How can we apply the concepts of frameless rendering to a double-buffered system?
If you'd like to really dive into what frameless rendering is, there's a short paper on it
here!
If you're really so lazy you don't want to read a 2 page paper, then the TLDR of it is that instead of rendering an entire frame and then uploading it to a presentation
buffer to display on the screen, you randomly sample and update pixels as soon as they are ready. This style of rendering poses some advantages, primarily that we can reduce
the delay in between user input and a result on the screen, since we don't need to wait for an entire swapchain of images to render before seeing our updated product. Some additional
advantages is also that there is less screen tearing and there can even be some implicit motion blur effects!
So considering this technique, what is the objective? Fundamentally, the vulkan pipeline that I've constructed still sends out batches of work to do for each "frame"; the system is still
a "double-buffered" renderer, where an entire frame is calculated and then swapped to a presentation image. Frameless rendering, overall, does not really apply to a GPU-based system
that batches together work groups and sends them to the GPU all at once to calculate. If we were to simply randomly select pixels to update and present, the overhead of moving each
workload to the GPU one at a time and then return would simply be slower than just doing it on the CPU. Not to mention, the GPU cores are still weaker than the CPU cores, so that calculation
would also be slower! The optimal way to use a GPU, at the end of the day, is to batch together groups of work and send them all at once.
So why try to apply this to my renderer? Well, the overall objective for this raytracing project of mine is to make a 3D modeler/renderer that can be viewed in real time. Depending
on the number of triangles and overall workload, it becomes quite easy for the FPS of the renderer to dip below 60. However, if we reduce the number of rays being traced each frame with
the application of frameless rendering, we can keep FPS at a visually optimal level which will look better than typical frame skipping. While it is still double-buffered, as long as the FPS
is at the same level as the monitor's inherent refresh rate, there's no visual difference between the CPU-based frameless rendering and what will be our GPU-based frameless rendering.
Our overall objective is to update a random sample of pixels in each of our frames as to reduce the number of traces we have to perform. To do this, we could simply
just randomly update a given pixel in our compute shader given a percentage
p. However, this can lead to some unwanted behaviors such as lagging pixels that don't update for
a significant amount of time due to an unlucky chance or a strange behavior in our set random function. To accomodate this, we should instead calculate the probability that our pixel
should be updated by now given how long it has been since the last update. To do this, we redfine
p as the chance that the pixel will update within a second, and calculate our
comparative probability with
1.0 - pow(1.0 - p, time_passed). We we do this, the output looks a little like this:
Frameless Rendering a Julia Set SDF on 400x225
While this looks like it's working at the moment, a glaring issue becomes apparent when we pressure the renderer and the fps goes down - there's some flickering between pixels.
This is due to the fact that each "frame" in our swapchain is randomly calculated and modified individually. Since the geometry changes between frames, this causes pixels to flip back and forth between
what the pixel looked like at different points of time. This effect is more apparent when we make our swapchain 20 images long - you can visually see a "rewind" effect as the swapchain loops through the images
and reveals the past pixel recordings:
The Rewind Effect
To fix this, instead of simply not updating a pixel, we have to update it to be a transparent pixel. Then we can blend the current swapchain image into a singular final image so only the most recently
update pixels are displayed at any given point in time.
Finally, just to add some finishing touches, we can then make it so an updated pixel is only at 10% opacity, so it blends smoothly. Putting this all together, we get something that looks
like this:
Final Render
While the 20 fps gif doesn't do it quite justice, the smooth blending effect really gives it a nice sense of motion blur, which looks phenomenal on a 4D fractal rendering such as the julia set!
As for performance results, it all is very dependant on the geometry, what features are enabled, etc. But when rendering the Stanford happy budhha model (just over 1 million triangles) at 1600x900, my render time
goes from 78 milliseconds to 24 when I set
p to 0.5! That's a 3.25x speedup! If I set
p any lower than 0.5, it updates a bit too slow for comfort, but that kind of speedup is definitely significant enough
to consider using this technique when the renderer starts bottlenecking.