**Part 3: DDGI Overview** [Dynamic Diffuse Global Illumination](index.html) 2019 May 3 Updated 2019 May 3
 [ <== Previous Page: Global Illumination](intro-to-gi.html) Coming soon: Algorithm Deep Dive
This part of the DDGI article describes integrated ray-traced glossy GI and dynamic diffuse GI at a level suitable for a product manager or art director to begin evaluating the technique. I describe the previous state of the art techniques and show many examples of DDGI handling particularly difficult cases. Glossy GI ============================================================================ Environment Maps ---------------------------------------------------------------------------- Glossy global illumination effects produce the recognizable reflections and highlights seen on shiny surfaces. Since the 1970s, glossy GI has been approximated with reflection maps [#Blinn76]. The core technique has been expanded over time with various clever parallax distortions and for surfaces of varying roughness. These are also known as environment maps, environment probes, radiance probes, and light probes. These are distinct from the "irradiance probes" that I'll mention for diffuse GI shortly. Except for car racing games, which often render a single reflection probe in real-time, most real-time programs use baked (precomputed) probes. This means that dynamic objects and lighting conditions cannot affect reflections. Some tricks such as rendering mirrors to texture, rendering the scene inverted, and grabbing distorted samples from the screen or skybox have been used in specific titles for planar reflectors. But none were general purpose real-time glossy GI solutions; that requires ray tracing. Ray Tracing ---------------------------------------------------------------------------- For the past several years, screen-space ray tracing has been used as an approximation of reflections between close objects that are both visible on the screen. This first appeared in CryEngine 3 [#Sousa11] in its modern form, and was quickly adopted and extended by many games [#Wronski14] [#Valient14] [#Stachowiak15]. I analyzed this technique in detail in a [previous research paper](http://jcgt.org/published/0003/04/04/). Screen-space ray tracing of course cannot produce glossy reflection of objects or parts of objects that are not on screen. True geometric ray tracing solves that problem [#Deligiannis19] [#Qian19]. Today, many games use at least one and sometimes both of these techniques simultaneously. ![Geometric and screen-space ray traced glossy GI in recent games.](x3-glossy-games.jpg width=80%) Algorithm ------------------------------------------------------------------------ Glossy GI on a perfect mirror is straightforward to render. For mirrors, the shaded value at the ray hit is the glossy GI value. For rough surfaces that produce blurred reflections, there are two choices. You can trace stochastic rays distributed according to the roughness and then blur out the noise, or trace perfect mirror rays and then blur the mirror reflection. Both approaches work with both screen-space and geometry ray tracing. ![Glossy GI by ray tracing, deferred shading and blurring to MIP maps, and then sampling per pixel.](x3-glossy-steps.jpg) The figure above shows glossy GI rendered by this process. *Step 1* traces a G-buffer that was half the vertical resolution of the screen by generating a mirror reflection ray at each pixel. I halved the resolution to double performance. I chose to do so vertically because most reflections for this content were on the floor and are thus vertically blurred anyway in screen space. I then ran the regular deferred shading code on the mirror G-buffer. The default [G3D](https://casual-effects.com/g3d) deferred shader can reconstruct hit position and view direction from the z-buffer and pixel coordinate (as most deferred shaders do), or by explicitly reading them from buffers. For this case, I passed the ray hit and direction buffers because the shading should be in the reflected direction and not towards the camera. G3D has persistent shadow maps, so the shading was able to compute direct illumination shadows from those. In an engine such as Unreal that uses transient shadow maps, it is probably best to use shadow ray casts on reflection points. The second order glossy GI and the diffuse GI on the points seen in reflection were both computed with DDGI. *Step 2* blurrs the mirror buffer [#Valient14] at progressively larger scales into a MIP map chain. The blurring respects edges using a bilateral filter. It also blurs into areas that have no reflection result at all, so that bilinear filtering in the following step will not hit black. *Step 3* samples this glossy GI at primary surfaces. It computes the correct MIP level to sample based on the roughness of the primary surface, its distance from the camera, and the distance of the reflected object (secondary surface) from the primary surface. I chose to blur mirror reflections instead of stochastic glossy ones because I've found it easier to avoid per-pixel flicker (also known as "specular aliasing" and "fireflies") this way. The result is slightly less physically correct. In general, per-pixel flicker is the main image quality challenge with glossy GI. Final-frame TAA both can help and hurt; it fixes some aliasing, but has to deal with motion vectors that are inconsistent on the reflected object versus the reflective surface, and then the TAA camera jitter also reveals the aliasing on bright highlights. I also run FXAA over the image as a result and disable parallax mapping and normal mapping on surfaces seen in reflections. You could imagine using TAA on the reflection buffer itself instead of the final frame to address the motion vector problem. As general advice, apply everthing that you already know about filtering for primary rays in the glossy shader, including geometric LOD, MIP bias, and normal to roughness. On RTX 2080 Ti for this scene with a few million polygons, the complete glossy GI pass cost between 1 and 2 ms depending on viewpoint, at 1920x1080 resolution. This includes the amortized cost of BVH refit. Mixing in less expensive screen-space reflection for very nearby objects and environment probes for very distant ones can roughly halve this cost by shortening rays. Checkerboard rendering instead of half-resolution may improve image quality at the risk of more horizontal aliasing, or it could be combined with half-resolution and/or DLSS for lower cost, especially on less powerful GPUs. Previous Diffuse GI =============================================================== Dynamic glossy GI was solved by ray tracing several years ago. What the field has been working on was handling off-screen objects (via accelerated geometry ray tracing and DXR), increasing performance (again, DXR), and avoiding flicker and noise (an ongoing process). Diffuse GI has never had a robust, dynamic, and efficient solution. However, there have been many good previous ideas and good results for specific applications and scenes. Strategies --------------------------------------------------------------- Some strategies that real-time renderers have previously used for diffuse global illumination are: - *Light maps* [#Quake97] [#Mitchell06] - *Irradiance probes/voxels* [#Greger98] [#Tatarchuk05] [#Ramamoorthi11] [#Gilabert12] - *Virtual point lights* [#Keller97] [#Kaplanyan10] [#Ding14] [#Xu16] [#Sassone19] [#White19] - *Reflective shadow maps* [#Dachsbacher05] [#Kaplanyan10] [#Ding14] [#Malmros17] [#Xu16] [#White19] - *Light propagation volumes* [#Kaplanyan09] [#Kaplanyan10] - *Sparse voxel cone tracing* [#Crassin11] [#McLaren16] - *Denoised ray tracing* [#Mara17] [#Schied17] [#Metro19] [#Archard19] Baked light maps and light probes are the dominant techniques right now for DX11-class games and are supported by most game engines. The [Enlighten](https://www.siliconstudio.co.jp) middleware can also update light maps and probes at runtime using simplified models. It has been used in several games and is part of the inspiration for DDGI. Similar techniques have been employed by some other renderers for slowly updating data structures. In addition to this list, there are many other research techniques for real-time GI. The methods that I listed above have been documented as shipping in games. The citations are representative but not intended to be comprehensive. See [_Real-Time Rendering_](http://www.realtimerendering.com/) 4th edition for a complete survey of techniques. I'll focus the evaluation here on classic irradiance probes because that is the previous technique that DDGI accelerates and upgrades. The limitations of these classic irradiance probes are representative of the problems that most of the above techniques experience. Classic Irradiance Probes ---------------------------------------------------------------------- The technique of sampling and storing the irradiance field at sparse locations goes back to 1998 [#Greger98] and is supported by most engines today. The engine fills the scene with small probes that measure and store diffuse GI. This is mathematically [irradiance](https://en.wikipedia.org/wiki/Irradiance), the integral of incident radiance weighted by a clamped cosine: $E_\mathrm{e}(X) = \int_{\Omega} L(X, \w) \max(0, \w \cdot \n) ~ d\w$, where $\n$ is the direction in which irradiance is measured (the normal), $X$ is the point, and $L$ is the radiance function. Irradiance probes are usually baked offline, although some engines have updated them at runtime by casting rays against very low level of detail (LOD) versions of the scene, relighting [#Gilabert12], splatting low LOD points, or rasterizing and blurring cube maps. ![Examples of classic irradiance probes in multiple engines.
Image Credits: Left [#Kaplanyan10b], Center and Right [#Asirvatham05].]](x3-cascades.jpg width=80%) We recommend 32 x 4 x 32 = 4096 probes around the camera in the highest-resolution grid cascade. These will update frequently. Coarse cascades in space and time to scale out to big scenes. Fade out and then entirely disable visibility on very coarse cascades to halve memory consumption and shading cost, just as you would for shadow maps. Data Structure ---------------------------------------------------------------------------- Below is the texture memory layout for one cascade. What you're seeing on the top of the figure is the irradiance data in a 2D R11G11B10F texture map, and on the bottom one channel of the the visibility data in 2D RG16F format. ![Layout in memory of the packed DDGI probe data for one cascade.](x3-memory-layout.jpg width=80%) Each of those textures contains four large squares arranged horizontally. These correspond to top views at different elevations of the Greek Villa scene. The black areas are probes that are inside of walls. Within each of the large squares are tiny squares. Each of these is the data for a single probe. Each is a full sphere of data projected onto an octahedron and then unfolded into a square. The probe contains 6x6 irradiance values, including a 1-pixel border to enable fast bilinear interpolation, and 16x16 visibility (distance) values including their border. At the formats and resolutions described, the irradiance data consumes 590 kB per cascade and the visibility data consumes about 4 MB. The entire cascade is thus less than 5 MB, which is smaller than a typical HDR framebuffer or sun shadow map cascade. Algorithm ---------------------------------------------------------------------------- If your engine already has classic probe support, you can add dynamic diffuse GI by repurposing existing data paths and tools. Keep the good parts of your workflow and what you already know about probes. ![The DDGI algorithm at a high level.](x3-algorithm.jpg width=80%) The DDGI algorithm is similar to the glossy GI algorithm sketched earlier in this part. It has three steps: trace, update probes by blending, and then shade the points visible on screen. *Step 1* generates 100-300 rays for each probe (192 is a good default) and traces them through the scene. It generates a G-buffer from these hits and runs the standard deferred shader on them. We packed the probe rays and G-buffer so that they fit into the bottom half of the screen-space buffer used for the glossy GI ray trace. This means that we can launch and shade both kinds of rays at once, which reduces overhead and achieves better scheduling. Because the diffuse GI for the points that are hit by the probe rays is provided by the probes from the previous frame, DDGI provides not just 1-bounce but _infinite_ bounce GI, converging quickly over a few frames when the scene changes. *Step 2* updates the probes from the memory layout diagram with the new shaded data. It iterates over the probe texels, and for each, gathers all ray hitpoints that affect it and blends those in. The blending uses a *hysteresis* value so that new and old data are combined. This, along with some careful math for the visibility data, ensures that results smoothly change without requiring explicit history. This is similar to how TAA and other kinds of exponentially-weighted moving averages (EMWA) operate. Hysteresis values of 90% to 99.5% (of the previous frame) are viable; lower gives faster update but can flicker. We recommend 97% hysteresis for the general case and have some adaptive filters for special cases described in the optimization part of this article. *Step 3* samples the probes per pixel, per frame. This can be done for deferred shading, forward shading, some mixture of them, and even for volumetrics and glossy ray hits. Step 3 is extremely fast. The main costs are 16 bilinear fetches per sample, which are nearly perfectly coherent and so always in cache, and at most nine division operations. It is also the only step that is proportional to the refresh rate or screen resolution. The more computationally intensive steps 1 and 2 are _independent_ of frame rate and resolution. This allows you to scale the per-frame time cost of DDGI to whatever fits your budget frame. We used 1-2ms for our prototype on RTX 2080 Ti at 60 Hz 1920x1080. This gives about 100 ms of latency on the indirect light, which is often imperceptible as it is in world space, not screen space. For lower end cards, the indirect light has more latency but the image quality is identical when nothing is moving. While a better GPU definitely gives a more impressive result as the scene changes, you'll have next-generation lighting everywhere. When combined with glossy GI ray tracing, a high-end GPU can match offline path traced quality for the global illumination using DDGI. It will also scale to future GPUs. Increase probe density, probe visibility information resolution, or cascade grid extent in "ultra" mode to sharpen indirect light shadows and provide even better lighting in the distance. On legacy platforms where ray tracing is too expensive, you can even set the latency to "infinity" and use baked DDGI probes. There the GI will not capture dynamic light sources or geometry, but it will still improve your artist lighting workflow. So, you'll get previous generation lighting quality with a lot less production cost. Evaluation ---------------------------------------------------------------------------- ### Diffuse Interreflection The first test one always runs on global illumination is the real 3D version of the toy colored box example that I described in the previous part. Here's the "Cornell Box" on the left, showing correct color bleeding of the red and green walls onto the white surfaces: ![Two colored boxes showing accurate diffuse interreflection with DDGI.](x3-accurate.jpg width=80%) On the right is the orange-and-cyan box that I previously described, now with a statue in the middle. You can see that the left side of the statue is colored orange by the diffuse indirect light and the right side is colored by the cyan indirect light. There is _no_ direct illumination on the statue at all. ### Accurate and Noise-free One of the strengths of DDGI is that there is no per-pixel noise, and thus no denoising pass or ghosting. The images below compare an offline path tracer with 256 paths per pixel on the left to real-time DDGI on the right. In each case, there is a white box containing only a white spotlight and a red dragon. When the dragon is on the right side of the box, the box is mostly white because the spotlight hits the white floor first with direct illumination. When the dragon moves to the left side of the box, everything turns red because all box surfaces are lit only by indirect light bouncing off the dragon. This is an extreme dynamic diffuse lighting case. ![A white box containing a white spotlight and red dragon.](x3-noise-free.jpg width=80%) It is usually acceptable for a real-time method to give slightly different results from an offline reference path tracer. However, we need to know what those differences are in order to art direct around them and design content appropriately. For this test, you can see that DDGI is very close to the path traced reference image. There is a slight loss of contrast inside of the dragon's mouth for DDGI, which we would recover with screen-space radiosity or screen-space ambient occlusion. There is one other difference: DDGI doesn't have all of the noise from path tracing. In this aspect, the real-time method is actually better than path tracing. ### Dynamic Lighting Dynamic lighting, especially time of day changes, are one of the primary features of a global illumination system. Players and designers both want lighting cycles for games to be affect not just direct sunlight but also all of the indirect lighting and any local geometric changes. The following figure shows two different locations each at two times of day. In each, the lighting shifts from mostly direct to mostly diffuse global illumination as the sun sets and shadows shift. Note that all of the diffuse GI matches both the changing sun and sky colors as well as the overall intensity and direction of lighting. Unlike interpolation between baked probes, no additional storage space is required for these transitions and they are continuous. During sunset, the GI colors rapidly shift through many different shades that could not be captured by a small number of baked probe variations. ![Dynamic time of day.](x3-time-of-day.jpg width=80%) ### Dynamic Geometry Dynamic geometry is the most difficult case for probes. There is no place to put the probes where dynamic geometry can't intersect them during gameplay. Both irradiance and visibility are constantly changing due to moving objects. The worst situation is when a characters stands right where a probe is, suddenly making that probe go dark and leaking shadow everywhere. Previous real-time diffuse GI approximations have to put probes high in the air in the hope that characters won't walk through them. We created a really hard test that defeats that strategy: a hundred bright beach balls tumbling into a courtyard using physical simulation (here PhysX on the GPU). The balls are constantaly passing around and through the probes, so there is no safe place to hide the probes. In the video you can see that DDGI correctly computes the visibility at every frame, so even though the balls are bouncing chaotically there are no shadow leaks. You can also see that DDGI correctly scatters red and yellow diffuse GI from the balls onto the walls and reflects light from the walls back onto the balls. ![Dynamic geometry video. Captured in realtime.](https://www.youtube.com/watch?v=mgJPQvcbVOI) ![Still frames from the dynamic geometry test case.](x3-dynamic.jpg) ### Realistic Here are three complex scenes that were originally designed for _offline_ rendering. These show DDGI and ray traced glossy GI interacting correctly. The left column shows the direct illumination contribution only. The right side shows the result with full global illumination. All run in 1-2 ms per frame. ![Greek exterior scene with reflecting pool. Note that the interior and inside ceilings of the temple are completely lit by indirect light.](x3-realistic.jpg) ![Living room lit by two interior lamps and sunlight through large windows. Note the subtle GI effects on the table top and couch as well as the obvious GI on the ceiling and mirror.](x3-realistic-livingroom.jpg) ![This bathroom is entirely lit by GI; the only source is the emissive skybox seen through the window. The glossy reflection in the mirror is only visible because the _secondary_ surfaces are also lit by DDGI.](x3-realistic-bathroom.jpg) ### Large Scenes DDGI can use a variety of techniques for scaling to large scenes within a fixed time budget, including sleeping, priority scheduling, and cascades. To show performance even under brute force, for this scene we loaded a 1 km^2 scene and filled it with 16,000 probes. We then updated _every_ probe every frame on GeForce RTX 2080 Ti. This takes about 4 ms per frame for DDGI and is able to handle dynamic lighting and geometry across this large area. ![Large scenes](x3-large-scenes.jpg) With proper cascades, that runtime can be limited to 2 ms on this GPU and can span a practically unlimited view distance due to the exponential range in the number of cascades. ### Avoids Leaks The core of DDGI is of course computing visibility per probe sample to avoid leaks. Without that they could not be dynamic or enable efficient lighting workflow. Here is a targetted test case to compare leaking between classic irradiance probes and DDGI. Each uses the same resolution and probe locations. I built this simple house, which is lit by a white directional light. The ground plane is orange so that diffuse GI will be obviously tinted orange to distinguish it from leaks. The colored spheres show the probe locations in this image: ![House test scene exterior. Spheres indicate probe locations and are artificially color coded and numbered for debugging purposes.](x3-avoid-leaks-overview.jpg width=80%) Inside, the house is divided into two rooms. One room has the open doorway to outside, allowing light to flow in. The other is completely sealed. The open room should have subtle, multi-bounce diffuse GI. The sealed room should be black everywhere. Here is the interior of the open room. ![Interior open room, showing correct multibounce DDGI.](x3-avoid-leaks-inside.jpg width=80%) Note that the setup corresponds to the case where Hooker [#Hooker16] showed light leaking in an early _Call of Duty_ build (they fixed this by artist intervention and some clever tools). They had light leaking from the bright sun hitting the roof and leaking onto the interior ceiling. DDGI eliminates this light leak. There is some bright light on the ceiling in the DDGI image--that is the orange light that bounced off the ground and correctly lit the ceiling! To evaluate both rooms, I moved the camera to a bird's eye top view and lowered the near clipping plane. This cuts the roof off the building in the camera's view, but leaves it present in the lighting simulation so that we can see inside without disturbing the scene. ![Classic probes (left) leak light and shadow, which DDGI (right) avoids.](x3-avoid-leaks-house.jpg width=80%) On the left are classic light probes. They leak significant light into both rooms of the house, and also leak shadow out though the wall at the bottom of the image. On the right side, the house lit by DDGI has no light or shadow leaks. The sealed room is 100% black. The following figure shows a more complex scene of a cathedral. It is entirely lit by the sun, which is very bright on the exterior, and reflects white off interior walls or red off interior carpet (not visible in these angles). The classic light probes on the left give an attractive image, but it is wrong. Most of the light here is actually leaking through the exterior wall. In some places, probes inside walls are also leaking darkness. DDGI corrects both. ![Left: light leaks from classic probes. Right: that are corrected by DDGI.](x3-avoid-leaks-sibenik.jpg width=80%) Tilting the camera up to the inside of the cupola shows even more obvious leaks from probes on the exterior, which DDGI also fixes: ![Cupola view with exterior light leaks fixed by DDGI.](x3-avoid-leaks-dome.jpg width=80%) ### Video Walkthrough Here's a video of DDGI and individual terms in motion from a live recording of an early Feb 2019 build. ![GDC'19 Video](https://www.youtube.com/watch?v=_gkLDuuEVMI) The following parts of this article describe the latest best practices, which have slightly improved performance and image quality and significantly improved indirect lighting lag since the build captured in this video. Summary =================================================================== I've now given an overview of the previous state of the art and a complete system for next-generation global illumination using ray traced glossy GI and Dynamic Diffuse Global Illumination. I also showed an evaluation on some targeted test cases of the image quality. This should provide enough information to make a decision on whether the DDGI solution is a good fit for your real-time diffuse lighting needs. If it is, implementors will want to read the next three parts of this article. They decribe the mathematics of the key algorithms, provide a detailed implementation guide, and then give advice on optimizing for performance and quality in production.