Game programming gems memory
I will now describe our Screen Space Ambient Occlusion algorithm in greater detail. Depending on how the final compositing is to be done, this can be accomplished in one of two ways. The first method requires that the scene be rendered twice. The first pass will render the depth and normal data only. The SSAO algorithm can then generate the ambient occlusion output in an intermediate step, and the scene can be rendered again in full color.
With this approach, the ambient occlusion map in screen space can be sampled by direct lights from the scene to have their contribution modulated by the ambient occlusion term as well, which can help make the contributions from direct and indirect lighting more coherent with each other.
This approach is the most flexible but is somewhat less efficient because the geometry has to be passed to the hardware twice, doubling the API batch count and, of course, the geometry processing load.
A different approach is to render the scene only once, using multiple render targets bound as output to generate the depth and normal information as the scene is first rendered without an ambient lighting term. SSAO data is then generated as a post-step, and the ambient lighting term can simply be added. This is a faster approach, but in practice artists lose the flexibility to decide which individual lights in the scene may or may not be affected by the ambient occlusion term, should they want to do so.
Using a fully deferred renderer and pushing the entire scene lighting stage to a post-processing step can get around this limitation to allow the entire lighting setup to be configurable to use ambient occlusion per light. Whether to use the single-pass or dual-pass method will depend on the constraints that are most important to a given graphics engine.
In all cases, a suitable format must be chosen to store the depth and normal information. When supported, a bit floating-point format will be the easiest to work with, storing the normal components in the red, green, and blue components and storing depth as the alpha component.
Screen Space Ambient Occlusion is very bandwidth intensive, and minimizing sampling bandwidth is necessary to achieve optimal performance. Moreover, if using the single-pass multi-render target approach, all bound render targets typically need to be of the same bit depth on the graphics hardware.
To minimize bandwidth and storage, the depth and normal can be encoded in as little as a single bit RGBA color, storing the x and y components of the normal in the 8-bit red and green channels while storing a bit depth value in the blue and alpha channels. Listing 1. At any visible point on a surface on the screen, we need to explore neighboring points to determine whether they could occlude our current point.
Multiple samples are thus taken from neighboring points in the scene using a filtering process described by the HLSL shader code in Listing 1.
From these three pieces of information, the 3D position of the point within can be reconstructed using the shader code shown in Listing 1. Of, Of ; to Transforming each offset vector by a matrix can be expensive, and one alternative is to perform a dot product between the offset vector and the normal vector at that point and to flip the offset vector if the dot product is negative, as shown in Figure 1.
This is a cheaper way to solve for the offset vectors without doing a full matrix transform, but it has the drawback of using fewer samples when samples are rejected due to falling behind the plane of the surface of the point p. Samples behind the hemisphere are flipped over to stay within the hemisphere. From this neighboring depth value, we can establish whether an object likely occupies that space at the neighbor point. We also have the depth di of the frontmost object along the ray that connects the eye to each neighboring point.
How do we determine ambient occlusion? The depth di gives us some hints as to whether a solid object occupies the space at each of the sampled neighboring points. We can devise some reasonable heuristics with the information we do have and use a probabilistic method. The further in front of the sample point the depth is, the less likely it is to occupy that space. Also, the greater the distance between the point p and the neighbor point, the lesser the occlusion, as the object covers a smaller part of the hemisphere.
Thus, we can derive some occlusion heuristics based on: The difference between the sampled depth di and the depth of the point qi The distance between p and qi For the first relationship, we can formulate an occlusion function to map the depth deltas to occlusion values. If the aim is to be physically correct, then the occlusion function should be quadratic.
In our case we are more concerned about being able to let our artists adjust the occlusion function, and thus the occlusion function can be arbitrary. Really, the occlusion function can be any function that adheres to the following criteria: Negative depth deltas should give zero occlusion. The occluding surface is behind the sample point. Smaller depth deltas should give higher occlusion values. The occlusion value needs to fall to zero again beyond a certain depth delta value, as the object is too far away to occlude.
For our implementation, we simply chose a linearly stepped function that is entirely controlled by the artist. A graph of our occlusion function is shown in Figure 1. There is a full-occlusion threshold where every positive depth delta smaller than this value gets complete occlusion of one, and a no-occlusion threshold beyond which no occlusion occurs. Depth deltas between these two extremes fall off linearly from one to zero, and the value is exponentially raised to a specified occlusion power value.
If a more complex occlusion function is required, it can be pre-computed in a small ID texture to be looked up on demand. SSAO blocker function. Sampling Randomization Sampling neighboring pixels at regular vector offsets will produce glaring artifacts to the eye, as shown in Figure 1.
SSAO without random sampling. To smooth out the results of the SSAO lookups, the offset vectors can be randomized. A good approach is to generate a 2D texture of random normal vectors and perform a lookup on this texture in screen space, thus fetching a unique random vector per pixel on the screen, as illustrated in Figure 1. We have n neighbors we must sample, and thus we will need to generate a set of n unique vectors per pixel on the screen.
These will be generated by passing a set of offset vectors in the pixel shader constant registers and reflecting these vectors through the sampled random vector, resulting in a semirandom set of vectors at each pixel, as illustrated by Listing 1. The set of vectors passed in as registers is not normalized—having varying lengths helps to smooth out the noise pattern and produces a more even distribution of the samples inside the occlusion hemisphere.
The offset vectors must not be too short to avoid clustering samples too close to the source point p. In general, varying the offset vectors from half to full length of the occlusion hemisphere radius produces good results.
The size of the occlusion hemisphere becomes a parameter controllable by the artist that determines the size of the sampling area. Randomized sampling process. With wider sampling areas, however, a further blurring of the ambient occlusion result becomes necessary. The ambient occlusion results are low frequency, and losing some of the high-frequency detail due to blurring is generally preferable to the noisy result obtained by the previous steps.
SSAO term after random sampling applied. Applying blur passes will further reduce the noise to achieve the final look.
To smooth out the noise, a separable Gaussian blur can be applied to the ambient occlusion buffer. However, the ambient occlusion must not bleed through edges to objects that are physically separate within the scene. A form of bilateral filtering is used. This filter samples the nearby pixels as a regular Gaussian blur shader would, yet the normal and depth for each of the Gaussian samples are sampled as well.
Encoding the normal and depth in the same render targets presents significant advantages here. If the depth from the Gaussian sample differs from the center tap by more than a certain threshold, or the dot product of the Gaussian sample and the center tap normal is less than a certain threshold value, then the Gaussian weight is reduced to zero. The sum of the Gaussian samples is then renormalized to account for the missing samples.
Result of Gaussian blur. Handling Edge Cases The offset vectors are in view space, not screen space, and thus the length of the offset vectors will vary depending on how far away they are from the viewer. This can result in using an insufficient number of samples at close-up pixels, resulting in a noisier result for these pixels. Of course, samples can also go outside the 2D bounds of the screen. Naturally, depth information outside of the screen is not available.
In our implementation, we ensure that samples outside the screen return a large depth value, ensuring they would never occlude any neighboring pixels. To prevent unacceptable breakdown of the SSAO quality in extreme close-ups, the number of samples can be increased dynamically in the shader based on the distance of the point p to the viewer. This can improve the quality of the visual results but can result in erratic performance.
Alternatively, the 2D offset vector lengths can be artificially capped to some threshold value regardless of distance from viewer. Optimizing Performance Screen Space Ambient Occlusion can have a significant payoff in terms of mood and visual quality of the image, but it can be quite an expensive effect. The main bottleneck of the algorithm is the sampling itself. The performance of the texture cache will also be very dependent on the sampling area size, with wider areas straining the cache more and yielding poorer performance.
Our artists quickly got in the habit of using SSAO to achieve a faked global illumination look that suited their purposes. This required more samples and wider sampling areas, so extensive optimization became necessary for us. One method to bring SSAO to an acceptable performance level relies on the fact that ambient occlusion is a low-frequency phenomenon.
Thus, there is generally no need for the depth buffer sampled by the SSAO algorithm to be at full-screen resolution. The initial depth buffer can be generated at screen resolution, since the depth information is generally reused for other effects, and it potentially has to fit the size of other render targets, but it can thereafter be downsampled to a smaller depth buffer that is a quarter size of the original on each side. The downsampling itself does have some cost, but the payback in improved throughput is very significant.
Downsampling the depth buffer also makes it possible to convert it from a wide bit floating-point format to a more bandwidth-friendly bit packed format. Fake Global Illumination and Artistic Styling If the ambient occlusion hemisphere is large enough, the SSAO algorithm eventually starts to mimic behavior seen from general global illumination; a character relatively far away from a wall could cause the wall to catch some of the subtle shadowing cues a global illumination algorithm would detect.
If the sampling area of the SSAO is wide enough, the look of the scene changes from darkness in nooks and crannies to a softer, ambient feel. This can pull the art direction in two somewhat conflicting directions: on the one hand, the need for tighter, high-contrast occluded zones in deeper recesses, and on the other hand, the desire for the larger, softer, ambient look of the wide-area sampling.
One approach is to split the SSAO samples between two different sets of SSAO parameters: Some samples are concentrated in a small area with a rapidly increasing occlusion function generally a quarter of all samples , while the remaining samples use a wide sampling area with a gentler function slope. The two sets are then averaged independently, and the final result uses the value from the set that produces the most darkest occlusion.
This is the approach that was used in StarCraft II. SSAO with different sampling-area radii. The edge-enhancing component of the ambient occlusion does not require as many samples as the global illumination one, thus a quarter of the samples can be assigned to crease enhancement while the remainder are assigned for the larger area threshold.
Though SSAO provides for important lighting cues to enhance the depth of the scene, there was still a demand from our artist for more accurate control that was only feasible through the use of some painted-in ambient occlusion. The creases from SSAO in particular cannot reach the accuracy that using a simple texture can without using an enormous amount of samples. Thus the usage of SSAO does not preclude the need for some static ambient occlusion maps to be blended in with the final ambient occlusion result, which we have done here.
Combined small- and large-area SSAO result. For our project, complaints about image noise, balanced with concerns about performance, were the main issues to deal with for the technique to gain acceptance among our artists. Increasing SSAO samples helps improve the noise, yet it takes an ever-increasing number of samples to get ever smaller gains in image quality. Transparency It should be noted the depth buffer can only contain one depth value per pixel, and thus transparencies cannot be fully supported.
This is generally a problem with all algorithms that rely on screen space depth information. There is no easy solution to this, and the SSAO process itself is intensive enough that dealing with edge cases can push the algorithm outside of the real-time realm.
In practice, for the vast majority of scenes, correct ambient occlusion for transparencies is a luxury that can be skimped on. Very transparent objects will typically be barely visible either way. For transparent objects that are nearly opaque, the choice can be given to the artist to allow some transparencies to write to the depth buffer input to the SSAO algorithm not the z-buffer used for hidden surface removal , overriding opaque objects behind them.
Final Results Color Plate 1 shows some results portraying what the algorithm contributes in its final form. The top-left pane shows lighting without the ambient occlusion, while the top-right pane shows lighting with the SSAO component mixed in. The final colored result is shown in the bottom pane. Here the SSAO samples are very wide, bathing the background area with an effect that would otherwise only be obtained with a full global illumination algorithm.
The SSAO term adds depth to the scene and helps anchor the characters within the environment. Conclusion This gem has described the Screen Space Ambient Occlusion technique used at Blizzard and presented various problems and solutions that arise. Screen Space Ambient Occlusion offers a different perspective in achieving results that closely resemble what the eye expects from ambient occlusion.
The technique is reasonably simple to implement and amenable to artistic tweaks in real time to make it ideal to fit an artistic vision. Boston: Charles River Media, Section 6. March Brett Lajzer. Max Planck Institut Informatik. Google Sites. Govindaraju, Derek Nowrouzezahrai, and John Snyder.
Homi Bhabha Auditorium, Bombay, India. Deferred shading enables game engines to handle many local lights without repeated geometry processing because it replaces geometry processing with pixel processing [Saito90, Shishkovtsov05, Valient07, Koonce07, Engel09, Kircher09].
In other words, shading costs are independent of geometric complexity, which is important as the CPU cost of scene-graph traversal and the GPU cost of geometry processing grows with scene complexity. Despite this decoupling of shading cost from geometric complexity, we still seek to optimize the pixel processing necessary to handle many local lights, soft shadows, and other per-pixel effects.
In this gem, we present a technique that we call multi-resolution deferred shading, which provides adaptive sub-sampling using a hierarchical approach to shading by exploiting spatial coherence of the scene. Multi-resolution deferred shading efficiently reduces pixel shading costs as compared to traditional deferred shading without noticeable aliasing.
As shown in Figure 1. Deferred shading left: 20 fps , multi-resolution deferred shading center: 38 fps , and their difference image right. Deferred Shading Unlike traditional forward rendering approaches, deferred shading costs are independent of scene complexity.
This is because deferred shading techniques store geometry information in textures, often called G-buffers, replacing geometry processing with pixel processing [Saito90, Shishkovtsov05, Valient07, Koonce07]. Deferred shading techniques start by rendering the scene into a G-buffer, which is typically implemented using multiple render targets to store geometry information, such as positions, normals, and other quantities instead of final shading results.
Next, deferred shading systems render a screen-aligned quad to invoke a pixel shader at all pixels in the output image. The pixel shader retrieves the geometry information from the G-buffer and performs shading operations as a post process.
Naturally, one must carefully choose the data formats and precise quantities to store in a G-buffer in order to make the best possible use of both memory and memory bandwidth. For example, the game Killzone 2 utilizes four buffers containing lighting accumulation and intensity, normal XY in bit floating-point format, motion vector XY, specular and diffuse albedo, and sun occlusion [Valient07].
The Z component of the normal is computed from normal XY, and position is computed from depth and pixel coordinates. As shown in Color Plate 3, we simply use two four-channel buffers of bit floating-point precision per channel without any advanced encoding schemes for ease of description and implementation. The first of our buffers contains view-space position in the RGB channels and a material ID in the alpha channel.
The other buffer contains view-space normal in the RGB channels and depth in the alpha channel. We could also use material buffers that store diffuse reflectance, specular reflectance, shininess, and so on. However, material buffers are not necessary if we separate lighting and material phases from the shading phase using light pre-pass rendering [Engel09]. Unlike traditional deferred shading, light pre-pass rendering first computes lighting results instead of full shading.
This method can then incorporate material properties in an additional material phase with forward rendering. Although this technique requires a second geometry rendering pass, such separation of lighting and material phases gives added flexibility during material shading and is compatible with hardware multi-sample antialiasing.
A related technique, inferred lighting, stores lighting results in a single lowresolution buffer instead of the full-resolution buffer [Kircher09]. To avoid discontinuity problems, this technique filters edges using depth and object ID comparison in the material phase. As we will describe in the next section, our technique is similar to inferred lighting, but our method finds discontinuous areas based on spatial proximity and then solves the discontinuity problems using a multi-resolution approach during the lighting or shading phase.
Multi-Resolution Deferred Shading Although deferred shading improves lighting efficiency, computing illumination for every pixel is still expensive, despite the fact that it is often fairly low frequency. We have developed a multi-resolution deferred shading approach to exploit the low-frequency nature of illumination.
We perform lighting in a lower-resolution buffer for spatially coherent areas and then interpolate results into a higher-resolution buffer. This key concept is based upon our prior work [Ki07a]. Here, we generalize this work and improve upon it to reduce aliasing. The algorithm has three steps, as shown in Color Plate 4: geometry pass, multi-resolution rendering pass, and composite pass.
The geometry pass populates the G-buffers. Our technique is compatible with any sort of G-buffer organization, but for ease of explanation, we will stick with the 8-channel G-buffer layout described previously. The next step is multi-resolution rendering, which consists of resolution selection non-edge detection , shading lighting , and interpolation up-sampling.
We allocate buffers to store rendering results at various resolutions. If the full-resolution image is especially high, we could choose to decrease the resolutions of the R-buffers even more drastically than just one-quarter resolution in each step.
Multi-resolution rendering uses rendering iterations from lower-resolution to higherresolution R-buffers. We prevent repeated pixel processing by exploiting early-Z culling to skip pixels processed in earlier iterations using lower-resolution R-buffers [Mitchell04].
To start shading our R-buffers, we set the lowest-resolution R-buffer as the current render target and clear its depth buffer with one depth farthest. During this pass, the pixel shader reads geometry information from mip-mapped versions of our G-buffers and estimates spatial proximity for non-edge detection.
Then, we compare the difference of normal and depth values using tunable thresholds. If spatial proximity is low for the current pixel, we should use a higher-resolution R-buffer for better quality, and thus we discard the current pixel in the shader to skip writing Z. After this pass, pixels whose spatial proximity is high in other words, non-edge in the current resolution contain meaningful Z values because they were not discarded.
The pixels whose spatial proximity is low in other words, edges still have farthest Z values left over from the initial clear. This means that only spatially coherent pixels in this resolution will pass the Z-test, as illustrated in Color Plate 4.
In the pixel shader, we read geometric data from G-buffers and compute illumination as in light prepass rendering. On a textured surface, such as wall and floor, although spatial proximity between neighboring pixels is high, these pixel colors are often different.
Such cases can cause serious aliasing in the resulting images. To solve this problem, we store only lighting results instead of full shading results into R-buffers, and we handle material properties with stored illumination in R-buffers in the composite pass. We have found that bilinear filtering is adequate, though we could use bi-cubic filtering or other higher-order filtering for better quality.
We repeat the process described above at the next higher resolution, estimating spatial proximity and writing Z and computing illumination until we reach the full-resolution Rbuffer. A full-screen quad is drawn three times per iteration. If a given pixel was shaded on a prior iteration in a lower-resolution R-buffer, that pixel is not shaded again at the higher resolution due to early-Z culling.
In this way, we are able to perform our screen-space shading operations at the appropriate resolution for different regions of the screen. In Figure 1. Visualization of hierarchical pixel processing. The middle image shows the pixels shaded in the second iteration at one-quarter resolution, and only the pixels in the image on the right were shaded at full image resolution.
Because this approach exploits image scaling from low resolution to high resolution with interpolation, discontinuity artifacts can appear at boundaries of lighting or shadows. We address this issue during the multi-resolution rendering phase.
We write 1. Therefore, we interpolate these pixels to a higher-resolution buffer. Otherwise, we consider these pixels within the boundary, and thus we discard them in the interpolation pass see Figure 1. Check it out! This is the list of contents of the Game Programming Gems series. For more important game programming books go to My Bibliography.
Game Programming Gems 1 Section 1 Programming 1. Tags: bibliography , Game programming Gems , list contents.
Unknown May 29, at AM. Unknown August 14, at PM. Unknown September 23, at AM. In a monolithic build, at static initialization time before the allocators are bootstrapped , allocations are routed directly to the underlying operating system. These static allocations are tracked in a fixed size set and sent back to the OS when they are freed. They are also reported separately to memory tracking in the Global category.
To discover the memory that is being allocated globally, set a breakpoint in AZ::Internal::GlobalAlloc. The Linux Foundation has registered trademarks and uses trademarks. Get Started. Search site. Edit on GitHub. PoolAllocator is not thread safe. If you need a thread-safe version, use ThreadPoolAllocator, or inherit from ThreadPoolBase and then write custom code to handle the synchronization. This is the preferred schema. It combines a small block allocator for small allocations and a red-black tree for large allocations.
This provides good general purpose performance. Uses nedmalloc internally. If you don't mind bibliography, take this list as a starting point. I wanted to add a bit more detail since I own these books, but I won't have access to them until the end of the month. Here's what I can somewhat recall from memory though. The most significant change you can make is simplifying the search space. The most common space representations are using a grid, a navigation mesh or a visibility graph.
Tweaking the heuristic can also affect the speed of your pathfinding. For instance, they say that the heuristic has to be admissible i. But sometimes if you overestimate the heuristic a little it will speed up the search and the paths returned will be similar enough maybe not perfect every time, but significantly close. Finally, there are many variations on the algorithm itself to make it work under different circunstances, for instance, just to name a few:.
Bit of an old one stumbled into this searching for something else entirely but I didn't see Jump Point Search explicitly mentioned. There's a good article here which I think is from someone who worked closely on the development of the approach that describes very well how it works, how to implement it, and why it's faster.
There's no need to add the node you just came from to the search space, obviously, but there's also no need to add any common neighbors either: it will always be faster to go from the parent to the common neighbor directly than to pass through the current node, so you end up with the following:. Where the arrow represents parent-to-current, the gray nodes can be ignored, and the white nodes are the new ones that need to be added to the search space.
There are some special-case rules dealing with obstacles e. As well as simply heading in a straight line until an obstacle is reached before adding any nodes to the search space at all hence the name of the optimization: "jump point" as the code locates virtual "lily pads" that it jumps to as the only important nodes in the path.
We stop the recursion when we hit an obstacle or when we find a so-called jump point successor. Jump points are interesting because they have neighbours that cannot be reached by an alternative symmetric path: the optimal path must go through the current node.
Figure 3: Jumping Examples.
0コメント