The Brain Dump: A more streamlined shader system

I’m currently porting N3’s shader system to the PS3, and while I got it basically working, I’m not satisfied because it’s not really possible to come up with a solution that’s really optimized for the PS3 without creating my own version of the DirectX FX system. Currently N3 depends too much on obscure FX features which actually aren’t really necessary for a realtime 3D engine. So this is a good time to think about trimming off a lot of fat from the shader classes in CoreGraphics.

I think the higher level state management parts are in pretty good shape. The FrameShader system divides a render-frame into render-passes, posteffect-passes and render-batches, draw-calls are grouped by shader-instance. All together these do a good job of preventing redundant state switches. So above the CoreGraphics subsystem (almost) everything is fine.

Here’s how shaders are currently used in N3:

FramePass: a pass-shader sets render states which are valid throughout the whole pass (for instance, a depth-pass may disable color-writes, and so on…), but doesn’t contain vertex- or pixel-shaders
FrameBatch: this is one level below passes, a batch-shader sets render state which is valid for rendering a batch of ModelNodeInstances with the same ModelNodeType (for instance, a render batch which contains all SrcAlpha/InvSrcAlpha-blended objects would configure the alpha blending render states accordingly). FrameBatches usually don’t have a vertex/pixel-shader assigned, but there are cases where this is useful, if everything that is rendered during the batch uses the same shader anyway.
PostEffects: a post-effect-pass uses a shader to render a fullscreen-quad
StateNodes and StateNodeInstances: node-instances are always rendered in batches, so that material parameters only need to be set once at the start of a node-batch, other parameters (most notably the ModelViewProjection matrix) must be set before each draw-call.
Specific subsystems may use shaders for their own rendering as they please, but let’s just say that those will have to live with the limitations of the new shader system ;)

This defines a pretty small set of requirements for a refactored Nebula3 shader system. By removing feature requirements and making the system *less* flexible, we have a better foundation for low-level optimizations and especially for porting the code to other platforms without compromises.

Here’s a quick overview of the current shader system:

The important classes and concepts to understand N3’s shader system are Shader, ShaderInstance, ShaderVariation and ShaderVariable. The basic idea of the shader system is to move all the slow stuff into the setup phase (for instance looking up shader variables) instead of doing it every frame.

A Shader encapsulates an actual shader file loaded from disc. Shaders are created once at application start, one Shader object per shader file in the shd: directory. Shaders are not actually used for anything except providing a template for ShaderInstances.

A ShaderInstance is created from a specific shader and contains its own set of shader parameter values. For instance, every Models::StateNode object creates its own ShaderInstance and initializes it with its specific material parameters.

A ShaderVariable is created from a ShaderInstance and is used to set the value of a specific shader parameter of the ShaderInstance.

ShaderVariations are the most interesting part of the system. Under the hood, a ShaderVariation is an FX Technique, and thus can have a completely different set of render states and vertex/pixel shader code. The variation that is currently used for rendering is selected by a “feature bit mask”, and the bits of the feature mask can be set and deleted in different parts of the rendering pipeline. For instance, the Depth Pass may set the “Depth” feature bit, and a character rendering node may set the “Skinned” feature bit, resulting in the feature mask “Depth|Skinned”. A Shader can contain any number of variations, each with a specific feature mask. Thus a shader which supports rendering in the depth pass, and also supports skinning would implement (among others) a variation “Depth|Skinned” for rendering skinned geometry during the depth pass.

The biggest problem with the current system is that ShaderVariables are so damn powerful, thanks to the FX system. FX implicitly caches the current value of shader variables, so that a variable can be set at any time before or after the shader is actually used for rendering. FX also tracks changes to shader variables and knows which shader variables need to be updated when it comes to rendering. This extra caching layer is actually not really required by Nebula3 and removing it allows for a much more streamlined implementation if FX is not available or not desirable.

Here are the changes I currently have in mind:

A lot of reflection information will not be available. A Shader object won’t be required to enumerate the shader parameters and query their type. The only information that will probably survive is whether a specific parameter exists, but not its datatype, array size, etc… This is not really a problem for a game application, since the reflection information is usually only needed in DCC tools.
Querying the current value of a shader parameter won’t be possible (in fact that’s already not possible in N3, even though FX offers this feature).
ShaderInstances will offer a way to setup shader parameters as “immutable”. Immutable parameters are set once but cannot be changed afterwards. For instance, the material parameters of a StateNode won’t be changed once they are initialized, and setting them up as immutable allows the shader to record them into some sort of opaque command buffer or state block and then completely forget about them (so it doesn’t have to keep track of immutable parameters at all once they have been setup).
Mutable shader parameters (like the ModelViewProjection matrix) are not required to cache their current value, this means that N3 may only set mutable shader parameters after the ShaderInstance has been set as the active shader. This may be a bit inconvenient here and there, but it relieves the low-level shader system from a lot of house-keeping.
With the 2 previous points a ShaderInstance doesn’t have to keep track of shader parameters at all. All the immutables are stored away in command buffers, and all the mutables could directly go through to the rendering API.
It won’t be possible to change rasterization state through shader variables at all (like CullMode or AlphaRef - basically, everything that’s set inside a FX Technique), shader variables may only map to vertex- or pixel-shader uniform constants.
N3 still needs to call a Commit() method on the shader instance after setting mutable parameters and before issuing the draw-call since depending on the platform, some housekeeping and batching may still be necessary for setting mutable parameters efficiently.
Shared shader parameters will (probably) go away. They are convenient but hard to replicate on others platforms. Maybe a similar mechanism will be implemented through a new N3 class so that platform ports have more freedom in implementing shared shader parameters (shared parameters will likely have to be registered by N3 instead of being automatically setup by the platform’s rendering API).
The last fix will be in the higher-level code where ModelNodeInstances are rendered:

Currently, draw-calls are grouped by their ShaderInstance, but those instance-groups themselves are not sorted in a particular order. In the future, rendering will also be sorted by Shader and active ShaderVariation, this allows to change the vertex and pixel shaders less frequently (changing the pixel shader is particularly bad).
Currently, a ShaderInstance object contains a complete cloned FX effect instance, in the future, a shader instance will not use a cloned effect - instead, immutable parameters will be captured into a parameter block and mutable parameter changes will be routed directly to the “parent” FX effect.

Another mid-term-goal I want to aim for is to reduce the number of per-draw-call shader-parameter-changes as much as possible. There are a lot of lighting parameters fed into the pixel shader for each draw call currently. Whether this means going to some sort of deferred rendering approach or maybe setting up per-model lighting parameters in a dynamic texture, or trying to move more work back into the vertex shader I don’t know yet.

27 Jun 2009

A more streamlined shader system