17 Aug 2012

Twiggy's Low Level Render Pipeline

Twiggy's render thread is much simpler then Nebula3's current "fat thread". Instead of running a completely autonomous graphics world in the render thread it simply accepts rendering commands from a push buffer, fed by the main thread (and maybe other threads later on). Eventually the render thread will also be extensible through some sort of plugin-modules which may implement new low-level rendering commands.

Extra care has been taken to keep data structures and their relations simple, memory granularity low (arrays-of-structs instead of arrays-of-pointers) and to keep related data close to each other in memory (to increase CPU cache efficiency).

The push buffer is at least double-buffered (more buffers are possible but probably don't make much sense since it would only increase latency when the main thread runs too far ahaead of the render thread). The render thread will starve if the main thread stops feeding commands, but that's something which needs to be fixed for special cases like loading screens, probably by adding flow-control commands to the push buffer, so that the render thread can loop over a block of rendering commands.

When doing an experimental port of N3's current render pipeline to Native Client I implemented such a push-buffer driven render pipeline because of NaCl's "call-on-main-thread" limitation. Each OpenGL call would be serialized into a command buffer, and pulled by the "Pepper Thread" where the actual GL calls would be executed. This worked - but was a terrible hack and convinced me that it isn't a good idea to feed the render thread with such low level commands (way too many commands per frame, and the feeder thread had to wait for the render thread each time a getter-function or resource-creation-function was called.

Thus the command protocol of Twiggy's render thread is higher level, but not as high-level to be completely hard-wired to a specific type of rendering technique.

Something that works very well since Nebula2 is the frame-shader system (in N2 these were called RenderPaths). Frame-shaders describe how a complete frame is rendered by dividing the frame into passes and batches. It's a nice and generic medium-level renderer architecture. Much less verbose then D3D or OpenGL, but flexible enough to implement various rendering techniques (like forward rendering on low-end platforms versus pre-light-pass rendering on platforms which have a bit more fillrate). It was quite natural to use the frame-shader vocabulary for Twiggy's render command protocol.

A twiggy render frame is built from the following commands (this excludes display setup and resource creation):

  • 1 BeginFrame
    • [UpdateProjTransform]
    • 1 UpdateViewTransform
    • 1..N BeginPass
      • 1..N BeginBatch
        • 1..N BeginInstances
          • 1..N Draw
          • 1..N DrawInstanced
          • 1 DrawFullscreenQuad
        • EndInstances
      • EndBatch
    • EndPass
  • EndFrame

BeginFrame / EndFrame: this encapsulates an entire render frame, calling EndFrame() signals the render pipeline that all rendering commands for this frame have been issued, and the result can be displayed.

UpdateProjTransform: updates the projection matrix, some rendering techniques may require to change the projection mid-frame (for instance before rendering to a shadow-map).

UpdateViewTransform: updates the view matrix, depending on the actual rendering technique used, this can be called several times per frame.

BeginPass / EndPass: a pass sets (and optionally clears) the active render target, and sets render states which should remain valid for the entire pass.

BeginBatch / EndBatch: a Batch only sets a couple of render states which should remain valid between BeginBatch/EndBatch, this is just a way to reduce redundant state switches when rendering instances. Typical batches are solid vs. alpha-blended objects for instance.

BeginInstances / EndInstances: this sets up all necessary data for a series of Draw commands for the same 3d object (geometry, textures, shaders, render states).

Draw / DrawInstances: Several ways to issue actual draw commands, the most simple version takes a world-space matrix and performs one draw call to render one instance. DrawInstances will be used for rendering several instances with a single command, preferably using hardware-instancing.

DrawFullscreenQuad: Just draws a fullscreen quad, mainly used for post-effects.

I'll keep resource management out for now, this topic is interesting enough for its own post.

What's currently missing is (1) the whole topic of dynamic resources (instance buffers, dynamic geometry, dynamic textures, ...) and (2) an elegant way to update variable shader parameters.

There will be a way to write dynamic resources from the main thread, and use them from the render thread, to prevent excessive data copying over the push buffer (it's a bit silly to move thousands of instance matrices over the push buffer, especially if the matrices are mostly static).

Next up: more info about Twiggy's resource management!

4 comments:

Ash said...

Hey Floh,

Any chance you're going to release a new version of Nebula3?

xoyojank said...

I think the articles are more important than code :D

GingerBoats said...

Glad that you started posting on the blog again! It has definitely been a while.

With the push buffer that your implementing for Twiggy, do you plan on encapsulating the commands into a engine standard Nebula Object or create something separate that only incorporates the GL rendering subset that you want to expose and the required resource commands?

S said...

I have code for this if anyone wants it... I actually would have given it to floh if the nebula device were still open source. Possibly if he asked nicely. It's DirectX only though.