31 Mar 2008

In Oblivion

I can't believe I started playing Oblivion again. I finally want to finish the main quest, last time I played through nearly all guild quests and then didn't have enough motivation left to go on with the main story... I was immediately sucked into the game again. Dungeon crawling is where Oblivion really shines, even more then Morrowind (which I still consider the better overall game). I have created a new dark-elf nightblade which I'm playing as a stealthy magic-wielding, arrow-shooting bad-ass assassin :o)

I also started to play around a bit with SVG, since I was looking for a cheap way to render diagrams for the Nebula3 debugging and profiling subsystem. I'll go into more details in a later post, but let me just say that SVG kicks ass and is exactly what I was looking for. Fun-fact: the only browser that can't render SVG out of the box is IE7 (Firefox, Opera and Safari are fine).

19 Mar 2008


Just came across this citation on Slashdot:

"Any third-rate engineer or researcher can increase complexity; but it takes a certain flair of real insight to make things simple again." - E.F.Schumacher.

Every programmer and game designer should bow before these mighty words of wisdom. Guess I need to read his book "Small Is Beautiful" now.

16 Mar 2008

Gaming Weekend

I'm not feeling very productive this weekend, might have to do with the shitty weather in Berlin...just the right weather to stay at home and play some games. I bought Bully for my 360 last week. I didn't play the original on the PS2, and although I read that the game "might freeze on some older consoles" I gave it a try. And guess what, it froze on me about 2 hours into the game, loosing at least 1 hour since the last save. I'm waiting now for the patch RockStar promised should come last week, since the game really looks like fun. Lousy certification job though, this bug shouldn't have slipped through.

Played through the first chapter of Rainbow Six Vegas again... and I must say the game hasn't aged very well. The graphics is a bit too dirty, it's very hard to make out enemies against the background at least in the Mexican setting in the beginning. There's still no better cover system in any other game though, but I had a hard time to adapt to the controls again (blew myself up several times because 'B' is 'throw grenade' instead of crouch). I think it was the right decision to make the graphics in Vegas 2 that much cleaner, even though I was turned off at first by the "cartoony" look of the screenshots.

After that mildly frustrating experience I played some more Ninja Gaiden Black on Hard difficulty. I finally want to kick Alma's ass. This game just gets better the more you play it. The structure of the game is very different on Hard difficulty. There are new enemies, items are distributed differently in the world, you get weapons and their upgrades only much later, and the boss fights are much more challenging because the bosses are now accompanied by minions. It is amazing how well balanced the rock-scissor-paper system in Ninja Gaiden is. A different weapon can make a subtle but very important difference for a specific enemy type. For instance, at first glance, the new cat-demons in hard difficulty just look like a more annoying version of the Black Spider Ninjas, but while the ninjas can be controlled very nicely with the Lunar staff, I feel much more comfortable fighting the cat-demons with the nunchuk (need to do some experimentation with the Vigorian Flail though). Hard difficulty also forces you to learn blocking, jumping and rolling much more efficiently to avoid attacks. I recently downloaded Ninja Gaiden as an Xbox Original title even though I also own the disc version. Not having to swap discs for a quick round of Ninja Gaiden fun is well worth the 1200 points IMHO :)

I also played an hour of Crackdown. This game is still so much fun... I was playing around with some of the more advanced stuff I didn't use during my earlier play-throughs. For instance, specifically aiming for body- or car-parts (head-shots with the sniper-rifle over insane distances, or causing havoc on the highways by blowing up the gas-tank or tires of passing vehicles). I read somewhere that GTA4 will use a similar targeting system, if true this would be great, I really started to appreciate the added targeting functionality in Crackdown, especially when playing a bit more tactical instead of blowing up the whole perimeter Terminator-style.

I finally ended the day with a few rounds of COD4 multi-player. I'm now on my second prestige-round. I guess I have finally finished my transition from a keyboard/mouse- to a gamepad-FPS player. I can pull off shit with the gamepad now which I deemed impossible one year ago :)

15 Mar 2008

Vertex Component Packing

I finally got around to optimize vertex component sizes for Drakensang. A typical vertex (coords, normal, tangent, binormal, one uv-set) is now 28 bytes instead of 56 bytes, a light-mapped mesh vertex (2 uv-sets) is now 32 bytes instead of 64, and a skinned vertex has been reduced to 36 bytes instead of 88. With this step I have finally burned all DX7-bridges, all our projects have a 2.0 minspec now (since Radon Labs also does casual titles, we had to support Win98 and DX7 for much too long). As a result, the size of all mesh resources in Drakensang has been reduced from from a whopping 1.2 GByte down to about 650 MByte. This also means reduced loading times and better vertex-through-put when transferring vertex data to the graphics chip. Some vertex components need to be scaled to the proper range in the vertex shader, but this is at most one multiply-add operation per component.

I also implemented support for the new vertex formats in Nebula3. N3 always had support for packed vertex components, so all I had to do was to add a few lines to the legacy NVX2 mesh loader and fix a few places in the vertex shaders for unpacking normals and texcoords.

Here's how the vertex components are now packed by default:
  • Position: Float3 (just as before)
  • Normal, Tangent, Binormal: UByte4N (unsigned byte, normalized)
  • TexCoord: Short2 as 4.12 fixed point
  • Color: UByte4N
  • Skin Weights: UByte4N
  • Skin Joint Indices: UByte4
Normals, tangents and binormals and tex-coords need an extra unpacking instruction in the vertex shader. Skin weights need to be "re-normalized" in the vertex shader because they loose too much precision:

float4 weights = packedWeights / dot(packedWeights, float4(1.0, 1.0, 1.0, 1.0));

This will make sure that the components add up to 1.0. In case you're wondering, the dot product is equivalent with s = (x + y + z + w), it's just much more efficient, because the dot product is a native vertex shader instruction (although I must confess that I didn't check yet whether fxc's optimizer is clever enough to optimize the horizontal sum into a dot product automatically).

5 Mar 2008

Nebula3's Multithreaded Rendering Architecture

Alright! The Application Layer is now running through the new multithreaded rendering pipeline.

Here's how it works:

  • The former Graphics subsystem has been renamed to InternalGraphics and is now running in its own "fat thread" with all the required lower-level Nebula3 subsystems required for rendering.
  • There's a new Graphics subsystem running in the application thread with a set of proxy classes which mimic the InternalGraphics subsystem classes.
  • The main thread is now missing any rendering related subsystems, so trying to call e.g. RenderDevice::Instance() will result in a runtime error.
  • Extra care has been taken to make the overall design as simple and "fool-proof" as possible.
  • There's very little communication necessary between the main and render threads. Usually one SetTransform message for each graphics entity which has changed its position.
  • Communication is done with standard Nebula3 messages through a single message queue in the new GraphicsInterface singleton. This is an "interface singleton" which is visible from all threads. The render thread receives messages from the main thread (or other threads) and never actively sends messages to other threads (with one notable exception on the Windows platform: mouse and keyboard input).
  • Client-side code doesn't have to deal with creating and sending messages, because it talks through proxy objects with the render thread. Proxy objects provide a typical C++ interface and since there's a 1:1 relationship may cache data on the client-side to prevent a round-trip into the render thread (so there's some data duplication, but a lot less locking)
  • The Graphics subsystem offers the following public proxy classes at the moment:
    • Graphics::Display: setup and query display properties
    • Graphics::GraphicsServer: creates and manages Stages and Views
    • Graphics::Stage: a container for graphics entities
    • Graphics::View: renders a "view" into a Stage into a RenderTarget
    • Graphics::CameraEntity: defines a view volume
    • Graphics::ModelEntity: a typical graphics object
    • Graphics::GlobalLightEntity: a global, directional light source
    • Graphics::SpotLightEntity: a local spot light
  • These proxy classes are just pretty interfaces and don't do much more then creating and sending messages into the GraphicsInterface singleton.
  • There are typically 3 types of messages sent into the render thread:
    1. Synchronous messages which block the caller thread until they are processed, this is just for convenience and only exists for methods which are usually not called while the main game loop is running (like Display::GetAvailableDisplayModes())
    2. Asynchronous messages which return immediately but pass a return-value back at some later time. These are non-blocking, but the result will only be available in the next graphics frame. The proxy classes do everything possible to hide this fact by either caching values on the client side, so that no communication is necessary at all, or by returning the previous value until the graphics thread gets around to process the message).
    3. The best and most simple messages are those which don't require a return value. They are just send off by the client-side proxy and processed at some later time by the render thread. Fortunately, most messages sent during a frame are of this nature (e.g. updating entity transforms).
  • Creation of Graphics entities is an asynchronous operation, it is possible to manipulate the client-side proxy object immediately after creation even though the server-side entity doesn't exist yet. The proxy classes take care about all these details internally.
  • There is a single synchronization event per game-frame where the game thread waits for the graphics thread. This event is signalled by the graphics thread after it has processed pending messages for the current frame and before culling and rendering. This is necessary to prevent the game thread from running faster then the render thread and thus spamming its message queue. The game thread may run at a lower - but never at a higher - frame rate as the render thread.

Here's some example code from the testviewer application. It actually looks simpler then before since all the setup code has become much tighter:
using namespace Graphics;
using namespace Resources;
using namespace Util;

// setup the render thread

Ptr<GraphicsInterface> graphicsInterface = GraphicsInterface::Create();

// setup and open the display
Ptr<Display> display = Display::Create();
// ... optionally change display settings here...

That's all that is necessary to open a default display and get the render thread up and running. The render thread will now happily run its own render loop.

To actually have something rendered we need at least a Stage, a View, a camera, at least one light and a model:

// create a GraphicServer, Stage and a default View
Ptr<GraphicsServer> graphicsServer = GraphicsServer::Create();

Attr::AttributeContainer dummyStageBuilderAttrs;
Ptr<Stage> stage = graphicsServer->CreateStage(StringAtom("DefaultStage"),


Ptr<View> view = this->graphicsServer->CreateView(InternalGraphics::InternalView::RTTI,

// create a camera and make it the active camera for our view
Ptr<CameraEntity> camera = CameraEntity::Create();
camera->SetTransform(matrix44::translation(0.0f, 0.0f, 10.0f));

// create a global light source
Ptr<GlobalLightEntity> light = GlobalLightEntity::Create();

// finally create a visible model
Ptr<ModelEntity> model = ModelEntity::Create();

That's the code to setup a simple graphics world in the asynchronous rendering case. There are a few issues I still want to fix (like the InternalGraphics::InternalView::RTTI thing). The only thing that's left is to add a call to GraphicsInterface::WaitForFrameEvent() somewhere into the game-loop before updating the game objects for the next frame. The classes App::RenderApplication and App::ViewerApplication in the Render layer will actually take care of most of this stuff.

There's some brain-adaption required to work in an asynchronous rendering environment:

  • there's always a delay of up to one graphics frame until a manipulation actually shows up on screen
  • it's hard (and inefficient) to get data back from the render thread
  • it's impossible for client-threads to read, modify and write-back data within one render-frame

For the tricky application specific stuff I'm planning to implement some sort of installable client-handlers. Client threads can install their own custom handler objects which would run completely in the render-thread context. This is IMHO the only sensible way to implement application specific graphics functionality which requires exact synchronization with the render-loop.

I've had to do a few other changes to the existing code base for the asynchronous rendering to work: Mouse and keyboard events under Windows are produced by the application Windows (which is owned by the render thread), but the input subsystem lives in the game thread. Thus there needs to be a way for the render thread to communicate those input events into the main thread. I decided to derive a ThreadSafeDisplayEventHandler class (and ThreadSafeRenderEventHandler for the sake of completeness). Client threads can install those event handlers to be notified about display and render events coming out of the render-thread.

The second, bigger, change affected the Http subsystem. Previously, HttpRequestHandlers had to live in the same thread as the HttpServer, which isn't very useful anymore now that important functionality has been moved out of the main thread. So I basically moved the whole Http subsystem into its own thread as well, and HttpRequestHandlers may now be attached from any thread. There's a nice side effect now that a Http request only stalls the thread of the HttpRequestHandler which processes the request.

There's still more work to do:

  • need to write some stress-tests to uncover any thread-synchronization bugs
  • need to do performance investigations and profiling (are there any unintended synchronizations issues?)
  • thread-specific low-level optimization in the Memory subsystem as detailed in one of my previous posts
  • optimize the messaging system as much as possible (especially creation and dispatching)
  • I also want to implement some sort of method to run the rendering in the main thread, partly for debugging, partly for platforms with simple single-core CPUs

Phew, that's all for today :)