27 Feb 2008

Lowlevel Optimizations

I'm currently doing memory optimizations in Drakensang, and together with the new ideas from the asynchronous rendering code in N3 I'm going to do a few low-level optimizations in the memory subsystem over the next time in Nebula3. Here's what I'm planning to do:
  • Add a thread-safe flag to the Heap class, currently a heap is always thread-safe, but there will be quite a few cases now where it makes sense to not have the additional thread-safety-overhead in the allocation routines.
  • Add some useful higher-level allocators:
    • FixedSizeAllocator: This optimizes the allocation of many small same-size objects from the heap, it will pre-allocate big pages, and manage the memory within the pages itself. The main-advantage comes from the fact that all blocks in the page are guaranteed to be the same size.
    • BucketAllocator: This is a general allocator which holds a number of buckets for e.g. 16, 32, 48, ...256 byte blocks (the buckets are just normal FixedSizeAllocators). Small allocations can be satisified from the buckets, larger allocation go directly through the heap as usual.
  • Overwrite the new-operator in all RefCounted-derived classes to use the a BucketAllocator (however, I'll actually do some profiling whether this is actually faster then Windows' Low Fragmentation Heaps). This stuff will be behind the scenes in the DeclareClass()/ImplementClass()-macros, so there are no changes necessary to the class source code.
The biggest new feature (which depends on all of the above) is that I want to split RefCounted classes into thread-safe and thread-local classes. The idea is that a thread-local class promises that creation, manipulation and destruction of its objects happens from the same thread. A thread-local class can do a few important things with less overhead:
  • thread-local classes would create their instances from a thread-local BucketAllocator which doesn't have to be thread-safe
  • thread-local classes could use normal increment and decrement operations for their refcounting instead of Interlocked::Increment() and Interlocked::Decrement(). Since every Ptr<> assignment changes the refcount of an object this may add up quite a bit.
By far most classes in Nebula3 can be thread-local, only message objects which are used for thread-communication, and some low-level classes need to be thread-safe. I'm planning to enforce thread-locallity in Debug mode by storing a pointer to the local thread-name in each instance, and checking in AddRef, Release and other RefCounted methods whether the current thread-context is the same (that's a simple pointer-comparision, and it happens only in Debug mode).

The general strategy is to get away as far as possible from the general Alloc()/Free() calls, and to make memory management more "context sensitive" and also more static (in the sense that the memory layout doesn't change wildly over the runtime of the game). It's extremely important for a game application to set aside fixed amounts of memory right from the beginning of the project (and to let the game fail hard if the limits are violated), otherwise everything will grow without control, and towards the end of the project much time must be invested to cut everything back to reasonable limits.


Sebastian Schuberth said...

This might be a little bit out of scope, but the question just came to my mind: Will Drakensang use Nebula 3 already?

Floh said...

Technically no, but there's a lot of Nebula3 code in Drakensang's engine. Drakensang was started on Nebula2 long before Nebula3 took shape. But over the last year, a lot of the Nebula3 code and ideas has been flowing back into our inhouse N2 version. N2 is a very solid and mature production engine, while N3 is still more of an experimentation testbed where I can try out some of the more radical stuff. We expect that N2 will still be our engine of choice for our small-to-medium size 2008 PC-projects. Console projects will start right away with N3 since porting N2 over to consoles would mean a lot more trouble then N3 where the 360 and Wii port is done in parallel to the PC version.