16 Aug 2009

XNAMath

With all the focus on the console platforms I didn’t notice one very cool addition to the March DirectX SDK: XNAMath. This is basically the traditional Xbox360 vector math lib, ported to the PC with SSE2 and inlining support. The N3 math classes are now running from the same code base on top of XNAMath for the PC and Xbox360 platforms. Maik has spent a few days to analyze the generated code and after some tweaking the improvements for our simple math benchmarks are absolutely dramatic, up to 4x faster on the PC side!

We had to change our memory allocation routines on the PC to always return 16-byte aligned memory, without this, XNAMath isn’t really useful since the aligned load/store functions can’t be used on vectors residing in heap buffers. Really strange that there isn’t a way to do this through the Win32 heap functions directly (or is there?).

Other then that I’m currently deep into “jobifiying” the render thread, in order to free the PS3-PPU from the mundane number-crunching tasks. Properly jobified code will also “automatically” run about 2x faster on a 2-core PC, and about 3..4x faster on the Xbox360, since even single jobs will be split and processing will be distributed to worker threads. The actual speedup may even be higher, since the data must be re-organized into small independent chunks (“slices”) of about 16..32 kByte each in order to make the best use of the SPU local memory, and this improved spatial locality is also extremely beneficial for CPU caches on the other platforms (I think I’m starting to sound like a record, but I can’t stress enough how good this data-reorganization will be for N3 on ALL platforms :)

4 comments:

Sebastian Schuberth said...

So, out of curiosity, are you now using _aligned_malloc() or something like

void* data=HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, size + 15)

void* data_aligned=(data+15)&(~15);

for 16-byte aligned heap allocations? Guessing from various Internet sources, the latter should be faster.

Floh said...

We're using HeapAlloc wrapper functions (called __HeapAlloc16, __HeapFree16, etc...), so it's like your second example.

gmail_ping said...

Hello! Don't know exactly where to as this, still I'll try here. I've been on Drakensang and was impressed with technical level (N2). (I used nnpack tool to get their game data :) ). What I found the most amazing are shader trees.
Very tricky and nice looking. So I wonder, why did they refused from this trees in DS:DE:River of Time? Is this about N3 specs?

eda said...

潤滑液內衣性感內衣自慰器充氣娃娃AV情趣衣蝶
按摩棒電動按摩棒飛機杯自慰套自慰套情趣內衣

G點性感丁字褲吊帶襪丁字褲無線跳蛋性感睡衣

角色扮演跳蛋情趣跳蛋煙火批發煙火情趣用品SM