tag:blogger.com,1999:blog-29484384000373176622024-03-06T21:01:45.120+01:00The Brain DumpGame development, Nebula Device, personal mumblings...Unknownnoreply@blogger.comBlogger172125tag:blogger.com,1999:blog-2948438400037317662.post-14780560199384473752016-01-09T16:17:00.001+01:002016-01-09T16:17:53.274+01:00Moving to githubI'm moving this blog over to github in an desperate attempt to finally keep all my web stuff in a single place (plus: keeping the blog in a version-controlled repo is so much nicer than Blogger, finally no more looking for the perfect blog editor, better control over the layout etc etc etc...).<br />
<br />
The new URL is:<a href="http://floooh.github.io/" target="_blank"> http://floooh.github.io/</a><br />
<br />
See you on the other side :)<br />
-Floh.Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-42109086390794153772014-11-10T00:41:00.001+01:002014-11-10T00:41:30.912+01:00New Adventures in 8-Bit Land<p>Alan Cox’ recent <a href="https://plus.google.com/111104121194250082892/posts/a2jAP7Pz1gj">announcement</a> of his Unix-like operating system for old home computers got me thinking: wouldn’t it be cool write programs for the KC85/3 in C, a language it never officially supported? </p>
<p>For youngsters and Westerners: the <a href="http://en.wikipedia.org/wiki/Robotron_KC_85">KC85 home computer line</a> was built in the 80’s in Eastern Germany, the most popular version, the KC85/3 had a 1.75MHz Z80-compatible CPU, and 16kByte each of general RAM, video RAM and ROM (so 32kByte RAM and 16kByte ROM). The ROM was split in half, 8kByte BASIC, 8kByte OS. Display was 320x256 pixels, a block of 8x4 pixels could have 1-out-of-16 foreground and 1-out-of-8 background colors. No sprite support, no dedicated sound chip, and the video RAM layout was extra-funky and had very slow CPU access.</p>
<p><a href="https://github.com/mamedev/mame">MAME/MESS</a> has rudimentary support for the KC85 line (and many other computers built behind the Iron Curtain) and I dabbled with the KC85 emulation in JSMESS a while ago, as can be seen here: <a href="http://www.flohofwoe.net/history.html">http://www.flohofwoe.net/history.html</a>. So far this dabbling was all about running old games on old (emulated) machines.</p>
<h3 id="new-code-on-old-machines">New Code on Old Machines</h3>
<p>But what about running new code on old machines? And not just Z80 assembler code, but ‘modern’ C99 code? </p>
<p>Good 8-bit C compilers are surprisingly easy to find, since the Z80 lived on well into the 2000s for embedded systems. I first started looking for a Z80 LLVM backend, but after some more googling I decided to go for <a href="http://sdcc.sourceforge.net/">SDCC</a>, which looks like the ‘industry standard’ for 8-bit CPUs and is still actively developed.</p>
<p>On OSX, a recent SDCC can be installed with brew:</p>
<pre class="prettyprint"><code class=" hljs markdown"><span class="hljs-blockquote">> brew install sdcc</span></code></pre>
<p>After I played a few minutes with the compiler I decided that starting right with C is a few steps too far.</p>
<h3 id="mess">MESS</h3>
<p>First I had to get MESS running again. MESS is the son of MAME, focusing on vintage computer emulation instead of arcade machines. Since I last used it, MESS had been merged back into MAME, and development has been moved onto github: <a href="https://github.com/mamedev/mame">https://github.com/mamedev/mame</a></p>
<p>So first, git-clone and compile mess:</p>
<pre class="prettyprint"><code class=" hljs markdown"><span class="hljs-blockquote">> git clone git@github.com:mamedev/mame.git mame</span>
<span class="hljs-blockquote">> cd mame</span>
<span class="hljs-blockquote">> make TARGET=mess</span></code></pre>
<p>This produces a ‘mess64’ executable on OSX. Next KC85/3 and /4 system ROMs are needed, these can be found by googling for ‘kc85_3.zip MESS’ (for what it’s worth, I consider these ROMs abandonware). With the compiled mess and the ROMs, a KC85/3 session can now be started in MESS:</p>
<pre class="prettyprint"><code class=" hljs lasso"><span class="hljs-subst">></span><span class="hljs-built_in">.</span>/mess64 kc85_3 <span class="hljs-attribute">-rompath</span> <span class="hljs-built_in">.</span> <span class="hljs-attribute">-window</span> <span class="hljs-attribute">-resolution</span> <span class="hljs-number">640</span>x512</code></pre>
<p>And here we go: <br>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpr3-9jqH6Vgju0ykNzjlBLIXnE2aZDedPJYZlznkfucXFR7p9C-aoXF3YjJHMeggZbNu5rMd-4DwubP2GMytJ6pjfHyrh8JPiV4CncznDCNZZYa27edupzrmV7I8GQ7ZjMT9Q2pT1Eye1/s0/mess_kc85_3.png" alt="enter image description here" title="mess_kc85_3.png"></p>
<h3 id="getting-stuff-into-mess">Getting stuff into MESS</h3>
<p>Next we need to figure out how to get code onto the emulator. The KC85 operating system ‘CAOS’ (for **C**assete **A**ided **O**perating **S**ystem - yes even East-German engineers had a sense for humor) didn’t have an ‘executable format’ like ELF, instead raw chunks of code and data were loaded from cassette tapes into memory. There was however a standardised format of how the data was stored on tape. Divided into chunks of 128 bytes, with the first chunk being the header with information at which address to load the following data. This tape format has survived as the ‘KCC file format’, where the first 128-byte chunk looks like this (taken from the <a href="https://github.com/mamedev/mame/blob/master/src/mess/machine/kc.c#L10">kc85.c MESS driver source code</a>):</p>
<pre class="prettyprint"><code class=" hljs scss">struct kcc_header
{
UINT8 name<span class="hljs-attr_selector">[10]</span>;
UINT8 reserved<span class="hljs-attr_selector">[6]</span>;
UINT8 number_addresses;
UINT8 load_address_l;
UINT8 load_address_h;
UINT8 end_address_l;
UINT8 end_address_h;
UINT8 execution_address_l;
UINT8 execution_address_h;
UINT8 pad<span class="hljs-attr_selector">[128-2-2-2-1-16]</span>;
};</code></pre>
<p>A .KCC file can be loaded into MESS using the <strong>-quik</strong> command line arg, e.g.:</p>
<pre class="prettyprint"><code class=" hljs lasso"><span class="hljs-subst">></span><span class="hljs-built_in">.</span>/mess64 kc85_3 <span class="hljs-attribute">-quik</span> test<span class="hljs-built_in">.</span>kcc <span class="hljs-attribute">-rompath</span> <span class="hljs-built_in">.</span> <span class="hljs-attribute">-window</span> <span class="hljs-attribute">-resolution</span> <span class="hljs-number">640</span>x512</code></pre>
<p>So if we had a piece of KC85/3 compatible machine code, and put it into a file with a 128-byte KCC header in front, we should be able to load this into the emulator.</p>
<p>The canonical ‘Hello World’ program for the KC85/3 looks like this in Z80 machine code:</p>
<pre class="prettyprint"><code class=" hljs bash"><span class="hljs-number">0</span>x7F <span class="hljs-number">0</span>x7F <span class="hljs-string">'HELLO'</span> <span class="hljs-number">0</span>x01
<span class="hljs-number">0</span>xCD <span class="hljs-number">0</span>x03 <span class="hljs-number">0</span>xF0
<span class="hljs-number">0</span>x23
<span class="hljs-string">'Hello World\n\r'</span> <span class="hljs-number">0</span>x00
<span class="hljs-number">0</span>xC9</code></pre>
<p>That’s a complete ‘Hello World’ in 27 bytes! Put these bytes somewhere in the KC85’s RAM, and after executing the command ‘MENU’ a new menu entry will show up named ‘HELLO’. To execute the program, type ‘HELLO’ and hit Enter: <br>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikjjC2ywoJXt1h5dwbn796d-VJkzXlTwKzpix6owo-f4LsedE6PhpIaOL5aJq5I-bu4EAYmWXXP2K4fJNkrpI-L6D9F20RiO1XUX9pF0dZTeBGSm-pCAnL8IB9V1RlLT1G5uGpEuXzbcEr/s0/kc85_hello.png" alt="enter image description here" title="kc85_hello.png"></p>
<p>How does this magic work? At the start is a special ‘7F 7F’ header which identifies these 27 bytes as a command line program called ‘HELLO’:</p>
<pre class="prettyprint"><code class=" hljs bash"><span class="hljs-number">0</span>x7F <span class="hljs-number">0</span>x7F <span class="hljs-string">'HELLO'</span> <span class="hljs-number">0</span>x01</code></pre>
<p>Execution starts right after the 0x01 byte:</p>
<pre class="prettyprint"><code class=" hljs ">0xCD 0x03 0xF0
0x23</code></pre>
<p>The <strong>CD</strong> is the machine code of the Z80 subroutine-call instruction, followed by the call-target address <strong>0xF003</strong> (the Z80 is little-endian, like the x86), this is a call to a central operating system ‘jump vector’, the <strong>0x23</strong> identifies the operating system function, in this case the function is <strong>OSTR</strong> for ‘Output STRing’ (<a href="http://www.mpm-kc85.de/dokupack/KC85_3_uebersicht.pdf">see page 43 of the system manual</a>). This function outputs a string to the current cursor position. The string is not provided as a pointer, but directly embedded into the code after the call and terminated with a zero byte:</p>
<pre class="prettyprint"><code class=" hljs tex">'Hello World<span class="hljs-command">\n</span><span class="hljs-command">\r</span>' 0x00</code></pre>
<p>After the operating system function has executed, it will resume execution after the string’s 0-terminator byte.</p>
<p>The final <strong>C9</strong> byte is the Z80 RETurn statement, which will give control back to the operating system.</p>
<p>This was the point where I started to write a bit of Python code which take a chunk of Z80 code, puts a KCC header in front and writes it to a .kcc file. And indeed the MESS loader accepted such a self-made ‘executable’ without problems.</p>
<h3 id="mnemonics">Mnemonics</h3>
<p>Before tackling the C programming challenge I decided to start smaller, with Z80 assembly code. The SDCC compiler comes (among others) with a Z80 assembler, but I found this hard to use (for instance, it generates intermediate ASCII <a href="http://en.wikipedia.org/wiki/Intel_HEX">Intel HEX</a> files instead of raw binary files).</p>
<p>After some more googling I found <a href="http://www.nongnu.org/z80asm/">z80asm</a> which looked solid and easy to use. Again this can be installed via brew:</p>
<pre class="prettyprint"><code class=" hljs markdown"><span class="hljs-blockquote">> brew install z80asm</span></code></pre>
<p>The simple Hello World machine code blob from above looks like this in Z80 assembly mnemonics:</p>
<pre class="prettyprint"><code class=" hljs sql"> org 0x200 ; <span class="hljs-operator"><span class="hljs-keyword">start</span> <span class="hljs-keyword">at</span> address <span class="hljs-number">0x200</span>
db <span class="hljs-number">0x7F</span>,<span class="hljs-number">0x7F</span>,<span class="hljs-string">"HELLO"</span>,<span class="hljs-number">1</span>
<span class="hljs-keyword">call</span> <span class="hljs-number">0xF003</span>
db <span class="hljs-number">0x23</span>
db <span class="hljs-string">"Hello World\r\n\0"</span>
ret</span></code></pre>
<p>Much easier to read right? And even with comments! Running this file through z80asm yields a binary files with the exact same 27 bytes as the hand-crafted machine code version:</p>
<pre class="prettyprint"><code class=" hljs cpp">> z80asm hello.s -o hello.bin
> hexdump hello.bin
<span class="hljs-number">0000000</span> <span class="hljs-number">7f</span> <span class="hljs-number">7f</span> <span class="hljs-number">48</span> <span class="hljs-number">45</span> <span class="hljs-number">4</span>c <span class="hljs-number">4</span>c <span class="hljs-number">4f</span> <span class="hljs-number">01</span> cd <span class="hljs-number">03</span> f0 <span class="hljs-number">23</span> <span class="hljs-number">48</span> <span class="hljs-number">65</span> <span class="hljs-number">6</span>c <span class="hljs-number">6</span>c
<span class="hljs-number">0000010</span> <span class="hljs-number">6f</span> <span class="hljs-number">20</span> <span class="hljs-number">57</span> <span class="hljs-number">6f</span> <span class="hljs-number">72</span> <span class="hljs-number">6</span>c <span class="hljs-number">64</span> <span class="hljs-number">0</span>d <span class="hljs-number">0</span>a <span class="hljs-number">00</span> c9
<span class="hljs-number">000001</span>b</code></pre>
<p>With some more Python plumbing I was then able to ‘cross-assemble’ new programs for the KC85 in a modern development environment. Very cool!</p>
<h3 id="c99">C99</h3>
<p>But the real challenge remains: compiling and running C code! Compiling a C source through SDCC generates a lot of output files, but none of them is the expected binary blob of executable code:</p>
<pre class="prettyprint"><code class=" hljs avrasm">> sdcc hello<span class="hljs-preprocessor">.c</span>
> ls
hello<span class="hljs-preprocessor">.asm</span> hello<span class="hljs-preprocessor">.ihx</span> hello<span class="hljs-preprocessor">.lst</span> hello<span class="hljs-preprocessor">.mem</span> hello<span class="hljs-preprocessor">.rst</span>
hello<span class="hljs-preprocessor">.c</span> hello<span class="hljs-preprocessor">.lk</span> hello<span class="hljs-preprocessor">.map</span> hello<span class="hljs-preprocessor">.rel</span> hello<span class="hljs-preprocessor">.sym</span></code></pre>
<p>There’s 2 interesting files: <strong>hello.asm</strong> is a human-readable assembler source file, and <strong>hello.ihx</strong> is the final executable, but in Intel HEX format. The .ihx file can be converted into a raw binary blob using the <strong>makebin</strong> program also coming with SDCC.</p>
<p>But even with a very simple C program there’s already a few things off: <br>
- global variables are placed at address 0x8000 (32kBytes into the address space), on the KC85/3 this is video memory so the default address for data wouldn’t work <br>
- if any global variables are initialized, then the resulting binary file is also at least 32 kBytes big, and has a lot of empty space inside <br>
- there’s a few dozen bytes of runtime initialization code which isn’t needed in the KC85 environment (at least as long as we don’t want to use the C runtime)</p>
<p>Thankfully SDCC allows to tweak all this and allows to compile (and link) pieces of C code into raw blobs of machine code without any ‘runtime overhead’, it doesn’t even need a main function to produce a valid executable. </p>
<p>Currently I’m placing global data at address 0x200, and code at address 0x300 (so there’s 256 bytes for global data), and I’m disabling anything C-runtime related. And of course we need to tell the compiler to generate Z80 code, these are the important command line options for sdcc:</p>
<pre class="prettyprint"><code class=" hljs haml">-<span class="ruby">mz8<span class="hljs-number">0</span>
</span>-<span class="ruby">-no-std-crt<span class="hljs-number">0</span> --nostdinc --nostdlib
</span>-<span class="ruby">-code-loc <span class="hljs-number">0x300</span>
</span>-<span class="ruby">-data-loc <span class="hljs-number">0x200</span></span></code></pre>
<p>With these compiler settings I’m getting the bare-bones Z80 code I want on the KC85. All that’s left now is some macros and system call wrapper functions to provide a KC-style runtime enviornment, and TADAA:</p>
<p>C99 programming on a 30 year old 8-bit home computer :D</p>
<iframe width="560" height="315" src="//floooh.github.io/kc85sdk/kc85_c.webm" allowfullscreen=""></iframe>
<p>Here’s the github link to the ‘kc85sdk’ (work in progress):</p>
<p><a href="https://github.com/floooh/kc85sdk">https://github.com/floooh/kc85sdk</a></p>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-30752395530717294382014-10-08T14:25:00.001+01:002014-10-08T14:25:23.821+01:00Cross-Platform Multitouch Input<p><strong>TL;DR</strong>: A look at the low-level touch-input APIs on iOS, Android NDK and emscripten, and how to unify them for cross-platform engines with links to source code.</p>
<h3 id="why">Why</h3>
<p>Compared to mouse, keyboard and gamepad, handling multi-touch input is a complex topic because it usually involves gesture recognition, at least for simple gestures like tapping, panning and pinching. When I worked on mobile platforms in the past, I usually tried to avoid processing low-level touch input events directly, and instead used gesture recognizers provided by the platform SDKs:</p>
<p>On <strong>iOS</strong> gesture recognizers are provided by the UIKit, they are attached to an UIView object, and when the gesture recognizer detects a gesture it will invoke a callback method. The details are here: <a href="https://developer.apple.com/library/ios/documentation/EventHandling/Conceptual/EventHandlingiPhoneOS/GestureRecognizer_basics/GestureRecognizer_basics.html">GestureRecognizer_basics.html</a></p>
<p>The <strong>Android NDK</strong> itself has no built-in gesture recognizers, but comes with source code for a few simple gesture detectors in the <a href="https://android.googlesource.com/platform/development/+/master/ndk/sources/android/ndk_helper/">ndk_helpers source code directory</a></p>
<p>There’s 2 problem with using SDK-provided gesture detectors. First, iOS and Android detectors behave differently. A pinch in the Android NDK is something slightly different then a pinch in the iOS SDK, and second, the <strong>emscripten SDK</strong> only provides the low-level touch events as provided by <a href="http://www.w3.org/TR/touch-events/">HTML5 Touch Event API</a>, no high-level gesture recognizers.</p>
<p>So, to handle all 3 platforms in a common way, there doesn’t seem to be a way around writing your own gesture recognizers and trying to reduce the platform-specific touch event information into a platform-agnostic common subset.</p>
<h3 id="platform-specific-touch-events">Platform-specific touch events</h3>
<p>Let’s first look at the low-level touch events provided by each platform in order to merge their common attributes into a generic touch event:</p>
<h4 id="ios-touch-events">iOS touch events</h4>
<p>On iOS, touch events are forwarded to <strong>UIView</strong> callback methods (more specifically, <strong>UIResponder</strong>, which is a parent class of UIView). Multi-touch is disabled by default and must be enabled first by setting the property <strong>multipleTouchEnabled</strong> to YES.</p>
<p>The callback methods are:</p>
<pre class="prettyprint"><code class=" hljs haml">-<span class="ruby"> <span class="hljs-symbol">touchesBegan:</span><span class="hljs-symbol">withEvent:</span>
</span>-<span class="ruby"> <span class="hljs-symbol">touchesMoved:</span><span class="hljs-symbol">withEvent:</span>
</span>-<span class="ruby"> <span class="hljs-symbol">touchesEnded:</span><span class="hljs-symbol">withEvent:</span>
</span>-<span class="ruby"> <span class="hljs-symbol">touchesCancelled:</span><span class="hljs-symbol">withEvent:</span></span></code></pre>
<p>All methods get an <strong>NSSet of UITouch</strong> object as first argument and an <strong>UIEvent</strong> as second argument.</p>
<p>The arguments are a bit non-obvious: the set of UITouches in the first argument is not the overall number of current touches, but only the touches that have <em>changed their state</em>. So if there’s already 2 fingers down, and a 3rd finger touches the display, a <strong>touchesBegan</strong> will be received with a <strong>single UITouch object</strong> in the NSSet argument, which describes the touch of the 3rd finger that just came down. Same with <strong>touchEnded</strong> and <strong>touchMoved</strong>, if one of 3 fingers goes up (or moves), the NSSet will only contain a single UITouch object for the finger that has changed its state.</p>
<p>The <em>overall</em> number of current touches is contained in the UIEvent object, so if 3 fingers are down, the UIEvent object contains 3 UITouch objects. The 4 callback methods and the NSSet argument are actually redundant, since all that information is also contained in the UIEvent object. A single <em>touchesChanged</em> callback method with a single UIEvent argument would have been enough to communicate the same information.</p>
<p>Let’s have a <a href="https://developer.apple.com/library/ios/Documentation/UIKit/Reference/UIEvent_Class/index.html#//apple_ref/c/tdef/UIEventType">look at the information</a> provided by UIEvent, first there’s the method <strong>allTouches</strong> which returns an NSSet of all UITouch objects in the event and there’s a <strong>timestamp</strong> when the event occurred. The rest is contained in the returned <a href="https://developer.apple.com/library/ios/Documentation/UIKit/Reference/UITouch_Class/index.html#//apple_ref/occ/cl/UITouch">UITouch objects</a>:</p>
<p>The UITouch method <strong>locationInView</strong> provides the position of the touch, the <strong>phase</strong> value gives the current state of the touch (began, moved, stationary, ended, cancelled). The rest is not really needed or specific to the iOS platform.</p>
<h4 id="android-ndk-touch-events">Android NDK touch events</h4>
<p>On Android, I assume that the Native Activity is used, with the <a href="https://android.googlesource.com/platform/development/+/master/ndk/sources/android/native_app_glue/android_native_app_glue.h">android_native_app_glue.h</a> helper classes. The application wrapper class <strong>android_app</strong> allows to set a single input event callback function which is called whenever an input event occurs. Android NDK input events and access functions are defined in the “android/input.h” header. The input event struct <strong>AInputEvent</strong> itself is isn’t public, and can only be accessed through accessor functions defined in the same header.</p>
<p>When an input event arrives at the user-defined callback function, first check whether it is actually a touch event:</p>
<pre class="prettyprint"><code class=" hljs fsharp">int32_t <span class="hljs-class"><span class="hljs-keyword">type</span> =</span> AInputEvent_getType(aEvent);
<span class="hljs-keyword">if</span> (AINPUT_EVENT_TYPE_MOTION == <span class="hljs-class"><span class="hljs-keyword">type</span>) {</span>
<span class="hljs-comment">// yep, a touch event</span>
}</code></pre>
<p>Once it’s sure that the event is a touch event, the <strong>AMotionEvent_</strong> set of accessor functions must be used to extract the rest of the information. There’s a whole lot of them, but we’re only interested in the attributes that are also provided by other platforms:</p>
<pre class="prettyprint"><code class=" hljs scss"><span class="hljs-function">AMotionEvent_getAction()</span>;
<span class="hljs-function">AMotionEvent_getEventTime()</span>;
<span class="hljs-function">AMotionEvent_getPointerCount()</span>;
<span class="hljs-function">AMotionEvent_getPointerId()</span>;
<span class="hljs-function">AMotionEvent_getX()</span>;
<span class="hljs-function">AMotionEvent_getY()</span>;</code></pre>
<p>Together, these functions provide the same information as the iOS UIEvent object, but the information is harder to extract.</p>
<p>Let’s start with the simple stuff: A motion event contains an array of touch points, called ‘pointers’, one for each finger touching the display. The number of touch points is returned by the <strong>AMotionEvent_getPointerCount()</strong> function, which takes an AInputEvent* as argument. The accessor functions <strong>AMotionEvent_getPointerId()</strong>, <strong>AMotionEvent_getX()</strong> and <strong>AMotionEvent_getY()</strong> take an AInputEvent* and an index to acquire an attribute of the touch point at the specified index. AMotionEvent_getX()/getY() extract the X/Y position of the touch point, and the AMotionEvent_getPointerId() function returns a unique id which is required to track the same touch point across several input events.</p>
<p><strong>AMotionEvent_getAction()</strong> provides 2 pieces of information in a single return value: the actual ‘action’, and the index of the touch point this action applies to:</p>
<p>The lower 8 bits of the return value contain the action code for a touch point that has changed state (whether a touch has started, moved, ended or was cancelled):</p>
<pre class="prettyprint"><code class=" hljs ">AMOTION_EVENT_ACTION_DOWN
AMOTION_EVENT_ACTION_UP
AMOTION_EVENT_ACTION_MOVE
AMOTION_EVENT_ACTION_CANCEL
AMOTION_EVENT_ACTION_POINTER_DOWN
AMOTION_EVENT_ACTION_POINTER_UP</code></pre>
<p>Note that there are 2 down events, DOWN and POINTER_DOWN. The NDK differentiates between ‘primary’ and ‘non-primary pointers’. The first finger down generates a DOWN event, the following fingers POINTER_DOWN events. I haven’t found a reason why these should be handled differently, so both DOWN and POINTER_DOWN events are handled the same in my code.</p>
<p>The upper 24 bits contain the index (not the identifier!) of the touch point that has changed its state.</p>
<h4 id="emscripten-sdk-touch-events">emscripten SDK touch events</h4>
<p>Touch input in emscripten is provided by the new HTML5 wrapper API in the ‘emscripten/html5.h’ header which allows to set callback functions for nearly all types of HTML5 events (the complete API documentation <a href="http://kripken.github.io/emscripten-site/docs/api_reference/html5.h.html">can be found here</a>.</p>
<p>To receive touch-events, the following 4 functions are relevant:</p>
<pre class="prettyprint"><code class=" hljs bash">emscripten_<span class="hljs-keyword">set</span>_touchstart_callback()
emscripten_<span class="hljs-keyword">set</span>_touchend_callback()
emscripten_<span class="hljs-keyword">set</span>_touchmove_callback()
emscripten_<span class="hljs-keyword">set</span>_touchcancel_callback()</code></pre>
<p>These set the application-provided callback functions that are called when a touch event occurs.</p>
<p>There’s a caveat when handling touch input in the browser: usually a browser application doesn’t start in fullscreen mode, and the browser itself uses gestures for navigation (like scrolling, page-back and page-forward). The emscripten API allows to refine the events to specific DOM elements (for instance the WebGL canvas of the application instead of the whole HTML document), and the callback can decide to ‘swallow’ the event so that standard handling by the browser will be supressed.</p>
<p>The first argument to the callback setter functions above is a C-string pointer identifying the DOM element. If this is a null pointer, events from the whole webpage will be received. The most useful value is “#canvas”, which limits the events to the (WebGL) canvas managed by the emscripten app.</p>
<p>In order to suppress default handling of an event, the event callback function should return ‘true’ (false if default handling should happen, but this is usually not desired, at least for games).</p>
<p>The touch event callback function is called with the following arguments:</p>
<pre class="prettyprint"><code class=" hljs cs"><span class="hljs-keyword">int</span> eventType,
<span class="hljs-keyword">const</span> EmscriptenTouchEvent* <span class="hljs-keyword">event</span>
<span class="hljs-keyword">void</span>* userData</code></pre>
<p><strong>eventType</strong> will be one of:</p>
<pre class="prettyprint"><code class=" hljs ">EMSCRIPTEN_EVENT_TOUCHSTART
EMSCRIPTEN_EVENT_TOUCHEND
EMSCRIPTEN_EVENT_TOUCHMOVE
EMSCRIPTEN_EVENT_TOUCHCANCEL</code></pre>
<p>The 4 different callbacks are again kind of redundant (like in iOS), it often makes sense to route all 4 callbacks to the same handler function and differentiate there through the eventType argument.</p>
<p>The actual touch event data is contained the EmscriptenTouchEvent structure, interesting for us is the member <strong>int numTouches</strong> and an array of <strong>EmscriptenTouchPoint</strong> structs. A single EmscriptenTouchPoint has the fields <strong>identifier</strong>, <strong>isChanged</strong> and the position of the touch in <strong>canvasX, canvasY</strong> (other member omitted for clarity).</p>
<p>Except for the timestamp of the event, this is the same information provided by the iOS and Android NDK touch APIs.</p>
<h4 id="bringing-it-all-together">Bringing it all together</h4>
<p>The cross-section of all 3 touch APIs provides the following information:</p>
<ul>
<li>a notification when the touch state changes: <br>
<ul><li>a touch-down was detected (a new finger touches the display)</li>
<li>a touch-up was detected (a finger was lifted off the display)</li>
<li>a movement was detected</li>
<li>a cancellation was detection</li></ul></li>
<li>information about all current touch points, and which of them has changed state <br>
<ul><li>the x,y position of the touch</li>
<li>a unique identifier in order to track the same touch point over several input events</li></ul></li>
</ul>
<p>The touch point identifier is a bit non-obvious in the iOS API since the UITouch class doesn’t have an identifier member. On iOS, the pointer to an UITouch object serves as the identifier, the same UITouch object is guaranteed to exist as long as the touch is active.</p>
<p>Also, another crucial piece of information is the timestamp when the event occurred. iOS and Android NDK provide this with their touch events, but not the emscripten SDK. Since the timestamps on Android and iOS have different meaning anyway, I’m simply tracking my own time when the events are received.</p>
<p>My unified, platform-agnostic <strong>touchEvent</strong> now basically looks like this:</p>
<pre class="prettyprint"><code class=" hljs rust"><span class="hljs-keyword">struct</span> touchEvent {
<span class="hljs-keyword">enum</span> <span class="hljs-title">touchType</span> {
began,
moved,
ended,
cancelled,
invalid,
} <span class="hljs-keyword">type</span> = invalid;
TimePoint time;
int32 numTouches = <span class="hljs-number">0</span>;
<span class="hljs-keyword">static</span> <span class="hljs-keyword">const</span> int32 MaxNumPoints = <span class="hljs-number">8</span>;
<span class="hljs-keyword">struct</span> point {
uintptr identifier = <span class="hljs-number">0</span>;
glm::vec2 pos;
<span class="hljs-keyword">bool</span> isChanged = <span class="hljs-keyword">false</span>;
} points[MaxNumPoints];
}</code></pre>
<p><strong>TimePoint</strong> is an Oryol-style timestamp object. The <strong>uintptr</strong> datatype for the identifier is an unsigned integer with the size of a pointer (32- or 64-bit depending on platform).</p>
<p>Platform-specific touch events are received, converted to generic touch events, and then fed into custom gesture recognizers:</p>
<ul>
<li><a href="https://github.com/floooh/oryol/blob/d640cf7840cabe866e34290e3a00c4309cc198a3/code/Modules/Input/ios/iosInputMgr.mm">iOS touch event source code</a></li>
<li><a href="https://github.com/floooh/oryol/blob/d640cf7840cabe866e34290e3a00c4309cc198a3/code/Modules/Input/android/androidInputMgr.cc">Android touch event source code</a></li>
<li><a href="https://github.com/floooh/oryol/blob/d640cf7840cabe866e34290e3a00c4309cc198a3/code/Modules/Input/emsc/emscInputMgr.cc">emscripten touch event source code (plus mouse and keyboard input handling)</a></li>
</ul>
<p>Simple gesture detector source code: <br>
- <a href="https://github.com/floooh/oryol/blob/d640cf7840cabe866e34290e3a00c4309cc198a3/code/Modules/Input/touch/tapDetector.cc">tap detector</a> <br>
- <a href="https://github.com/floooh/oryol/blob/d640cf7840cabe866e34290e3a00c4309cc198a3/code/Modules/Input/touch/panDetector.cc">panning detector</a> <br>
- <a href="https://github.com/floooh/oryol/blob/d640cf7840cabe866e34290e3a00c4309cc198a3/code/Modules/Input/touch/pinchDetector.cc">pinch detector</a></p>
<p>And a simple demo (the WebGL version has only been tested on iOS8, mobile Safari’s WebGL implementation still has bugs): <br>
- <a href="http://floooh.github.io/oryol/TestInput.html">WebGL demo</a> <br>
- <a href="http://floooh.github.io/oryol/TestInput-debug.apk">Android self-signed APK</a></p>
<p>And that’s all for today :)</p>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-19381447989174549352014-05-24T21:02:00.001+01:002014-05-24T21:14:15.974+01:00Shader Compilation and IDEs<p>I recently played around with shader code generation and the GLSL reference compiler in <a href="https://www.github.com/floooh/oryol">Oryol</a>.</p>
<p>The result is IMHO pretty neat:</p>
<p>Shader source files (*.shd) live in the IDE next to C++ files: <br>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIlK6jfDB27qCEK1JJMdWzoa25u310D2TysvqnjTqYRuQ6l5C54XO06KnqGdUdUUQPuW5fDosyd_MZcwljQN-lYBz6FXtmHNYMC3intkjqIfuOyM56jKu6LdWniAzpVtwVUmGqeUk7suUh/s0/shd1.png" alt="enter image description here" title="shd1.png"></p>
<p>Shader files are written in normal GLSL syntax with custom annotations (those @ and $ tags): <br>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYfa5UhnBVcHuqvzry51xErNt4PLgcFtctE8b2Z75PWgaZGO58E17SuMlDw8nKRue5PNCWqCu25K8JifZ4KCogh3k53yCG1RYL3YJETdS_O07D54kQs0Mvmyrhpg-fW2AoagI-0v_EJ7pr/s0/annotated_glsl.png" alt="enter image description here" title="annotated_glsl.png"></p>
<p>When compiling the project, a custom build step will generate vertex- and fragment shaders for different GLSL versions and run them through the GLSL reference compiler. Any errors from the reference compiler are converted to a format which can be parsed by the IDE:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyNKDvycB94jKfj-vUXXnK37ZEbPSPmmbaoXSlZFWL_d9kG2XGsK8tKlKzEWoGxkXir4HCrHapRYzVOKlMoMbVbxITrvpmswaFeGogjX6p3dF7eNV5Bae-OBP7WOCT0Sb3HF42GKyc8OS9/s0/xcode_error.png" alt="enter image description here" title="xcode_error.png"></p>
<p>Error parsing also works in Visual Studio: <br>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxVzArIVlv7l2RdCfEcyM95NXUy9sDsCV6rqNl8TqqouN4U2MhyJVkE6arxli3S4Be4zX8JIg_z_3rtc1-vNFX6wTUGAeeKzWYW5L7lhEe23VS68Nr23uGCg5V4yHqeEyo5i7V0C19SIlN/s0/shd_error_vstudio.png" alt="enter image description here" title="shd_error_vstudio.png"></p>
<p>Unfortunately I couldn’t get error parsing to work in QtCreator on Linux. The error messages are recognised, but double-clicking them doesn’t work.</p>
<p>After the GLSL compiler pass, a C++ header/source file pair will be created which contains the GLSL shader code and some C++ glue to make the shader accessible from the engine side.</p>
<p>The edit-compile-test cycle is only one or two seconds, depending on the link time of the demo code. Also, since the shader generation runs as a normal build step, shader code will also be generated and validated in command line builds.</p>
<h4 id="heres-how-it-works">Here’s how it works:</h4>
<p>When <strong>cmake</strong> runs to create the build files it will look for XML files in the source code directories. For each XML file, a custom build target will be created which invokes a python script. This ‘generator script’ will generate a C++ header/source pair during compilation.</p>
<p>This generic code generation has only been used so far for the Oryol Messaging system, but it is flexible enough to cover other code generation scenarios (like generating shader code).</p>
<p>Setting up the custom build target involves 3 steps:</p>
<p>The actual build target must be created, cmake has the <strong>add_custom_target</strong> macro for this:</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">add_custom_target</span><span class="pun">(</span><span class="pln">$</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln">_gen
COMMAND $</span><span class="pun">{</span><span class="pln">PYTHON</span><span class="pun">}</span><span class="pln">
$</span><span class="pun">{</span><span class="pln">ORYOL_ROOT_DIR</span><span class="pun">}/</span><span class="pln">generators</span><span class="pun">/</span><span class="pln">generator</span><span class="pun">.</span><span class="pln">py
$</span><span class="pun">{</span><span class="pln">xmlFiles</span><span class="pun">}</span><span class="pln">
COMMENT </span><span class="str">"Generating sources..."</span><span class="pun">)</span></code></pre>
<p>This statement takes a variable <em>target</em> with the name of the build target which will compile the generated C++ sources plus a <em>xmlFiles</em> list variable and it will generate a new build target called [<em>target</em>]_gen The variables PYTHON and ORYOL_ROOT_DIR are config variables pointing to the python executable and the Oryol root directory.</p>
<p>To get the right build order, a target dependency must be defined so that the generated target is always run before the build target which needs the generated C++ source code:</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">add_dependencies</span><span class="pun">(</span><span class="pln">$</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln"> $</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln">_gen</span><span class="pun">)</span></code></pre>
<p>Finally we need to resolve a chicken-egg situation. All C++ files must exist when cmake assembles the build files, but the generated C++ files will only be created during the first build. To fix this situation, empty placeholder files are created if the generated sources don’t exist yet: </p>
<pre class="prettyprint prettyprinted"><code><span class="kwd">foreach</span><span class="pun">(</span><span class="pln">xmlFile $</span><span class="pun">{</span><span class="pln">xmlFiles</span><span class="pun">})</span><span class="pln">
</span><span class="kwd">string</span><span class="pun">(</span><span class="pln">REPLACE </span><span class="pun">.</span><span class="pln">xml </span><span class="pun">.</span><span class="pln">cc src $</span><span class="pun">{</span><span class="pln">xmlFile</span><span class="pun">})</span><span class="pln">
</span><span class="kwd">string</span><span class="pun">(</span><span class="pln">REPLACE </span><span class="pun">.</span><span class="pln">xml </span><span class="pun">.</span><span class="pln">h hdr $</span><span class="pun">{</span><span class="pln">xmlFile</span><span class="pun">})</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="pln">NOT EXISTS $</span><span class="pun">{</span><span class="pln">src</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">WRITE $</span><span class="pun">{</span><span class="pln">src</span><span class="pun">}</span><span class="pln"> </span><span class="str">" "</span><span class="pun">)</span><span class="pln">
endif</span><span class="pun">()</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="pln">NOT EXISTS $</span><span class="pun">{</span><span class="pln">hdr</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">WRITE $</span><span class="pun">{</span><span class="pln">hdr</span><span class="pun">}</span><span class="pln"> </span><span class="str">" "</span><span class="pun">)</span><span class="pln">
endif</span><span class="pun">()</span><span class="pln">
endforeach</span><span class="pun">()</span><span class="pln"> </span></code></pre>
<p>These 3 steps take care of the build configuration via cmake.</p>
<h4 id="on-to-the-python-generator-script">On to the python generator script:</h4>
<p>First, the generator script parses the XML ‘source file’ which caused its invokation. For the shader generator, the XML file is very simple:</p>
<pre class="prettyprint prettyprinted"><code><span class="tag"><Generator</span><span class="pln"> </span><span class="atn">type</span><span class="pun">=</span><span class="atv">"ShaderLibrary"</span><span class="pln"> </span><span class="atn">name</span><span class="pun">=</span><span class="atv">"Shaders"</span><span class="pln"> </span><span class="tag">></span><span class="pln">
</span><span class="tag"><AddDir</span><span class="pln"> </span><span class="atn">path</span><span class="pun">=</span><span class="atv">"shd"</span><span class="tag">/></span><span class="pln">
</span><span class="tag"></Generator></span></code></pre>
<p>The most important piece is the <em>AddDir</em> tag which tells the generator script where it finds the actual shader source files. More then one <em>AddDir</em> can be added if the shader sources are spread over different directories.</p>
<p>Generator scripts must also include a dirty-check and only actually overwrite the target C++ files when the source files (in this case: the XML file and all shader sources) are newer then the target sources to prevent unneeded compilation of dependent files. </p>
<h4 id="shader-file-parsing">Shader File Parsing</h4>
<p>Shader files will be processed by a simple line-parser:</p>
<ol>
<li>comments and white-space will be removed</li>
<li>find and process ‘@’ and ‘$’ keywords</li>
<li>gather GLSL code lines and keep track of their source file and line numbers (this is important for mapping error messages back later)</li>
</ol>
<p>A very minimal shader file looks like this:</p>
<pre class="prettyprint prettyprinted"><code><span class="lit">@vs</span><span class="pln"> </span><span class="typ">MyVertexShader</span><span class="pln">
</span><span class="lit">@uniform</span><span class="pln"> mat4 mvp </span><span class="typ">ModelViewProj</span><span class="pln">
</span><span class="lit">@in</span><span class="pln"> vec4 position
</span><span class="lit">@in</span><span class="pln"> vec2 texcoord0
</span><span class="lit">@out</span><span class="pln"> vec2 uv
</span><span class="kwd">void</span><span class="pln"> main</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
$position </span><span class="pun">=</span><span class="pln"> mvp </span><span class="pun">*</span><span class="pln"> position</span><span class="pun">;</span><span class="pln">
uv </span><span class="pun">=</span><span class="pln"> texcoord0</span><span class="pun">;</span><span class="pln">
</span><span class="lit">@end</span><span class="pln">
</span><span class="lit">@fs</span><span class="pln"> </span><span class="typ">MyFragmentShader</span><span class="pln">
</span><span class="lit">@uniform</span><span class="pln"> sampler2D tex </span><span class="typ">Texture</span><span class="pln">
</span><span class="lit">@in</span><span class="pln"> vec2 uv
</span><span class="kwd">void</span><span class="pln"> main</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
$color </span><span class="pun">=</span><span class="pln"> $texture2D</span><span class="pun">(</span><span class="pln">tex</span><span class="pun">,</span><span class="pln"> uv</span><span class="pun">);</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="lit">@end</span><span class="pln">
</span><span class="lit">@bundle</span><span class="pln"> </span><span class="typ">Main</span><span class="pln">
</span><span class="lit">@program</span><span class="pln"> </span><span class="typ">MyVertexShader</span><span class="pln"> </span><span class="typ">MyFragmentShader</span><span class="pln">
</span><span class="lit">@end</span></code></pre>
<p>This defines one vertex shader (between the <strong>@vs</strong> and <strong>@end</strong> tags) and a matching fragment shader (between <strong>@fs</strong> and <strong>@end</strong>). The vertex shader defines a 4x4 matrix uniform with the GLSL variable name <em>mvp</em> and the ‘bind name’ <em>ModelViewProj</em>, and it expects position and texture coordinates from the vertex. The vertex shader transforms the vertex-position into the special variable $position and forwards the texture coordinate to the fragment shader.</p>
<p>The fragment shader defines a texture sampler uniform with the GLSL variable name <em>tex</em> and the bind name <em>Texture</em>. It takes the texture coordinates emitted by the vertex shader, samples the texture and writes the color into the special variable $color.</p>
<p>Finally a shader <strong>@bundle</strong> with the name ‘Main’ is defined, and one shader program created from the previously defined vertex- and fragment-shader is attached to the bundle. A shader bundle is an Oryol-specific concept and is simply a collection of one or more shader programs that are related to each other.</p>
<p>Another useful tag which isn’t used in this simple example are the <strong>@block</strong> and <strong>@use</strong> tag. A @block encapsulates a piece of code which can then later be included with a @use tag in other blocks or vertex-/fragment-shaders. This is basically the missing <strong>#include</strong> mechanism for GLSL files.</p>
<p>Here’s some @block sample code, first a <em>Util</em> block is defined with general utility functions, then a block <em>VSLighting</em> which would contain lighting functions for vertex shaders, and <em>FSLighting</em> with lighting functions for fragment shaders. Both VSLighting and FSLighting want to use functions from the Util block (via <strong>@use Util</strong>). Finally the vertex- and fragment-shaders would contain a <em>@use VSLighting</em> and <em>@use FSLighting</em> (not shown). The shader code generator would then resolve all block dependencies and include the required the code blocks in the generated shader source in the right order:</p>
<pre class="prettyprint prettyprinted"><code><span class="lit">@block</span><span class="pln"> </span><span class="typ">Util</span><span class="pln">
</span><span class="com">// general utility functions</span><span class="pln">
vec4 bla</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
vec4 result</span><span class="pun">;</span><span class="pln">
</span><span class="pun">...</span><span class="pln">
</span><span class="kwd">return</span><span class="pln"> result</span><span class="pun">;</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="lit">@end</span><span class="pln">
</span><span class="lit">@block</span><span class="pln"> </span><span class="typ">VSLighting</span><span class="pln">
</span><span class="com">// lighting functions for the vertex shader</span><span class="pln">
</span><span class="lit">@use</span><span class="pln"> </span><span class="typ">Util</span><span class="pln">
vec4 vsBlub</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="kwd">return</span><span class="pln"> bla</span><span class="pun">();</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="lit">@end</span><span class="pln">
</span><span class="lit">@block</span><span class="pln"> </span><span class="typ">FSLighting</span><span class="pln">
</span><span class="com">// lighting functions for the fragment shader</span><span class="pln">
</span><span class="lit">@use</span><span class="pln"> </span><span class="typ">Util</span><span class="pln">
vec4 fsBlub</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="kwd">return</span><span class="pln"> bla</span><span class="pun">();</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="lit">@end</span></code></pre>
<h4 id="glsl-code-generation-and-validation">GLSL Code Generation and Validation</h4>
<p>From the ‘tagged shader source’, the shader generator script will create actual vertex- and fragment-shader code for different GLSL versions and feed it to the reference compiler for validation.</p>
<p>For instance, the above simple vertex/fragment-shader source would produce the following GLSL 1.00 source code (for OpenGLES2 and WebGL):</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">uniform mat4 mvp</span><span class="pun">;</span><span class="pln">
attribute vec4 position</span><span class="pun">;</span><span class="pln">
attribute vec2 texcoord0</span><span class="pun">;</span><span class="pln">
varying vec2 uv</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">void</span><span class="pln"> main</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
gl_Position </span><span class="pun">=</span><span class="pln"> mvp </span><span class="pun">*</span><span class="pln"> position</span><span class="pun">;</span><span class="pln">
uv </span><span class="pun">=</span><span class="pln"> texcoord0</span><span class="pun">;</span><span class="pln">
</span><span class="pun">}</span></code></pre>
<p>The output for a more modern GLSL version would look slightly different:</p>
<pre class="prettyprint prettyprinted"><code><span class="com">#version 150</span><span class="pln">
uniform mat4 mvp</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">in</span><span class="pln"> vec4 position</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">in</span><span class="pln"> vec2 texcoord0</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">out</span><span class="pln"> vec2 uv</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">void</span><span class="pln"> main</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
gl_Position </span><span class="pun">=</span><span class="pln"> mvp </span><span class="pun">*</span><span class="pln"> position</span><span class="pun">;</span><span class="pln">
uv </span><span class="pun">=</span><span class="pln"> texcoord0</span><span class="pun">;</span><span class="pln">
</span><span class="pun">}</span></code></pre>
<p>The GLSL reference compiler is called once per GLSL version and vertex-/fragment-shader and the resulting output is captured into a string variable. The python code to start an exe and capture its output looks like this:</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">child </span><span class="pun">=</span><span class="pln"> subprocess</span><span class="pun">.</span><span class="typ">Popen</span><span class="pun">([</span><span class="pln">exePath</span><span class="pun">,</span><span class="pln"> glslPath</span><span class="pun">],</span><span class="pln"> stdout</span><span class="pun">=</span><span class="pln">subprocess</span><span class="pun">.</span><span class="pln">PIPE</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">out</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="str">''</span><span class="pln">
</span><span class="kwd">while</span><span class="pln"> </span><span class="kwd">True</span><span class="pln"> </span><span class="pun">:</span><span class="pln">
</span><span class="kwd">out</span><span class="pln"> </span><span class="pun">+=</span><span class="pln"> child</span><span class="pun">.</span><span class="pln">stdout</span><span class="pun">.</span><span class="pln">read</span><span class="pun">()</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> child</span><span class="pun">.</span><span class="pln">poll</span><span class="pun">()</span><span class="pln"> </span><span class="pun">!=</span><span class="pln"> </span><span class="kwd">None</span><span class="pln"> </span><span class="pun">:</span><span class="pln">
</span><span class="kwd">break</span><span class="pln">
</span><span class="kwd">return</span><span class="pln"> </span><span class="kwd">out</span></code></pre>
<p>The output will then be parsed for error messages and error line numbers. Since these line-numbers are pointing into the generated source code they are not useful themselves but must be mapped back to the original source-file-path and line-numbers. This is why the line-parser had to store this information with each extracted source code line.</p>
<p>The mapped source-file-path, line-number and error message must then be formatted into the gcc/clang- or VStudio-error-message format, and if an error occurs, the python script will terminate with an error code so that the build is stopped:</p>
<pre class="prettyprint prettyprinted"><code><span class="kwd">if</span><span class="pln"> platform</span><span class="pun">.</span><span class="pln">system</span><span class="pun">()</span><span class="pln"> </span><span class="pun">==</span><span class="pln"> </span><span class="str">'Windows'</span><span class="pln"> </span><span class="pun">:</span><span class="pln">
</span><span class="kwd">print</span><span class="pln"> </span><span class="str">'{}({}): error: {}'</span><span class="pun">.</span><span class="pln">format</span><span class="pun">(</span><span class="typ">FilePath</span><span class="pun">,</span><span class="pln"> </span><span class="typ">LineNumber</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="lit">1</span><span class="pun">,</span><span class="pln"> msg</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">else</span><span class="pln"> </span><span class="pun">:</span><span class="pln">
</span><span class="kwd">print</span><span class="pln"> </span><span class="str">'{}:{}: error: {}\n'</span><span class="pun">.</span><span class="pln">format</span><span class="pun">(</span><span class="typ">FilePath</span><span class="pun">,</span><span class="pln"> </span><span class="typ">LineNumber</span><span class="pln"> </span><span class="pun">+</span><span class="pln"> </span><span class="lit">1</span><span class="pun">,</span><span class="pln"> msg</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> terminate</span><span class="pun">:</span><span class="pln">
sys</span><span class="pun">.</span><span class="kwd">exit</span><span class="pun">(</span><span class="lit">10</span><span class="pun">)</span></code></pre>
<p>This formatting works for Xcode and VisualStudio. The error is displayed by the IDE and can be double-clicked to position the text cursor over the right source code location. It doesn’t work in Qt Creator yet unfortunately, and I haven’t tested Eclipse yet.</p>
<p>Another thing to keep in mind is that build jobs can run in parallel. At first I was writing the intermediate GLSL files for the reference compiler into files with simple filenames (like ‘vs.vert’ and ‘fs.frag’). This didn’t cause any problems when doing trivial tests, but once I had converted all Oryol samples to use the shader generator I was sometimes getting weird errors from the reference compiler which didn’t make any sense at first.</p>
<p>The problem was that build jobs were running at the same time and overwrote each others intermediate files. The solution was to use randomized filenames which cannot collide. As always, python has a module just for this case called ‘tempfiles’:</p>
<pre class="prettyprint prettyprinted"><code><span class="com"># this writes to a new temp vertex shader file </span><span class="pln">
f </span><span class="pun">=</span><span class="pln"> tempfile</span><span class="pun">.</span><span class="typ">NamedTemporaryFile</span><span class="pun">(</span><span class="pln">suffix</span><span class="pun">=</span><span class="str">'.vert'</span><span class="pun">,</span><span class="pln"> </span><span class="kwd">delete</span><span class="pun">=</span><span class="kwd">False</span><span class="pun">)</span><span class="pln">
writeFile</span><span class="pun">(</span><span class="pln">f</span><span class="pun">,</span><span class="pln"> lines</span><span class="pun">)</span><span class="pln">
f</span><span class="pun">.</span><span class="pln">close</span><span class="pun">()</span><span class="pln">
</span><span class="com"># call the validator</span><span class="pln">
</span><span class="pun">...</span><span class="pln">
</span><span class="com"># delete the temp file when done</span><span class="pln">
os</span><span class="pun">.</span><span class="pln">unlink</span><span class="pun">(</span><span class="pln">f</span><span class="pun">.</span><span class="pln">name</span><span class="pun">)</span></code></pre>
<h4 id="the-c-side">The C++ Side</h4>
<p>Last but not least a quick look at the generated C++ source code. The C++ header defines a namespace with the name of the shader-library, and one class per shader-bundle. The very simple vertex/fragment-shader sample from above would generate a header like this:</p>
<pre class="prettyprint prettyprinted"><code><span class="com">#pragma</span><span class="pln"> once
</span><span class="com">/* #version:1#
machine generated, do not edit!
*/</span><span class="pln">
</span><span class="com">#include</span><span class="pln"> </span><span class="str">"Render/Setup/ProgramBundleSetup.h"</span><span class="pln">
</span><span class="kwd">namespace</span><span class="pln"> </span><span class="typ">Oryol</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="kwd">namespace</span><span class="pln"> </span><span class="typ">Shaders</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="kwd">class</span><span class="pln"> </span><span class="typ">Main</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="kwd">public</span><span class="pun">:</span><span class="pln">
</span><span class="kwd">static</span><span class="pln"> </span><span class="kwd">const</span><span class="pln"> int32 </span><span class="typ">ModelViewProj</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">static</span><span class="pln"> </span><span class="kwd">const</span><span class="pln"> int32 </span><span class="typ">Texture</span><span class="pln"> </span><span class="pun">=</span><span class="pln"> </span><span class="lit">1</span><span class="pun">;</span><span class="pln">
</span><span class="kwd">static</span><span class="pln"> </span><span class="typ">Render</span><span class="pun">::</span><span class="typ">ProgramBundleSetup</span><span class="pln"> </span><span class="typ">CreateSetup</span><span class="pun">();</span><span class="pln">
</span><span class="pun">};</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="pun">}</span></code></pre>
<p>Note the ModelViewProj and Texture constant definitions. These are used to set the uniform values in the C++ render loop.</p>
<p>How this code is actually used for rendering is a topic of its own. For now let me just point to the Oryol sample source code:</p>
<p><a href="https://github.com/floooh/oryol/tree/master/code/Samples/Render">https://github.com/floooh/oryol/tree/master/code/Samples/Render</a></p>
<h4 id="whats-next">What’s next</h4>
<p>The existing shader tags are already quite useful but only the beginning. The real problem I want to solve is to manage slightly differing variations of the same shader. For instance there might exist a specific high-level material, which must be applied to static and skinned geometry (2 variations), can cast shadows (4 variations: static shadow caster, skinned shadow caster), should be available in a forward-renderer and deferred-renderer (== many more slightly different shader variations). Sometimes an ueber-shader approach is better, and sometimes actually separate shaders for each variation are better. </p>
<p>The guts of those material shaders are always built from the same small code fragments, just arranged and combined differently.</p>
<p>Hopefully a couple of new ‘@’ and ‘$’ tags will be enough, but how this will look like in detail I don’t know yet. One inspiration are web-template engines which build web pages from a set of templates and rules. Another inspiration are the existing connect-the-dots shader editors (even though I want to keep the focus on ‘shaders-as-source-code’, not ‘shader-as-data’, but some limited runtime-code-generation would still make sense).</p>
<p>And of course the right middle-ground between ‘modern GLSL’ and ‘legacy GLSL’ must be found. Unfortunately OpenGL ES2 / WebGL1.0 will have to be the foundation for quite some time.</p>
<p>And that’s all for today :)</p>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-67089197447588057442014-04-20T19:51:00.001+01:002014-04-20T22:28:13.965+01:00cmake and the Android NDK<p>TL;DR: how to build Android NDK applications with cmake instead of the custom NDK build system, this is useful for projects which already use cmake to create multiplatform/cross-compiling build files.</p>
<p><strong>Update:</strong> Thanks to <a href="thp.io">thp</a> for pointing out a rather serious bug: packaging the standard shared libraries into the APK should NOT be necessary since these are pre-installed on the device. I noticed that I didn’t set a library search path to the toolchain lib dir in the linker step (-L…) which might explain the crash I had earlier, but unfortunately I can’t reproduce this crash anymore with the old behaviour (no library search path and no shared system libraries in the APK). I’ll keep an eye on that and update the blog post with my findings.</p>
<hr>
<p>I’ve spent the last 2.5 days adding Android support to Oryol’s build system. This wasn’t exactly on my to-do list until I sorta “impulse-bought” a Nexus7 tablet last Thursday. It basically went like this “hey that looks quite neat for a non-iPad tablet => wow, scrolling feels smooth, very non-Android-like => holy shit it runs my Oryol WebGL samples at 60fps => hmm 179 Euros seems quite reasonable…” - I must say I’m impressed how far the Android “user experience” has come since I last dabbled with it. The UI finally feels completely smooth, and I didn’t have any of those Windows8-Metro-style WTF-moments yet.</p>
<p>Ok, so the logical next step would be to add support for Android to the Oryol build system (if you don’t know what Oryol is: it’s a new experimental C++11 multi-plat engine I started a couple months ago: <a href="https://github.com/floooh/oryol">https://github.com/floooh/oryol</a>).</p>
<p>The Oryol build system is cmake-based, with a python script on top which simplifies managing the dozens of possible build-configs. A build-config is one specific combination of target-platform (osx, ios, win32, win64, …), build-tools (make, ninja, Visual Studio, Xcode, …) and compile-mode (Release, Debug) stored under a descriptive name (e.g. osx-xcode-debug, win32-vstudio-release, emscripten-make-debug, …).</p>
<p>The front-end python script called ‘oryol’ is used to juggle all the build-configs, invoke cmake with the right options, and perform command line builds.</p>
<p>One can for instance simply call:</p><div class="se-section-delimiter"></div>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol update osx</span><span class="pun">-</span><span class="pln">xcode</span><span class="pun">-</span><span class="pln">debug</span></code></pre>
<p>…to generate an Xcode project.</p>
<p>Or to perform a command line build with xcodebuild instead:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol build osx</span><span class="pun">-</span><span class="pln">xcode</span><span class="pun">-</span><span class="pln">debug</span></code></pre>
<p>Or to build Oryol for emscripten with make in Release mode (provided the emscripten SDK has been installed):</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol build emscripten</span><span class="pun">-</span><span class="pln">make</span><span class="pun">-</span><span class="pln">release</span></code></pre>
<p>This also works on Windows (32- or 64-bit): </p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> oryol build win64</span><span class="pun">-</span><span class="pln">vstudio</span><span class="pun">-</span><span class="pln">debug
</span><span class="pun">></span><span class="pln"> oryol build win32</span><span class="pun">-</span><span class="pln">vstudio</span><span class="pun">-</span><span class="pln">debug</span></code></pre>
<p>…or on Linux:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol build linux</span><span class="pun">-</span><span class="pln">make</span><span class="pun">-</span><span class="pln">debug</span></code></pre>
<p>Now, what I want to do with my shiny new Nexus7 is of course this:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol build android</span><span class="pun">-</span><span class="pln">make</span><span class="pun">-</span><span class="pln">debug</span></code></pre>
<p>This turned out to be harder then usual. But lets start at the beginning:</p>
<p>A cross-compiling scenario is normally well defined in the GCC/cmake world:</p>
<p>A <strong>toolchain</strong> wraps the target-platform’s compiler tools, system headers and libs under a standardized directory structure:</p>
<p>The compiler tools usually reside in a <strong>bin</strong> subdirectory, and are called <strong>gcc</strong> and <strong>g++</strong>, or in the LLVM world: <strong>clang</strong> and <strong>clang++</strong>, sometimes the tools also have a prefix: <strong>pnacl-clang</strong> and <strong>pnacl-clang++</strong>), or they have completely different names (like <strong>emcc</strong> in the emscripten SDK).</p>
<p>Headers and libs are often located in a <strong>usr</strong> directory (<strong>usr/include</strong> and <strong>usr/lib</strong>).</p>
<p>The toolchain headers contain at least the the C-Runtime headers, like <strong>stdlib.h</strong>, <strong>stdio.h</strong> and usually the C++ headers (<strong>vector</strong>, <strong>iostream</strong>, …) and often also the OpenGL headers and other platform-specific header files.</p>
<p>Finally the lib directory contains precompiled system libraries for the target platform (for instance <strong>libc.a</strong>, <strong>libc++.a</strong>, etc…).</p>
<p>With such a standard gcc-style toolchain, cross-compilation is very simple. Just make sure that the toolchain-compiler tools are called instead of the host platform’s tools, and that the toolchain headers and libs are used.</p>
<p>cmake standardizes this process with its so-called <strong>toolchain-files</strong>. A toolchain-file defines what compilers tools, headers and libraries should be used instead of the ‘default’ ones, and usually also overrides compile and linker flags.</p>
<p>The typical strategy when adding a new target platform to a cmake build system looks like this:</p>
<ul>
<li>setup the target platform’s SDK </li>
<li>create a new toolchain file (obviously)</li>
<li>tell cmake where to find the compiler tools, header and libs</li>
<li>add the right compile and linker flags</li>
</ul>
<p>Once the toolchain file has been created, call cmake with the toolchain file:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> cmake </span><span class="pun">-</span><span class="pln">G</span><span class="str">"Unix Makefiles"</span><span class="pln"> </span><span class="pun">-</span><span class="pln">DCMAKE_TOOLCHAIN_FILE</span><span class="pun">=[</span><span class="pln">path</span><span class="pun">-</span><span class="pln">to</span><span class="pun">-</span><span class="pln">toolchain</span><span class="pun">-</span><span class="pln">file</span><span class="pun">]</span><span class="pln"> </span><span class="pun">[</span><span class="pln">path</span><span class="pun">-</span><span class="pln">to</span><span class="pun">-</span><span class="pln">project</span><span class="pun">]</span></code></pre>
<p>Then run make in verbose mode to check whether the right compiler is called, and with the right options:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> make VERBOSE</span><span class="pun">=</span><span class="lit">1</span></code></pre>
<p>This approach works well for platforms like emscripten or Google Native Client. Some platforms require a bit of additional cmake-magic, a Portable Native Client executable for instance must be “finalized” after it has been linked. Additional build steps like these can be added easily in cmake with the <strong>add_custom_command</strong> macro.</p>
<p>Integrating Android as a new target platform isn’t so easy though:</p>
<ul>
<li>the Android SDK itself only allows to create pure Java applications, for C/C++ apps, the separate Android NDK (Native Development Kit) is required</li>
<li>the NDK doesn’t produce complete Android applications, it needs the Android Java SDK for this</li>
<li>native Android code isn’t a typical executable, but lives in a shared library which is called from Java through JNI</li>
<li>the Android SDK and NDK both have their own build systems which hide a lot of complexity</li>
<li>…this complexity comes from the combination of different host platforms (OSX, Linux, Windows), target API levels (android-3 to android-19, roughly corresponding to Android versions), compiler versions (gcc4.6, gcc4.9, clang3.3, clang3.4), and finally CPU architectures and instruction sets (ARM, MIPS, X86, with several variations for ARM (armv5, armv7, with or without NEON, etc…)</li>
<li>C++ support is still bolted on, the C++ headers and libs are not in their standard locations</li>
<li>the NDK doesn’t follow the standard GCC toolchain directory structure at all</li>
</ul>
<p>The custom build system coming with the NDK does a good job to hide all this complexity, for instance it can automatically build for all CPU architectures, but it stops after the native shared library has been compiled: it cannot create a complete Android APK. For this, the Android Java SDK tools must be called from the command line.</p>
<p>So back to how to make this work in cmake:</p>
<p>The plan looks simple enough:</p>
<ol>
<li>compile our C/C++ code into a shared library instead of an executable</li>
<li>somehow get this into a Java APK package file…</li>
<li>…deploy APK to Android device and run it</li>
</ol>
<p>Step 1 starts rather innocent, create a toolchain file, look up the paths to the compiler tools, headers and libs in the NDK, then lookup the compiler and linker command line args by watching a verbose build. Then put all this stuff into the right cmake variables. At least this is how it usually works. Of course for Android it’s all a bit more complicated:</p>
<ul>
<li>first we need to decide on a target CPU architecture and what compiler to use. I settled for ARM and gcc4.8, which leads us to <strong>[…]/android-ndk-r9d/toolchains/arm-linux-androideabi-4.8/prebuilt</strong></li>
<li>in there is a directory <strong>darwin-x86_64</strong> so we need separate paths by host platform here</li>
<li>finally in there is a bin directory with the compiler tools, so GCC would be for instance at <strong>[..]/android-ndk-r9d/toolchains/arm-linux-androideabi-4.8/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-gcc</strong></li>
<li>there’s also an include, lib and share directory but the stuff in there definitely doesn’t look like system headers and libs… bummer.</li>
<li>the system headers and libs are under the platforms directory instead: <strong>[..]/android-ndk-r9d/platforms/android-19/arch-arm/usr/include</strong>, and <strong>[..]/android-ndk-r9d/platforms/android-19/arch-arm/usr/lib</strong></li>
<li>so far so good… put this stuff into the toolchain file and it seems to compile fine – until the first C++ header must be included - WTF?</li>
<li>on closer inspection, the system include directory doesn’t contain any C++ headers, and there’s different C++ lib implementations to choose from under <strong>[..]/android-ndk-r9d/sources/cxx-stl</strong></li>
</ul>
<p>This was the point where was seriously thinking about calling it a day until I stumbled across the <strong>make-standalone-toolchain.sh</strong> in build/tools. This is a helper script which will build a standard GCC-style toolchain for one specific Android API-level and target CPU:</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">sh make</span><span class="pun">-</span><span class="pln">standalone</span><span class="pun">-</span><span class="pln">toolchain</span><span class="pun">.</span><span class="pln">sh </span><span class="pun">–-</span><span class="pln">platform</span><span class="pun">=</span><span class="pln">android</span><span class="pun">-</span><span class="lit">19</span><span class="pln">
</span><span class="pun">–-</span><span class="pln">ndk</span><span class="pun">-</span><span class="pln">dir</span><span class="pun">=</span><span class="str">/Users/</span><span class="pun">[</span><span class="pln">user</span><span class="pun">]/</span><span class="pln">android</span><span class="pun">-</span><span class="pln">ndk</span><span class="pun">-</span><span class="pln">r9d
</span><span class="pun">–-</span><span class="pln">install</span><span class="pun">-</span><span class="pln">dir</span><span class="pun">=</span><span class="str">/Users/</span><span class="pun">[</span><span class="pln">user</span><span class="pun">]/</span><span class="pln">android</span><span class="pun">-</span><span class="pln">toolchain
</span><span class="pun">–-</span><span class="pln">toolchain</span><span class="pun">=</span><span class="pln">arm</span><span class="pun">-</span><span class="pln">linux</span><span class="pun">-</span><span class="pln">androideabi</span><span class="pun">-</span><span class="lit">4.8</span><span class="pln">
</span><span class="pun">--</span><span class="pln">system</span><span class="pun">=</span><span class="pln">darwin</span><span class="pun">-</span><span class="pln">x86_64</span></code></pre>
<p>This will extract the right tools, headers and libs, and also integrate C++ headers (by default gnustl, but can be selected with the –stl option). When the script is done, a new directory ‘android-toolchain’ has been created which follows the GCC toolchain standard, and is much easier to integrate with cmake: </p>
<p>The important directories are: <br>
- <strong>[..]/android-toolchain/bin</strong>, this is where the compiler tools are located, these are still prefixed though (e.g. <strong>arm-linux-androideabi-gcc</strong> <br>
- <strong>[..]/android-toolchain/sysroot/usr/include</strong> CRT headers, plus EGL, GLES2, etc…, but NOT the C++ headers <br>
- <strong>[..]/android-toolchain/include</strong> the C++ headers are here, under ‘c++’ <br>
- <strong>[..]/android-toolchain/sysroot/usr/lib</strong> .a and .so system libs, libstc++.a/.so is also here, no idea why</p>
<p>After setting these paths in the toolchain file, and telling cmake to create shared-libs instead of exes when building for the Android platform I got the compiler and linker steps. Instead of a CoreHello executable, I got a libCoreHello.so. So far so good.</p>
<p>Next step was to figure out how to get this .so into a APK which can be uploaded to an Android device.</p>
<p>The NDK doesn’t help with this, so this is where we need the Java SDK tools, which uses yet another build system: ant. From looking at the SDK samples I figured out that it is usually enough to call <strong>ant debug</strong> or <strong>ant release</strong> within a sample directory to build an .apk file into a bin subdirectory. ant requires a <strong>build.xml</strong> file which defines the build tasks to perform. Furthermore, Android apps have an embedded AndroidManifest.xml file which describes how to run the application, and what privileges it requires. None of these exist in the NDK samples directories though…</p>
<p>After some more exploration it became clear: The SDK has a helper script called <strong>android</strong> which is used (among many other things) to setup a project directory structure with all required files for ant to create a working APK:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> android create project
</span><span class="pun">--</span><span class="pln">path </span><span class="typ">MyApp</span><span class="pln">
</span><span class="pun">--</span><span class="pln">target android</span><span class="pun">-</span><span class="lit">19</span><span class="pln">
</span><span class="pun">--</span><span class="pln">name </span><span class="typ">MyApp</span><span class="pln">
</span><span class="pun">--</span><span class="kwd">package</span><span class="pln"> com</span><span class="pun">.</span><span class="pln">oryol</span><span class="pun">.</span><span class="typ">MyApp</span><span class="pln">
</span><span class="pun">--</span><span class="pln">activity </span><span class="typ">MyActivity</span></code></pre>
<p>This will setup a directory ‘MyApp’ with a complete Android Java skeleton app. Run ‘ant debug’ in there and it will create a ‘MyApp-debug.apk’ in the ‘bin’ subdirectory which can be deployed to the Android device with ‘adb install MyApp-debug.apk’, which when executed displays a ‘Hello World, MyActivity’ string. </p>
<p>Easy enough, but there are 2 problems, <strong>first</strong>: how to get our native shared library packaged and called?, and <strong>second</strong>: the Java SDK project directory hierarchy doesn’t really fit well into the source tree of a C/C++ project. There should be a directory per sample app with a couple of C++ files and a CMakeLists.txt file and nothing more.</p>
<p>The first problem is simple to solve: the project directory hierarchy contains a libs directory, all .so files in there will be copied into the APK by ant (to verify this: a .apk is actually a zip file, simply changed the file extension to zip and peek into the file). One important point: the lib directory contains one sub-directory-level for the CPU architecture, so once we start to support multiple CPU instruction sets we need to put them into subdirectories like this:</p>
<pre class="prettyprint prettyprinted"><code><span class="typ">FlohOfWoe</span><span class="pun">:</span><span class="pln">libs floh$ ls
armeabi armeabi</span><span class="pun">-</span><span class="pln">v7a mips x86</span></code></pre>
<p>Since my cmake build-system currently only supports building for armeabi-v7a I’ve put my .so file in the armeabi-v7a subdirectory.</p>
<p>Now I thought that I had everything in place, I got an APK file with my native code .so lib in it, I used the NativeActivity and the android_native_app_glue.h approach, and logged out a “Hello World” to the system log (which can be inspected with <strong>adb logcat</strong> from the host system).</p>
<p>And still the App didn’t start, instead this showed up in the log:</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">D</span><span class="pun">/</span><span class="typ">AndroidRuntime</span><span class="pun">(</span><span class="pln"> </span><span class="lit">482</span><span class="pun">):</span><span class="pln"> </span><span class="typ">Shutting</span><span class="pln"> down VM
W</span><span class="pun">/</span><span class="pln">dalvikvm</span><span class="pun">(</span><span class="pln"> </span><span class="lit">482</span><span class="pun">):</span><span class="pln"> threadid</span><span class="pun">=</span><span class="lit">1</span><span class="pun">:</span><span class="pln"> thread exiting </span><span class="kwd">with</span><span class="pln"> uncaught exception </span><span class="pun">(</span><span class="kwd">group</span><span class="pun">=</span><span class="lit">0x41597ba8</span><span class="pun">)</span><span class="pln">
E</span><span class="pun">/</span><span class="typ">AndroidRuntime</span><span class="pun">(</span><span class="pln"> </span><span class="lit">482</span><span class="pun">):</span><span class="pln"> FATAL EXCEPTION</span><span class="pun">:</span><span class="pln"> main
E</span><span class="pun">/</span><span class="typ">AndroidRuntime</span><span class="pun">(</span><span class="pln"> </span><span class="lit">482</span><span class="pun">):</span><span class="pln"> </span><span class="typ">Process</span><span class="pun">:</span><span class="pln"> com</span><span class="pun">.</span><span class="pln">oryol</span><span class="pun">.</span><span class="typ">CoreHello</span><span class="pun">,</span><span class="pln"> PID</span><span class="pun">:</span><span class="pln"> </span><span class="lit">482</span><span class="pln">
E</span><span class="pun">/</span><span class="typ">AndroidRuntime</span><span class="pun">(</span><span class="pln"> </span><span class="lit">482</span><span class="pun">):</span><span class="pln"> java</span><span class="pun">.</span><span class="pln">lang</span><span class="pun">.</span><span class="typ">RuntimeException</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Unable</span><span class="pln"> to start activity </span><span class="typ">ComponentInfo</span><span class="pun">{</span><span class="pln">com</span><span class="pun">.</span><span class="pln">oryol</span><span class="pun">.</span><span class="typ">CoreHello</span><span class="pun">/</span><span class="pln">android</span><span class="pun">.</span><span class="pln">app</span><span class="pun">.</span><span class="typ">NativeActivity</span><span class="pun">}:</span><span class="pln"> java</span><span class="pun">.</span><span class="pln">lang</span><span class="pun">.</span><span class="typ">IllegalArgumentException</span><span class="pun">:</span><span class="pln"> </span><span class="typ">Unable</span><span class="pln"> to load </span><span class="kwd">native</span><span class="pln"> library</span><span class="pun">:</span><span class="pln"> </span><span class="str">/data/</span><span class="pln">app</span><span class="pun">-</span><span class="pln">lib</span><span class="pun">/</span><span class="pln">com</span><span class="pun">.</span><span class="pln">oryol</span><span class="pun">.</span><span class="typ">CoreHello</span><span class="pun">-</span><span class="lit">1</span><span class="pun">/</span><span class="pln">libCoreHello</span><span class="pun">.</span><span class="pln">so
E</span><span class="pun">/</span><span class="typ">AndroidRuntime</span><span class="pun">(</span><span class="pln"> </span><span class="lit">482</span><span class="pun">):</span><span class="pln"> at android</span><span class="pun">.</span><span class="pln">app</span><span class="pun">.</span><span class="typ">ActivityThread</span><span class="pun">.</span><span class="pln">performLaunchActivity</span><span class="pun">(</span><span class="typ">ActivityThread</span><span class="pun">.</span><span class="pln">java</span><span class="pun">:</span><span class="lit">2195</span><span class="pun">)</span></code></pre>
<p>This was the second time where I banged my head against the wall for a while until I started to look into how linker dependencies are resolved for the shared library. I was pretty sure that I gave all the required libs on the linker command line (-lc -llog -landroid, etc), the error was that I assumed that these are linked statically. Instead default linking against system libraries is dynamic. The ndk-depends helps in finding the dependencies:</p>
<pre class="prettyprint prettyprinted"><code><span class="pln">localhost</span><span class="pun">:</span><span class="pln">armeabi</span><span class="pun">-</span><span class="pln">v7a floh$ </span><span class="pun">~</span><span class="str">/android-ndk-r9d/</span><span class="pln">ndk</span><span class="pun">-</span><span class="pln">depends libCoreHello</span><span class="pun">.</span><span class="pln">so
libCoreHello</span><span class="pun">.</span><span class="pln">so
libm</span><span class="pun">.</span><span class="pln">so
liblog</span><span class="pun">.</span><span class="pln">so
libdl</span><span class="pun">.</span><span class="pln">so
libc</span><span class="pun">.</span><span class="pln">so
libandroid</span><span class="pun">.</span><span class="pln">so
libGLESv2</span><span class="pun">.</span><span class="pln">so
libEGL</span><span class="pun">.</span><span class="pln">so</span></code></pre>
<p><del>This is basically the list of .so files which must be contained in the APK. After I copied these to the SDK project's lib directory, together with my libCoreHello.so</del>. <strong>Update:</strong> These shared libs are not supposed to be packaged into the APK! Instead the standard system shared libraries which already exist on the device should be linked at startup. </p>
<p>I finally saw the sweet, sweet ‘Hello World!’ showing up in the adb log!</p>
<p>But I skipped one important part: so far I fixed everything manually, but of course I want automated Android batch builds, and without having those ugly Android skeleton project files in the git repository.</p>
<p>To solve this I did a bit of cmake-fu:</p>
<p>Instead of having the Android SDK project files committed into version control, I’m treating these as temporary build files.</p>
<p>When cmake runs for an Android build target, it does the following additional steps:</p>
<p>For each application target, a temporary Android SDK project is created in the build directory (basically the ‘android create project’ call described above):</p><div class="se-section-delimiter"></div>
<pre class="prettyprint prettyprinted"><code><span class="com"># call the android SDK tool to create a new project</span><span class="pln">
execute_process</span><span class="pun">(</span><span class="pln">COMMAND $</span><span class="pun">{</span><span class="pln">ANDROID_SDK_TOOL</span><span class="pun">}</span><span class="pln"> create project
</span><span class="pun">--</span><span class="pln">path $</span><span class="pun">{</span><span class="pln">CMAKE_CURRENT_BINARY_DIR</span><span class="pun">}/</span><span class="pln">android
</span><span class="pun">--</span><span class="pln">target $</span><span class="pun">{</span><span class="pln">ANDROID_PLATFORM</span><span class="pun">}</span><span class="pln">
</span><span class="pun">--</span><span class="pln">name $</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln">
</span><span class="pun">--</span><span class="kwd">package</span><span class="pln"> com</span><span class="pun">.</span><span class="pln">oryol</span><span class="pun">.</span><span class="pln">$</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln">
</span><span class="pun">--</span><span class="pln">activity </span><span class="typ">DummyActivity</span><span class="pln">
WORKING_DIRECTORY $</span><span class="pun">{</span><span class="pln">CMAKE_CURRENT_BINARY_DIR</span><span class="pun">})</span></code></pre>
<p>The output directory for the shared library linker step is redirected to the ‘libs’ subdirectory of this skeleton project:</p>
<pre class="prettyprint prettyprinted"><code><span class="com"># set the output directory for the .so files to point to the android project's 'lib/[cpuarch] directory</span><span class="pln">
</span><span class="kwd">set</span><span class="pun">(</span><span class="pln">ANDROID_SO_OUTDIR $</span><span class="pun">{</span><span class="pln">CMAKE_CURRENT_BINARY_DIR</span><span class="pun">}/</span><span class="pln">android</span><span class="pun">/</span><span class="pln">libs</span><span class="pun">/</span><span class="pln">$</span><span class="pun">{</span><span class="pln">ANDROID_NDK_CPU</span><span class="pun">})</span><span class="pln">
set_target_properties</span><span class="pun">(</span><span class="pln">$</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln"> PROPERTIES LIBRARY_OUTPUT_DIRECTORY $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
set_target_properties</span><span class="pun">(</span><span class="pln">$</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln"> PROPERTIES LIBRARY_OUTPUT_DIRECTORY_RELEASE $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
set_target_properties</span><span class="pun">(</span><span class="pln">$</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln"> PROPERTIES LIBRARY_OUTPUT_DIRECTORY_DEBUG $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span></code></pre>
<p><del>The required system shared libraries are also copied there:</del> (DON’T DO THIS, normally the system’s standard shared libraries should be used)</p>
<pre class="prettyprint prettyprinted"><code><span class="com"># copy shared libraries over from the Android toolchain directory</span><span class="pln">
</span><span class="com"># FIXME: this should be automated as post-build-step by invoking the ndk-depends command</span><span class="pln">
</span><span class="com"># to find out the .so's, and copy them over</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">libm</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">liblog</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">libdl</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">libc</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">libandroid</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">libGLESv2</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span><span class="pln">
file</span><span class="pun">(</span><span class="pln">COPY $</span><span class="pun">{</span><span class="pln">ANDROID_SYSROOT_LIB</span><span class="pun">}/</span><span class="pln">libEGL</span><span class="pun">.</span><span class="pln">so DESTINATION $</span><span class="pun">{</span><span class="pln">ANDROID_SO_OUTDIR</span><span class="pun">})</span></code></pre>
<p>The default AndroidManifest.xml file is overwritten with a customized one:</p>
<pre class="prettyprint prettyprinted"><code><span class="com"># override AndroidManifest.xml </span><span class="pln">
file</span><span class="pun">(</span><span class="pln">WRITE $</span><span class="pun">{</span><span class="pln">CMAKE_CURRENT_BINARY_DIR</span><span class="pun">}/</span><span class="pln">android</span><span class="pun">/</span><span class="typ">AndroidManifest</span><span class="pun">.</span><span class="pln">xml
</span><span class="str">"<manifest xmlns:android=\"http://schemas.android.com/apk/res/android\"\n"</span><span class="pln">
</span><span class="str">" package=\"com.oryol.${target}\"\n"</span><span class="pln">
</span><span class="str">" android:versionCode=\"1\"\n"</span><span class="pln">
</span><span class="str">" android:versionName=\"1.0\">\n"</span><span class="pln">
</span><span class="str">" <uses-sdk android:minSdkVersion=\"11\" android:targetSdkVersion=\"19\"/>\n"</span><span class="pln">
</span><span class="str">" <uses-feature android:glEsVersion=\"0x00020000\"></uses-feature>"</span><span class="pln">
</span><span class="str">" <application android:label=\"${target}\" android:hasCode=\"false\">\n"</span><span class="pln">
</span><span class="str">" <activity android:name=\"android.app.NativeActivity\"\n"</span><span class="pln">
</span><span class="str">" android:label=\"${target}\"\n"</span><span class="pln">
</span><span class="str">" android:configChanges=\"orientation|keyboardHidden\">\n"</span><span class="pln">
</span><span class="str">" <meta-data android:name=\"android.app.lib_name\" android:value=\"${target}\"/>\n"</span><span class="pln">
</span><span class="str">" <intent-filter>\n"</span><span class="pln">
</span><span class="str">" <action android:name=\"android.intent.action.MAIN\"/>\n"</span><span class="pln">
</span><span class="str">" <category android:name=\"android.intent.category.LAUNCHER\"/>\n"</span><span class="pln">
</span><span class="str">" </intent-filter>\n"</span><span class="pln">
</span><span class="str">" </activity>\n"</span><span class="pln">
</span><span class="str">" </application>\n"</span><span class="pln">
</span><span class="str">"</manifest>\n"</span><span class="pun">)</span></code></pre>
<p>And finally, a custom build-step to invoke the ant-build tool on the temporary skeleton project to create the final APK:</p>
<pre class="prettyprint prettyprinted"><code><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="str">"${CMAKE_BUILD_TYPE}"</span><span class="pln"> STREQUAL </span><span class="str">"Debug"</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">set</span><span class="pun">(</span><span class="pln">ANT_BUILD_TYPE </span><span class="str">"debug"</span><span class="pun">)</span><span class="pln">
</span><span class="kwd">else</span><span class="pun">()</span><span class="pln">
</span><span class="kwd">set</span><span class="pun">(</span><span class="pln">ANT_BUILD_TYPE </span><span class="str">"release"</span><span class="pun">)</span><span class="pln">
endif</span><span class="pun">()</span><span class="pln">
add_custom_command</span><span class="pun">(</span><span class="pln">TARGET $</span><span class="pun">{</span><span class="pln">target</span><span class="pun">}</span><span class="pln"> POST_BUILD COMMAND $</span><span class="pun">{</span><span class="pln">ANDROID_ANT</span><span class="pun">}</span><span class="pln"> $</span><span class="pun">{</span><span class="pln">ANT_BUILD_TYPE</span><span class="pun">}</span><span class="pln"> WORKING_DIRECTORY $</span><span class="pun">{</span><span class="pln">CMAKE_CURRENT_BINARY_DIR</span><span class="pun">}/</span><span class="pln">android</span><span class="pun">)</span></code></pre>
<p>With all this in place, I can now do a:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol make </span><span class="typ">CoreHello</span><span class="pln"> android</span><span class="pun">-</span><span class="pln">make</span><span class="pun">-</span><span class="pln">debug</span></code></pre>
<p>To compile and package a simple Hello World Android app!</p>
<p>What’s currently missing is a simple wrapper to deploy and run an app on the device:</p>
<pre class="prettyprint prettyprinted"><code><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol deploy </span><span class="typ">CoreHello</span><span class="pln">
</span><span class="pun">></span><span class="pln"> </span><span class="pun">./</span><span class="pln">oryol run </span><span class="typ">CoreHello</span></code></pre>
<p>These would be simple wrappers around the adb tool, later this should of course also work for iOS apps.</p>
<p>Right now the Android build system only works on OSX and only for the ARM V7A instruction set, and there’s no proper Android port of the actual code yet, just a single log message in the CoreHello sample.</p>
<p>Phew, that’s it! All this stuff is also available on github (<a href="https://github.com/floooh/oryol/tree/master/cmake">https://github.com/floooh/oryol/tree/master/cmake</a>).</p>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-66806979082457543232014-02-02T16:13:00.001+01:002014-02-02T16:13:13.633+01:00It's so quiet here...<p>…because I’m doing a lot of weekend coding at the moment. I basically caught the github bug over the holidays:</p>
<p><a href="http://www.github.com/floooh">http://www.github.com/floooh</a></p>
<p>I’ve been playing around with C++11, python, Vagrant, puppet and chef recently:</p>
<p><strong>C++11:</strong></p>
<ul>
<li>I like: move semantics, for (:), variadic template arguments, std::atomic, std::thread, std::chrono, possibly std::function and std::bind (haven’t played around with these yet)</li>
<li>(still) not a big fan of: auto, std containers, exceptions, rtti, shared_ptr, make_shared</li>
<li>thread_local vs __thread vs __declspec(thread) is still a mess across Clang/OSX, GCC and VisualStudio</li>
<li>the recent crazy-talk about integrating a 2D drawing API into the C++ standard gives me the shivers, what a terrible, terrible idea!</li>
</ul>
<p><strong>Python</strong></p>
<ul>
<li>best choice/replacement for command-line scripts and asset tools (all major 3D modelling/animation tools are python-scriptable)</li>
<li>performance of the standard python interpreter is disappointing, and making something complex like FBX SDK work in alternative Python compilers is difficult or impossible</li>
</ul>
<p><strong>Vagrant plus Puppet or Chef</strong></p>
<ul>
<li>Vagrant is extremely cool for having an isolated cross-compilation Linux VM for emscripten and PNaCl, instead of writing a readme with all the steps required to get a working build machine, you can simply check-in a Vagrantfile into the versioning system repository, and other programmers simply do a ‘vagrant up’ and have a VM which ‘just works’</li>
<li>the slow performance of shared directories on VirtualBox requires some silly workarounds, supposedly this is better with VMWare Fusion, but haven’t tried yet</li>
<li>Puppet vs Chef are like Coke vs Pepsi for such simple “stand-alone” use-cases. Chef seems to be more difficult to get into, but I think in the end it is more rewarding when trying to “scale up”</li>
</ul>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-49607748076379761362013-12-20T15:56:00.001+01:002013-12-20T16:12:24.818+01:00Asset loading in emscripten and PNaCl<p>Loading data from a file on disk doesn’t look like a big deal in a normal C application:</p>
<pre class="prettyprint prettyprinted" style=""><code class="language-c"><span class="typ">int</span><span class="pln"> main</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="com">// open file for reading</span><span class="pln">
</span><span class="typ">FILE</span><span class="pun">*</span><span class="pln"> fh </span><span class="pun">=</span><span class="pln"> fopen</span><span class="pun">(</span><span class="str">"filename"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"rb"</span><span class="pun">);</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="pln">fh</span><span class="pun">)</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="com">// read some bytes</span><span class="pln">
</span><span class="kwd">char</span><span class="pln"> buffer</span><span class="pun">[</span><span class="lit">128</span><span class="pun">];</span><span class="pln">
fread</span><span class="pun">(</span><span class="pln">buffer</span><span class="pun">,</span><span class="pln"> </span><span class="kwd">sizeof</span><span class="pun">(</span><span class="pln">buffer</span><span class="pun">),</span><span class="pln"> </span><span class="lit">1</span><span class="pun">,</span><span class="pln"> fh</span><span class="pun">);</span><span class="pln">
</span><span class="com">// close the file</span><span class="pln">
fclose</span><span class="pun">(</span><span class="pln">fh</span><span class="pun">);</span><span class="pln">
fh </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span><span class="pun">;</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="kwd">return</span><span class="pln"> </span><span class="lit">0</span><span class="pun">;</span><span class="pln">
</span><span class="pun">}</span></code></pre>
<p>When doing a real-world game this simple approach has a couple of problems:</p>
<ul>
<li><strong>blocking</strong>: The above code is blocking, when reading from a fast hard disk this is probably not even noticeable, but try loading from a DVD or Bluray disk or some sort of network drive over a slow connection and the game loop will stutter</li>
<li><strong>hard-coded paths</strong>: The concept of a <em>current directory</em> is often not portable, you can’t depend on the current directory being set to where your executable is. It is better to establish an absolute root location and have all filename paths in the game relative to that (of course how to establish this root location is platform-dependent again, for instance get the absolute path to the executable, and go on from there)</li>
<li><strong>can’t use different transfer protocols</strong>: the above code works fine for local filesystems, but not loading data from a web- or ftp-server, and operations like creating a new file, or randomly seeking in a file may not be available with other protocols.</li>
</ul>
<p>It is a good idea to restrict the type of file operations that a game can use, e.g.:</p>
<ul>
<li><strong>do we really need write and create access?</strong> An offline game may need to write save-game files and options, while an online game probably doesn’t need access to the local file system at all.</li>
<li><strong>do we really need random seek?</strong> Randomly seeking in a file can be either impossible (HTTP) or slow because some mechanical device must be moved around, it’s often better to read a file straight into memory and seek there or to avoid such operations at all.</li>
<li><strong>do we really need to iterate directory content?</strong> again, this can be either expensive (mechanical storage device) or impossible (in plain HTTP for instance)</li>
<li><strong>do we really need free-form file paths?</strong> Games usually need to access very few places in the file system (the asset directory which is usually read-only, and maybe some sort of per-user writable location for settings and save-games)</li>
<li><strong>do we really need access to file attributes?</strong> Stuff like last modification time, ownership, readable/writable. Usually this is not needed.</li>
<li><strong>do we really need the concept of a “current directory”?</strong> This can be tricky for portability, and some platforms don’t have the concept of a current working directory at all</li>
</ul>
<p>That’s a lot of features we don’t need in a game and which are also often not provided by web-based runtime platforms like PNaCl and JS. It helps to look at the HTTP protocol for inspiration, since that is where we need to load our data from anyway in the web scenario:</p>
<ul>
<li>file system paths become URLs</li>
<li>only one read operation GET, which usually provides an entire file (but can also load a part of a file)</li>
<li>no directory iteration</li>
<li>no “write access” unless specifically allowed by the server</li>
<li>state-less, no current directory or current read position</li>
<li>operations can take very long (seconds or even minutes)</li>
</ul>
<p>For a game which wants to load its asset from the web the IO system should be designed around those restrictions.</p>
<p>As an example, here’s an overview of the Nebula3 IO system:</p>
<ul>
<li><strong>all paths are URLs</strong>: Not much to say about this :)</li>
<li><strong>a single root location</strong>: At application start, a root location is established, this is usually a file:// URL pointing to the app’s installation directory, but can be overriden to point (for instance) to an http:// URL. Loading all data from a web server instead of the local hard disk is done with a single line of code which sets a different root location.</li>
<li><strong>Amiga assigns as path aliases</strong>: A filesystem path to a texture looks like this in N3: <em>tex:walls/brickwall.dds</em>, where the <em>tex:</em> is an “AmigaOS assign” which is replaced with an absolute path, incorporating the root directory.</li>
<li><strong>all paths are absolute</strong>: there is no concept of a “current directory” in Nebula3, instead all paths resolve to an absolute location at runtime by replacing assigns in the path.</li>
<li><strong>pluggable “virtual filesystem” modules associated with the URL scheme</strong>: URLs starting with file:// are handled by a different file system module than http://, plus Nebula3 apps can plug in their own filesystem modules if they want</li>
<li><strong>stream objects, stream readers and stream writers</strong>: this is interesting in the web context only because there’s a MemoryStream object which is used to store and transfer downloaded data in RAM</li>
<li><strong>asynchronous IO is really simple</strong>: more on that later in this post :)</li>
</ul>
<p>Since Nebula3 is also used as a command-line-tools framework, the IO subsystem is a bit of a hybrid, which in hindsight was a design fault. There are still all these writing and file creation operations, blocking IO, directory walking etc… which makes the API quite bloated. In a new engine I would probably strictly separate the two scenarios, use the engine as a game framework only, which only supports very simple asynchronous read operations, and write the tools with another framework (or even other language, like python). </p><div class="se-section-delimiter"></div>
<h4 id="asynchronous-io-in-nebula">Asynchronous IO in Nebula3</h4>
<p>Let’s look at async IO in Nebula3 a bit closer since this is the most interesting feature for web-based platforms. This is based on the “non-blocking future” pattern (or whatever you wanna call it) and depends on a frame-driven instead of event- or callback-driven application architecture.</p>
<p>Here’s some pseudo code:</p><div class="se-section-delimiter"></div>
<pre class="prettyprint prettyprinted" style=""><code class="language-cpp"><span class="kwd">void</span><span class="pln"> </span><span class="typ">StartLoading</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="com">// To start loading data we need to create an </span><span class="pln">
</span><span class="com">// IO request object and "send it off" to the</span><span class="pln">
</span><span class="com">// IoInterface singleton for asynchronous processing</span><span class="pln">
</span><span class="typ">Ptr</span><span class="pun"><</span><span class="pln">IO</span><span class="pun">::</span><span class="typ">ReadStream</span><span class="pun">></span><span class="pln"> req </span><span class="pun">=</span><span class="pln"> IO</span><span class="pun">::</span><span class="typ">ReadStream</span><span class="pun">::</span><span class="typ">Create</span><span class="pun">();</span><span class="pln">
req</span><span class="pun">-></span><span class="typ">SetURI</span><span class="pun">(</span><span class="str">"tex:walls/brickwall.dds"</span><span class="pun">);</span><span class="pln">
</span><span class="typ">IoInterface</span><span class="pun">::</span><span class="typ">Singleton</span><span class="pun">()-></span><span class="typ">Send</span><span class="pun">(</span><span class="pln">req</span><span class="pun">);</span><span class="pln">
</span><span class="com">// The IoRequest is now "in flight" and will contain</span><span class="pln">
</span><span class="com">// a result at some point in the future. Because we need</span><span class="pln">
</span><span class="com">// to check for completion in some later frame we need to</span><span class="pln">
</span><span class="com">// store the smart pointer somewhere</span><span class="pln">
</span><span class="kwd">this</span><span class="pun">-></span><span class="pln">pendingRequest </span><span class="pun">=</span><span class="pln"> req</span><span class="pun">;</span><span class="pln">
</span><span class="com">// ok, we're done for this frame...</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="kwd">void</span><span class="pln"> </span><span class="typ">HandlePendingRequest</span><span class="pun">()</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="com">// this function must be called regularly (e.g. per</span><span class="pln">
</span><span class="com">// frame) to check whether the async loading operation</span><span class="pln">
</span><span class="com">// has finished</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="kwd">this</span><span class="pun">-></span><span class="pln">pendingRequest</span><span class="pun">.</span><span class="pln">isvalid</span><span class="pun">()</span><span class="pln"> </span><span class="pun">&&</span><span class="pln">
</span><span class="kwd">this</span><span class="pun">-></span><span class="pln">pendingRequest</span><span class="pun">-></span><span class="typ">Handled</span><span class="pun">())</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="com">// ok, the request has been completed, if </span><span class="pln">
</span><span class="com">// the file was loaded successfully we get</span><span class="pln">
</span><span class="com">// a MemoryStream object with its content</span><span class="pln">
</span><span class="kwd">if</span><span class="pln"> </span><span class="pun">(</span><span class="kwd">this</span><span class="pun">-></span><span class="pln">pendingRequest</span><span class="pun">-></span><span class="typ">GetSuccess</span><span class="pun">())</span><span class="pln"> </span><span class="pun">{</span><span class="pln">
</span><span class="com">// actually load the data from the memory</span><span class="pln">
</span><span class="com">// stream and throw the request object away,</span><span class="pln">
</span><span class="com">// since all file data is in memory, we can</span><span class="pln">
</span><span class="com">// actually use the normal open/seek/read/close</span><span class="pln">
</span><span class="com">// pattern on the stream object</span><span class="pln">
</span><span class="kwd">this</span><span class="pun">-></span><span class="typ">LoadFromStream</span><span class="pun">(</span><span class="kwd">this</span><span class="pun">-></span><span class="pln">pendingRequest</span><span class="pun">-></span><span class="typ">GetStream</span><span class="pun">());</span><span class="pln">
</span><span class="com">// delete the request object, </span><span class="pln">
</span><span class="com">// remember, this is a smart pointer :)</span><span class="pln">
</span><span class="kwd">this</span><span class="pun">-></span><span class="pln">pendingRequest </span><span class="pun">=</span><span class="pln"> </span><span class="lit">0</span><span class="pun">;</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="pun">}</span><span class="pln">
</span><span class="pun">}</span></code></pre>
<p>There may be less verbose or more elegant versions of this code of course, but the basic idea is that you start loading a file in one frame, and then need to check in the following frames if loading has finished (or failed), and get the completely loaded data in a memory buffer which can be parsed with “traditional” read and seek functions (and which is very fast since everything happens in memory).</p>
<p>This implies that the engine needs to know what to do while some required data has not been loaded yet. For a graphics pipeline this is quite simple by either rendering nothing or some placeholder while the data is still loading.</p>
<p>But there are cases where the code cannot progress without important data being loaded, or where it would be very tricky or impossible to implement asynchronous IO (for instance when integrating complex 3rd party libraries like sqlite).</p>
<p>If we could simply block this wouldn’t be a problem: the worst thing that would happen is that our game loop would stutter, but on web platforms we cannot simply block the main thread (it is easier on PNaCl where it is recommended to move the game loop into a separate thread, which then can block waiting for the main thread to process asynchronous IO requests).</p>
<p>For Nebula3 I fixed this with an additional application object state called the “Preloading Phase”. The idea is that the app enters this state outside of the normal game loop (for instance while displaying a loading screen), and during this state, populates a simple in-memory filesystem (basically just a lookup-table with URLs as keys and MemoryStream objects as values) with the asynchronously loaded data. When all data has been loaded (or failed to load), the app will leave the preloading phase (and hide the loading screen) and synchronous loader code will transparently get the data from the in-memory file system instead of starting an actual asynchronous IO request. Since all this preloaded data resides in memory this means of course that only small data and few files should be preloaded, and the majority of data should be asynchronously streamed on demand during the game loop. It’s really only a workaround for the few cases where synchronous access is absolutely necessary.</p>
<p>More details about here in one of my presentations: <a href="http://www.slideshare.net/andreweissflog3/gdce2013-cpp-ontheweb">http://www.slideshare.net/andreweissflog3/gdce2013-cpp-ontheweb</a></p><div class="se-section-delimiter"></div>
<h4 id="emscripten-and-pnacl-details">emscripten and PNaCl details</h4>
<p>Ok, almost done!</p>
<p>For the emscripten and PNaCl platforms I basically wrote a simple Nebula3 filesystem module which fires HTTP GET requests through he respective emscripten and PNaCl API calls, and copies the received data into MemoryStream objects, it’s only a few hundred lines of code each. </p>
<p>The main difference between the two platforms lies in the use of threading:</p>
<ul>
<li>PNaCl works like “traditional” platforms, there are a number of IO threads (about 10, but that’s tweakable) each of them processes one IO request at a time, so that as many IO requests can be in flight as there are IO threads. Those threads also directly handle processing of the received data like decompression.</li>
<li>In emscripten, the IO calls (sending a HTTP request, and the callback when the response has been received) is handled on the main thread, but the expensive processing (e.g. decompression) of the received data is handed over to a WebWorker pool (usually 4 WebWorker threads). There can still be multiple IO requests in flight because the IO system doesn’t “wait” for an IO request to finish before firing a new one (but it is still throttled to restrict the number of requests in flight in case a lot of requests arrive in a very short time period).</li>
</ul>
<p>The actual code implementation is straightforward so I’ll spare you the source code samples. The respective class in PNaCl is called <strong>pp::URLLoader</strong>, and emscripten offers a whole set of rather specialized C functions which all start with <strong>emscripten_async_wget</strong>. Both fire an HTTP request (emscripten does an XmlHttpRequest, and PNaCl presumably under the hood as well - this has some unfortunate cross-domain implications), and invoke callbacks on failure or when data has arrived. PNaCl needs a bit more coding work since data is received in chunks (and the receive callback can be called multiple times), while emscripten waits until all data is received before calling the received-callback once.</p>
<p>emscripten has more options to integrate the data with the web page DOM (for instance it can automatically create DOM image objects from downloaded image files), and it also has a very advanced CRT IO emulation layer (so you actually <em>can</em> directly use fopen/fclose after the data has been downloaded or preloaded), but I haven’t looked into these advanced concepts very closely since Nebula3 already does a lot of this layering itself.</p>
<p>There’s a similar filesystem layer for NaCl called nacl-mounts, but similarly to emscripten I didn’t look into this very closely since the low-level URL loading functions were a better fit for N3.</p>
<p>That’s it for today, have a nice Christmas everyone :)</p>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-33298515762959551122013-11-03T16:11:00.001+01:002013-11-03T16:11:44.717+01:00Messing around with MESS (and JSMESS)<p>And now for something completely different:</p>
<p>Since I’m dabbling with emscripten I’ve had this idea in my head to write or port a <a href="http://en.wikipedia.org/wiki/KC85">KC85/3</a> emulator, so that I could play the games I wrote as a kid directly in the browser. The existing KC85 emulators I was aware of are not trival to port, they either depend on x86 inline assembly, or are hardwired to a specific UI framework (if you read German, here’s an overview on what’s out there: <a href="http://www.kc85emu.de/Emulatoren/Emulatoren.htm">http://www.kc85emu.de/Emulatoren/Emulatoren.htm</a> )</p>
<p>About 2 weeks ago I started to look around more seriously for a little side project to spent my 3 weeks of vacation around Christmas (I need to burn my remaining vacation days, in Germany employees are basically required by law to take all their vacation - tough shit ;) My original plan was to cobble together a minimal emulator just enough to run my old games: Take an existing Z80 CPU emulator like the one from <a href="http://fuse-emulator.sourceforge.net/">FUSE</a>, hack some keyboard input and video output and go on from there.</p>
<p>Thankfully I then had a closer look at <a href="http://www.mess.org/">MESS</a>. I always thought that MESS could only emulate the most popular Western game machines like the C64 or Atari 400, but it turns out that this beast can emulate pretty much any computer that ever existed (between 600 and 1700, depending on how you count), it even has support for the PDP-1 from the early 60’s! When searching through the list of emulated systems here (<a href="http://www.progettoemma.net/mess/sysset.php">http://www.progettoemma.net/mess/sysset.php</a>) I stumbled over the following entries: </p>
<ul>
<li>HC900 / KC 85/2</li>
<li>KC 85/3</li>
<li>KC 85/4</li>
<li>KC Compact</li>
<li>Lerncomputer LC 80</li>
<li>KC 85/1</li>
<li>Z1013</li>
<li>Poly-Computer 880</li>
<li>BIC A5105</li>
</ul>
<p>That’s the entire list of East-German “hobby computers”. But wait, there’s more:</p>
<ul>
<li>Robotron PC-1715</li>
<li>A5120</li>
<li>A7150</li>
</ul>
<p>These were GDR office computers. The 1715 was a CP/M compatible 8-bit PC, and the A7150 was a not-quite-compatible x86 IBM-PC clone. I’m actually not sure what the 5120 was, just that it was a big ugly box with built-in mono-chrome monitor.</p>
<p>Since all those systems are marked as “not working” in this list I wasn’t too enthusiastic yet, but I had to be sure. The latest MESS compiled out of the box on OSX, and it was easy to find the right ROM images in the net. So I started MESS with:</p>
<blockquote>
<p>./mess64 kc85_3 -window</p>
</blockquote>
<p>To my astonishment I watched a complete boot sequence into the operating system:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGONfjBpsJEfy6vF2D3cyGIgb5aiw6cQ8_KQFb2Z0ioEb9DihmgTOEXCkUPUf2pmVuKrm5WHSDeR3LyVSLCDR9KZR2zoS39BDLcB80pUI8JOgRoGm7umYpZgjWXEq4fqOzWtZTciPecKMg/s640/kc85_3.png" alt="KC85/3 system shell" title="kc85_3.png"></p>
<p>Excite!</p>
<p>I also came across the <a href="https://github.com/jsmess/jsmess">JSMESS</a> project before, which is a port of MESS to Javascript using emscripten. So my next step was to compile JSMESS and see whether the KC emulator works there as well. It booted, but didn’t accept any keyboard input :( After comparing the source code it dawned on me that JSMESS was far behind MESS, about 2 years to be exact. But this was a good excuse to dive a bit deeper into how MESS actually works, and the deeper I crawled the more impressed I became.</p>
<p>MESS had been derived from the well known MAME arcade machine emulator project, with the goal to extend the emulation to “real computers”. Later MESS merged with MAME again, so that today both projects compile from the same code base. </p>
<p>A specific emulated machine is called a “system driver” and can be described by just a few lines of code listing what CPU to use, the RAM and ROM areas, what ROM image to load, and what memory-mapped IO registers exist. You’ll also have to provide several callback routines for handling reads and write to IO addresses and to convert the system’s video memory into a standardized bitmap representation. For a very simple computer built from standard chips a working emulator can be plugged together in a couple of hours, but writing a complete and “cycle-perfect” emulator is of course still a tough challenge, especially if custom chips are used. The overall amount of research and implementation work that went into MESS is almost overwhelming. Pretty much every computer, every mass-produced chip that ever existed is emulated in there, often with all of their undocumented quirks!</p>
<p>Ok, back to the KC85/3: after analyzing the source code of the KC driver it quickly became clear that the keyboard input emulation was the toughest part, since this was where the original KC engineers were very “creative”. As far as I understood the several pages of email exchange which are included as comment in the MESS KC driver, the KC keyboard used a very exotic TV remote control chip to send serial pulses to the main unit (the KC had an external keyboard connected with a “very thin” wire, so it was very likely a simple serial connection). The base unit which received the signal didn’t have a “decoder chip” however, but used its universal Z80-CTC (timer) and -PIO (in/out) chips to decode the signal. Emulating this behaviour seems to be very tricky since a lot of KC emulators have yanky keyboard input (not registering key presses, or inserting random key codes when typing fast, etc…). </p>
<p>Since I didn’t get this to work reliably even after back-porting the latest keyboard input code from MESS (which somewhat works, but still has problems with random keys triggering), I decided to be a bit naughty and implement a shortcut (the “cycle-perfect” emulator purists will likely kill me for this heresy):</p>
<p>After the KC-ROM reads a keyboard scan-code through this tricky serial-pulse decoding described above, it converts the scan code to ASCII and writes it to memory location 0x1FD, and then sets bit 0 in memory location 0x1F8 to signal that a new key code is available. It also maintains a keyboard repeat counter in address 0x1FA. All of this can be gathered from the keyboard handling code in ROM (and is also explained in that very informative, very long comment in the source code). I’m basically “shortcutting” this with C code and write the ASCII code directly to 0x1FD and also handle the key repeat directly in C. The tricky serial decoding stuff in ROM is never triggered this way. With this hack the keyboard input is fairly responsive (sometimes the first key is swallowed, don’t know yet what’s up with this).</p>
<p>Next I had to fix the RGB colors which were off both in MESS and JSMESS (bright yellow-green looked more like puke-yellowish, and all other “inbetween colors” were off too), and I finally back-ported (and also optimized a bit) the video memory mapping code from MESS to JSMESS.</p>
<p>You can check all my changes here on GitHub: <a href="https://github.com/floooh/jsmess/tree/floh">https://github.com/floooh/jsmess/tree/floh</a> Right now a “reboot” is going on in the JSMESS project to bring it uptodate with the latest MESS version. I’ll wait with any pull-requests until this is finished and I refreshed my own fork as well. Also I will not try to contribute my “dirty hacks” back to the main code base of course, the MESS guys are right to insist on perfect emulation instead of shortcut hacks like the keyboard hack described above. But my (rather egoistic) main goal is to get my own games running on my web page, so I think I can get away with such hacks in my own fork.</p>
<p>The next challenge is to get all of my games running in JSMESS. This is harder than I thought. Part of the problem is that there exist several memory dump files which are not original. I found dump files with the wrong entry address, and dumps where others have implemented cheats and trainers. So far I’ve got 3 out of 7 games working. Getting the remaining 4 games into working condition might take a while since I may have to do some hardcore assembly debugging to find out what’s wrong.</p>
<p>Thankfully MESS has a completely assembler-level debugger built-in:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvpjKuMcjsjUIz5TfgHzYArEWgk7FsE2gpu1EDstue7x6vNE0sdMYckduTo8SX-QwNXZLwJzRBWzLPj7uZ-Ttluv6x60macILNXwZdWbdNYoVIeBCC-JADJYSaesBPr3ySBL3oU9b5IskP/s640/mess_debugger.png" alt="MESS Debugger" title="mess_debugger.png"></p>
<p>Re-constructing the program flow of this 25-year-old game which I wrote in machine code (instead of using a assembler) is actually quite a lot of fun, much easier than trying to reconstruct a program which was written in a high-level language and compiled to machine code. Subroutines often start at “even” addresses, and have a block of NOP instructions appended, in case I needed to add instructions when fixing bugs, strings are usually embedded right into the instruction sequence instead of a central “string pool”. Analyzing the program flow comes down to figuring out what a given subroutine does (drawing a sprite? handling keyboard input? updating the hiscore display?), and what variables are stored at specific memory addresses (for instance current live counter, current position, and so on).</p>
<p>What’s remarkable is how small the game code actually is, even though it is not very dense with all those NOPs inbetween and a lot of redundant code segments (e.g. I didn’t specifically care about code size). Of the about 12kByte of my (very simple) Pacman clone, only about 3.5 kByte are actual code. The entire game code fits on a single screen (marked in yellow here):</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZm4otadbjX48LytE-ItEpdXSL5bf4a8FHNcyalwwTSAUundWjcS02XtN2q2tbS2JJSlzVzK41W7ZnYFnmpE8nhE2-94qEER7ZjaMahyl24Jp-J0bfvZgAyFkmGiGYQuL_OPdFSbT7uVX3/s640/pacman_dump.png" alt="enter image description here" title="pacman_dump.png"></p>
<p>Finally, here’s the current result of this work: a JSMESS KC85/3 and KC85/4 emulator, and 3 of my old games running directly in the browser. Don’t try this on an iPhone though (or generally Safari). Firefox or an uptodate Chrome works very well:</p>
<p><a href="http://www.flohofwoe.net/history.html">http://www.flohofwoe.net/history.html</a></p>
<blockquote>
<p>Written with <a href="https://stackedit.io/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-62512576329721979822013-10-08T21:04:00.001+01:002013-10-08T21:17:01.844+01:00Farewell DirectX<p>Today I ported the OpenGL rendering code in Nebula3's bleeding edge branch back to Windows:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg12m4L1VLT6WvceJg3m8Gi1jZEwj0N4HAT5BNfsz645xgp3lx9t_RX3d7OCGwPHwh6KvciF1REHXmsasl8dLd9sjqXYKf2wvrbNPbDM1mTQVnzkXoDDnsn4zdPCIzZvJdBkbKXLYlTdks4/s480/cg2_win32.png" alt="enter image description here" title="Windows version of N3 running in OpenGL"></p>
<p>This is remarkable in 2 ways:</p>
<ol>
<li>It's the first time since around 1997 that I ported a significant amount of code <em>to</em> Windows. Usually it was from Windows towards another platform.</li>
<li>This is also the end of DirectX in our code base (well almost, we're still using the xnamath.h header, which is a standalone header and doesn't require linking against any DX DLL).</li>
</ol>
<p>Why do I think that this is remarkable:</p>
<p>It is the end of an era! In 1997 I ported Urban Assault from DOS to Windows95 and Direct3D5. This was just around the time when Windows started its career as a gaming platform. D3D5 was the first D3D version which didn't completely suck because it had the new DrawPrimitive API, before that, rendering commands had to be issued through an incredibly arcane "execute buffer" concept (theoretically a good idea if GPUs would have been able to directly parse this buffer, but terrible to use in real-world code). The Urban Assault port to D3D was pretty inefficient since we ported from a software rasterizer (with perspective correction and all that cool shit), and if I remember correctly we issued every single triangle through a single DrawPrimitive call (although that wasn't such a big deal at the time). And the only graphics card which had somewhat acceptable D3D support was the RIVA128 from an underdog company called nVidia (this was before their breakthrough TNT2), and the top dog was the 3dfx Voodoo2 which had much better support for Glide then for D3D. But since UA was published by Microsoft we had to be D3D-exclusive of course.</p>
<p>Since 1998 Direct3D was our primary rendering API, I dabbled around with OpenGL from time to time, but nothing serious. We made the jump to D3D7, D3D8, and finally D3D9. Each new version sucked less and less, and D3D9 is still a really good API. We never made the jump to D3D10 because of Microsoft's exceptionally stupid decision to not back-port D3D10 to Windows XP from Vista, and since Nebula was never about high-end rendering features but instead running on a broad range of old hardware we could never justify to add D3D10 support, since we couldn't give up D3D9.</p>
<p>And as silly as it sounds, this boneheaded Microsoft decision from 7 years ago is one important reason why I'm ditching D3D today. World-wide, WindowsXP is the <em>fastest growing</em> Windows version. It's growing a lot faster than Windows8. Don't believe me? See the Unity hardware stats page for a scary reality check:</p>
<p><a href="http://stats.unity3d.com/web/index.html">http://stats.unity3d.com/web/index.html</a></p>
<p>The Chinese Dragon has awoken, and it is running exclusively on XP. WindowsXP is also very popular in Eastern Europe and the Middle East. So if you want to support markets east and south of Middle Europe you're basically fucked if you don't support XP.</p>
<p>Another important reason is streamlining the code base. The currently "interesting platforms" (browser and mobile) are all running some variant of POSIX+OpenGL. In this new world the Windows APIs are the exotics, and Microsoft doesn't exactly help the cause by repeating their errors of the past (limiting Windows Store apps to D3D11). By using a single rendering code base (and especially shader code base!) across all platforms we're reducing our technical debt in the future.</p>
<p>I have a fallback plan of course, because there are a few risks:</p>
<ul>
<li>What if OpenGL driver quality on Windows is as bad as everybody says?</li>
<li>What if we need to support native Windows Store apps (as opposed to a WebGL version running embedded in a browser)?</li>
</ul>
<p>The fallback plan has 2 stages:</p>
<ol>
<li>Use <a href="https://code.google.com/p/angleproject">ANGLE</a> which layers OpenGL ES2 with some important extensions over D3D9 or D3D11, this is the preferred solution since we don't need to touch the render layer code and shader library.</li>
<li>If ANGLE isn't good enough, write native D3D9 and D3D11 ports of the CoreGraphics2 subsystem, and optimally use some API agnostic shader language wrapper. This wouldn't be as bad as it sounds, each wrapper would have around 7k lines of code, which is about 4.5% of Nebula3 in its minimal useful configuration (which is about 150k lines of code, depending on which other N3 modules are added this can go up to 500k lines of code).</li>
</ol>
<p>OpenGL isn't perfect of course. It has some incredibly crufty corners, most of those have been fixed in through extensions and newer GL versions over time, but realistically we can't use anything newer then OpenGL ES2 with very few extensions for the renderer's base feature set.</p>
<p>When I removed the DirectX library stubs from the Nebula3 CMake files this afternoon I really had to stop and think for a moment. Who knows, maybe in a future blog post in about 15 years I will write "this was around the time when Windows became irrelevant as a gaming platform"? ;)</p>
<blockquote>
<p>Written with <a href="http://benweet.github.io/stackedit/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-88412039050393918422013-09-07T16:12:00.001+01:002013-09-07T16:16:16.306+01:00emscripten and PNaCl: App entry in PNaCl<p>The is the followup to last week's post about <a href="http://flohofwoe.blogspot.de/2013/09/emscripten-and-pnacl-app-entry-in.html">application entry in emscripten</a>. If you haven't done yet I would recommend reading this first before continuing.</p>
<p>2 main points to keep in mind about the (P)NaCl platform:</p>
<ol>
<li>Blocking the main thread will block the entire browser tab.</li>
<li>NaCl has true threading support which can be used to workaround these blocking limitations.</li>
</ol>
<p>Point (1) is the same as on the emscripten platform, and point (2) is the big difference to emscripten.</p>
<p>In a Nebula3/PNaCl application, the main function looks the same as on any other platform (I'm using emscripten's "simulate_infinite_loop" approach now):</p>
<pre><code>#include "myapplication.h"
ImplementNebulaApplication();
void
NebulaMain(const Util::CommandLineArgs& args)
{
MyApplication app;
app.SetCommandLineArgs(args);
app.StartMainLoop();
}
</code></pre>
<p>However under the hood, the startup process until the NebulaMain() function is entered is completely different from other platforms, since <strong>PNaCl doesn't have a main() function</strong>. Instead PNaCl has the concept of application <strong>Module</strong> and <strong>Instance</strong> objects. This is where the plugin-nature of a PNaCl app shines through. There is a single Module object created on a web page containing a PNaCl app, and for each <code><embed></code> element on the page, one Instance object. In reality though, most of the time there will be exactly one Module and one Instance object, so the distinction doesn't really matter.</p>
<p>PNaCl offers two different startup APIs for C and C++. The C++ API is easier to grasp IMHO, so I'll just concentrate on this (this dual C/C++ nature continues through the whole NaCl API, there's a pure C API, extended by a slightly higher-level C++ API. </p>
<p>Hooking up your code to NaCl basically means to write 2 subclasses, one deriving from <strong>pp::Module</strong>, and one deriving from <strong>pp::Instance</strong>, and the NaCl runtime will then call into these classes through virtual methods for initialisation and notifying the application about events.</p>
<p>But first things first: </p>
<p>Everything starts at a global C Function called <strong>pp::CreateModule()</strong> which you must provide, and which must return a new object of your pp::Module subclass (called <strong>N3NaclModule</strong> in this case):</p>
<pre><code>namespace pp
{
Module* CreateModule()
{
return new N3NaclModule();
};
}
</code></pre>
<p>Although this is the very first function that NaCl will call, you should be aware that initialisers in the global scope (static objects) will already be initialised and have had their constructors called at this point.</p>
<p>The main job of the derived Module class is to create Instance objects, but we can also put some one-time init code in there. There's a pair of functions to initialise and shutdown GL rendering called <strong>glInitializePPAPI()</strong> and <strong>glTerminatePPAPI()</strong>. The only rule is that no GL calls must be made outside these two functions, so I guess we could also put them somewhere else, as long as is guaranteed that they are not called multiple times.</p>
<p>But - the most important method in the derived Module class is the factory method for Instance objects called <strong>CreateInstance</strong>. In my case, I have created a subclass of pp::Instance called <strong>NACL::NACLBridge</strong>.</p>
<p>The entire N3NaclModule class looks like this:</p>
<pre><code>class N3NaclModule : public pp::Module
{
public:
virtual ~N3NaclModule()
{
glTerminatePPAPI();
}
virtual bool Init()
{
return glInitializePPAPI(get_browser_interface()) == 1;
}
virtual pp::Instance* CreateInstance(PP_Instance instance)
{
return new NACL::NACLBridge(instance);
};
};
</code></pre>
<p>All the really interesting stuff from here on happens in the NACLBrigde object.</p>
<p>These two source snippets live inside the ImplementNebulaApplication() macro which all in all looks like this:</p>
<pre><code>...
#elif __NACL__
#define ImplementNebulaApplication() \
class N3NaclModule : public pp::Module \
{ \
public: \
virtual ~N3NaclModule() \
{ \
glTerminatePPAPI(); \
} \
virtual bool Init() \
{ \
return glInitializePPAPI(get_browser_interface()) == 1; \
} \
virtual pp::Instance* CreateInstance(PP_Instance instance) \
{ \
return new NACL::NACLBridge(instance); \
}; \
}; \
namespace pp \
{ \
Module* CreateModule() \
{ \
return new N3NaclModule(); \
}; \
}
#elif __MACOS__
...
</code></pre>
<p>Now on to the NACLBridge class, this is (I know I'm repeating myself) derived from the pp::Instance class, but is called "Bridge" for a reason: in the PNaCl we're spawning a dedicated thread for the game loop, and leave the main thread (aka the Pepper thread) for event handling and rendering. Our derived pp::Instance subclass serves as a "bridge" between these 2 threads, that's why it's called <strong>NACLBridge</strong>.</p>
<p>The NaCl runtime will call into virtual methods of an pp::Instance object for handling events, the most important of these are <strong>Init(), DidChangeView(), HandleInputEvent()</strong>. For a complete overview and exhaustive documentation of those callback methods I recommend sifting directly through the SDK header: <strong>include/ppapi/cpp/instance.h</strong></p>
<p>In the Init() method I'm only building a CommandLineArgs object from the provided raw arguments (these have been extracted from our <code><embed></code> element in the HTML page).</p>
<p>The actual initialisation work happens (in my case) in the first call to <strong>DidChangeView()</strong> by calling a Setup() method in the NACLBridge object. I choose this place because this is where I'm getting the current display dimensions of the <code><embed></code> element, which is required for the renderer initialisation (although now thinking about it, I might also be able to extract these from the arguments provided in the Init() method, need to try this out some time). </p>
<p>The <strong>NACLBridge::Setup()</strong> method only does one thing: create a thread with the <strong>NebulaMain()</strong> function as entry point, and then return to the NaCl runtime. The code inside NebulaMain() works just as on any other platform, with the only difference that it is not running on the main thread, but in its own dedicated game thread.</p>
<p>The big advantage to run the game loop in its own thread is that you "own the game loop", and you can perform blocking, for instance to wait for IO. The disadvantage is that you can't call any PPAPI (NaCl system functions) from the game thread, which is a blog-post-topic on its own.</p>
<p>So to recap: The <strong>ImplementNebulaApplication</strong> macro runs on the main thread, and creates one <strong>pp::Module</strong> and one <strong>pp::Instance</strong> object. The <strong>pp::Instance</strong> object creates the dedicated game thread, which calls into the <strong>NebulaMain()</strong> function, which from that moment on runs the game loop like on any other platform. With this approach we don't need to slice the game loop into frames like on the emscripten platform.</p>
<p>Now that you heroically worked your way through through all of this I'll tell you a secret: NaCl also provides a simple alternative to this complicated mess called the <strong>ppapi_simple</strong> library, which essentially provides a classic main() function running in its own thread, and because blocking is allowed on this thread, also provides normal POSIX fopen()/fclose() style blocking IO functions (sound familiar?). </p>
<p>Check out the header file <strong>include/ppapi_simple/ps.h</strong> as starting point.</p>
<p>Unfortunately this ppapi_simple library didn't exist when I started dabbling with NaCl about 2 years ago, certainly would have made life a lot easier. On the other hand, the work that had already gone into the NaCl port made the emscripten port easier, which wouldn't be the case had I used the ppapi_simple wrapper code. </p>
<blockquote>
<p>Written with <a href="http://benweet.github.io/stackedit/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-43457172902203294822013-09-01T14:21:00.001+01:002013-09-01T14:32:04.925+01:00emscripten and PNaCl: App entry in emscripten<p>When quickly hacking a graphics demo on the PC or consoles, the main function usually looks like this:</p>
<pre><code>int main()
{
if (Initialize())
{
while (!Finished())
{
Update();
Render();
}
Cleanup();
}
return 0;
}
</code></pre>
<p>Trying this in on one of the browser platforms like emscripten or PNaCl results in a freeze and after a little while the browser will kill your tab :(</p>
<p>The problem is that the browser won't "let you own the game loop", and this is a general problem of event- or callback-driven platforms (iOS and Android have the same problem for instance). On such platforms the execution flow of the main thread is not controlled by your game code, instead there's some outer event loop which will call into your code from time to time. If you spent too much time in your allotted slice of the pie you will drag the entire system event loop down and other important events (such as input events) can't be handled fast enough. Result is that the entire user interface will feel sluggish and unresponsive to the user (for instance, scrolling in your browser tab will stutter or even freeze for multiple seconds). And if you don't return for about 30 seconds, then the browser will kill your app (Aw Snap!).</p>
<p>This is all bad user experience of course, we want the browser to remain responsive, and scrolling smooth <strong>all</strong> the time, also during initialisation and load time.</p>
<p>The core problem is that your code <strong>must always</strong> return within a few milliseconds back to the browser (e.g. 16 or 33, depending on whether you're aiming for 60 or 30fps), and this is the big riddle we need to solve for a game application running in a browser.</p>
<p>For a Flash or Javascript coder, or someone who's mainly writing event-driven UI applications this will all be familiar, they are used to have all their code run inside event handlers and callbacks, but typical UI apps usually don't need to do anything continuous. Event-driven applications sleep most of the time, react to (mostly input-) events from the outside, and go to sleep again. But games need to do continuous rendering, and thus are <strong>frame-driven</strong>, not <strong>event-driven</strong>, and mixing these two programming models isn't a very good idea because its hard to follow the code-flow. The usual way to implement games on event-driven platforms is to setup a timer which calls a per-frame callback function many times per second. I think hacks like this is why game programmers have a deep hatred for UI-centric platforms (and why I still like Windows despite its other shortcomings, because the recommended event handling model in Windows for games (PeekMessage -> TranslateMessage -> DispatchMessage) actually lets you "own the game loop" in a very simple and elegant way through message polling).</p>
<p>There are a few different approaches to either get a true continuous game loop, or at least to create the illusion of a continuous game loop on platforms where polling isn't possible, mainly depending on whether "true" pthreads-style multi-threading is supported or not.</p>
<p>In a Nebula3/emscripten application this isn't the case, the actual game loop and the rendering code runs on the main thread. Reason for this is that emscripten's multithreading support is built on WebWorkers. pthreads emulation isn't possible in emscripten since WebWorkers can't share memory with the main thread, furthermore, WebWorkers can't call into WebGL. This puts a lot of restrictions on our "game loop problem", and it required to refactor Nebula3's application model: in all previous ports there was always a way to somehow run a continuous game loop, mostly by moving the game loop into its own thread, but we don't have this option in emscripten (yet ... but hopefully one day, with more flexible WebWorkers).</p>
<p>Traditionally, a Nebula3 application used to go through a simple "Open -> Run -> Close -> Exit" sequence. An N3 main file looked like this for instance:</p>
<pre><code>#include "myapplication.h"
ImplementNebulaApplication();
void
NebulaMain(const Util::CommandLineArgs& args)
{
MyApplication app;
app.SetCommandLineArgs(args);
if (app.Open())
{
app.Run();
app.Close();
}
app.Exit();
}
</code></pre>
<p>Instead of a main() function, there's a NebulaMain() wrapper function and a macro called <em>ImplementNebulaApplication()</em>. These hide the fact that not all platforms have a standard main() (for a Windows application, one would typically use WinMain() for instance).</p>
<p>The actual system main function is hidden inside the <em>ImplementNebulaApplication()</em> macro, for a PC-like platform the macro code looks like this:</p>
<pre><code>int __cdecl main(int argc, const char** argv)
{
Util::CommandLineArgs args(argc, argv);
return NebulaMain(args);
}
</code></pre>
<p>Now back up to the NebulaMain() function's content: the Application::Open() method could take a while to execute (couple of seconds, worst case), and the Application::Run() will contain the "infinite" game loop, which only returns when the application should quit.</p>
<p>Since this wasn't a very good fit for the emscripten platform (because of this "infinite" loop inside the Run() method), first step was to make the app entry even more abstract to give the platform-specific code more wiggle room:</p>
<pre><code>#include "myapplication.h"
ImplementNebulaApplication();
void
NebulaMain(const Util::CommandLineArgs& args)
{
static MyApplication* app = new MyApplication();
app->SetCommandLineArgs(args);
app->StartMainLoop();
}
</code></pre>
<p>The most obvious change is that there's only a single StartMainLoop() method instead of the Open->Run->Close->Exit sequence. And at closer inspection some strange stuff is going on here: The application object is now created on the heap, the pointer to the object lives in the global scope, and the app object is never deleted. WTF?!?</p>
<p>To understand what's going on we need to dive a bit deeper into the emscripten system API.</p>
<p>The StartMainLoop function is actually only a one-liner on the emscripten platform:</p>
<pre><code>emscripten_set_main_loop(OnPhasedFrame, 0, 0);
</code></pre>
<p>This sets the per-frame callback (called OnPhasedFrame) which the browser runtime will call regularly, and we'll have to do <strong>everything</strong> inside this callback function. The first 0-arg is the intended callback frequency per second (e.g. 60). 0 has a special meaning: in this case emscripten is using the modern requestAnimationFrame mechanism to call our per-frame function (instead of of the old-school setInterval or setTimeout way). The second argument is called simulateInfiniteLoop, and to understand what this does it is first necessary to understand what happens when it is <em>not</em> used:</p>
<p>The emscripten_set_main_loop() function will simply return, all the way up to main(), which will also return right after it has started! WTF indeed...</p>
<p>In a normal C program, returning from the main() function means that the program is shutting down of course. Local-scope objects will be destroyed before leaving main(), then global-scope objects (static initialisers).</p>
<p>In emscripten's case, a program which has called emscripten_set_main_loop() continues to run after main() has returned. This is a bit of a strange design decision, but makes for familiar looking code (e.g. hello_world.cpp is the same as on any other platform). Objects in the global scope <strong>will continue to exist</strong> in emscripten after main() returns, but objects in the local scope of main() will be destroyed, thus this strange way to create our application object, to prevent the app object from being destroyed after main() is left:</p>
<pre><code> static MyApplication* app = new MyApplication();
</code></pre>
<p>And now back to that <strong>simulate_infinite_loop</strong> argument: This is a new argument which was introduced after I started the Nebula3 emscripten port. Setting this argument to 1 will cause the emscripten_set_main_loop() function to not return to the caller, instead a Javascript exception will be thrown which essentially means that execution bails out of the C/C++ code without unwinding the (C/C++) stack, thus leaving local-scope objects of the main() function alive, everything after emscripten_set_main_loop() will never be called. So with this fix we could just as well write:</p>
<pre><code>void
NebulaMain(const Util::CommandLineArgs& args)
{
MyApplication app;
app.SetCommandLineArgs(args);
app.StartMainLoop();
}
</code></pre>
<p>Which looks a lot more friendly indeed.</p>
<p>So this basically covered emscripten's application startup process, we now have a per-frame function (called OnPhasedFrame) which will be called back at 60 fps. We just need to cram everything the application has to do into these 1/60sec time slices. This is fine for the actual game loop after everything has been loaded and initialised, but can be a problem for stuff like loading a new level, which could take a couple of seconds. In a traditional game, worst thing that could happen in this case is that the loading screen animation (if there is any) may stutter, but in a browser environment, such pauses will affect the entire browser tab (freezing, no scrolling, etc...), which makes a very bad first impression to the user.</p>
<p>So what to do? For Nebula3 I created a new Application base class called "PhasedApplication". Such a phased application goes through different life time phases (== states), such as:</p>
<pre><code>Initial -> app has just become alive
Preloading -> currently preloading data
Opening -> currently initializing
Running -> currently running the game loop
Closing -> currently shutting down
Quit -> shutting down has finished
</code></pre>
<p>Each of these phases (or states) has an associated per-frame callback method (OnInitial, OnPreloading, OnOpening, etc...). The central per-frame callback will simply call into one of those methods based on the current phase/state. Each phase method invocation must return quickly (the browser's responsiveness depends on this), and may be called many times until the next phase is activated. So instead of doing a lot of stuff in a single frame, we do many small things across many frames.</p>
<p>Best example to illustrate this is the OnOpening() method. Suppose we need to do a lot of initialisation work during the apps Opening phase. Files need to be loaded, subsystems must be initialised and so on. This may take a couple of seconds. But the rule is that we must ideally return within 1/60sec, and we also don't have an independent render thread which could hide the main-thread freeze behind a smooth loading animation. So we need to do just a little bit of initialisation work, possibly update rendering of the loading screen, and return to the browser runtime. But since we haven't switched to the next state yet, OnOpening() will be called back again, and we can do the next piece of initialisation work. Sounds awkward of course, and it is, but there's not a lot we can do about it. </p>
<p>A new Javascript concept called <strong>generators</strong> could help to clean up this mess, with these it should be possible to chop a long sequence of actions into small slices while leaving the function context intact (essentially like a yield() function in a cooperative multithreading system) - catapulting Javascript into the illustrious company of Windows1.x and Classic MacOS. But enough with the ranting ;)</p>
<p>A somewhat cleaner method for long initialisation work is starting asynchronous actions through a WebWorker job in the first call to OnOpening() and during the next OnOpening calls check for all of those actions to have finished, gather the results, and finally switch to the next state, which would be <em>Running</em>. In the worst case, initialisation code must literally be chopped into little slices running on the main thread.</p>
<p>So that's it for this blog post. Originally I wanted to compare emscripten's and PNaCl startup process, but this would be way too much text for a single posts, so next will very likely be a similar walk through of the PNaCl application start, and after that the next big topic: how to handle asset loading.</p>
<blockquote>
<p>Written with <a href="http://benweet.github.io/stackedit/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-87502643851474524042013-08-26T22:32:00.001+01:002013-08-26T22:32:24.508+01:00emscripten and PNaCl: Build Systems<p>I recently ported Nebula3 to Google's PNaCl. Main motivation was that I wanted to see how it compares to asm.js both for performance and "ease of use". This was basically a drive-by port, I didn't want to put too much effort into it. Thankfully I had old NaCl code lying around which I could reuse and after 2 or 3 afternoons (and some WTF-moments) I had a pretty clean port running which I'm planning to keep updated into the foreseeable future.</p>
<p>The big news about PNaCl is that deployment no longer has to go through the Chrome Web Store, instead it is now finally possible to host PNaCl applications from any URL.</p>
<p>You can check out the Nebula3 PNaCl demos here: <a href="http://www.flohofwoe.net/demos.html">http://www.flohofwoe.net/demos.html</a>. Just make sure you're running the latest Google Chrome Canary, and if an error pops up that PNaCl isn't enabled, just restart Chrome, and wait a little bit. First start can take up to one minute, since PNaCl support is installed on demand which is a multi-MByte download.</p>
<p>Over the next few weeks I'm intending to write up a little series of blog posts comparing the PNaCl and emscripten Nebula3 ports. From a coder's perspective, the two systems are actually fairly close when seen from high above.</p>
<p>As a "pragmatic programmer", I don't really care about the political side. Both asm.js and PNaCl had to take a lot of flak from web purists. The only thing that counts to me is that both technologies provide a seamless software distribution channel directly from the coder to the user. No app shops, gate-keepers, code-signing-certificates or approval processes inbetween.</p>
<h4 class="wmd-title" id="the-build-system">The Build System</h4>
<p>First step is of course to get the SDKs. Both emscripten and PNaCl offer a GCC-style cross-compiling toolchain based on Clang-LLVM. Quick disclaimer: I'm running on OSX, haven't looked at the Windows side of things yet. </p>
<p>The <strong>emscripten SDK</strong> is simply installed and updated through a github repository. There's a stable <strong>master</strong> branch, and a bleeding-edge <strong>incoming</strong> branch. emscripten requires a couple of external tools, most notably Clang-LLVM, python and node.js. Even though clang is the standard compiler on OSX I installed a separate version because emscripten required a newer version then was installed on OSX 10.7. Paths to external tools must be provided through a <strong>.emscripten</strong> config file in your home dir.</p>
<p>The NaCl SDK is a normal download-archive which should be unzipped to a nacl_sdk directory in your home directory. This download only contains a script file called "naclsdk" which takes care of downloading and updating the actual SDK files in the future. The NaCl SDK contains versioned bundles, each of which is actually a complete SDK in itself, with tools, headers, libraries and examples. This is the same philosophy as the DirectX SDKs. You pick a version to work with and decide yourself when to switch to a newer version, this guarantees you a stable API, and gives the dev team the freedom to change APIs in new versions without breaking code compiled against older versions.</p>
<p>One challenge about the NaCl SDK is to find the right compiler tools and runtime libs since there are so many choices. The "classic" CPU-specific NaCl had different toolchains for ARM and Intel CPU architectures, and two different C runtime libs to choose from: newlib or glibc.</p>
<p>PNaCl is much simpler though: there are no longer different target CPU architectures since PNaCl executables are essentially LLVM bitcode, and the only available C runtime lib is newlib (which is the better choice anyway, since it is much slimmer then glibc).</p>
<p>In Nebula3 I'm using <strong>cmake</strong> to generate build files for different target platforms and build systems / IDEs. For each platform, you build a so called <strong>toolchain file</strong> which contains paths to the cross-compiling tools, search paths to headers and libraries, and compiler/linker settings.</p>
<p>Writing such a toolchain file can be a bit of guess work, but there are examples flying around the net, also emscripten comes with sample cmake toolchain files which might be helpful as a starting point.</p>
<p>Here are a couple of tips which might save you a some trouble:</p>
<ul>
<li>don't set "ld" as the linker tool, in both toolchains the normal compiler tool also serves as linker (in emscripten this is <strong>emcc</strong>, in PNaCl use <strong>pnacl-clang++</strong></li>
<li>PNaCl requires an additional post-build-step after linking, called pnacl-finalize, cmake has the <strong>add_custom_command</strong> macro for this</li>
</ul>
<p>To properly separate the different build files I have a directory structure like this:</p>
<pre><code>nebula3/
code/
cmake/
emscripten_asmjs/
emscripten_debug/
pnacl_release/
pnacl_debug/
</code></pre>
<p>All the source code lives under /code, and all the build files are generated under cmake/ with one directory per target platform and build configuration.</p>
<p>To actually generate the build files, I have a couple of shell scripts under /code which invoke cmake like this:</p>
<pre><code>cd ../cmake/emscripten_asmjs
cmake -G "Eclipse CDT4 - Ninja" -DCMAKE_BUILD_TYPE="AsmJS" -DNEBULA_PLATFORM=EMSCRIPTEN -DCMAKE_TOOLCHAIN_FILE="../../bin/emscripten.toolchain.cmake" ../../code
</code></pre>
<p>The <strong>-G</strong> option is the cmake "generator", we're telling cmake here that we want Eclipse project files using the ninja build tool (ninja is a more modern make alternative). *-DCMAKE_BUILD_TYPE* sets the AsmJS build config (cmake lets us define any number of custom configs, commonly just Release and Debug but in emscripten I have defined an extra AsmJS config), then -DNEBULA_PLATFORM=EMSCRIPTEN is one of our own custom symbol definitions, this simply tells our cmake files, that we're building for the emscripten target platform (actually this is redundant, a better place for this definition would be the toolchain file). Next we tell cmake which toolchain file to use, and finally where the source code is located (or more specifically: where to find the root CMakeLists.txt file - CMakeLists.txt files tell cmake what targets to build, and from what sources).</p>
<p>When cmake has run, we could import the generated project into Eclipse, or we can just run ninja from the command line:</p>
<p><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6RtGJIQGn6TMTficyviplfAyy7Wdekqpf3UWy7J6r_9cI8AL7ANuXmumkZ4OHrN0pC21rbj4CXlAroCW-knM7BtABdIJ1bFyDLWhyphenhyphenXFogingHJ-N7zlh8D4HlbqoWnmBmvA3OY0Y0lfT7/s0/Screen+Shot+2013-08-26+at+10.34.10+PM.png" alt="ninja invocation" title="ninja.png"></p>
<p>Writing a proper cmake based build environment can be a lot of work, but it is definitely worth it. Managing a multi-platform build environment across Linux, OSX and Windows and probably several game consoles, spanning different IDEs like Visual Studio, Xcode and Eclipse would be a nightmare without a meta-build-tool like cmake.</p>
<h3 class="wmd-title" id="deployment">Deployment</h3>
<p>Big jump here, but no worries, I'll deal with all the inbetween-stuff in the following blog posts.</p>
<p>The common thing between emscripten and PNaCl when deploying is that the generated files are embedded into a web page, and thus can be easily integrated into existing web site build- and deployment-processes.</p>
<p>The details are a little bit different between the two though:</p>
<p>An emscripten "executable" is either a .js file or a complete HTML page (the so called shell page) which embeds the generated Javascript code. The emscripten linker looks at the output file extension to decide whether it should generate a .js or .html file. Emscripten comes with a default html shell file which should be used as starting point for a customised web page.</p>
<p>Integrating emscripten generated code into a web page is just the same as integrating any piece of complex Javascript code. Since emscripten-generated code is just Javascript, it is also very easy to interact with the rest of the page through direct JS function calls.</p>
<p>PNaCl on the other hand integrates like a plugin into the HTML page using the <strong>embed</strong> element:</p>
<pre><code><embed src="dragons.nmf" class="pnacl" id="pnacl_module" name="pnacl_module" width="800" height="452" type="application/x-pnacl"/>
</code></pre>
<p>Instead of the .pexe file, a .nmf <strong>manifest</strong> file is given to the embed element which contains the name of the .pexe file (this manifest file used to look more interesting in classic NaCl since it contained one entry for each target cpu architecture, but for PNaCl there's only one useful piece of information):</p>
<pre><code>{
"program": {
"portable": {
"pnacl-translate": {
"url": "dragons.pexe"
}
}
}
}
</code></pre>
<p>Finally, the <strong>type="application/x-pnacl"</strong> attribute is important for Chrome to recognise the embed element as a PNaCl application.</p>
<p>Interaction between a PNaCl application and the surrounding web page works through the Javascript messaging system. To get events from the PNaCl application, just add event listeners to the embed element:</p>
<pre><code><script type="text/javascript">
// ...
var naclModule = document.getElementById("pnacl_module");
naclModule.addEventListener('loadstart', handleLoadStart, true);
naclModule.addEventListener('progress', handleProgress, true);
naclModule.addEventListener('load', handleLoad, true);
naclModule.addEventListener('error', handleError, true);
naclModule.addEventListener('crash', handleCrash, true);
naclModule.addEventListener('message', handleMessage, true);
// ...
</script>
</code></pre>
<p>The other way around works as well, by sending messages to the PNaCl app through postMessage.</p><h3 class="wmd-title" id="the-end">The End</h3>
<p>Ok, that's it. Next up I'll go through the changes to the Nebula3 Application Model which were necessary for the web platforms!</p>
<blockquote>
<p>Written with <a href="http://benweet.github.io/stackedit/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-14362648375385757502013-07-06T21:00:00.001+01:002013-07-06T21:00:51.714+01:00Entity-Component-System Revisited<p>This old blog post about the <a href="http://flohofwoe.blogspot.de/2007/11/nebula3s-application-layer-provides.html">Nebula3 Application Layer</a> is the 3rd-popular-post on my blog, very likely because it was linked from <a href="http://stackoverflow.com/questions/1189236/data-structures-for-message-passing-within-a-program">Stack Overflow</a>. I always wanted to write a followup to this post, because if I would design such a system again, it would look quite differently today.</p>
<p>First a quick recap of the original system:</p>
<ul>
<li>the original system consists of the following classes:
<ul><li><strong>Entity:</strong> a container for Properties and Attributes, can receive Messages which are distributed to its Properties</li>
<li><strong>Property:</strong> attached to an Entity, implements some part of the entities "game logic", receives and processes messages</li>
<li><strong>Message:</strong> a small object which is sent to an Entity and distributed to Properties which may handle them</li>
<li><strong>Attribute:</strong> key/value pairs attached to entities</li>
<li><strong>Manager:</strong> singletons which implement global game logic</li>
<li>the only pre-defined Manager is the EntityManager which is a container for Entities, and allows to query for entities</li>
<li>Entities and Properties have several per-frame callbacks and are called back by the EntityManager</li></ul></li>
<li>the motivation behind this system:
<ul><li>to have a simple, extensible high-level framework for <strong>implementing game-play logic</strong></li>
<li>fix extension-through-inheritance problems through <strong>composition</strong></li></ul></li>
<li>and the problems of the original system:
<ul><li><strong>poor spatial locality:</strong> Entities, Properties and Messages are isolated heap objects and can be spread all over the address space in the worst case</li>
<li><strong>high cost for creation and destruction:</strong> all objects are dynamically allocated, this is especially a problem for Messages, there may be thousands of Messages created and destroyed per frame</li>
<li><strong>high cost for settings/getting Attributes:</strong> setting or getting an attribute value involves a O(log2 n) lookup</li>
<li><strong>high overhead for on-frame callbacks</strong>: the EntityManager calls several callbacks every frame on each entity, with many entities the call-overhead is non-trivial</li>
<li><strong>reliance on virtual methods</strong>: almost all public methods in properties are virtual, because the message handler and callback methods are implemented in a Property base class, with specialised properties as subclasses</li></ul></li>
</ul>
<p>In the old single-player Drakensang games we had up to two-thousand game entities in some bigger maps, and we ran into real performance problems because the entity system is so heavy-weight.</p>
<p>So here's how I would implement a similar system today, keep in mind that this is just a "<a href="http://en.wikipedia.org/wiki/Gedankenexperiment">Gedankenexperiment</a>", and I will make up some stuff while I type (but most of it has been lingering in the back of my head for quite a while now). </p>
<p>The main goals are to improve performance by making the system less dynamic, reduce memory fragmentation and reduce message-passing and object creation overhead.</p>
<p>Here we go:</p>
<h4 id="1-move-all-the-interesting-code-into-separate-subsystems">1. Move all the interesting code into separate subsystems</h4>
<p>In the original entity system, Managers and Properties would often implement actual game logic, and could become big, complicated and unwieldy.</p>
<p>The new entity system would only be minimal glue code between (ideally autonomous) subsystems, each with a Facade singleton as its main public interface. Such subsystems could be: rendering, AI, physics, audio, and also <strong>anything else what makes up the game</strong>. The last point is important: Even when already using such autonomous subsystems for low-level stuff like rendering or audio it is tempting to write the actual game logic "along the way" inside Properties without separating it into additional "game logic subsystems", which is guaranteed to soon end in an unmaintainable mess.</p>
<p>Ideally, each of the autonomous subsystems can live (and be tested) on its own, and will not interact with other subsystems (the physics world must not know about the rendering world or the audio world and so on).</p>
<p>One of the main jobs of the entity-component-system is to control and coordinate the data flow between those autonomous subsystems, it glues the subsystems together (e.g. getting the desired motion from the AI/navigation system into the physics system, and getting position updates from the physics system into the rendering system).</p>
<p>The other job is to provide different types of game objects (for instance different unit types in a strategy games) by combining small, reusable Component objects which implement different aspects/behaviours of the game logic.</p>
<p>The important thing to keep in mind is that all the classes of the new entity system will only provide a slim layer of glue between subsystem which contain all the meaty stuff.</p>
<h4 id="whats-in-the-new-entity-system">What's in the new entity system</h4>
<p>Properties will now be called Components, but their role will be the same. Managers and Attributes will go away (reasons are detailed below). Entities and Messages will keep their names and roles. </p>
<h4 id="fixing-the-spatial-locality-and-cost-of-creation">Fixing the Spatial Locality and Cost of Creation</h4>
<p>Entities and Components would be created from pre-allocated object pools. Live Entities and Components would ideally be located next to each other without big memory holes inbetween. As public handle to an Entity I would probably use an EntityID instead of a (smart) pointer, the EntityID would be a 32-bit integer, some bits used as index into the entity pool, and some bits as a unique wrap-around-counter to prevent that an old Id points to a recycled object in the pool.</p>
<h4 id="entities-and-components">Entities and Components</h4>
<p>An Entity would be a template class which must be partly implemented by the game programmer tailored to his project. The max number of Components the Entity can hold is a template parameter. There's a private C array of raw pointers to Components contained inside the Entity class, and programmer-provided template-methods to gain safe access to those Component objects.</p>
<p>An example: let's say the components-access template method would be called Component(), then invoking a method "SetTransform()" on a component "Location" would look like this:</p>
<pre><code>entity->Component<Location>()->SetTransform(m);
</code></pre>
<p>Hmm, this looks mighty ugly though... The advantage is that the Component<> method will resolve to a simple inlined pointer indirection, which is as cheap as it gets. But I will have to think of some nicer looking code...</p>
<h4 id="attributes">Attributes</h4>
<p>Attributes will very likely go away completely because the cost for setting/getting is too high (this involved a binary search). Instead entity state will be exposed through simple inline getter methods in Component classes. There are not setter methods, because direct, unchecked manipulation of internal entity state by an "outsider" would be too dangerous. Manipulating an entity is exclusively done by sending messages to the entity.</p>
<p>There must still be a more dynamic, generalised way to initialise and manipulate an entity (this was a nice side-effect of the general attribute-system), for instance to implement persistence or communicate with remote applications (like a level editor). For this, some general serialisation mechanism to and from a simple binary stream must be implemented.</p>
<h4 id="the-entity-registry">The Entity Registry</h4>
<p>This would be a singleton used as factory and container of entities (basically the facade of the entity system). It would allow creation of entities, resolve an EntityID into a pointer, probably lookup entities by name (if having human-readable entity names makes sense at all), and sending messages to entities. This would be similar to the old EntityManager, but it would not call any per-frame methods on entities (it would be desirable if the new entity-system wouldn't any type of per-frame-tick at all).</p>
<h4 id="components-and-messages">Components and Messages</h4>
<p>Sending a message to an entity should not involve creating a message object, instead a message is just a simple, short-lived stream of plain-old-data bytes in some hidden memory buffer. There will be a unique message type identifier, which is a simple 32-bit integer value (or maybe an enum) at the front of the byte stream.</p>
<p>Messages are processed by Component objects, which can subscribe to specific message types at the central EntityRegistry by associating a message type with a handler method:</p>
<pre><code>entityRegistry->Subscribe(msgType, componentType, methodPtr);
</code></pre>
<p>A message is sent to one or more entities through the central EntityRegistry by calling one of several "PushMsg" template methods which accept a variable number of arguments. Each combination of arg types will resolve to a template specialisation under the hood. The advantage is again, that none of this involves expensive "dynamic" code, each specific message signature will resolve to a piece of code which is very likely inlined and just consists of writing values to memory:</p>
<pre><code>entityRegistry->PushMsg(entityId, msgType, arg0, ...);
</code></pre>
<p>This will write the args to an internal memory area (with proper alignment), and and call the handler method of subscribers, which will be provided with some sort of pointer to the start of the arguments, read/decode the arguments and perform some action with them. The disadvantage here is that there's no type-safety for the message arguments. If the caller and handlers don't agree about the order and types of arguments bad things will happen at run time, so it might still be better to use simple message classes instead of multiple typed arguments:</p>
<pre><code>MyMsg msg(x, y, z);
entityRegistry->PushMsg(entityId, msg);
</code></pre>
<p>This would have the overhead of an extra object created on the stack (still better then on the heap), and would involve defining dozens or hundreds of message classes which would only consist of setters and getters, this should be a job for a code generator (we have something similar already called NIDL files, which are used to generate C++ message classes from a simple XML description). The advantage is type-safety and automatic agreement between sender and handler about the message arguments, plus the message class constructor can setup default argument values.</p>
<p>The default PushMsg() method will probably call the subscribers immediately. It might be desirable to also have deferred message handling, where the sender defines a time in the future when the message should be handled. It might also be possible to use this mechanism to send messages between remote objects across threads, processes and physical machines, but this might go a bit too far.</p>
<h4 id="what-about-the-managers">What about the Managers?</h4>
<p>Managers don't really have a place in the new entity-system. Their role is taken over by the Facade singletons of the autonomous subsystems.</p>
<h4 id="conclusion">Conclusion</h4>
<p>I think the original ideas behind the Nebula3 Application Layer as a flexible Entity-Component-System still make a lot of sense for a high level game framework, but today I look at the original implementation as too "heavy-weight" both in design and implementation. If I were to rewrite the system (and I'm tempted, but other stuff has higher priority) I would start as described here. What the end-result would look like is on another page, I tend to restart such systems from scratch several times if the code "doesn't look right" :)</p>
<blockquote>
<p>Written with <a href="http://benweet.github.io/stackedit/">StackEdit</a>.</p>
</blockquote>Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-11987048016345464552013-06-21T22:34:00.001+01:002013-06-23T12:46:31.016+01:00Sane C++<strong>TL;DR</strong>: An attempt to outline the 'good parts' of C++ from my experience of porting Nebula3 to various platforms over the years. Some of it controversial.<br />
<br />
<b>Update: </b><span style="font-weight: normal;">some explanation why STL and C++11 is currently "forbidden", see below!</span>
<br />
<div>
<span style="font-weight: normal;"><br /></span></div>
<h4 id="c">
</h4>
<br />
<b>C++</b><br />
<br />
...is relatively famous for how easy it is to shoot yourself in the foot in many interesting ways. The types of bugs which are simply impossible in other languages is legion.<br />
So then, why is C++ so damn popular in game development? One of the most important reasons (IMHO) is that C++ allows to write very high-level and very low-level code. If needed, you can have full control over the memory layout, when and how dynamic memory is allocated and freed, and how exactly memory is accessed. At the same time you can write very clean and high-level code with the right framework and don't care about memory management at all.<br />
Especially the significance of low-level programming, e.g. controlling the exact memory layout of your data is often ignored by other, higher level languages, even though it can have a dramatic effect on performance.<br />
One of the most common C++ newbie errors is to tackle a big software project without a proper high-level "toolbox". C++ doesn't come with a luxurious standard framework like all those fancy-pancy modern languages. <br />
And with only hello_world.cpp under their belt newbies quickly end up with this typical mess of object ownership problems, spaghetti-inheritance, seg-faults, memory leaks and lots of redundant code all over the place after just a few ten-thousand lines of code.<br />
On the other hand, it is incredibly easy to write really slow code in a high-level environment since you don't really know (or need to care) what's going on under all those layers of convenience. <br />
The most important rule when diving into C++ is: Know when to write high-level and when to write low-level code, these are completely different beasts!<br />
So what's the difference between high-level and low-level C++ code? I think there's no clear-cut separation line, but a good rule of thumb is: if it needs to run a few thousand times per frame, it better be really well optimised low-level code!<br />
<ul>
<li>If you look at a typical rendering pipeline, there's this typical cascade where every stage in the pipeline is executed at least an order of magnitude more often then the previous one: outer-most there's stuff that happens only once per frame, next code is executed once per graphics object, then once per bone/joint, then per vertex, and finally per pixel. The realm of low-level code starts somewhere between per-object and per-bone (IMHO).</li>
<li>Typical high-level code to me is "game play logic". This is also were thinking object-oriented still makes the most sense (as opposed to a more data-oriented approach). You have a couple of "game objects" which need to interact with each other in fairly complex ways. On this level you don't want to think about object ownership or memory layout, and high-level concepts like events, delegates, properties etc... start to make sense. Shit starts to hit the fan when you have thousands of such game objects.</li>
<li>It is of course desirable to get the performance advantages of low-level code combined with the simplicity and convenience of high-level code. This is basically the holy grail of games programming. Hiding complex or complicated code under simple interfaces is a good start.</li>
</ul>
Ok, so before I drift completely into the metaphysical, here's a simple check-list:<br />
<h4 id="forbidden-c">
<br /></h4>
<h4 id="forbidden-c">
Forbidden C++:</h4>
This stuff is completely forbidden in our coding-style:<br />
<ul>
<li>exceptions</li>
<li>RTTI</li>
<li>STL</li>
<li>multiple inheritance</li>
<li>iostream</li>
<li>C++11</li>
</ul>
That's right, we're not using C++ exceptions, RTTI, multiple inheritance or the STL. C++11 is pretty cool, but still too fresh. Most of these restrictions will make your multiplatform-life a lot easier (and not much of importance is lost IMHO). <br />
<br />
<b>Update:</b> I should have explained why the STL and C++11 is on this list. First the STL: Historically the STL came with a lot of problems because quality differed between compilers a lot, porting to non-PC platforms was difficult if your code depended on STL, and I am reluctant to include more complex dependencies into the engine (like boost for example). Today STL implementations are much better, so on most platforms this is probably no longer an issue.<br />
<br />
Personally, I think the STL is an ugly library, *at least* the container classes. You'll have to admire its orthogonality and flexibility, but in reality one project ever only needs 3 or 4 specialisations. What we did was write a handful of container classes (Array, Dictionary, Queue, Stack, List) in the spirit of C#'s container classes (those are probably not as flexible as STL conteiners, but they do look nicer, and the generated code should be the same in most cases). Beautiful looking source code is important I think. This may all change with C++11 though. C++11 is extremely cool, but I think it is too early still to jump on if we need cover a lot of platforms. But C++11 together with the STL is much more powerful then those two alone, so I will very like revert my stance on STL once we switch to C++11.<br />
<br />
But I think this switch should be done throughout the entire engine (starting at the core with the new move semantics which are really useful for containers, to the new threading support, lambdas, function objects and so on), so switching to C++11 will involve a major rewrite of Nebula3, maybe even justify a major version number switch. I think it doesn't make sense to sprinkle bits and pieces of C++11 and STL here and there into the code<br />
<h4 id="tolerated-c">
</h4>
<h4 id="tolerated-c">
</h4>
<h4 id="tolerated-c">
</h4>
<h4 id="tolerated-c">
<br /></h4>
<h4 id="tolerated-c">
Tolerated C++:</h4>
Use with care, don't go crazy:<br />
<ul>
<li>templates</li>
<li>operator overloading</li>
<li>new/delete</li>
<li>virtual methods</li>
</ul>
<strong>Templates</strong> are very powerful, they can make your code both more readable, AND faster because more type information is known at compile time. But you really need to keep an eye on the generated code size. Don't nest them too deeply, and keep it simple.<br />
<strong>Operator overloading</strong> is restricted to very few places (containers and items in containers). We're NOT having operator overloading in our math library. dot(vec,vec) is much more readable then vec*vec.<br />
<strong>Not using new/delete</strong> in C++ code sounds a bit crazy, I know. But most of the time where you need to create an object on the heap you'll also want to hand its pointer somewhere else, which quickly introduces ownership problems. That's why we're using smart pointers to heap objects which hide the delete call. And since a new without its delete looks a bit silly, we're also hiding the new behind a static Create() method. It's better to avoid heap objects altogether though, especially in low-level code.<br />
<strong>Virtual methods</strong> are important of course, BUT: Just spend a second to think about whether a method really must be virtual (or more importantly: do you really need run-time polymorphism, or is compile-time polymorphism enough?). The more "static" your code is, the more optimisation options the compiler has.<br />
<h4 id="forbidden-c-1">
</h4>
<h4 id="forbidden-c-1">
</h4>
<h4 id="forbidden-c-1">
</h4>
<h4 id="forbidden-c-1">
<br /></h4>
<h4 id="forbidden-c-1">
Forbidden C:</h4>
Some unusual stuff here as well:<br />
<ul>
<li>all CRT functions like fopen() or strcmp() are forbidden, except the math.h functions</li>
<li>directly calling malloc()/free() is forbidden</li>
</ul>
Most of the CRT functions are straight out terrible (strpbrk, strtok, ...) and/or dangerous (strcpy), so we're wrapping them all away and/or use better platform-specific functions under the hood (this can also reduce executable size, which is always good).<br />
Overriding malloc/free with central wrapper functions is really useful once you need to do memory-debugging and -profiling, also makes it easier to try out different memory allocator libs.<br />
<h4 id="tolerated-c-1">
</h4>
<h4 id="tolerated-c-1">
</h4>
<h4 id="tolerated-c-1">
<br /></h4>
<h4 id="tolerated-c-1">
Tolerated C:</h4>
Some "dangerous" stuff is only allowed in performance-critical low-level code:<br />
<ul>
<li>raw pointers and pointer arithmetics</li>
<li>raw C arrays </li>
<li>raw memory buffers</li>
</ul>
These are all recipes for disaster in the hands of an unexperienced programmer (or an experienced programmer who needs to juggle too many things in his head). Instead of pointers, use smart-pointers to refcounted objects (see above), or indices into containers. Instead of raw arrays use containers. Never directly allocate and access memory buffers in high-level code.<br />
All of these "dangerous techniques" are essential for really performance-critical low-level code though, but this is only at a handful places in the code, and when the really mysterious kind of crashes happen, at least you know where to look.<br />
<h4 id="the-end">
</h4>
<h4 id="the-end">
</h4>
<h4 id="the-end">
</h4>
<h4 id="the-end">
<br /></h4>
<h4 id="the-end">
The End</h4>
One last point: our code is riddled with asserts which are also enabled in release mode (hardly makes a performance difference, but the uncompressed executable size is up to 20% larger because of the expression strings, thankfully those strings compress very well). <br />
The essential, must-have assert checks are for invalid smart pointer accesses (null pointers), boundary checks in container classes and checking for valid method parameters.<br />
With all of the above, we're rarely ever hitting a seg-fault (maybe twice a year on the server-side). If something breaks, then it is very likely an assertion check which got hit, and this is usually very easy to post-mortem-debug since it comes with a call-stack and method signature.Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-15359280991895265062013-05-04T16:58:00.000+01:002013-05-04T17:01:25.375+01:00Minor demos and web page updateCouple of minor changes at <a href="http://www.flohofwoe.net/">http://www.flohofwoe.net</a>:<br />
<ul>
<li>I have removed the non-asm.js demos. Since the asm.js code generation in emscripten is now always faster then the "traditional" code generation, it doesn't make sense to have the non-asm.js code around. I'll keep support for the old code-generation in my build pipeline for now, to be able to run comparisons between the new and old code from time to time though.</li>
<li>The demos are now compiled with link-time-optimization enabled. Previously this had subtle and hard to debug code generation problems, but it looks like this is fixed now (fingers crossed). Performance or code size doesn't seem to be different that much however.</li>
<li>Demos have been recompiled with the latest emscripten incoming branch.</li>
<li>I added experimental support for uncompressed textures if the WebGL implementation doesn't support DXT textures (e.g. mobile platforms). This will decompress textures on the fly after download. For now this is just a workaround/hack and hasn't been tested that much. Also, since uncompressed textures are 4..8x bigger, this isn't really useful for complex games.</li>
<li>I have added a high-level source code page for people who like to read some code: <a href="http://www.flohofwoe.net/sources.html">http://www.flohofwoe.net/sources.html</a></li>
<li>Finally, <a href="http://n3emscripten.appspot.com/">http://n3emscripten.appspot.com</a> will no longer be updated, and I've put a link to the new demos there.</li>
</ul>
<div>
-Floh.</div>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-85886778341395218692013-04-25T18:05:00.001+01:002013-04-25T18:05:23.212+01:00Quo Vadis Talk, New Demo PlaceQuick update:<br />
<br />
Just came back from Quo Vadis 2013 in Berlin where I talked about "C++ on the Web" in front of a crowded room (thanks to all who've been there :), the slides are here:<br />
<br />
<a href="http://de.slideshare.net/andreweissflog3/quovadis2013-cpp-ontheweb">http://de.slideshare.net/andreweissflog3/quovadis2013-cpp-ontheweb</a><br />
<br />
And I have moved the Nebula3/emscripten demos to my own web site here:<br />
<br />
<a href="http://www.flohofwoe.net/demos.html">http://www.flohofwoe.net/demos.html</a><br />
<br />
The demos at the old appspot.com URL haven't been updated in a while. When I get around it I'll redirect to the new demo page from there.<br />
<br />
Over and out :)<br />
-Floh.Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-73029686165396125582013-03-22T16:04:00.002+01:002013-03-22T16:07:33.042+01:00Why I spend my precious spare time with emscriptenI recently realized that I have spent much more time with emscripten then any other "weekend project" so far. At least the emscripten-based demos became the most advanced on any of my spare-time coding platforms in the past 2 years like iOS, Android, Google Native Client, flascc.<br />
<div>
<br /></div>
<div>
I think it comes down to "open, free and painless", for spare-time projects these are all extremely important points. I want to spend my free time with stuff that is fun.</div>
<div>
<br /></div>
<div>
Let's look why the other stuff isn't as much fun:</div>
<div>
<br /></div>
<div>
<b>iOS:</b> The tools you need for development are all free, XCode is a very slick IDE to work in, and unlike VisualStudio there's no artificial distinction between a (feature-cut) free and a (pricy) professional version. So far so good. The pain starts when you want to run your code on your actual iOS device. Welcome to provisioning profile hell. First you need to hand over $99 per year for the privilege to run your own code on you own hardware, but that's the least of it. Next you need to create "provisioning profiles" on Apple's developer portal, registering each team member, device and application and set up who may do what. In the end you essentially get per-app/per-device code-signing-certificates which expire every three months. So all the iOS demos which I did 2 years ago don't work anymore unless I go through all that hell again. Nope.</div>
<div>
<br /></div>
<div>
<b>Android:</b> Android C++ development sucks, plain and simple. It's a pain in the ass to set up (it's less painful if you use nVidias ready-made installer), remote debugging a native app is so slow it's essentially useless, and you can't use the cool new stuff since most of the world is still running an Android version from the stone age. To be fair, this was all 1.5 years ago, but I have little motivation to waste further weekends in finding out whether things have improved since then ;)</div>
<div>
<br /></div>
<div>
<b>Google Native Client:</b> The main reasons why I stopped dabbling with Native Client is that it is still not opened up (only works with Chrome Web Store bundled apps), and pNaCl seems to take forever to be finished. To be fair, Native Client has very good middleware support (like FMOD or RakNet), but it doesn't look like it will ever be implemented outside of Chrome.</div>
<div>
<br /></div>
<div>
f<b>lascc:</b> I played around with flascc for a weekend or two, 2 main reasons why it didn't set my heart on fire: (1) Compiling/linking is extra-ordinarily slow AND/OR uses infinite amounts of RAM. For reasonably big code bases (like Nebula3) it's unusable because my 4GB Mac simply ran out of memory. (2) since working with flascc is so damn slow I wasn't motivated to actually go on with writing a Stage3D wrapper for N3's rendering layer.</div>
<div>
<br /></div>
<div>
So all in all, emscripten is the most frictionless way to write and and actually publish 3D demos for me. I can host the demos wherever I want, update them without a certification or signing process getting in the way, the demos won't expire, they are automatically multi-platform and finally, there's no vendor or platform lock-in. Most of the code I'm writing is platform-agnostic C++ and will compile and run anywhere, and the host platform's "API foot print" is minimal: a subset of POSIX and OpenGL, which will also compile almost anywhere else with minimal changes.</div>
<div>
<br /></div>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-5046228749917238272013-03-18T23:15:00.000+01:002013-03-20T22:54:57.558+01:00Updated Nebula3/emscripten Demos<b>Update 3:</b> I replaced SQLite with a TableData addon, this reduces the map-viewer-demo size from 8 MB down to 5 MB (uncompressed), and reduces startup time dramatically.<br />
<b><br /></b>
<b>Update 2: </b>Demos should now properly work on all WebGL configs again (which support DXT textures to be exact). I've been using more then 254 vertex shader uniforms, and at least ANGLE restrict this number even if the GPU could actually handle a lot more).<br />
<b><br /></b>
<b>Update: </b>Demos don't work on Windows and some other configs since one of the new GLSL shaders doesn't compile. Tested configs are: OSX 10.7.5 with GeForce 9400M, Intel HD3000, HD4000 and Radeon HD 6770M. Fix is coming later today.<br />
<br />
Finally a new demo update! If you're a Chrome user, please be aware that you need to run these demos in the very latest Chrome Canary (Version 27.0.1444.3 canary) since this contains a bugfix in the V8 Javascript engine (details are here: <a href="https://code.google.com/p/chromium/issues/detail?id=177883" target="_blank">https://code.google.com/p/chromium/issues/detail?id=177883</a>). This bug was also the reason why I held back updates for so long, I couldn't overwrite the version which reproduces this bug, but I also didn't feel like setting up yet another AppEngine project.<br />
<div>
<br /></div>
<div>
Updated demos are here: <a href="http://n3emscripten.appspot.com/" target="_blank">http://n3emscripten.appspot.com</a></div>
<div>
<br /></div>
<div>
The DSO map viewer demo is now much closer to the actual map renderer of the Drakensang Online client:<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://n3emscripten.appspot.com/dsomapviewer.html" target="_blank"><img border="0" height="325" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiArqUimaiCWqhqNlu6Q_J8Kh0WBS5yJReqQAdKVjH3gMwf2RXVuLS-d9HYzc8VTB3SnYlZHDAHzp21VsDugyuN-zR4kN_rpgYHnuVSQyYHGTBwivXV16wSbH7Ejp0Xs6Lyy_Da0digqg4D/s640/Screen+Shot+2013-03-18+at+10.54.35+PM.png" width="640" /></a></div>
<div>
<br /></div>
</div>
<div>
The ground-decals system has been moved over which helps a lot in hiding the tiling structure of the level. The rendering pipeline now includes posteffects like bloom and color-balancing. You're now controlling a "player character", and I added a few more "NPCs" to the map in order to check performance with a couple of characters on screen.</div>
<div>
<br /></div>
<div>
All demos now come in 2 flavours: "regular" and "asm.js". </div>
<div>
<br /></div>
<div>
ASM.JS is a Mozilla project to define a small subset of Javascript which can be exceptionally well optimized. More about that here: <a href="http://asmjs.org/">http://asmjs.org/</a></div>
<div>
<br /></div>
<div>
And I identified the long pause at the start of the map viewer demo, originally I thought this would be caused by generating the collision mesh, which is built at startup from tens-of-thousands of very small mesh fragments, but surprisingly this is extremely fast. The pause is actually caused by parsing the structure of an SQLite database file and reading many small items from the database. Replacing this with a more efficient "table data" subsystem is the next thing on my weekend todo list. The SQLite stuff is really a left-over from the single-player Drakensangs where the world-state was loaded from and written back to SQLite database files.</div>
<div>
<br /></div>
<div>
That's it for today!</div>
<div>
<br /></div>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-20272071208470717352013-02-10T19:24:00.004+01:002013-02-10T19:24:55.651+01:00Diminishing ReturnsWeekend was kinda semi-successful as far as coding is concerned. I tried various ways to reduce GL calls further, and was able to reduce the number of GL calls by about 25%: from about 4100 down to about 3000 in the initial screen of the Drakensang Online map viewer demo. Although this sounds pretty good, I'm a bit disappointed because I was hoping that bundling vertex data chunks into big vertex buffer would have a bigger effect:<br />
<br />
- Bundling vertex data into big vertex buffers cut the number of glVertexAttribPointer() calls by almost half from about 950 down to about 500. With the GL_vertex_array_object extension however, I could save double the GL calls for "free" (so the demo would be down to 3100 GL calls without any additional optimizations), and the savings would be more consistent (right now it depends a lot on the order of draw calls). The bundling added *a lot* of complex code, so it's probably not really worth it, since at least Chrome already supports OES_vertex_array_object in WebGL, so it would make more sense to support that.<br />
<br />
- All the rest was gained by simply filtering redundant texture updates (glActiveTexture, glBindTexture, glUniform1i). This was a big win for very little code, but this also varies with the actual textures applied to the objects. Fewer shared textures means more updates.<br />
<br />
I also tried to generally filter redundant shader uniform updates, but with little effect. Apart from the texture updates, an entire frame had less then 10 redundant uniform updates, so not worth it.<br />
<br />
I'll give the GL call optimization a little rest for now and concentrate on adding features. There's still some untapped potential in grouping transform matrix updates into arrays, and by better sorting inside batches. But right now I've had enough ;)<br />
<br />Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-3293835870253250392013-01-23T21:08:00.000+01:002013-01-23T21:28:48.389+01:00A Radeon Fix and MoreThe Nebula3/emscripten demos (<a href="http://n3emscripten.appspot.com/" target="_blank">http://n3emscripten.appspot.com</a>) had a serious performance problem on Macs with Radeon GPUs in the instancing demos. Problem was that my pseudo-instancing code used an additional vertex-buffer with 1-dimensional ubyte vertex components as fake InstanceIds. This worked fine on nVidia and Intel GPU, but triggered a horrible slow-path in the OSX Radeon driver. After replacing this with ubyte4 components everything worked fine on Radeons, but I wasn't happy that the InstanceId buffer would now be 4 times as large, with 3/4 of the the size dead weight. Then today in the train from Hamburg back to Berlin the embarrassingly obvious solution occured to me to stash the InstanceId in the unused w-component of the vertex normals. These are in packed ubyte4 format, with the last byte unused. And with this simple fix I could get rid of the second vertex buffer completely and actually throw away most of the pseudo-instancing code. Win-Win!<br />
<br />
And now on to the actual issue: I didn't really pay attention to the code path which is used if the GL vertex array object extenion isn't available, and I was shocked when I discovered that the dsomapviewer demo performs 7000 GL calls per frame (not draw-calls, but all types of GL calls), and then I was astonished that Javascript+WebGL crunches through those 7k calls without a problem even on my puny laptop. But something had to be done about that of course.<br />
<br />
OpenGL / WebGL without extensions is very verbose even compared to Direct3D9. To prepare the geometry for rendering, you need to bind an vertex buffer (or several), bind an index buffer, and for each vertex component call glEnableVertexAttribArray() and glVertexAttribPointer(), aaaand each unused vertex attribute must be disabled with glDisableVertexAttribArray(). Depending on the max number of vertex attributes supported in the engine, this can add up to dozens of calls just to switch geometry. And whenever a different vertex buffer is bound, at least the glVertexAttribPointer() functions must be called again and if the vertex specification has changed, vertex attribute arrays must be enabled or disabled accordingly.<br />
<br />
With the vertex array object extension all of this can be combined into a single call.<br />
<br />
This particular part of defining the vertex layout is by far the least elegant area of the OpenGL spec, and even the vertex array object stuff could be nicer. To me it doesn't make a lot of sense to include the buffer binding in the vertex attribute state, keeping the buffer separate from the vertex layout would make more sense IMHO. But enough with the ranting.<br />
<br />
Other high-frequency calls are the glUniformXXX() functions to update shader variables, and the whole process of assigning textures to shaders. Un-extended WebGL doesn't provide functions to bundle these static shader updates into some sort of buffers.<br />
<br />
These types of high-frequency calls is exactly what we don't want in Javascript and WebGL. In a native OpenGL app, these calls are usually extremely cheap, so it doesn't matter that much. But when calling a WebGL function from emscripten, there's quite a lot of overhead (at least compared to a native GL app). First, emscripten maintains some lookup tables to associate numeric GL ids with Javascript objects. Then the WebGL JS functions are called, in Chrome, these calls are serialized into a command buffer which is transferred to another process, in this GPU process the commands are unpacked, validated, and the actual GL function is called. But it doesn't end there. On Windows, the ANGLE wrapper translates the OpenGL calls to Direct3D9 calls. So what's an extremely cheap GL call in a native app, comes with some serious overhead in a WebGL app. Considering all this it is really mind-blowing that WebGL is still so fast!<br />
<br />
All this means though, that it really makes a lot of sense to filter redundant GL calls, especially in a WebGL application, and every GL extension which helps to reduce the number of API calls is many times more valuable under WebGL!<br />
<br />
So my mission in the train from Berlin to Hamburg and back today was to filter out those redundant GL calls.<br />
<br />
First I wanted to know what calls are actually the problem. The OSX OpenGL Profiler tool can help with this. It records a trace of all OpenGL calls, can create a quick stat of the most-called functions, and the sequence of calls with their arguments reveals which calls suffer most from redundancy.<br />
<br />
Which are in the dsomapviewer demo: glEnableVertexArray(), glDisableVertexArray(), glBindBuffer() and glUseProgram().<br />
<br />
Apart from filtering those lowlevel calls I also implemented a separate high-level filter which skips complete mesh assignment operations (that whole call sequence of buffer bindings and vertex attribute specification I talked about before).<br />
<br />
All in all the results where encouraging: per-frame GL calls dropped from 7k down to 4k. In comparison: when using the vertex array object extension the number of GL calls goes down to about 3k.<br />
<br />
This could be improved even more by reducing the number of vertex buffers, and bundling the vertex data of many graphics objects into one or few big vertex buffers, since then much fewer buffer binds and vertex attribute specification calls would be needed (at least if they occur in the right sequence). But for this I would either need the glDrawElementsBaseVertex() function, which is not available in WebGL, or I would need to fix-up lots of indices whenever vertex data is created or destroyed (but this would limit the size of one compound vertex buffer to 64k vertices, and limit the efficiency of the bundling, hmm...).<br />
<br />
Anyway, to wrap this up, Chrome already exposes the OES_vertex_array_object extension, and an ANGLE_instanced_arrays extension seems to be on the way. Both should help a lot to reduce GL calls already. Then the only remaining problem is texture assignment and uniform updates in scenes with many different materials.<br />
<br />
But I think before working on reducing GL calls even more I'll try to do something about then stuttering when new graphics assets are streamed in.<br />
<br />
Over & Out,<br />
-Floh.<br />
<br />Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-24803027589649589572013-01-19T18:36:00.001+01:002013-01-23T21:16:10.822+01:00A Drakensang Online map viewer in emscripten<b>Update 2: </b>The OSX/Radeon performance problem should be fixed now. See here: <a href="http://flohofwoe.blogspot.de/2013/01/a-radeon-fix-and-more.html" target="_blank">http://flohofwoe.blogspot.de/2013/01/a-radeon-fix-and-more.html</a><br />
<b><br /></b>
<b>Update: </b>Just found out that the demo runs incredibly slow on a 15"Mac when running on the discrete AMD Radeon HD 6770M chip (it's actually much faster on the integrated Intel HD 3000). This is both on Chrome and Firefox, reason unknown yet. So if you have one of these, note that the demo runs actually a lot smoother ;)<br />
<br />
I did a very simple proof-of-concept Drakensang Online map viewer in Nebula3/emscripten (as always, Chrome or Firefox required), to see how JS+WebGL can deal with a close-to-real-world 3D scenario:<br />
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="http://n3emscripten.appspot.com/dsomapviewer.html" target="_blank"><img border="0" height="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGX7Vuh5muqz4ciyFTeSLpBIPgQb43TvtFV7vdI_qjTcd0_gqVk180ljllGJrXdCiuya449xk_r9gdBJkqK-u0JyjLf3j73_fOD45Pdw9fkUMI_rOK6ipQtde2r99sPNWL1f0sRgFNeiYx/s640/dsomapviewer.png" width="640" /></a></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://n3emscripten.appspot.com/dsomapviewer.html" target="_blank">Drakensang Online map viewer</a></td></tr>
</tbody></table>
This is work in progress and I will spend more time with optimizations before moving on to the next demo.<br />
<div>
<br /></div>
<div>
You'll notice that there's still frame-rate-stuttering when moving around the map (with left-mouse-button + dragging). The bad type of stuttering is caused by asset loading which happens on demand when new graphics objects are pulled in as they enter the view volume. I don't know yet what causes the lighter stuttering when moving around in areas which are completed loaded. I need to do a detailed profiling session to figure out what's going on there exactly. The stuttering also happens (to a lesser extend) in the native OSX version of the demo. It's most likely the preparation and creation of OpenGL resources, like vertex buffer, index buffers and textures. I will need to figure out how to move more of the asset creation stuff out of the main thread.<br />
<div>
<br /></div>
<div>
The demo is also quite demanding on WebGL. Despite the pseudo-instancing which I implemented recently there's still a lot of OpenGL calls per frame. Support for the <b>OES_vertex_array_object</b> (Chrome already exposes this) and something like <b>ARB_instanced_arrays</b> would help a lot to reduce the number of GL calls drastically (the JS profiler currently shows the vertex array definition as the most expensive rendering-related code, followed by the matrix array uniform updates for the pseudo instancing code).</div>
<div>
<br /></div>
<div>
Finally I've added a new Nebula3 code module to this demo: the ODE-based physics and collision subsystem is now also running in emscripten (no changes were necessary), the demo sets up a static collide world at startup and uses this to perform stabbing checks under the mouse pointer. Unfortunately adding ODE almost doubled the size the of the generated Javascript code. This is another incentive to finally get rid of our (somewhat bloated) physics wrapper code and ODE, and build a new slim collision system, probably on top of the Bullet collision classes (we're mainly using the current physics wrapper for simple collision checks on a static collide world in the live version of Drakensang Online, so not much of value will be lost).</div>
<div>
<br /></div>
<div>
Also, originally I wanted to include SQLite into the demo, since additional map info is currently stored in an additional SQLite file (lighting information, player start position, etc...). But this didn't work out of the box because SQLite's file i/o code must be adopted.</div>
<div>
<br /></div>
<div>
This wouldn't be hard to fix, but I actually want to get rid of SQLite for a long time. SQLite was really useful as save-game system in the single player Drakensang games, but if you don't need to save game world changes back, a complete SQL implementation in the client is just overkill. So this is another good reason to finally get started with a nice and small TableData-subsystem in Nebula3.</div>
<div>
<br /></div>
<div>
The frame-stuttering is a tiny bit disheartening, but on the other hand this is to be expected when bringing a complex code base over to a new platform. Most important right now is to really know what's going on, so I will probably spend some time adding profiling code and do some performance analysis next - together with text rendering to get some continuous debug statistics output on screen.</div>
<div>
<br />
Exciting stuff :D</div>
<div>
<br /></div>
</div>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-76112694316365854202013-01-13T15:31:00.000+01:002013-01-13T15:34:17.809+01:00Multithreading in emscripten with HTML5 WebWorkersMultithreading in emscripten is different from what us C/C++ coders are used to. There is no concept of threads with shared memory state in Javascript, so emscripten can't simply offer a pthreads wrapper like NaCl does. Instead it uses HTML5 WebWorkers and a highlevel message-passing API to spread work across several CPU cores.<br />
<br />
You basically pass a memory buffer over to the worker thread as input data, the worker thread does its processing and passes a memory buffer with the result data back to the main thread.<br />
<br />
The downsides are <b>(1)</b> you can't simply port your existing multi-threaded code over to emscripten, <b>(2)</b> it is (currently) somewhat expensive to pass data around since it involves copying, and <b>(3)</b> you cannot express all multithreading patterns in emscripten. The upside is though, that it's really hard to shoot yourself in the foot, since there's no shared state, and all the multithreading primitives you love to hate (like mutexes, semaphores, cond-vars, atomic-ops) simply don't exist.<br />
<br />
Let's have a quick look at emscripten's worker API, only 4 API-functions and 2 user-provided functions are necessary:<br />
<br />
<b>worker_handle emscripten_create_worker(const char* url);</b><br />
<br />
This create a new worker object, it takes the URL of a separate emscripten-generated Javascript file.<br />
<br />
The worker file must export at least one C-function (name doesn't matter, but the function name must be explicitely exported using emscripten's new "-s EXPORTED_FUNCTIONS" switch so that it isn't removed by dead-code elimination. The worker function prototype looks like this:<br />
<br />
<b>void dowork(char* data, int size);</b><br />
<br />
The arguments define the location and size of the input data.<br />
<br />
The function to invoke the worker is:<br />
<br />
<b>void emscripten_call_worker(worker_handle worker, const char *funcname, char *data, int size, void (*callback)(char *, int, void*), void *arg);</b><br />
<br />
This takes the worker handle returned by emscripten_create_worker(), the name of the worker function (in our case "dowork"), a pointer to and size of the input data, a completion callback function pointer, and finally a custom argument which is passed through to the completion callback to associate the completion call with the invocation call.<br />
<br />
At some point after emscripten_call_worker() is called, the dowork-function will be called in the worker thread with a data pointer and size. Since the worker has its own address space, the actual pointer value will be different from the pointer value in the emscripten_call_worker call of course.<br />
<br />
The worker function now uses this input data to compute a result, and (optionally) hands this result back to the main thread using this function:<br />
<br />
<b>void emscripten_worker_respond(char* data, int size);</b><br />
<b><br /></b>
The return-data will be copied inside the function, so if the worker function had allocated a result buffer it remains the owner of that buffer and is responsible to release it.<br />
<br />
Finally, once the worker has finished, the completion callback will be called on the main thread with the result data, and the custom arg given in the emscripten_call_worker() call:<br />
<br />
<b>void completion_callback(char* data, int size, void* arg);</b><br />
<br />
The callee does not gain ownership of the data buffer, thus it must read / copy the received data but not write to, or free the buffer.<br />
<br />
Finally there's a function to destroy a worker:<br />
<br />
<b>void emscripten_destroy_worker(worker_handle worker);</b><br />
<br />
As with threads, creating and destroying workers is not cheap, so you should create a couple of workers at the start of the application and keep them around, instead of creating and destroying workers repeatedly. It's also wise to batch as much work as possible per worker invocation to offset the call-overhead as much as possible (don't call a worker many times per frames, ideally only once), but this is all pretty much common sense.<br />
<br />
The worker Javascript file must be created as a separate compilation unit, it's a bit like on the PS3 where the SPU code also must be compiled into small, complete "SPU executables". To keep the code size small I decided to keep the runtime environment in the worker scripts as slim as possible, there's no complete Nebula3 environment, only a minimal C runtime environment. But this is not a limitation of emscripten, only a decision on my part. Most of the time the workers will contain simple math code which loops over arrays of data instead of high-level object-oriented code. To avoid downloading redundant code it might also make sense to put several worker functions into a single JS file.<br />
<br />
The updated Nebula3/emscripten demos at <a href="http://n3emscripten.appspot.com/" target="_blank">http://n3emscripten.appspot.com</a> now decompress the downloaded asset files in up to 4 WebWorker threads in parallel to the main thread, this speeds up asset loading tremendously and avoids the excessive frame hickups which happened before. This is important, since real-world Nebula3 apps stream asset data on demand while the render loop is running. The whole stuff took me about half a day, but unfortunately I stumbled across a Chrome bug which required a small workaround (see here: <a href="http://code.google.com/p/chromium/issues/detail?id=169705" target="_blank">http://code.google.com/p/chromium/issues/detail?id=169705</a>).<br />
<br />
It's not completely perfect yet. There's data copying happening on the main thread, and there's also some expensive stuff going on when creating the WebGL resources (for instance vertex and index data is unrolled for the instanced rendering hack). The ultimate goal is to move as much resource creation work off the main thread in order to guarantee smooth rendering while resources are created.<br />
<br />
There are also browser improvements in sight which will make WebWorkers more efficient in the future, mainly to avoid extra data copies by transferring ownership of the passed data over to the web worker, basically a move instead of a copy.<br />
<br />
And that's it for today :)<br />
<br />
<br />
<br />Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-85599061295012007412013-01-04T19:24:00.003+01:002013-01-04T23:08:19.415+01:00Happy New Year 2013!I've been playing around a bit more with the Nebula3/emscripten port over the holidays. Emscripten had some nice improvements during the past 2 months, mainly to generate smaller and faster code, and to drastically reduce code generation time in the linker stage (read this up on <a href="http://mozakai.blogspot.de/" target="_blank">azakai's blog</a>).<br />
<br />
The work I did on my experimental Nebula3 branch were only partially emscripten-related: The biggest chunk of work went into refactoring to adapt the higher level parts of the rendering pipeline for the new CoreGraphics2 subsystem (lighting, view volume culling, and the highlevel graphics subsystem which is concerned about Stages, Views and GraphicsEntities). A lot of code was thrown away or moved around, but from the outside everything looks quite similar as before. External code which depends on the Graphics subsystem must be fixed-up, but not rewritten.<br />
<br />
Another big chunk of work went into implementing instanced rendering for the new CoreGraphics2 system. OpenGL offers several extensions for instanced rendering, but since none of the current WebGL implementations support any of these extensions I first wrote a fallback solution which works without extensions, but uses bigger "unrolled" vertex- and index-data, and a instance-matrix palette in the vertex shader. With the current implementation, up to 64 instances can be collapsed into a single drawcall. This depends on the number of available vertex shader uniforms, and since the <a href="http://code.google.com/p/angleproject/" target="_blank">ANGLE</a> wrapper used by Chrome and Firefox on Windows generally restricts the number of vertex shader uniforms to 254 I had go with only 64 instances per drawcall. This restricts the usage scenarios of this approach, but when rendering a Drakensang Online map (for instance), this comes pretty close to the average number of instances of environment objects in the view volume. For particle rendering this approach would be useless though.<br />
<br />
I also rewrote the emscripten filesystem wrapper. The original implementation was only a quick hack to get data loaded into the engine at all. I wrapped this now into a proper subsystem which uses new emscripten API calls to directly download data into a memory buffer without mirroring the data into a "virtual filesystem", and the new implementation also accepts the file compression of Drakensang Online's HTTP filesystem (it's not the complete HTTP filesystem implementation yet though, the table-of-content-files are ignored, as well as the per-file MD5 hashes, and there's no local file cache apart from the normal browser cache). Also, while the emscripten filesystem wrapper is asynchronous, it is not yet multithreaded through the new WebWorker API. Decompression currently happens on the main thread and may lead to frame stuttering, but the plan is to move this into separate worker threads.<br />
<br />
Finally I've uploaded a few new demos to <a href="http://n3emscripten.appspot.com/" target="_blank">http://n3emscripten.appspot.com</a>. As always you should use an uptodate Chrome or Firefox browser to try them out.<br />
<br />
First, here's the old Dragons demo, recompiled with the latest emscripten version. Thanks to the improvements in emscripten, and the house-cleaning to remove old code, the (compressed) download size of the Javascript-code is now only 308kByte:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://n3emscripten.appspot.com/dragons.html" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" height="363" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigP7wZ2IObnBRldatC7-DF4Pfya2QKnp0h24O02fgrP2GW5sSMhXxS7b4Zc4_K3fru818lqH6e-7DvfVdVwX1EPfgrlp5Pi9kh-OagtFjYAo0B-TeKrDufYD6l2IQDLhiTGAwV2B4J9saO/s640/dragons.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://n3emscripten.appspot.com/dragons.html" target="_blank">Dragons Demo (Cursor up to add more dragons)</a></td></tr>
</tbody></table>
<br />
Next is a demo for the new instanced rendering. On startup, 1000 independently animated cubes are rendered, and by pressing cursor-up you can add 1000 more. There's also 128 point lights in the scene. Every 1000 cubes require about 32 draw-calls (that's (1000/64)*2, the instancing collapses 64 cubes into one draw call, and then *2 because of the NormalDepth- and Material-Passes of the Light Pre-Pass Renderer. For every cube, a world-space transform matrix is computed per frame on the CPU (a conversion from polar-coordinates to cartesian coordinates, involving two sin() and two cos(), and a matrix-lookat involving several normalizations and cross-products.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="http://n3emscripten.appspot.com/instancing.html" target="_blank"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_fESZboScfZKpe4E5h2QbmZq6ZnFguO5DjEfq_w65bYKAFWV48re4XGzJ6LAFYJHbFDjSrhPP4EdPF2fsEajfolvUMbuc6taTIcE3nOi5Drd3AdttaG5CEV8PlJdSL4Qbfx7vD1w4JMnu/s640/instancing.png" width="640" /></a></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://n3emscripten.appspot.com/instancing.html" target="_blank">Pseudo Instancing</a></td></tr>
</tbody></table>
<br />
By hitting the space-key you can also enable a disco-light posteffect for giggles, this does an additional single-pass fullscreen posteffect which does a lot of texture sampling:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://n3emscripten.appspot.com/instancing.html" style="margin-left: auto; margin-right: auto;" target="_blank"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTS_hIHDKFmPTG8bDwoYexvyjOaPKGtsxze_tXYLCdXBH-CUkITRqd1vr1VOsYUeKyA6qRYWS0Le7HNoc8VVyZEIwHq9crLAZOtYBOF-0H8nDTPkeh-1cbYRU0uL-zLb4AYZmnn87kMTBN/s640/instancing_posteffect.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://n3emscripten.appspot.com/instancing.html" target="_blank">Pseudo Instancing with Disco posteffect (press Space)</a></td></tr>
</tbody></table>
<br />
And finally I wrote a little Drakensang Online monster viewer. With cursor-up/down you can switch to the next/previous monster, with cursor-right you can flip between different skin-lists (appearances), and with cursor-left you can toggle a few animations (usually idle and running anims). Obviously the material shader is different from Drakensang Online (the color texture is replaced with just white, the specular effect is exaggerated (which actually is a nice show-case for the really good normal-maps of our character models). This is only a snapshot of what's currently in the game, especially most of the animations are not included. The strange cubes which are displayed sometimes are the mesh-placeholder objects, I think I'll remove them and just use no placeholder as long as the mesh is not loaded, at least it shows that the placeholder system is working right ;)<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="http://n3emscripten.appspot.com/dsocharviewer.html" target="_blank"><img border="0" height="364" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtm9LexzJIF3QdaXL3ca4aqHIQF5qCk8gfFUpCBSWRTX_rp8YxCCdbFcO8jbgCkWqNh4yAARkTmMLBPCrw6PxX68rmYL081odkmLUwkcCHEAETVRJyrbDA0P85jWnTqLVKp-bKa8SvZe7q/s640/monster.png" width="640" /></a></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://n3emscripten.appspot.com/dsocharviewer.html" target="_blank">Drakensang Online Monster Viewer</a></td></tr>
</tbody></table>
That's it for today :)<br />
<br />
<br />Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-37301664929757538992012-12-16T18:06:00.000+01:002012-12-16T18:52:30.977+01:00CoreGraphics2That's Twiggy's official name now.<br />
<br />
I've basically written a vertical slice of the new Nebula3 Render Layer during the past few weekends where I'm trying out a few ideas of what the Nebula3 rendering system will look like in the future.<br />
<br />
The lowest-level subsystem is <b>CoreGraphics2</b>, which I wrote about already a little bit.<br />
<br />
It wraps the host platform's 3D API (e.g. OpenGL or Direct3D), the rendering vocabulary is higher level / less verbose then OpenGL/D3D. It runs the render thread, but can be compiled without threading (on the emscripten platform for instance). There's a facade singleton object (CoreGraphics2Facade) which wraps the entire functionality into a surprisingly simple interface.<br />
<div>
<br /></div>
<div>
CoreGraphics2 works with only 5 resource types:</div>
<div>
<ol>
<li><i style="font-weight: bold;">Texture:</i> Just what the name implies, a texture resource object. This also includes render targets.</li>
<li><i style="font-weight: bold;">Mesh:</i> This encapsulates all the required geometry data for a drawing operation: vertex buffer, index buffer (optional), vertex layout / vertex array definition, and "primitive groups" (basically sub-mesh definitions). </li>
<li><i style="font-weight: bold;">DrawState:</i> This wraps all the required shader and render-state data for a drawing operation: a reference to a shader object, shader constants (one-time-init, immutable), shader variables (mutable) and an (immutable) state-block for render-states.</li>
<li><i style="font-weight: bold;">Pass:</i> A pass object holds all required data for a rendering pass, this includes a render-target-texture object, and a DrawState object which defines state which is valid for the rendering pass. All rendering must happen inside passes. Typical passes in a pre-light-pass renderer are for instance the NormalDepth-Pass, the Light-Pass, the Material-Pass, and a Compose-Pass. The pass object also contains the information whether and how the render target should be cleared at the start of the pass.</li>
<li><i style="font-weight: bold;">Batch:</i> A batch object just contains a DrawState object which defines render state for several draw operations, so this is just a way to reduce redundant state switches.</li>
</ol>
<div>
Resource objects are opaque to the outside. To the caller, these are just ResourceId objects, there's no way to directly access the data in the resource objects (since they actually live in the render thread).</div>
</div>
<div>
<br /></div>
<div>
Resource creation happens by passing a Setup object to one of the Create methods in the CoreGraphics2Facade singleton. There's one Setup class for each resource type (so basically TextureSetup, MeshSetup, DrawStateSetup, PassSetup and BatchSetup). The Setup object basically describes how the resource should be created and shared (for instance when creating a texture resource, the Setup object would contain the path to the texture file, whether the texture should be loaded asynchronously, whether the texture object should be a render target, and so on). The render thread will keep the Setup objects around, so it has all information available to re-create the resource (for instance because of D3D's lost device state, or for more advanced resource management where currently unused resources can be removed from memory, and re-loaded later).</div>
<div>
<br /></div>
<div>
All rendering happens by calling methods of CoreGraphics2Facade:</div>
<div>
<br /></div>
<div>
<b>Begin / End methods:</b></div>
<div>
These methods structure a frame into segments. </div>
<div>
<ul>
<li><i>BeginFrame / EndFrame:</i> Signal the start and end of a render frame. </li>
<li><i>BeginPass / EndPass:</i> Signal start and end of a rendering pass. BeginPass takes the ResourceId of a Pass object, makes the render target of the pass active, optionally clears the render target, and applies the render state of the DrawState object of the pass.</li>
<li><i>BeginBatch / EndBatch:</i> Signal start and end of a rendering batch. This simply applies the render state of the DrawState object of the batch.</li>
<li><i>BeginInstances / EndInstances:</i> This is where it gets interesting. BeginInstances sets all the required state for a series of Draw commands. It takes a Mesh ResourceId, a DrawState ResourceId, and a "shader variation bitmask". The bitmask basically selects a "technique" from the shader (in D3DXEffects terms). For instance, to select the right shader technique for rendering the NormalDepth-pass of a skinned object, one would pass "NormalDepth|Skinning" as the bitmask.</li>
</ul>
<div>
<b>Apply methods:</b></div>
</div>
<div>
This method group applies dynamic state changes during a frame:</div>
<div>
<ul>
<li><i>ApplyProjectionTransform, ApplyViewTransform, ApplyModelTransform: </i>Sets the projection, view and model matrices.</li>
<li><i>ApplyVariable: </i>applies a shader variable value to the currently active DrawState object (which has been set during BeginInstances). This is a template method, specialized for each shader variable data type (float, int, float4, matrix44, bool).</li>
<li><i>ApplyVariableArray:</i> same as ApplyVariable, but for an array of values.</li>
</ul>
</div>
<div>
<b>Draw methods:</b></div>
<div>
This method group performs actual drawing operations:</div>
<div>
<ul>
<li><i>Draw:</i> Performs a single draw call, must be called inside BeginInstances/EndInstances. Renders a PrimitiveGroup (aka material group) from the currently active Mesh, using the render state defined in the currently active DrawState. For non-instanced rendering one would usually perform several ApplyModelTransform() / Draw() pairs in a row.</li>
<li><i>DrawInstanced:</i> Like Draw, but takes an array of per-instance transforms to render the same mesh at many different positions. Tries to use some sort of hardware instancing, but falls back to a "tight render loop" if no hardware instancing is available.</li>
<li><i>DrawFullscreenQuad:</i> simply render a fullscreen quad with the currently set DrawState, this is used for fullscreen-post-effects</li>
</ul>
<div>
And that's it basically. I'm quite happy with how simple everything looks from the outside, and how straight-forward the innards work. For instance, leaving the shader system aside (which is implemented in a separate subsystem CoreShader), the OpenGL specific code in CoreGraphics2 is just 7 classes, and the biggest file is around 600 lines of code.</div>
</div>
<div>
<br /></div>
<div>
And it's simple to use, for instance here's the render loop to render the point lights in the new LightPrePassRenderer (hopefully the Blogger editor won't screw up my formatting):</div>
<blockquote class="tr_bq">
<div class="p1">
<span class="s1"> </span><span style="font-family: Courier New, Courier, monospace; font-size: xx-small;">CoreGraphics2Facade<span class="s1">* cg2Facade = </span>CoreGraphics2Facade<span class="s1">::</span><span class="s2">Instance</span><span class="s1">();</span></span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> <span class="s3">if</span> (<span class="s3">this</span>-><span class="s2">pointLights</span>.<span class="s2">Size</span>() > <span class="s3">0</span>)</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> {</span></div>
<div class="p3">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><span class="s1"> cg2Facade-></span>BeginInstances<span class="s1">(</span><span class="s3">this</span><span class="s1">-></span>pointLightMesh<span class="s1">, </span><span class="s3">this</span><span class="s1">-></span>lightDrawState<span class="s1">, </span><span class="s3">this</span><span class="s1">-></span>pointLightFeatureBits<span class="s1">, </span><span class="s3">false</span><span class="s1">);</span></span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> <span class="s4">IndexT</span> i;</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> <span class="s3">for</span> (i = <span class="s3">0</span>; i < <span class="s3">this</span>-><span class="s2">pointLights</span>.<span class="s2">Size</span>(); i++)</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> {</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> <span class="s3">const</span> <span class="s4">Light</span>* curLight = <span class="s3">this</span>-><span class="s2">pointLights</span>[<span class="s2">i</span>];</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> <span class="s3">const</span> <span class="s4">matrix44</span>& lightTransform = curLight-><span class="s2">GetTransform</span>();</span></div>
<div class="p4">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span></div>
<div class="p5">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><span class="s1"> </span>// compute light position in view space, and set .w to inverted light range</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> <span class="s4">float4</span> posAndRange = <span class="s4">matrix44</span>::<span class="s2">transform</span>(lightTransform.<span class="s2">get_position</span>(), <span class="s3">this</span>-><span class="s2">viewTransform</span>);</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> posAndRange.<span class="s2">w</span>() = <span class="s3">1.0f</span> / lightTransform.<span class="s2">get_zaxis</span>().<span class="s2">length</span>();</span></div>
<div class="p4">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> </span></div>
<div class="p5">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><span class="s1"> </span>// update shader params</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> cg2Facade-><span class="s2">ApplyModelTransform</span>(lightTransform);</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> cg2Facade-><span class="s2">ApplyVariable</span><<span class="s4">float4</span>>(<span class="s2">LightPosRange</span>, posAndRange);</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> cg2Facade-><span class="s2">ApplyVariable</span><<span class="s4">float4</span>>(<span class="s2">LightColor</span>, curLight-><span class="s2">GetColor</span>());</span></div>
<div class="p3">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"><span class="s1"> cg2Facade-></span>ApplyVariable<span class="s1"><</span><span class="s3">float</span><span class="s1">>(</span>LightSpecularIntensity<span class="s1">, curLight-></span>GetSpecularIntensity<span class="s1">());</span></span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> cg2Facade-><span class="s2">Draw</span>(<span class="s3">0</span>);</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> cg2Facade-><span class="s2">EndInstances</span>();</span></div>
<div class="p2">
<span style="font-family: Courier New, Courier, monospace; font-size: xx-small;"> }</span></div>
</blockquote>
<div>
The only thing that's still missing from CoreGraphics2 is dynamic resources and a plugin system to extend functionality of the render-thread side with custom code (for instance for non-essential stuff like runtime resource baking). </div>
<div>
<br /></div>
<div>
As much as I'd love to have a rendering system where dynamic resources aren't needed at all, there's no way around them yet. We still need them for particle systems and UI rendering.</div>
<div>
<br /></div>
<div>
On the front-end of the render layer, there's the new <b>Graphics2</b> subsystem. The changes are not as radical as in CoreGraphics2 (with good reason because changes in this subsystem would affect a lot of high level gameplay code). There are still the basic object types <b>Stage</b>, <b>View</b>, <b>Camera</b>, <b>Light</b> and <b>Model</b>. There's now a new <b>GraphicsFacade</b> object, which simplifies setup and manipulation of the graphics world drastically. And I tried out a new component-system for GraphicsEntities (Models, Lights and Cameras). Instead of a inheritance hierarchy for the various GraphicsEntity types, there's now only one GraphicsEntity class which owns a set of Component objects. The combination of those components is what turns a GraphicsEntity into a visible 3D model, a light source, or a camera. The main driver behind this was that 90% of all data in a ModelEntity was character related, but less then 10% of graphics objects in a typical graphics world are actually characters.</div>
<div>
<br /></div>
<div>
I've split the existing functionality into the following entity components:</div>
<div>
<ul>
<li><b>TransformComponent:</b> defines the entity's position and bounding box volume in world space.</li>
<li><b>TimingComponent:</b> keeps track of the entity-local time</li>
<li><b>VisibilityComponent:</b> attached the entity to the Visibility subsystem (view frustum culling)</li>
<li><b>ModelComponent:</b> renders the entity as a simple 3D object</li>
<li><b>CharacterComponent:</b> additional functionality for skinned characters (animations, skins, joint attachments, ...)</li>
<li><b>LightComponent:</b> turns the entity into a light source</li>
<li><b>CameraComponent:</b> turns the entity into a camera</li>
</ul>
<div>
This component model hasn't really been written to allow strange combinations (you might be tempted to attach a CameraComponent to a Character-entity for a first person shooter). Theoretically something like this might be even possible, but I don't think it is a good idea. The driving force behind the component model was cleaner code and better memory usage.</div>
</div>
<div>
<br /></div>
Unknownnoreply@blogger.comtag:blogger.com,1999:blog-2948438400037317662.post-50895454845502239542012-10-23T21:00:00.002+01:002012-10-24T16:28:40.379+01:00Mea CulpaOk, let me just say that I went from "Saulus to Paulus" (as we say in Germany) in the past few days. In my ongoing stealth mission to evaluate all the C++-to-Web technologies currently available (Google Native Client, Adobe's Flash C/C++ compiler, and Mozilla's emscripten) I actually wanted to pick Adobe's solution next, since I didn't really believe that emscripten's approach of compiling C++ to Javascript could possibly work. I had a fixed idea in my mind, how fast a C++-to-Bytecode VM solution would be (that's what Adobe is doing), and how fast Javascript could possibly be, and Javascript would lose by a long shot. No way I thought it possible to run really math heavy code in a language with such a shitty type system (excuse my french).<br />
<div>
<br /></div>
<div>
There are numbers flying around like 25% to 50% of native performance (even up to 80% for Adobe's solution) which I thought to be extremely optimistic even for hand-picked benchmarks. For instance if you look at Epic's famous Unreal Flash demo, there's not a lot of dynamic stuff happening in the 3D world you're moving through. Sure it looks impressive, but it mainly demonstrates a good art style and how fast your GPU is, but doesn't say much about how efficient the "CPU code" is running in the Adobe VM.<br />
<div>
<br /></div>
<div>
Then I started to look closer at emscripten, spent a few days with porting, and imagine my surprise when I first started the optimized version of this:</div>
<div>
<br /></div>
<div>
<a href="http://n3emscripten.appspot.com/" target="_blank">http://n3emscripten.appspot.com</a> (disclaimer: uptodate Firefox or Chrome recommended, no IE)</div>
<div>
<br /></div>
<div>
...and I added dragons and more dragons, and even more dragons, until the frame rate finally started to drop. Of course it's not Native Client performance, but it is much (much!) better then I expected.</div>
<div>
<br /></div>
<div>
Let me explain what you're seeing: </div>
<div>
<br /></div>
<div>
The demo is built from a set of Nebula3 modules consisting of about 120k lines of C++ code cross-compiled to Javascript+WebGL through the emscripten compiler infrastructure. There is a lot (really a LOT) of vector floating point math C++ code running in the animation engine because I must admit that I actually wanted to "break" the JS engine and show how incredibly much faster NaCl would be. Well, that didn't quite work out ;)</div>
<div>
<br /></div>
<div>
Of those 120k lines of code, only a few hundred lines are actually specific to the emscripten platform. So there's less then 0.5% of platform specific code, and about 99.5% of the code is exactly the same as in the NaCl demo, or an actual "native" (OpenGL-based) desktop version of the demo. If you take all of Nebula3 (about half a million lines of C++ code), then the ratio of platform specific code is even more impressive.</div>
<div>
<br /></div>
<div>
Let this sink in for a while: </div>
<div>
<br /></div>
<div>
You can take a really big C++ code base with dozens of man years of engineering effort and a mature asset pipeline attached to it, spend about 2 weekends(!) of tinkering, and can run your code at a really good performance level in a browser without plugins! You still have to be realistic about CPU performance of course. It helps if a game is designed to make relatively little use of the CPU, and move as much work as possible onto the GPU, but these are the normal realities of game development. Of all the target platforms for a project you should choose the weakest as the "lead platform", make the game run well there, and use the extra power of the other platforms for non-essential, but pretty "bells'n'whistles".</div>
<div>
<br /></div>
<div>
And you don't have to burn bridges behind you: You can still use the exact same code base and asset pipeline and create traditional native applications for mobile platforms, desktop apps, or game consoles. And all of this in a programming language and a graphics API which many considered almost on its death bed a few years ago. </div>
<div>
<br /></div>
<div>
I'd say that C++ and OpenGL are on a goddamn come-back tour right now :)</div>
</div>
<div>
<br /></div>
<div>
Not all is golden though, each of the C++-to-web solutions has at least one serious weakness:</div>
<div>
<ul>
<li>Native Client: extremely fast and feature rich, but only supported by Google</li>
<li>emscripten (more specifically WebGL): not supported by Microsoft</li>
<li>Adobe flascc: Adobe wants a "speed tax" if you're using certain "premium features" required for high-performance 3D games and earn money with it</li>
</ul>
<div>
Sucks badly, since most of this is purely politics-driven, not for technological reasons.</div>
</div>
<div>
<br /></div>
<div>
So... originally I wanted this blog post to be a technical post-mortem of the emscripten port, but on one hand it's already quite long, and on the other hand: there's really not much to write about, since it went so smooth.</div>
<div>
<br /></div>
<div>
I installed the SDK and a few required tools to my MacBook, wrote (yet another) simple CMake toolchain file, and was able to compile most of Nebula3 out of the box within an hour. The most exciting event was that I found a minor bug in the OpenGL wrapper code, which was fixed within the day by the emscripten team (kudos and many thanks to kripken (aka azakai) for being so incredibly fast and helpful).</div>
<div>
<br /></div>
<div>
The only area where I had to spend a bit more time was on threading (or rather the lack thereof). Since emscripten (like NaCl) is running in the browser context, it suffers from many of the same limitations - like not being able to do synchronous IO, and you cannot "own the game loop" but need to run in little per-frame time-slices inside callbacks from the browser. NaCl offers a working pthreads API to workaround these limitations, but emscripten cannot support true threading since the underlying Javascript runtime doesn't allow sharing state between threads. This looked like a real show stopper to me in the beginning, but after a few nights of sleeping over this problem I found a really simple solution by moving to a higher level into Nebula3's asynchronous messaging system. Up there it was relatively easy to replace the multithreaded message handler code with a frame-callback system with minimal code changes.</div>
<div>
<br /></div>
<div>
It sucks a bit right now that everything runs on the main thread (so the current N3 demo cannot take advantage of multiple CPU cores), but a solution for a higher level multithreaded API based on HTML5 webworkers is in the works right now (I really can't believe how fast these guys are!).</div>
<div>
<br /></div>
<div>
I'm tempted to write a lot more of how incredibly clever the C++-to-JS cross-compilation is, and how the generated JS code can be even faster then handwritten code, and how surprisingly compact the generated code is, but if you're getting all exited about this type of stuff it's better if you're reading it first hand: </div>
<div>
<br /></div>
<div>
<a href="https://github.com/kripken/emscripten/wiki" target="_blank">https://github.com/kripken/emscripten/wiki</a></div>
<div>
<br /></div>
<div>
So where next? I'll polish the emscripten port a bit more and implement support for the new web worker API to load and uncompress asset files in the background, and then take a little detour and port the Dragons demo to Adobe's flascc (perfect timing since they just went into open beta). After that I need to do some cleanup work on all three ports, and on the higher level parts of the render pipeline since the low level rendering code has been replaced with the new "Twiggy" stuff.</div>
<div>
<br /></div>
<div>
In the meantime, here's another exiting development: RakNet has started support for Google Native Client: <a href="http://www.jenkinssoftware.com/forum/index.php?topic=4980.0" target="_blank">http://www.jenkinssoftware.com/forum/index.php?topic=4980.0</a>.</div>
<div>
<br /></div>
<div>
All the pieces are slowly falling into place...</div>
<div>
<br /></div>
Unknownnoreply@blogger.com