The Brain Dump: 09/13

7 Sept 2013

emscripten and PNaCl: App entry in PNaCl

The is the followup to last week's post about application entry in emscripten. If you haven't done yet I would recommend reading this first before continuing.

2 main points to keep in mind about the (P)NaCl platform:

Blocking the main thread will block the entire browser tab.
NaCl has true threading support which can be used to workaround these blocking limitations.

Point (1) is the same as on the emscripten platform, and point (2) is the big difference to emscripten.

In a Nebula3/PNaCl application, the main function looks the same as on any other platform (I'm using emscripten's "simulate_infinite_loop" approach now):

#include "myapplication.h"

ImplementNebulaApplication();

void
NebulaMain(const Util::CommandLineArgs& args)
{
    MyApplication app;
    app.SetCommandLineArgs(args);
    app.StartMainLoop();
}

However under the hood, the startup process until the NebulaMain() function is entered is completely different from other platforms, since PNaCl doesn't have a main() function. Instead PNaCl has the concept of application Module and Instance objects. This is where the plugin-nature of a PNaCl app shines through. There is a single Module object created on a web page containing a PNaCl app, and for each <embed> element on the page, one Instance object. In reality though, most of the time there will be exactly one Module and one Instance object, so the distinction doesn't really matter.

PNaCl offers two different startup APIs for C and C++. The C++ API is easier to grasp IMHO, so I'll just concentrate on this (this dual C/C++ nature continues through the whole NaCl API, there's a pure C API, extended by a slightly higher-level C++ API.

Hooking up your code to NaCl basically means to write 2 subclasses, one deriving from pp::Module, and one deriving from pp::Instance, and the NaCl runtime will then call into these classes through virtual methods for initialisation and notifying the application about events.

But first things first:

Everything starts at a global C Function called pp::CreateModule() which you must provide, and which must return a new object of your pp::Module subclass (called N3NaclModule in this case):

namespace pp
{
    Module* CreateModule()
    {
        return new N3NaclModule();
    };
}

Although this is the very first function that NaCl will call, you should be aware that initialisers in the global scope (static objects) will already be initialised and have had their constructors called at this point.

The main job of the derived Module class is to create Instance objects, but we can also put some one-time init code in there. There's a pair of functions to initialise and shutdown GL rendering called glInitializePPAPI() and glTerminatePPAPI(). The only rule is that no GL calls must be made outside these two functions, so I guess we could also put them somewhere else, as long as is guaranteed that they are not called multiple times.

But - the most important method in the derived Module class is the factory method for Instance objects called CreateInstance. In my case, I have created a subclass of pp::Instance called NACL::NACLBridge.

The entire N3NaclModule class looks like this:

class N3NaclModule : public pp::Module
{
public:
    virtual ~N3NaclModule()
    {
        glTerminatePPAPI();
    }
    virtual bool Init()
    {
        return glInitializePPAPI(get_browser_interface()) == 1;
    }
    virtual pp::Instance* CreateInstance(PP_Instance instance)
    {
        return new NACL::NACLBridge(instance);
    };
};

All the really interesting stuff from here on happens in the NACLBrigde object.

These two source snippets live inside the ImplementNebulaApplication() macro which all in all looks like this:

...
#elif __NACL__
#define ImplementNebulaApplication() \
class N3NaclModule : public pp::Module \
{ \
public: \
    virtual ~N3NaclModule() \
    { \
        glTerminatePPAPI(); \
    } \
    virtual bool Init() \
    { \
        return glInitializePPAPI(get_browser_interface()) == 1; \
    } \
    virtual pp::Instance* CreateInstance(PP_Instance instance) \
    { \
        return new NACL::NACLBridge(instance); \
    }; \
}; \
namespace pp \
{ \
    Module* CreateModule() \
    { \
        return new N3NaclModule(); \
    }; \
}
#elif __MACOS__
...

Now on to the NACLBridge class, this is (I know I'm repeating myself) derived from the pp::Instance class, but is called "Bridge" for a reason: in the PNaCl we're spawning a dedicated thread for the game loop, and leave the main thread (aka the Pepper thread) for event handling and rendering. Our derived pp::Instance subclass serves as a "bridge" between these 2 threads, that's why it's called NACLBridge.

The NaCl runtime will call into virtual methods of an pp::Instance object for handling events, the most important of these are Init(), DidChangeView(), HandleInputEvent(). For a complete overview and exhaustive documentation of those callback methods I recommend sifting directly through the SDK header: include/ppapi/cpp/instance.h

In the Init() method I'm only building a CommandLineArgs object from the provided raw arguments (these have been extracted from our <embed> element in the HTML page).

The actual initialisation work happens (in my case) in the first call to DidChangeView() by calling a Setup() method in the NACLBridge object. I choose this place because this is where I'm getting the current display dimensions of the <embed> element, which is required for the renderer initialisation (although now thinking about it, I might also be able to extract these from the arguments provided in the Init() method, need to try this out some time).

The NACLBridge::Setup() method only does one thing: create a thread with the NebulaMain() function as entry point, and then return to the NaCl runtime. The code inside NebulaMain() works just as on any other platform, with the only difference that it is not running on the main thread, but in its own dedicated game thread.

The big advantage to run the game loop in its own thread is that you "own the game loop", and you can perform blocking, for instance to wait for IO. The disadvantage is that you can't call any PPAPI (NaCl system functions) from the game thread, which is a blog-post-topic on its own.

So to recap: The ImplementNebulaApplication macro runs on the main thread, and creates one pp::Module and one pp::Instance object. The pp::Instance object creates the dedicated game thread, which calls into the NebulaMain() function, which from that moment on runs the game loop like on any other platform. With this approach we don't need to slice the game loop into frames like on the emscripten platform.

Now that you heroically worked your way through through all of this I'll tell you a secret: NaCl also provides a simple alternative to this complicated mess called the ppapi_simple library, which essentially provides a classic main() function running in its own thread, and because blocking is allowed on this thread, also provides normal POSIX fopen()/fclose() style blocking IO functions (sound familiar?).

Check out the header file include/ppapi_simple/ps.h as starting point.

Unfortunately this ppapi_simple library didn't exist when I started dabbling with NaCl about 2 years ago, certainly would have made life a lot easier. On the other hand, the work that had already gone into the NaCl port made the emscripten port easier, which wouldn't be the case had I used the ppapi_simple wrapper code.

Written with StackEdit.

1 Sept 2013

emscripten and PNaCl: App entry in emscripten

When quickly hacking a graphics demo on the PC or consoles, the main function usually looks like this:

int main() 
{
    if (Initialize()) 
    {
        while (!Finished()) 
        {
            Update();
            Render();
        }
        Cleanup();
    }
    return 0;
}

Trying this in on one of the browser platforms like emscripten or PNaCl results in a freeze and after a little while the browser will kill your tab :(

The problem is that the browser won't "let you own the game loop", and this is a general problem of event- or callback-driven platforms (iOS and Android have the same problem for instance). On such platforms the execution flow of the main thread is not controlled by your game code, instead there's some outer event loop which will call into your code from time to time. If you spent too much time in your allotted slice of the pie you will drag the entire system event loop down and other important events (such as input events) can't be handled fast enough. Result is that the entire user interface will feel sluggish and unresponsive to the user (for instance, scrolling in your browser tab will stutter or even freeze for multiple seconds). And if you don't return for about 30 seconds, then the browser will kill your app (Aw Snap!).

This is all bad user experience of course, we want the browser to remain responsive, and scrolling smooth all the time, also during initialisation and load time.

The core problem is that your code must always return within a few milliseconds back to the browser (e.g. 16 or 33, depending on whether you're aiming for 60 or 30fps), and this is the big riddle we need to solve for a game application running in a browser.

For a Flash or Javascript coder, or someone who's mainly writing event-driven UI applications this will all be familiar, they are used to have all their code run inside event handlers and callbacks, but typical UI apps usually don't need to do anything continuous. Event-driven applications sleep most of the time, react to (mostly input-) events from the outside, and go to sleep again. But games need to do continuous rendering, and thus are frame-driven, not event-driven, and mixing these two programming models isn't a very good idea because its hard to follow the code-flow. The usual way to implement games on event-driven platforms is to setup a timer which calls a per-frame callback function many times per second. I think hacks like this is why game programmers have a deep hatred for UI-centric platforms (and why I still like Windows despite its other shortcomings, because the recommended event handling model in Windows for games (PeekMessage -> TranslateMessage -> DispatchMessage) actually lets you "own the game loop" in a very simple and elegant way through message polling).

There are a few different approaches to either get a true continuous game loop, or at least to create the illusion of a continuous game loop on platforms where polling isn't possible, mainly depending on whether "true" pthreads-style multi-threading is supported or not.

In a Nebula3/emscripten application this isn't the case, the actual game loop and the rendering code runs on the main thread. Reason for this is that emscripten's multithreading support is built on WebWorkers. pthreads emulation isn't possible in emscripten since WebWorkers can't share memory with the main thread, furthermore, WebWorkers can't call into WebGL. This puts a lot of restrictions on our "game loop problem", and it required to refactor Nebula3's application model: in all previous ports there was always a way to somehow run a continuous game loop, mostly by moving the game loop into its own thread, but we don't have this option in emscripten (yet ... but hopefully one day, with more flexible WebWorkers).

Traditionally, a Nebula3 application used to go through a simple "Open -> Run -> Close -> Exit" sequence. An N3 main file looked like this for instance:

#include "myapplication.h"

ImplementNebulaApplication();

void
NebulaMain(const Util::CommandLineArgs& args)
{
    MyApplication app;
    app.SetCommandLineArgs(args);
    if (app.Open())
    {
        app.Run();
        app.Close();
    }
    app.Exit();
}

Instead of a main() function, there's a NebulaMain() wrapper function and a macro called ImplementNebulaApplication(). These hide the fact that not all platforms have a standard main() (for a Windows application, one would typically use WinMain() for instance).

The actual system main function is hidden inside the ImplementNebulaApplication() macro, for a PC-like platform the macro code looks like this:

int __cdecl main(int argc, const char** argv)
{
    Util::CommandLineArgs args(argc, argv);
    return NebulaMain(args);
}

Now back up to the NebulaMain() function's content: the Application::Open() method could take a while to execute (couple of seconds, worst case), and the Application::Run() will contain the "infinite" game loop, which only returns when the application should quit.

Since this wasn't a very good fit for the emscripten platform (because of this "infinite" loop inside the Run() method), first step was to make the app entry even more abstract to give the platform-specific code more wiggle room:

#include "myapplication.h"

ImplementNebulaApplication();

void
NebulaMain(const Util::CommandLineArgs& args)
{
    static MyApplication* app = new MyApplication();
    app->SetCommandLineArgs(args);
    app->StartMainLoop();
}

The most obvious change is that there's only a single StartMainLoop() method instead of the Open->Run->Close->Exit sequence. And at closer inspection some strange stuff is going on here: The application object is now created on the heap, the pointer to the object lives in the global scope, and the app object is never deleted. WTF?!?

To understand what's going on we need to dive a bit deeper into the emscripten system API.

The StartMainLoop function is actually only a one-liner on the emscripten platform:

emscripten_set_main_loop(OnPhasedFrame, 0, 0);

This sets the per-frame callback (called OnPhasedFrame) which the browser runtime will call regularly, and we'll have to do everything inside this callback function. The first 0-arg is the intended callback frequency per second (e.g. 60). 0 has a special meaning: in this case emscripten is using the modern requestAnimationFrame mechanism to call our per-frame function (instead of of the old-school setInterval or setTimeout way). The second argument is called simulateInfiniteLoop, and to understand what this does it is first necessary to understand what happens when it is not used:

The emscripten_set_main_loop() function will simply return, all the way up to main(), which will also return right after it has started! WTF indeed...

In a normal C program, returning from the main() function means that the program is shutting down of course. Local-scope objects will be destroyed before leaving main(), then global-scope objects (static initialisers).

In emscripten's case, a program which has called emscripten_set_main_loop() continues to run after main() has returned. This is a bit of a strange design decision, but makes for familiar looking code (e.g. hello_world.cpp is the same as on any other platform). Objects in the global scope will continue to exist in emscripten after main() returns, but objects in the local scope of main() will be destroyed, thus this strange way to create our application object, to prevent the app object from being destroyed after main() is left:

    static MyApplication* app = new MyApplication();

And now back to that simulate_infinite_loop argument: This is a new argument which was introduced after I started the Nebula3 emscripten port. Setting this argument to 1 will cause the emscripten_set_main_loop() function to not return to the caller, instead a Javascript exception will be thrown which essentially means that execution bails out of the C/C++ code without unwinding the (C/C++) stack, thus leaving local-scope objects of the main() function alive, everything after emscripten_set_main_loop() will never be called. So with this fix we could just as well write:

void
NebulaMain(const Util::CommandLineArgs& args)
{
    MyApplication app;
    app.SetCommandLineArgs(args);
    app.StartMainLoop();
}

Which looks a lot more friendly indeed.

So this basically covered emscripten's application startup process, we now have a per-frame function (called OnPhasedFrame) which will be called back at 60 fps. We just need to cram everything the application has to do into these 1/60sec time slices. This is fine for the actual game loop after everything has been loaded and initialised, but can be a problem for stuff like loading a new level, which could take a couple of seconds. In a traditional game, worst thing that could happen in this case is that the loading screen animation (if there is any) may stutter, but in a browser environment, such pauses will affect the entire browser tab (freezing, no scrolling, etc...), which makes a very bad first impression to the user.

So what to do? For Nebula3 I created a new Application base class called "PhasedApplication". Such a phased application goes through different life time phases (== states), such as:

Initial     -> app has just become alive
Preloading  -> currently preloading data
Opening     -> currently initializing
Running     -> currently running the game loop
Closing     -> currently shutting down
Quit        -> shutting down has finished

Each of these phases (or states) has an associated per-frame callback method (OnInitial, OnPreloading, OnOpening, etc...). The central per-frame callback will simply call into one of those methods based on the current phase/state. Each phase method invocation must return quickly (the browser's responsiveness depends on this), and may be called many times until the next phase is activated. So instead of doing a lot of stuff in a single frame, we do many small things across many frames.

Best example to illustrate this is the OnOpening() method. Suppose we need to do a lot of initialisation work during the apps Opening phase. Files need to be loaded, subsystems must be initialised and so on. This may take a couple of seconds. But the rule is that we must ideally return within 1/60sec, and we also don't have an independent render thread which could hide the main-thread freeze behind a smooth loading animation. So we need to do just a little bit of initialisation work, possibly update rendering of the loading screen, and return to the browser runtime. But since we haven't switched to the next state yet, OnOpening() will be called back again, and we can do the next piece of initialisation work. Sounds awkward of course, and it is, but there's not a lot we can do about it.

A new Javascript concept called generators could help to clean up this mess, with these it should be possible to chop a long sequence of actions into small slices while leaving the function context intact (essentially like a yield() function in a cooperative multithreading system) - catapulting Javascript into the illustrious company of Windows1.x and Classic MacOS. But enough with the ranting ;)

A somewhat cleaner method for long initialisation work is starting asynchronous actions through a WebWorker job in the first call to OnOpening() and during the next OnOpening calls check for all of those actions to have finished, gather the results, and finally switch to the next state, which would be Running. In the worst case, initialisation code must literally be chopped into little slices running on the main thread.

So that's it for this blog post. Originally I wanted to compare emscripten's and PNaCl startup process, but this would be way too much text for a single posts, so next will very likely be a similar walk through of the PNaCl application start, and after that the next big topic: how to handle asset loading.

Written with StackEdit.