20 Dec 2013

Asset loading in emscripten and PNaCl

Loading data from a file on disk doesn’t look like a big deal in a normal C application:

int main() {
    // open file for reading
    FILE* fh = fopen("filename", "rb");
    if (fh) {

        // read some bytes
        char buffer[128];
        fread(buffer, sizeof(buffer), 1, fh);

        // close the file
        fclose(fh);
        fh = 0;
    }
    return 0;   
}

When doing a real-world game this simple approach has a couple of problems:

  • blocking: The above code is blocking, when reading from a fast hard disk this is probably not even noticeable, but try loading from a DVD or Bluray disk or some sort of network drive over a slow connection and the game loop will stutter
  • hard-coded paths: The concept of a current directory is often not portable, you can’t depend on the current directory being set to where your executable is. It is better to establish an absolute root location and have all filename paths in the game relative to that (of course how to establish this root location is platform-dependent again, for instance get the absolute path to the executable, and go on from there)
  • can’t use different transfer protocols: the above code works fine for local filesystems, but not loading data from a web- or ftp-server, and operations like creating a new file, or randomly seeking in a file may not be available with other protocols.

It is a good idea to restrict the type of file operations that a game can use, e.g.:

  • do we really need write and create access? An offline game may need to write save-game files and options, while an online game probably doesn’t need access to the local file system at all.
  • do we really need random seek? Randomly seeking in a file can be either impossible (HTTP) or slow because some mechanical device must be moved around, it’s often better to read a file straight into memory and seek there or to avoid such operations at all.
  • do we really need to iterate directory content? again, this can be either expensive (mechanical storage device) or impossible (in plain HTTP for instance)
  • do we really need free-form file paths? Games usually need to access very few places in the file system (the asset directory which is usually read-only, and maybe some sort of per-user writable location for settings and save-games)
  • do we really need access to file attributes? Stuff like last modification time, ownership, readable/writable. Usually this is not needed.
  • do we really need the concept of a “current directory”? This can be tricky for portability, and some platforms don’t have the concept of a current working directory at all

That’s a lot of features we don’t need in a game and which are also often not provided by web-based runtime platforms like PNaCl and JS. It helps to look at the HTTP protocol for inspiration, since that is where we need to load our data from anyway in the web scenario:

  • file system paths become URLs
  • only one read operation GET, which usually provides an entire file (but can also load a part of a file)
  • no directory iteration
  • no “write access” unless specifically allowed by the server
  • state-less, no current directory or current read position
  • operations can take very long (seconds or even minutes)

For a game which wants to load its asset from the web the IO system should be designed around those restrictions.

As an example, here’s an overview of the Nebula3 IO system:

  • all paths are URLs: Not much to say about this :)
  • a single root location: At application start, a root location is established, this is usually a file:// URL pointing to the app’s installation directory, but can be overriden to point (for instance) to an http:// URL. Loading all data from a web server instead of the local hard disk is done with a single line of code which sets a different root location.
  • Amiga assigns as path aliases: A filesystem path to a texture looks like this in N3: tex:walls/brickwall.dds, where the tex: is an “AmigaOS assign” which is replaced with an absolute path, incorporating the root directory.
  • all paths are absolute: there is no concept of a “current directory” in Nebula3, instead all paths resolve to an absolute location at runtime by replacing assigns in the path.
  • pluggable “virtual filesystem” modules associated with the URL scheme: URLs starting with file:// are handled by a different file system module than http://, plus Nebula3 apps can plug in their own filesystem modules if they want
  • stream objects, stream readers and stream writers: this is interesting in the web context only because there’s a MemoryStream object which is used to store and transfer downloaded data in RAM
  • asynchronous IO is really simple: more on that later in this post :)

Since Nebula3 is also used as a command-line-tools framework, the IO subsystem is a bit of a hybrid, which in hindsight was a design fault. There are still all these writing and file creation operations, blocking IO, directory walking etc… which makes the API quite bloated. In a new engine I would probably strictly separate the two scenarios, use the engine as a game framework only, which only supports very simple asynchronous read operations, and write the tools with another framework (or even other language, like python).

Asynchronous IO in Nebula3

Let’s look at async IO in Nebula3 a bit closer since this is the most interesting feature for web-based platforms. This is based on the “non-blocking future” pattern (or whatever you wanna call it) and depends on a frame-driven instead of event- or callback-driven application architecture.

Here’s some pseudo code:

void StartLoading() {
    // To start loading data we need to create an 
    // IO request object and "send it off" to the
    // IoInterface singleton for asynchronous processing
    Ptr<IO::ReadStream> req = IO::ReadStream::Create();
    req->SetURI("tex:walls/brickwall.dds");
    IoInterface::Singleton()->Send(req);

    // The IoRequest is now "in flight" and will contain
    // a result at some point in the future. Because we need
    // to check for completion in some later frame we need to
    // store the smart pointer somewhere
    this->pendingRequest = req;

    // ok, we're done for this frame...
}

void HandlePendingRequest() {
    // this function must be called regularly (e.g. per
    // frame) to check whether the async loading operation
    // has finished
    if (this->pendingRequest.isvalid() &&
        this->pendingRequest->Handled()) {

        // ok, the request has been completed, if 
        // the file was loaded successfully we get
        // a MemoryStream object with its content
        if (this->pendingRequest->GetSuccess()) {

            // actually load the data from the memory
            // stream and throw the request object away,
            // since all file data is in memory, we can
            // actually use the normal open/seek/read/close
            // pattern on the stream object
            this->LoadFromStream(this->pendingRequest->GetStream());

            // delete the request object, 
            // remember, this is a smart pointer :)
            this->pendingRequest = 0;
        }
    }
}

There may be less verbose or more elegant versions of this code of course, but the basic idea is that you start loading a file in one frame, and then need to check in the following frames if loading has finished (or failed), and get the completely loaded data in a memory buffer which can be parsed with “traditional” read and seek functions (and which is very fast since everything happens in memory).

This implies that the engine needs to know what to do while some required data has not been loaded yet. For a graphics pipeline this is quite simple by either rendering nothing or some placeholder while the data is still loading.

But there are cases where the code cannot progress without important data being loaded, or where it would be very tricky or impossible to implement asynchronous IO (for instance when integrating complex 3rd party libraries like sqlite).

If we could simply block this wouldn’t be a problem: the worst thing that would happen is that our game loop would stutter, but on web platforms we cannot simply block the main thread (it is easier on PNaCl where it is recommended to move the game loop into a separate thread, which then can block waiting for the main thread to process asynchronous IO requests).

For Nebula3 I fixed this with an additional application object state called the “Preloading Phase”. The idea is that the app enters this state outside of the normal game loop (for instance while displaying a loading screen), and during this state, populates a simple in-memory filesystem (basically just a lookup-table with URLs as keys and MemoryStream objects as values) with the asynchronously loaded data. When all data has been loaded (or failed to load), the app will leave the preloading phase (and hide the loading screen) and synchronous loader code will transparently get the data from the in-memory file system instead of starting an actual asynchronous IO request. Since all this preloaded data resides in memory this means of course that only small data and few files should be preloaded, and the majority of data should be asynchronously streamed on demand during the game loop. It’s really only a workaround for the few cases where synchronous access is absolutely necessary.

More details about here in one of my presentations: http://www.slideshare.net/andreweissflog3/gdce2013-cpp-ontheweb

emscripten and PNaCl details

Ok, almost done!

For the emscripten and PNaCl platforms I basically wrote a simple Nebula3 filesystem module which fires HTTP GET requests through he respective emscripten and PNaCl API calls, and copies the received data into MemoryStream objects, it’s only a few hundred lines of code each.

The main difference between the two platforms lies in the use of threading:

  • PNaCl works like “traditional” platforms, there are a number of IO threads (about 10, but that’s tweakable) each of them processes one IO request at a time, so that as many IO requests can be in flight as there are IO threads. Those threads also directly handle processing of the received data like decompression.
  • In emscripten, the IO calls (sending a HTTP request, and the callback when the response has been received) is handled on the main thread, but the expensive processing (e.g. decompression) of the received data is handed over to a WebWorker pool (usually 4 WebWorker threads). There can still be multiple IO requests in flight because the IO system doesn’t “wait” for an IO request to finish before firing a new one (but it is still throttled to restrict the number of requests in flight in case a lot of requests arrive in a very short time period).

The actual code implementation is straightforward so I’ll spare you the source code samples. The respective class in PNaCl is called pp::URLLoader, and emscripten offers a whole set of rather specialized C functions which all start with emscripten_async_wget. Both fire an HTTP request (emscripten does an XmlHttpRequest, and PNaCl presumably under the hood as well - this has some unfortunate cross-domain implications), and invoke callbacks on failure or when data has arrived. PNaCl needs a bit more coding work since data is received in chunks (and the receive callback can be called multiple times), while emscripten waits until all data is received before calling the received-callback once.

emscripten has more options to integrate the data with the web page DOM (for instance it can automatically create DOM image objects from downloaded image files), and it also has a very advanced CRT IO emulation layer (so you actually can directly use fopen/fclose after the data has been downloaded or preloaded), but I haven’t looked into these advanced concepts very closely since Nebula3 already does a lot of this layering itself.

There’s a similar filesystem layer for NaCl called nacl-mounts, but similarly to emscripten I didn’t look into this very closely since the low-level URL loading functions were a better fit for N3.

That’s it for today, have a nice Christmas everyone :)

Written with StackEdit.