21 Jun 2013

Sane C++

TL;DR: An attempt to outline the 'good parts' of C++ from my experience of porting Nebula3 to various platforms over the years. Some of it controversial.

Update: some explanation why STL and C++11 is currently "forbidden", see below!


...is relatively famous for how easy it is to shoot yourself in the foot in many interesting ways. The types of bugs which are simply impossible in other languages is legion.
So then, why is C++ so damn popular in game development? One of the most important reasons (IMHO) is that C++ allows to write very high-level and very low-level code. If needed, you can have full control over the memory layout, when and how dynamic memory is allocated and freed, and how exactly memory is accessed. At the same time you can write very clean and high-level code with the right framework and don't care about memory management at all.
Especially the significance of low-level programming, e.g. controlling the exact memory layout of your data is often ignored by other, higher level languages, even though it can have a dramatic effect on performance.
One of the most common C++ newbie errors is to tackle a big software project without a proper high-level "toolbox". C++ doesn't come with a luxurious standard framework like all those fancy-pancy modern languages.
And with only hello_world.cpp under their belt newbies quickly end up with this typical mess of object ownership problems, spaghetti-inheritance, seg-faults, memory leaks and lots of redundant code all over the place after just a few ten-thousand lines of code.
On the other hand, it is incredibly easy to write really slow code in a high-level environment since you don't really know (or need to care) what's going on under all those layers of convenience.
The most important rule when diving into C++ is: Know when to write high-level and when to write low-level code, these are completely different beasts!
So what's the difference between high-level and low-level C++ code? I think there's no clear-cut separation line, but a good rule of thumb is: if it needs to run a few thousand times per frame, it better be really well optimised low-level code!
  • If you look at a typical rendering pipeline, there's this typical cascade where every stage in the pipeline is executed at least an order of magnitude more often then the previous one: outer-most there's stuff that happens only once per frame, next code is executed once per graphics object, then once per bone/joint, then per vertex, and finally per pixel. The realm of low-level code starts somewhere between per-object and per-bone (IMHO).
  • Typical high-level code to me is "game play logic". This is also were thinking object-oriented still makes the most sense (as opposed to a more data-oriented approach). You have a couple of "game objects" which need to interact with each other in fairly complex ways. On this level you don't want to think about object ownership or memory layout, and high-level concepts like events, delegates, properties etc... start to make sense. Shit starts to hit the fan when you have thousands of such game objects.
  • It is of course desirable to get the performance advantages of low-level code combined with the simplicity and convenience of high-level code. This is basically the holy grail of games programming. Hiding complex or complicated code under simple interfaces is a good start.
Ok, so before I drift completely into the metaphysical, here's a simple check-list:

Forbidden C++:

This stuff is completely forbidden in our coding-style:
  • exceptions
  • RTTI
  • STL
  • multiple inheritance
  • iostream
  • C++11
That's right, we're not using C++ exceptions, RTTI, multiple inheritance or the STL. C++11 is pretty cool, but still too fresh. Most of these restrictions will make your multiplatform-life a lot easier (and not much of importance is lost IMHO).

Update: I should have explained why the STL and C++11 is on this list. First the STL: Historically the STL came with a lot of problems because quality differed between compilers a lot, porting to non-PC platforms was difficult if your code depended on STL, and I am reluctant to include more complex dependencies into the engine (like boost for example). Today STL implementations are much better, so on most platforms this is probably no longer an issue.

Personally, I think the STL is an ugly library, *at least* the container classes. You'll have to admire its orthogonality and flexibility, but in reality one project ever only needs 3 or 4 specialisations. What we did was write a handful of container classes (Array, Dictionary, Queue, Stack, List) in the spirit of C#'s container classes (those are probably not as flexible as STL conteiners, but they do look nicer, and the generated code should be the same in most cases). Beautiful looking source code is important I think. This may all change with C++11 though. C++11 is extremely cool, but I think it is too early still to jump on if we need cover a lot of platforms. But C++11 together with the STL is much more powerful then those two alone, so I will very like revert my stance on STL once we switch to C++11.

But I think this switch should be done throughout the entire engine (starting at the core with the new move semantics which are really useful for containers, to the new threading support, lambdas, function objects and so on), so switching to C++11 will involve a major rewrite of Nebula3, maybe even justify a major version number switch. I think it doesn't make sense to sprinkle bits and pieces of C++11 and STL here and there into the code

Tolerated C++:

Use with care, don't go crazy:
  • templates
  • operator overloading
  • new/delete
  • virtual methods
Templates are very powerful, they can make your code both more readable, AND faster because more type information is known at compile time. But you really need to keep an eye on the generated code size. Don't nest them too deeply, and keep it simple.
Operator overloading is restricted to very few places (containers and items in containers). We're NOT having operator overloading in our math library. dot(vec,vec) is much more readable then vec*vec.
Not using new/delete in C++ code sounds a bit crazy, I know. But most of the time where you need to create an object on the heap you'll also want to hand its pointer somewhere else, which quickly introduces ownership problems. That's why we're using smart pointers to heap objects which hide the delete call. And since a new without its delete looks a bit silly, we're also hiding the new behind a static Create() method. It's better to avoid heap objects altogether though, especially in low-level code.
Virtual methods are important of course, BUT: Just spend a second to think about whether a method really must be virtual (or more importantly: do you really need run-time polymorphism, or is compile-time polymorphism enough?). The more "static" your code is, the more optimisation options the compiler has.

Forbidden C:

Some unusual stuff here as well:
  • all CRT functions like fopen() or strcmp() are forbidden, except the math.h functions
  • directly calling malloc()/free() is forbidden
Most of the CRT functions are straight out terrible (strpbrk, strtok, ...) and/or dangerous (strcpy), so we're wrapping them all away and/or use better platform-specific functions under the hood (this can also reduce executable size, which is always good).
Overriding malloc/free with central wrapper functions is really useful once you need to do memory-debugging and -profiling, also makes it easier to try out different memory allocator libs.

Tolerated C:

Some "dangerous" stuff is only allowed in performance-critical low-level code:
  • raw pointers and pointer arithmetics
  • raw C arrays
  • raw memory buffers
These are all recipes for disaster in the hands of an unexperienced programmer (or an experienced programmer who needs to juggle too many things in his head). Instead of pointers, use smart-pointers to refcounted objects (see above), or indices into containers. Instead of raw arrays use containers. Never directly allocate and access memory buffers in high-level code.
All of these "dangerous techniques" are essential for really performance-critical low-level code though, but this is only at a handful places in the code, and when the really mysterious kind of crashes happen, at least you know where to look.

The End

One last point: our code is riddled with asserts which are also enabled in release mode (hardly makes a performance difference, but the uncompressed executable size is up to 20% larger because of the expression strings, thankfully those strings compress very well).
The essential, must-have assert checks are for invalid smart pointer accesses (null pointers), boundary checks in container classes and checking for valid method parameters.
With all of the above, we're rarely ever hitting a seg-fault (maybe twice a year on the server-side). If something breaks, then it is very likely an assertion check which got hit, and this is usually very easy to post-mortem-debug since it comes with a call-stack and method signature.


Andreas Bergmeier said...

With what you describe, you are a perfect candidate to use Qt. Which is mostly what you seem to use: C with classes.
While I can accept the opinion, I would rather like to know what the actual arguments (not opinions) are, why you are holding back on C++11.

Igal said...

I would also like to know what are the reasons behind not using C++11 standard? it seems both GCC and Clang fully support the new standard.

Is Visual Studio is the reason for that ? i see it still does not support all C++11 features.

Andre Weissflog said...

Just want to let the dust settle a bit until I go into C++11, I haven't made experiments yet how well it is supported on various platforms. Visual Studio is of course important, some other platforms which come with their own modified compilers like flascc or Native Client often, or even Linux systems not running the latest distributions are often behind a few versions. So I'm a bit conservative about switching to new compiler features ;)

Andre Weissflog said...

I'm actually a big Qt fan, I think it does many things right, at least for a high-level C++ framework.

Igal said...

Thanks for this clarification.

kripken said...

Do you do anything for smart pointer cycles? Do you just avoid them, or do you use tools in some way?

Andre Weissflog said...

we're having a weakptr to break the cycle, but in general we just try to avoid them

Nils said...

I wonder why you forbid the STL? What are you using for strings, containers, etc your own lib or Qt or something else?

Andre Weissflog said...

Nils: I updated the post with some more explanations. We basically wrote our own containers and string classes. The string class has gone through many changes over time, and I'm still not happy (also not with the STL string class). We have a StringAtom class which has become more and more important (relatively slow to create, but extremely fast to compare and copy, since these are just pointer operations). The String class itself will become immutable in the future, and all the expensive manipulation functions will go into some sort of StringBuilder class.

Nils said...

Thx for the quick update! About multiple inheritance: Are you using some kind of interfaces? Like classes with only pure virtual functions. In that case do you allow the implementation of an interface and inheritance from a base class? In C++ this would technically result in multiple inheritance.

Andre Weissflog said...

Nils: We don't use an interface-like approach anywhere in N3, although I like the concept in other languages. The way multiple inheritance is implemented in C++ is just too weird, at least combined with virtual methods. In general, we don't rely very much on inheritance to extend existing behaviour, we have a few places with deep inheritance chains (maybe 3..4 deep), but those have always been sore spots. Nowadays I prefer composition and simple template classes over inheritance in most cases, these also seem to make less problems in the long run (refactoring etc...)

mikaelhc said...

I think operator overloading makes math libraries easier to read/use. For ambigious operators nobody forces you to overload: I'd use cross(...) and dot(...) for vector products, only overloading * for scalar multiplication - following the syntax of e.g. GLSL/HLSL.

Unknown said...

Just tested demos on Android (WebGL should be enabled in about:flags). Only problem I see is wrong terrain textures in Drakensang map viewer but everything else works as it should.
Great work!

Bruce said...

Emscripten is fine with C++11 ... I updated the libcxx that it uses to a much more current one and fixed a lot of bugs in the process. :) It'll be even better once emscripten moves to LLVM / Clang 3.3.

There were some fun things with the Windows platform, but those are addressed as well.

I've no idea about flascc, but it probably wouldn't be hard to get Native Client to a good place if it isn't already.

But I'd be conservative about it as well due to older Linux distributions, I doubt Nintendo is up to date, and Visual Studio still needs a pretty current version to be solid.

Larry said...

"Beautiful looking source code is important I think" Ive always been afraid to say this out loud, now I can.