Know your assembly (part 5)

The other day I was looking at a crash dump for a friend. A discussion that followed made me realize it might be worth to write a short post explaining why sometimes 2 seemingly almost identical function calls behave very differently. Consider the following code snippet (simplified): 1struct Foo 2{ 3 __declspec(noinline) const int& GetX() const { return x; } 4 virtual const int& GetY() const { return y; } 5 6 int x, y; 7}; 8 9int Cat(int); 10int Lol(Foo* f) 11{ 12 const int& x = f->GetX(); 13 const int& y = f->GetY(); // *** 14 15 return Cat(x+y); 16}

Simple multithreading tricks

Today I’d like to share a simple multithreading trick you can use to minimize “bubbles” in your pipeline. “Simple” because it applies mostly to “oldschool” threading systems, ie. the ones with main thread concept that kicks jobs and eventually flushes them. Cool kids using proper task graphs where everything just flows beautifully should not need it. Imagine we have a simple scenario, our thread produces some work, continues with whatever it’s doing, eventually waits for the job to finish (ideally this overlaps the work from previous point, so not much to do here) and processes results.

Zig pathtracer

If you’ve been reading this blog for a bit, you might know that when I experiment with new programming languages, I like to write a simple pathtracer, to get a better “feel”. It’s very much based on smallpt (99-line pathtracer in C++), but I do not want to make it as short as possible. I am more interested in how easily I can get it to run and what language ‘features’ I can use.

A debugger barrier

I’ve recently been asked by a friend for a little help with debugging a problem he was running into. Occasionally the program would freeze while trying to process a chunk of data and never moved on to the next one. Application is heavily threaded and processing is done by thread B, while thread A does its own job and periodically checks if work has been finished. If so, it sends it for further transformations and queues more work for thread B.

Grafana for dummies

Grafana is a very popular “analytics platform” or in more professional terms - a system to create pretty graphs. It’s very popular for monitoring system metrics, but really can be used for any timeseries data. It supports plethora of data sources and there is a decent chance you can use one of the off-the-shelf solutions to do 99% of the work for you (for example for some basic system metrics, especially on Linux).

Microcorruption writeup

Microcorruption is my ongoing “distraction” – it’s an online CTF. I’m way late to the party and have been doing it on and off since… 2013. How does it work exactly? To use their own description: “tl;dr: Given a debugger and a device, find an input that unlocks it. Solve the level with that input. You’ve been given access to a device that controls a lock. Your job: defeat the lock by exploiting bugs in the device’s code.

API granularity

I’d be the first to admit I don’t have much experience designing public APIs. I typically work on code that’s fairly specific to the game we’re making and while some of it is expected to be reused, our potential user pool is very limited, we’re still talking just one team, so <40 people and definitely not thousands. I’m still successfully using some little utilities/helpers I designed/coded 10+ years ago, but every now and then I run into a situation where decisions taken back then catch up with me and force to rewrite the API.

Ranged based "for" story

Consider the following, seemingly innocent (and completely made up) fragment of code: typedef std::map<int, std::string> MyMap; void foo(const MyMap& m) { for(const std::pair<int, std::string>& i : m) { if(i.first) { printf("%s\n", i.second.c_str()); } } } Looks simple enough, right? Just iterate over all elements of the map and print the value for non-zero keys. We use range-based for construct, use references, so do not expect any copies. Let’s just quickly make sure it all works as expected and consult Compiler Explorer.

Two Stage Push & Pop

I probably mentioned this before, but SPSC (single producer, single consumer) queue is one of my favorite structures. If implemented correctly, it’s actually 100% lock-free and is also surprisingly versatile (not all problems require MPMC!). I typically use a slightly modified version of this or an unbounded version, based on code by Dmitry Vyukov. Both implementations are very simple and possibly lack some of the modern C++ bells and whistles, like emplace, but these should not be hard to add.

Circular buffers (to the rescue)

Circular buffers are one (if not the) of my favorite data structures for some quick&dirty debugging. A simple, not production ready version can be implemented in a few lines of code (not ashamed to admit, I usually just copy paste these and remove when I’m done) and they’re a great tool to “record” a history of generic events. Any time I run into a seemingly random/unpredictable issue that might take a long time to repro, they’re on my short list.

Vanishing warning

Yet another MSVC story. Visual Studio has a nice compile-time warning when trying to access a static array with invalid index - C4789. According to documentation it’s mostly meant for various ‘copy’ functions (memcpy/strcpy etc), but it seems to work on ‘simple’ accesses as well. Consider (here’s a Godbolt link): struct Tab { float tab[2]; }; void Foo(const Tab&); void Bar(float forward) { Tab tab;[0] = forward;[2] = forward; // OOB access!

Compilers are smart

I recently transitioned to Visual Studio 2017 and while it went relatively painless, the new and improved optimizer uncovered some subtle issues lurking in the code (to be fair, Clang/GCC has been behaving same way for a long time now). The code in question was actually quite ancient and originated from this Devmaster forum post (gone now, but found if using The Wayback Machine): Fast and accurate sine/cosine. To be more precise, it was this version with ‘fast wrapping’:

Neural networks and the stock market

Machine learning and neural networks seem to be all the rage recently. I was sifting through my old drives the other day and found my master thesis. It’s actually vaguely related, subject was Neural Networks for Stock Market Forecasting. Complete thesis can be downloaded here, but comes with 2 major caveats: it’s 100% in Polish (so actual title is Sztuczne sieci neuronowe w prognozowaniu kursów giełdowych) it’s over 15 years old Sadly, I could not find an accompanying MATLAB program I wrote.

Who ate my stack

The Old New Thing is one of my favorite blogs. It’s a collection of Windows development anecdotes, but every now and then Raymond will post a gnarly debugging/crash story. I’ve recently found some of my old notes related to a crash I was chasing in a third-party library, it reminded me a little bit of The Old New Thing and I decided to try something similar. The whole thing took place a few years ago, we were using some super early versions of a certain vendor library (no sources).

Cache effects, illustrated

Few years ago I published an article about love and care for your cache. Every now and then I receive emails asking for clarification or some extra question. It seems like the basic rules are easy enought, but once you add multiple cores to the mix, it might get a little bit confusing. If we only have one core everything is fairly simple. There’s just 1 cache, which is essentially a variant of hash-table with limited number of slots.

A Halloween Story - get_temporary_buffer

A little bit late, as Halloween was yesterday, but I think std::get_temporary_buffer is scary enough to qualify. A co-worker called me today to show me an ‘interesting’ crash. It was deep in guts of the STL, more specifically in the std::_Merge method. Corresponding C++ line seemed innocent: *_Dest++ = _Move(*_First1++); Nothing very interesting, not our code and yet it was crashing with: Exception thrown at 0x01293C05 in foo.exe: 0xC0000005: Access violation reading location 0xFFFFFFFF

Who crashed?

Last week I was investigating a crash originating somewhere in a code that looked like this (GPF in this C++ line): obj->GetFoo()->GetBar().Call(player->GetCat(), this, &MyType::SomeFunc, moreArgs We could discuss the number of indirections or the fact that if this code had been split into multiple lines it’d be obvious, but that’s not the main point here. I didn’t have this luxury, I had to find out which pointer exactly was NULL here.

My check-in procedure

A few weeks ago we had a discussion a forum discussion. Question was - what tips did we have for junior developers (not only programmers, all specialties). My #1 tip was: always diff before checking in your changes! I still stand by it, but thought I’d elaborate a little bit and give a quick overview of how I handle my commits these days. Not a rocket science for sure, but the presentation (and hints) were aimed at people just starting in the industry.

Choices and consequences (part 2)

I’ve been doing some minor refactoring recently and - once again - it got my thinking how seemingly tiny C++ changes can affect generated assembly code (and performance). It was a fairly simple scenario, had a collection of ~60 small items identified by an unique ID. ID was a 32-bit number, but realistically the range seemed to be limited to 0-5000 (although I couldn’t rely on it staying like that forever).

Dynamic initializers strike back

Back in the day I wrote about the ‘dynamic initializers’ problem. Basically, older versions of MSVC (up to 2012, not sure about 2013, seems better in 2015) had problems with static const floats that depended on other static const floats. Values were not calculated compile-time, there was actually a short function generated and it’d do it. The immediate problem is a code bloat (if our constant is placed in a global header), but the other potential issue stems from the fact that these ‘dynamic initializer’ functions respect the optimization settings (/fp:fast).

Wordpress to Hugo

I’ve been planning on converting my blog to some static site generator for a while now, but the (perceived) amount of work involved seemed scary, so I kept coming up with reasons not to do it. I really like the idea of static sites. Using gamedev jargon, this basically means we “precompute” as much as possible. Instead of retrieving posts from the database and building pages on the fly (like WP), we do it all offline and then push static HTML files online.

Elo vs Glicko

If you’ve ever done any matchmaking coding (…or played chess) you’re most likely familiar with the Elo rating system (PSA: it’s Elo, not ELO, it’s not an acronym, it’s named after a person - Arpad Elo). It’s a rating system used by most chess federations (most notably, FIDE) and has been adapted by many video games. It’s trivial to implement and get up and running, but suffers from a few issues:

DD2016 - video

A video from my DD2016 talk has been uploaded and can be viewed here:


I’ve been debugging a rare memory corruption bug recently and - as usual - it turned out to be an exercise in frustration. This time only part of that was because of the bug itself, the other was because methods I chose were not very helpful (in my defense, it’s been a while, so I was a little bit rusty). The bug itself was fairly boring - sometimes, when leaving a certain UI screen game would crash.

Know your assembly (part N)

An entertaining head-scratcher from today. Application has suddenly started crashing at launch, seemingly inside steam_api.dll. It’d be easy to blame external code, but as mentioned - it’s a new thing, so most likely related to our changes. To make things more interesting, it’s only 32-bit build that crashes, 64-bit seems fine. High-level code looks like (simplified): 1struct tester 2{ 3 virtual bool foo(); 4 virtual void bar() 5 { 6 } 7 virtual void lol() 8 { 9 if(!

Instrumenting crash dumps

I’ve been planning to write a post about debugging multiplayer games (post-mortem) for a while now, but it keeps getting bigger and bigger, so instead of waiting until I can get enough time to finish it, I thought it’d be easier to share some quick’n’easy tricks first. I’d like to show a simple way of “instrumenting” a crash dump so that it gives us more info about the crime scene. Let’s assume we’ve received a crash from the following piece of code (it’s actually very close to a real-life scenario I encountered).

NaN memories

A nice thing about Twitter is that single tweet can bring so many memories. It all started when Fiora posted this: This reminds me of how HLSL very carefully defines "saturate" in a way that makes NaNs turn into 0: @rygorous — Fiora (@FioraAeterna) March 11, 2015 It reminded me of an old bug I encountered few years ago when working on a multi-platform (PC/X360/PS3) title. One day we started getting strange bug reports.

MESIng with cache

(Please excuse the terrible pun, couldn’t help myself). As we all know, computer cache is a touchy beast, seemingly little modifications to the code can result in major performance changes. I’ve been playing with performance monitoring counters recently (using Agner Fog’s library I mentioned before). I was mostly interested in testing how cmpxchg instruction behaves under the hood, but wanted to share some other tidbits as well. Let’s assume we’re working with a simple spinlock code.

Rust pathtracer

Last year I briefly described my adventure with writing apathtracer in the Go language. This year, I decided to give Rust a try. It’s almost exact 1:1 port of my Go version, so I’ll spare you the details, without further ado - here’s a short list of observations and comparisons. As previously, please remember this is written from position of someone who didn’t know anything about the language 2 weeks ago and still is a complete newbie (feel free to point out my mistakes!

Hidden danger of the BSF instruction

Had a very interesting debugging session recently. I’ve been investigating some of external crash reports, all I had was a crash dump related to a fairly innocent-looking piece of code. Here’s a minimal snippet demonstrating the problem: struct Tree { void* items[32]; }; #pragma intrinsic(_BitScanForward) __declspec(noinline) void* Grab(Tree* t, unsigned int n) { unsigned int j = 0; _BitScanForward((unsigned long *)&j, n); return t->items[j]; } Easy enough, right? Seemingly, nothing can go wrong here.

Z-Machine interpreter in Go

Recently, I had an inspiring discussion with fellow programmers, we were talking about interesting side projects/programs to quickly “try out” new programming language/job interview tasks. One that’s been mentioned was coding a Z-machine interpreter that’s capable of playing Zork I. The Z-machine is a virtual machine developed by Joel Berez and Marc Blank, used for numerous Infocom text adventure games, most notably the Zork series. In all honesty, I’m probably a few years too young so didn’t get to play Zork when it was big (I did play old Sierra adventures back when you actually had to type commands, though, one of the the reasons I started to learn English was Police Quest I.

A Byte Too Far

A short cautionary tale from today. I’ve been modifying some code and one of the changes I made was to use a type of Lol as a key in a map-like structure (key-pair container, uses < operator for comparisons). Structure itself looked like: 1struct Lol 2{ 3 byte tab[16]; 4 short cat; 5 bool foo; 6}; …and here’s the operator< 1bool Lol::operator<(const Lol& other) const 2{ 3 return(memcmp(this, &other, sizeof(other)) < 0); 4} The problem was - it seemed like sometimes, in seemingly random cases, we’d try to insert an instance of Lol to a container even though exactly the same element was already there.

Going deeper - addendum

There’s been some comments to my previous post wondering about C++ compilers and their capabilities. Normally, I’m all for compiler bashing, in this case I’d probably cut them some slack. It’s easy to optimize when you’re focused on a single piece of code, way more difficult when you have to handle plethora of cases. On top of that, uops handled differently on different CPUs, e.g. in my limited tests Haswell seems to care less.

Going deeper

Few weeks ago I encountered a discussion on a Polish gamedev forum – participants were wondering whether it’s faster to access stack or heap memory. I didn’t pay much attention (there should be no significant difference) until someone had posted a test case. Out of curiosity, I ran it and to my surprise discovered, it was consistently faster for buffers allocated on the stack. We’re not talking about few cycles here and there, too, the difference was almost 20% (Ivy Bridge laptop).

Patching binaries

There may come a time in game programmer’s life when he has to fix a bug in a library he doesn’t have the source code for. It doesn’t happen often, it might never happen, but it’s good to be prepared. If I remember correctly, I had to do it only two times, one was fairly recently. We were getting quite a few crash reports and were assured that fix in the third-party library was coming, but I decided to see if it’s possible to do anything about it in the meantime.

Smartness overload #2

Today’s article is brought to you by a friend of mine. He’s been doing some home experiments and noticed a particular piece of code was running faster in Debug than Release (using Visual C++ compiler). He mentioned this to me, I found this intriguing enough to ask him for this snippet and investigated it yesterday. Turned out it was a classical case of compiler trying to be too smart and shooting itself in the foot.

Delete current instruction macro

I must admit I am not as die hard fan of ProDG debugger as some other coders out there, perhaps I’ve not been using it long enough. One tiny thing I miss though was the possibility of replacing an assembly instruction under the cursor with NOP with a single keystroke. Sure, with Visual Studio you can achieve same result with memory/immediate window, but it’s much more cumbersome. Today I decided to finally bite the bullet and recreate this little feature with VB macro:

Go pathtracer

Recently I’ve been experimenting with the Go programming language a little bit. It’s my second approach actually - I gave it a half-hearted try last year, but gave up pretty quickly (I think it was some petty reason, too, probably K&R braces). This time around I actually managed to stick to it a little bit longer and learn a thing or two. I decided my test application would be a simple pathtracer.

Performance-Monitoring Events

When experimenting with some of the more esoteric features of modern CPUs it’s sometimes not immediately obvious if we’re actually taking advantage of them. Sure, you can compare cycles, but the differences are not always big enough to justify conclusions. Luckily for us, in the Pentium processor Intel introduced a set of performance-monitoring counters. They are model specific (not compatible among different processor families) and allow you to monitor just about every aspect of CPU pipeline.

Forward that store

I know I’ve been bitching about load-hit-store (too) many times before, but it’s been one of the most annoying things we had to deal with at the previous generation consoles. LHS happens when we try to load data from the address that has been recently written to. X360/PPC CPUs were fairly simple, they could neither try to execute some other instruction (no OOE) nor retrieve the data without waiting for it to reach cache.

cmov fun

If you’ve been coding for current (prev?) gen consoles, you know your optimization guidelines - be nice to your cache, avoid LHS and branches, in general - do not stress the pipeline too much. With next (current?) generation moving back to x86, things get a little bit more blurry. With out-of-order/speculative execution, register renaming and advanced branch predictors, it’s sometimes easy to shoot yourself in the foot when trying to be smarter than compiler/CPU.

Parallel 101

With the unveiling of next gen console specifications it’s clear that multi-threaded code is here to stay (I don’t think anyone expected otherwise). If anything, it’ll be even more common, new CPUs run at relatively low frequencies (when compared to modern PCs), so we’ll definitely have to go wide to use their potential. Here’s a quick cheat sheet I’m usually following when trying to move code to a background thread. Please note: move.

MemTracer 64

It’s been a long time coming, but I finally found time to update MemTracer C# to support 64-bit applications (so 64-bit memory pointers + callstack addresses).

Choices & consequences

Spent a little bit time tweaking RDE vector class again. As I already mentioned, the container itself is not terribly fascinating, there are not too many choices here. There’s another battle going on the lower level though, it’s interesting to see how an innocent instruction like_ size()_ can be a cause of a slowdown. Typically, a vector class has 3 properties it needs to keep track of: buffer properties (pointer and size),

The undefined flag

This week I had one of the most interesting debugging sessions in a while. Here’s the minimal code snippet exhibiting the problem. It doesn’t really make much sense in itself, but I tried to remove everything not related to the bug itself (the actual case was much more convoluted): 1struct SVar 2{ 3 SVar() : m_v(23000) {} 4 void operator=(unsigned short t) 5 { 6 if(_rotr16(m_v, 1) != t) 7 { 8 m_v += t * 100; 9 } 10 } 11 unsigned short m_v; 12}; 13static const int MAX_T = 1000; 14struct Lol 15{ 16 __declspec(noinline) void Cat(int t) 17 { 18 unsigned short newT = static_cast<unsigned short>(t < MAX_T ?

x86/x64 MSVC plugins

Recently I had to write a tiny MSVC plugin to help visualizing one of our structures. It’s been a while since I’ve done it last time, so I started Googling for help. The good news is – it’s now much easier to find information/articles (mostly unofficial). The bad news – there are still many dark corners & I ran into a problem that took me a while to figure out.

Coding in a debugger

Recently, I spent some time debugging all kinds of crazy, once-in-a-blue-moon type of bugs, usually MP related and often happening in final/release builds only. The annoying problem with these is they tend to “hide” when you try to repro them and then pop up 5 minutes later when you’re investigating something else. That’s why it’s often crucial to catch every chance you get and try to extract as much information as possible out of every case.

Copy or move

I’ve been optimizing RDE’s vector class a little bit recently. In all honesty, it’s probably the least interesting container, there are only so many ways to do things, so it all boils down to working at the instruction level, trying to eliminate everything that’s not needed. Those few cycles don’t even matter in the grand scheme of things, but it can be an interesting learning experience at times. One of the functions I was interested in was erase.

Pointers to member functions

Pointers to member functions is one of those aspects of C++ that I use every 2 years or so. I know how they work, but I usually need some refreshing on the details (especially the syntax kills me every time, although I might actually remember it now). One of the gotchas with them is - you’re not supposed to cast it to void* (see this question for example). That’s one of those guidelines that I read long time ago, absorbed and never gave a second thought.

A Tale of 3 Characters

Had an interesting little debugging adventure recently. Few weeks ago we started getting QA reports about some of the visual effects looking all weird. I tested it locally, it all seemed fine. QA dug deeper and they discovered it only occured with disc builds (not necessarily running from a DVD, just after installing fully preprocessed build), P4 version was good. The problem with installed builds is they are very close to final, fully optimized, all assets in final version with no debug info etc, so it’s pain in the ass to debug.

Pass by reference

I’ve not been reading too many C++ books recently, but I have a foggy memory of a recommendation they include: do not pass arguments by value, unless it is a built-in type. Justification is given, probably, but it does not seem to stick and people tend to only remember the “pass by value - bad” part. There is plenty of programmers out there, who do it automatically - built-in by value, anything else - by reference.

Attach to process macro

Recently I got fed up with manually attaching to game process every time. I knew there were some macros that would search for given executable and attach automatically, I even found some of them. Sadly, it seems like API changes slightly with every release, so couldn’t get any of them to work with MSVC2010. In the end, I just hacked my own one, attaching it below (MSVC2010, probably won’t work with any other version).

A Decade

I can’t remember the exact date, but I know it was early May or the end of April, 2002. I’ve just realized I had started my gamedev adventure almost exactly 10 years ago. It seemed like a logical step, never really considered any other career choice, it was my dream job and quite natural progression after few years of demo coding. I responded to a job advertisement at the gbadev mailing list (wow, I love the Internet, I actually found this post), flew to Palermo for a weekend, survived my first job interview (even though I was still slightly hung over from the party 7th Sense guys took me the day before…) and before I knew it - I was flying to Italy again, this time for good.

Null references - addendum

My recent article turned out to be quite popular, however it seems that people focus on the problem of undefined behavior rather than optimization itself. Quick clarification then, so that we’re all on the same page. I do not advocate relying on undefined behaviors. They are… undefined (duh). I merely noticed in practice you can have ‘null references’, just as you can have null pointers. I also would be very surprised if there was any compiler that actually generates code testing against null pointer dereference.

Compressing integers

Programming games is an uphill battle. We always try to fight for more, but our resources are limited. There’s never enough RAM, the CPUs are never too powerful and obviously same thing applies to bandwidth - we could always use some more. The eternal problem of mutiplayer game is trying to send as much information as possible using as little memory as possible. One of the weapons in our arsenal is compression.

Null references

One of the most popular questions that fresh C++ programmers ask is about differences between pointers and references and which one to use. One of the differences people cite is “references can never be NULL”. That’s true in theory and according to the standard, but in practice, especially when mixing pointers and references there’s nothing preventing you from doing this:

Resolving PS3 callstacks

Hopefully, your engine includes a crash handler. If it does not, stop reading, add one (here’s simple example) and then continue. In most cases, when something goes wrong, log files contains fully resolved callstack. However, sometimes, especially at the last stage and (semi) final builds, crash reporter might be disabled and now all we have is a list of addresses. Luckily, it’s still possible to resolve those, using Sony’s tools (that’s also what I use in PS3 version of MemTracer).

The Broken Windows Theory

The broken windows theory is a criminological theory first introduced in 1982 article by James Q. Wilson and George L. Kelling. The gist of it is given in this example: “Consider a building with a few broken windows. If the windows are not repaired, the tendency is for vandals to break a few more windows. Eventually, they may even break into the building, and if it’s unoccupied, perhaps become squatters or light fires inside.

Optimizing without fear

As you can probably tell from this blog, I spent fair chunk of my time optimizing code/thinking about optimizations. One of the main problems with optimizing code is making sure we didn’t break anything in the process. It might be tricky, especially when modifying algorithms with many edge cases, but even simple changes are dangerous, probably because people don’t test them so thoroughly. I still feel a little bit nervous upon seeing “Little optimization in XXX” in P4 changelist description (heck, I’ll be the first one to admit, I broke code as recently as last week…).

STL interface

As you are probably aware, game developers have mixed feelings about STL. Sure, it has advantages, it’s already there, doesn’t have to be coded from scratch, it’s standard. On the other hand, the guidelines are very broad, implementations can vary wildly. Performance/memory guarantees seems to be hard to enforce. Even if the interface is always the same, there’s enough wiggle room underneath to seriously break some applications. Take clear() method for example.

Flattening trees

Trees are one of the most popular structures in game development. Your skeletal hierarchy, your attachments – all this can be represented using trees. They lend nicely to short & clean code, especially with recursive functions – you typically perform your operation on a root, then call the same function for all children. Problem is – it’s not always the most effective way of doing things. Consider the most canonic form of tree traversal:

Hidden enemy

Consider the following piece of code: void FindFrames(const S* seq, float d, UINT& outFrameA) { const UINT n= static_cast<UINT>(seq->fd.size()); outFrameA = 0; while(outFrameA + 1 < n && seq->fd[outFrameA + 1] < d) { ++outFrameA; } } Looks innocent, right? Well, it was slow enough to show up quite high in the profiler… It might seem surprising at first, but if you’ve done some console programming in the past, you can probably recognized our old foe - load-hit-store here.


Normally, I try to avoid C++ bashing, so trendy nowadays. It is what it is, but we’ve to use it anyway. Today, however I encountered a funny bug that made me wonder a little bit. Consider the following snippet:

Conditional breakpoints on steroids

Imagine a situation where you’re debugging some problem and need to set a breakpoint in function that’s called very often from many points in your application (like operator[] in your array class for example). Thing is, you’re only interested in one particular codepath, as that’s where things go wrong, calls from other places are fine. If you simply set a breakpoint in line that interests you, it’ll trigger hundreds of times every frame, making it tricky to find the moment that’s interesting for us.

A little bit of fun

This recent tweet: "SomeEnum blah : 8;" Added new enum elements so max value is now >127. GUESS WHAT HAPPENS. — Fabian Giesen (@rygorous) April 1, 2011 from Ryg reminded me of another old story related to bitfields in C++. Consider the following piece of code: typedef char BOOL; ... // Some other file struct SomeStruct { BOOL m_something : 1; BOOL m_isVisible : 1; BOOL m_somethingElse : 1; BOOL m_someOtherStuff : 5; .

Cruncher - Google code project

Remember Cruncher#? That’s my little tool for extracting structure layout information from PDB files. It’s pretty much done and I rarely modify it, but from time to time something irritates me enough to force me to get back to it. Recently I’ve added simple, but useful feature. If you double click member variable, it’ll automatically jump to it’s type definition (if it can find it, it must be another UDT). Example:

vector constructor iterator

As promised, today a little bit about mysterious ‘vector constructor iterator’. Essentially, it’s a function generated by MSVC to initialize arrays of UDTs. Unfortunately, sometimes compiler gets confused and tends to spit out superflous code. Consider the following example, taken almost as-is from certain engine (names hidden/changed to protect innocent):

Attack of dynamic initializers

Recently, I’ve been looking a little bit what’s taking space in our executables and found an interesting ‘feature’ of MSVC. It has troubles with expanding floating-point constants that rely on another constants (…that rely on another constant, yes it requires 2 levels of indirection to break). Consider the following example:

Compile time log2

Been working on some wrappers over data containers allowing to move code to SPU easier recently and found myself in need of a compile-time log2 (for shifts, so integers only). Luckily, it’s pretty straightforward to achieve using some template voodoo and boils down to using fact that log2(x) == log2(x / 2) + 1:

Reflection in C++ – metadata

About a year ago I published a few articles on my experiments with C++ reflection system. It extracts all the type information from PDB files. Nice thing - it’s all automatic and requires none to very little source code changes. One thing I didn’t like was problem with adding metadata to our types/fields. It’s mostly for editing purposes, but wouldn’t it be nice to have at least a help/description string for fields?

TMplayer to MicroDVD converter

One of my Christmas presents (that I bought myself) was WD TV Live Plus media streamer. I left my projector back in Warsaw (sniff) and I was tired of watching stuff on my 15” laptop. WD TV is a pretty neat little machine, it effortlessly plays every video I throw at it. The only problem I had was that it seemed to struggle with TMPlayer subtitle format which is very popular in Poland for some reason.

Quick iteration, insertion, look-up, removal

Wouldn’t it be nice to get all four of those with some container type? Sadly, it seems quite tricky. Vector gives the most efficient iteration and pretty good insertion (at tail), but is more problematic when it comes to look-up & removal. Intrusive linked lists are pretty good here, but pointer chasing may hurt during iteration. Hash map falls somewhere in the middle, but again - iteration will most probably be slower than vector (we have to skip over empty buckets).

MemTracer - episode IV

(I’m seriously running out of subtitle ideas). Some years ago I started to toy with an memory tracing/debugging tool idea. First proof-of-concept implementation has been created in few days and while it worked with a very simple test app, it required major overhaul before integrating with any real-world program. Somewhere in February 2008 I had a new version, that proved itself very useful during my adventure with enhanced version of certain PC RPG (as it later turned out, it was by far the most memory stressing project’).

Declaration matching

I encountered an interesting problem today. Consider the following piece of code: namespace Test { struct Lol {}; struct Cat {}; struct Foo { virtual void Do(Lol&); virtual void Do(Cat&); }; struct Bar : public Foo { virtual void Do(Lol&); void DoSomethingElse() { Cat cat; Do(cat); } }; } Question: what happens when you execute Bar::DoSomethingElse? Answer: nothing, as trying to compile this piece of code will result in the following error: ‘Test::Bar::Do’ : cannot convert parameter 1 from ‘Test::Cat’ to ‘Test::Lol &’.

Some more benchmarks

Gave tr1::unordered_map a try, it seems to do perfectly average, nothing spectacular, nothing awful, except for predict_grow with 4 byte objects, where it’s almost 10x slower than the next implementation (that’s not an anomaly, it happens every time). I also found a bug in RDE where I overoptimized it a little bit - could result in a crash when searching for item that’s just been removed. It slowed fetch by about 3%, however later I found a way to remove some branches and sped it up by ~6% again, so final version is even faster.

EA STL released, updated benchmarks

Big weekend news were EA releasing some of their codebase source, including parts of EASTL. Given that the original EASTL document has been the main inspiration for my RDE STL experiment, I was very interested in finally seeing the code itself. First impression - it’s still big. It is leaner than STL, but it is a big library. That’s quite understandable if you remember it’s not a local, one studio thing, it’s used by the whole organization.

Debugging heap corruptions, part 2

Few years ago I wrote a short article about dealing with heap corruptions (buffer over/underruns). I mentioned that even after applying described techniques, the biggest problem with overruns is that they’re detected when memory block is freed, not when the actual corruption happens. Those two moments can be very far apart from each other, which complicates debugging. I described some workarounds, but they were not really anything I’d call a satisfying solution.

Process Explorer & 64-bit OS

Few years ago I mentioned a nice little program by SysInternals - Process Explorer (task manager replacement). I install it everywhere I go. Recently, I ran into weird problem, however. My work machine runs Windows 7 64-bit. PE works perfectly, the only problem is that I couldn’t make it to replace standard Task Manager. There is a menu option that should do just that, but running it didn’t change nothing.

Clash of the Thread Pools

Recently, I’ve been experimenting a little bit with different kinds of MT-safe containers. I wanted to compare a performance of various kinds of containers I’ve had lieing around. It turned out it’s more tricky than I expected, as under Windows, results vary wildly from one run to another. I’m not even speaking about container performance, just running same task can be many times slower (see this note for example). Include similar problems with managing the threads themselves (sometimes they wake up too late or not at all) and you should get the picture.

Smartness overload – addendum

In my previous note I mentioned that IDs are my favourite form of weak references. By pure coincidence, just recently Noel made one of his Inner Product articles public and it deals with very related subject. As a matter of fact Noel’s implementation of HandleManager has been a starting point for the one I use for my home projects. I use it not only for dealing with resources, but as a general ID->pointer resolver.

Smartness overload

Over the years, I’ve seen plenty of different code bases - open source projects, internal game engines, my own experiments. Some of them were just bad and buggy, but in many situations I found something that could only be described as ‘smartness overload’. An obviously skilled & experienced programmer just tried too hard. There’s a great quote attributed to Brian Kernighan: ‘Debugging is twice as hard as writing the code in the first place.

Optimization 101: ordering conditions

One of the most basic truths about optimizing existing code is: there are no low hanging fruits. Your coworkers are not stupid, it’s not like you can just add some switch or line and code will magically run two times faster (at least not often). ‘Easiest’ way nowadays is probably some form of parallalization, but it’s not always possible. Usually, it’s more a series of mundane tweaks, day after day, shaving few cycles here, few there.

Be nice to your cache

Short list of tips & guidelines that every game developer should keep in mind. No rocket science, common sense, really, but it’s still relatively rare to find codebases that apply to them. It’s especially aimed at gameplay programmers, who operate a little bit further up from the metal. With todays hardware, cache can be your biggest friend or enemy, CPUs got the the point where a few extra calculations hurt much less than a single cache miss (so LUT is not always the best option, for example).

Data breakpoints

Data breakpoints are one of the most helpful debugger features when trying to hunt for memory overwrites/ninja variable modifications. In majority of cases it’s enough to set them up from debugger, however, there are situations when it’s not possible. Sometimes breaking into debugger changes program behavior (I had this problem just yesterday), sometimes we don’t want to catch every variable access, just some of them (as others are legal). In situations like that we need to set data breakpoints from code.

Adding layers

Simple technique that can help in making code more future-proof and easier to modify. Applicable in most situations where two subsystems need to exchange informations. Let’s imagine the following scenario - for some reason we need to introduce ‘unlimited mana’ feature in our game. If hero is in berserker mode, his attributes increase, time slows down, he is able to cast spells even if he’s out of mana and his mana pool doesn’t deplete at all.

Playing with bits

Little challenge posted by a workmate. Good for those 5 minute breaks at work when you need to force your brain to think about something else than your current task. Recode the following piece of code so that it doesn’t use branches/multiplies: unsigned char* oldStart = 0; if (m_start + size < m_bottom) { oldStart = m_start; m_start += size; } return oldStart; Don’t read below if you want to try it yourself.

Are Movies Getting Longer?

Few weeks ago I’ve been watching The Curious Case of Benjamin Button. It is a nice movie, but also painfully long - almost 3 hours. It got me thinking - are movies getting longer & longer? I could have sworn that I have remembered them to be shorter, usually around 1.5h, now it feels like it’s closer to 2h. Obviously, this was only my impression, I had no hard data to back it up.

Remote debugging in Visual Studio

Today I’d like to describe a relatively unknown feature of Visual Studio that can be extremely useful in certain situations - remote debugging. Remember all those situations when application work perfectly on your machine, but crashes on tester’s? ‘Old school’ way of trying to debug this problem was to add a very detailed logging, then analyze it. Fortunately, there is a more convenient method - we can simply connect to his machine and attach debugger to running application.

To assert or not to assert

(or rather: how & when to assert). Assertion is probably one of my favourite programming tools, however there are still few areas that I’m not entirely sure how to solve in an optimal way. First, let’s start with things that I’m rather convinced about, few simple guidelines for my ideal assert macro (funnily enough, during my career I’ve never seen implementation respecting all those rules at once): when it breaks into debugger make sure it stops in the same line assertion actually failed, not two hierarchy levels down (so every time it’s triggered one has to rewind first to see what’s wrong exactly)!

Reflection in C++ - Load In Place

I played a little bit with my Load-In-Place system. It used to only support vectors of PODs (stored by value). Now it handles also vectors of pointers & vectors of classes. Sample structure that’ll be saved/loaded automatically: struct IntContainer { int* pInt; }; struct SuperBar { unsigned long i; // [Hidden] float* p; bool b; signed char s; Color color; SuperBar* psb; typedef rde::vector tVec; tVec v; rde::vector<Color*> someColors; rde::vector<SuperBar*> superBars; rde::vector containers; IntContainer ic; }; [.

Reflection in C++ – part 2

In a previous note I wrote a little bit about my experimental reflection system for C++. Today, I’d like to describe a simple load-in-place mechanism I built on top of that. Basic idea behind LIP is to minimize the overhead of any processing after data is loaded from a file. Ideally, all that needs to be done is call to ‘fread’ to a memory buffer. For more detailed description of such system see Fast File Loading article by Ent/Incognita.

Reflection in C++ - a teaser

Whether we like it or no - C++ is an ancient language. It doesn’t have many features that are considered normal in some more modern languages. One of such missing features is reflection. It also never been lucky enough to get any kind of standard implementation. Over the years programmers, both main-stream & gamedev - have been patching their own systems. Today, I’d like to present yet another approach that I’ve developed.

Cruncher# update

I’ve read this note on Rachel’s blog and found out there’s tool similar to Cruncher, just for ELF, named pahole. It has a nifty feature of showing cacheline boundaries. Had a little bit of spare time this weekend, so decided to add it to Cruncher# as well. By default, it assumes that object’s start offset corresponds with cacheline start, but you can set ‘prefetch’ start offset from context RMB menu.

Anatomy of Duff's Device

Duff’s Device is one of the most brilliant exploits of C syntax. It’s used to unroll loops and save some cycles spent on loop ‘maintenance’. Let’s take a look at typical fill_n function: template RDE_FORCEINLINE void fill_n(T* first, size_t n, const T& val) { for (size_t i = 0; i < n; ++i) first[i] = val; } Now, assuming that element is not too expensive to copy, execution time of this function can be heavily influenced just by loop ‘maintenance’ I mentioned (counter modifications + jumps).

Vector swizzling in C++

Everyone who’s done at least some vertex/pixel shader/HLSL programming has probably encountered mechanism called “swizzling”. It’s an operation where we create new vector using arbitrarily selected components of another vector (also a little bit similiar to SSE shuffling). Code snippet is worth 100 words, so some examples: a = b.zyzx; // a.x = b.z, a.y = b.y, a.z = b.z, a.w = b.x a = b.wy; // a.x = b.

Coders at work - observations

Recently, I’ve been reading “Coders At Work” book when commuting to the office. It only takes me 15 minutes, so it’s going rather slowly. It’s a collection of interviews with some of the most famous programmers, really interesting. I’ve only read about 200 pages so far (5 interviews), but already found two intriguing points: almost everyone so far insisted on using printf for debugging. Yeah, they use some GDB, but not much.

Game Engine Architecture – review

Took me a while to dig through this one, book is over 800 pages. It has been written by Jason Gregory, who’s currently working as programmer at Naughty Dog. Jason has years of practical experience in the gamedev (Midway, EA, ND) and it really shows. There are too many books out there written by people who have never shipped anything bigger. This is not the case. When he writes how to do/don’t something, he usually backs it up with a real-world scenario.

Optimization is a full time job!

From time to time, when I have a spare moment, I like to run the game under profiler and see what’s changed since the last test. I do it every 3-4 weeks, roughly, and the basic conclusion is that code is like food - it has an expiration date, later it rots. It doesn’t really matter that just 2 weeks ago you eliminated 1000 cache misses, now they’re back, in a different place perhaps, but total number of misses per frame is almost the same again.

Crunching bytes. Harder.

Been doing some size/layout optimizations again, recently. Cruncher# proves very useful for tracking problems, however it’s not that great when I want to quickly examine the progress. I have to compile/link application again (takes time, especially if modified structure is included from many files) and reload PDB (in case of bigger projects - takes some more time). Luckily, we can force the compiler to do the work for us. It’s possible to configure MSVC to warn about all padding it adds after structure fields.

Ordering randomness

One of the things that game developers hate the most are ‘random bugs’. I’d love to get an euro for every time I got a bug report and when asked how could I reproduce it, received: ‘I don’t know, it’s random’. Thing is, it simply is not true in majority of cases. There are no truly random bugs, they are caused by certain sequence of events. Sure, sequence may be incredibly rare, very unlikely, but usually - it’s not random.

Random links, 31/08/2009

Some interesting links: Recast & Detour libraries. Recast is library that automatically generates navigation meshes from level geometry, Detour is a pathfinding library. Both created by Mikko Mononen. Demosceners may remember his Demopajaa tool (demo authoring tool) or Moppi Productions demos. Pierre Terdiman uses Recast in his Konoko Payne project, you can read about his experiences at his blog. Konoko Payne is a one-man project, Oni-meets-Max Payne game.

Siggraph 2009

You know the drill, will keep adding links here: Beyond Programmable Shading 2009, NVIDIA papers, SG 2009 paper list by Ke-Sen Huang, (both links courtesy of folks), Light Propagation Volumes in CryEngine 3 (A. Kaplanyan/Crytek), Inferred Lighting: Fast dynamic lighting and shadows for opaque and transluscent objects (S. Kircher, A. Lawrance/Volition Inc.) Old comments Link Mini-Dump 08/17Jeremy Shopf 2013-08-06 02:25:30 […] Maciej Siniloâ?

(Source) code scalability

Today I’d like to write about source code scalability in its most literal form, as in true source code scalability, ie - how much one would have to type if he’d want to extend existing codebase. It’s an often forgotten, bastard sibling of ‘true’ scalability, but an interesting topic nonetheless. There are certain constructions that don’t really lend to modifications that well. I’d like to list some of them here’

Cruncher# - update

Just little fix (see original note comments for details) and one column more - padding/size ratio (lets me to find what structures waste most space, in relative terms). Same download URL as before. Old comments sconosciuta 2010-04-01 03:24:27 […] Place. Gregory on Data breakpoints. L'assert, questa sconosciuta on Chromed editor …Cruncher# update | .mischief.mayhem.soap.Just little fix (see original note comments for details) and one column more - padding/size ratio […]

Random links, 02/07/2009

Dropbox - wanted to make some weekend experiments, so finally gave it a try and it really is as good as people say. Sharing files between different computers without hassle. It can either work completely in the background or you can treat it as a file server. Great piece of software. Paris Game AI Conference 2009 - report, highlights, presentations. Worth checking out even if you’re not AI programmer.

Crunching bytes

As you’re probably aware - there ain’t such thing as ‘too much memory’. It doesn’t matter how much there is, it’s never enough. Even if we fit, it still pays off to use the smallest structures possible, as they’re more cache friendly. The more we can squeeze them (without wasting too much time for depacking of course) - the better. Recently, I’ve been optimizing some X360 code and one of main problems (as usually with consoles) were related to cache-usage.

Lazy calculations considered harmful?

Actually, the trap I’m going to write about is a little more subtle and doesn’t apply only to lazy calculations, but it’s one of the most typical scenarios. Consider the following situation: 1float Rectangle::GetArea() const 2{ 3 if (m_isAreaDirty) 4 CalcArea(); // Calculates area (width * height), clears isAreaDirty flag 5 return m_area; 6} That’s rather typical (albeit very simplified, of course) example of performing calculations only when they’re really needed.

Virtual functions - an experiment

Some months ago I’ve read this article by Elan Ruskin, when he measures the overhead of virtual functions. There’s also a follow-up with test code, make sure to check it out as well. I’ve decided to extend the test application a little bit and add one more solution - direct function pointers. Instead of having global per-class vtable, we can store function pointers in each object, thus removing one level of indirection.


This word is usually enough to send shivers down the spine of most developers. This time, however, it’s something nicer. Check out the session program for Siggraph 2009 (bottom of the page). If I’m not mistaken it’s the first case that Polish gamedev is represented here, so it’s a kind of milestone for our small industry. My mate - Michal (we spent 5+ years at CD Projekt Red) - wrote paper together with Peter-Pike Sloan (yes, this PPS) and he’ll be presenting it.

Random links, 05/04/2009

“Almost perfect…” - story of Word Perfect (competitor of MS Word, for young readers), interesting read, Two-part blog entry from Jeff Vogel, if you’ve ever wondered on indie RPG sales - this one’s for you (Jeff’s owner of Spiderweb Software, probably the most famous indie RPG developer), Superannuation - great source of information about top-secret/canned projects. Usually data-mined from resumes, but not only. People are really careless sometimes…


GDC2009 has started, first publications should be available soon. Preparing for them, I wanted to organize my documents collection. “Standard”, filesystem organization isn’t perfect here. I usually try to categorize papers by conference or author, but it quickly gets messy. Say, I’ve paper from Siggraph2k8 by Bungie programmer. Should it go to Siggraph or Bungie directory? End result is that it always takes me too much time to find the publication I need.

Breakpoints in system libraries

I’ll just put it here, because I’m fed up with forgetting it and experimenting with syntax every time I need it. There are times when you need to put breakpoint in system function, like malloc or OutputDebugString. For example, I use the latter one sometimes, in order to intercept DirectX warnings (couldn’t find a way to make it break on warning, not only on error). Syntax for breakpoint window is (function): {,,kernel32.

Go with the flow

If you’re interested in lock-free/multicore programming, make sure to visit Charles Bloom blog and read his Low Level Threading series. Table of contents is a good start. You’ll find plenty of valuable links there, always good to have them gathered in one place (if you’re not familiar with the topic, reserve two weeks, reading all this stuff may take some time). On top of that, some Relacy-tested code (I tend to agree that code that has not been extensively tested with some automatic suite, cannot be considered safe) and many good insights.

Aligning arrays

When developing stack-based containers for RDESTL I’ve encountered the following problem - how to get block of uninitialized memory that’s aligned properly for type T. Consider fixed_vector class: template class fixed_array { ... char m_data[N * sizeof(T)]; Size is OK (we need N elements of type T), sadly alignment is invalid here. It may not be a problem for majority of cases, but try storing _m128s… Even when using 32-bit variables, they should be aligned on natural boundary (4 bytess) in order to rely on writes/reads being atomic.

The Law of Demeter

I’ve been refactoring big pieces of legacy code recently and my work would be much easier if authors would have followed the Law of Demeter. It’s not even LoD per-se, it’s mutated version, because I’m not talking about calling methods, rather accessing fields. The general guideline is – where possible, function should only operate on a minimal required subset of data. Consider the following code snippet: float GetDistanceFromCamera(const Object& obj, const Camera& camera) { return (obj.

Single producer/single consumer bounded queue

Another little snippet. I’m a big fan of bounded containers for multithreaded environment. Sure, they’re not as flexible, but at least you don’t have to worry so much about memory allocation. Most of unbounded containers rely on some kind of list, so nodes have to be managed separately, which complicates the code and actually introduces hidden locks (unless you use your own multithreaded allocator). For specific types of applications (like games) you usually know roughly what’s maximum capacity is needed, so why shouldn’t we take advantage of this knowledge.

Crash handler/reporter (Win32)

Usually, I try to write about less common programming issues here, this time it may be something less “flashy”, but very useful nonetheless. To be honest, if I had to choose single most crucial feature I coded for The Witcher it would be this crash reporter/handler. It took me few hours of home-coding, but has proven invaluable in later years of development process. If you don’t have similar system in place yet and you develop for Win32 – just stop doing whatever you’re doing and implement it now, you’ll thank me later.

Lock-free double ended queue (bounded)

Just a small snippet – as a title says - lock-free double ended, bounded queue in C++. Original version can be be found in Herlihy’s book, but it’s in Java. It’s not very general, it has been created specifically to be used in my work-stealing thread pool. Basically, each thread adds tasks to the bottom of it’s own pool and tries to steal from top of other threads’ pools when low on tasks.

Know your assembly, part 3

I’ve been playing with radix sort for RDESTL recently and MSVC optimizer surprised me one more time. First of all, it seems to generate much better code if my histogram array is local to function and not member (in the first case it makes better use of registers and keep less stuff in temporary variables). It costs us 4k of stack, but I can live with that. Other than that, generated code is pretty smart, sometimes almost too smart.

Google interview process

What’s the best way to drive traffic to your blog these days? … Write an entry about your Google job interview. Recently I’ve found yet another one of those. This one is actually well written and quite interesting, so go read it first. Today, I wanted to focus on another issue, though – if you read comments for this article on Reddit, 90% of them seem to be from people complaining about the number of interviews he had to take.

Chromed editor

As you probably have heard - some months ago Google released its own browser - Chrome. One of it’s distinguishing features is that every tab is a separate process, so if one tab crashes, it won’t bring whole application down. I’ve read in different places about tries of applying this scheme for game editors. Imagine game (renderer) and editor being two separate processes. Should one of them crash, the other is still running, so that we can at least save our work.

Poor man’s Thread Profiler

There’s an interesting observation in Eric Steven Raymond’s essay - ‘The Cathedral and the Bazaar’ (I’ve heard about it thanks to the reference in ‘Dreaming in Code’). Eric notices: “Every good work of software starts by scratching a developer’s personal itch’. It seems to me like my problem is I’ve too many itches. When I was a young bedroom coder I’ve been basically scratching constantly. Every time I’ve learned about some new technique, I couldn’t resist and just had to try it in my code.

MemTracer on steroids

If you’re familiar with MemTracer project you may be interested to know there’s a version that can be actually hooked to external application without much hassle. It has been created by Jean-Silvestre Zirani, visit project page for details. Jean contacted me some time ago, we had nice chats about issues with debugging memory related problems, I sent him latest sources and he decided to move MT to a more usable state.

RDESTL - changes

Seems like I’m mainly pimping RDESTL recently, but that’s all I’ve time/power to work on when I’m back at hotel. Quick update: latest hash_map included in Attractive Chaos benchmark (+bug fixes). It did quite well. One problem is it consumes memory like crazy, but benchmark works on a really huge set. Anyhow, memory/performance ratio can be controlled via load factor parameter. I’ve added simple stack class (built on top of other generic container, like in STL).

Warnings considered useful

As stated before, I’m a big fan of oversensitive warnings and try to compile my projects with the highest warning level (/W4) under MSVC. However, as it turns out, even at this level there are still warnings that are not enabled. For a complete list of warnings that are off by default, see here. The easiest way to enable them all is to use /Wall switch, but it’s too much even for me (mainly because of C4820, feel free to give it a try, though).

RDESTL: hash_map, another take

I’ve been reading lots of interesting stuff on hash tables recently and was trying to improve my RDESTL implementation of hash_map. At one point I had 5 different versions, but finally managed to create something that’s good enough for most applications. Granted, I exploit the fact that performance’s the most important factor, so I can live with some limitations that wouldnt fly in any more general library. I effectively “ban” two hash values (0xFFFFFFFE & 0xFFFFFFFF) for example and use it for my own purposes.

RDESTL - fixed_list

I didnt have much time to work on RDESTL recently, but I try to commit some new stuff every now and then (usually bug fixes). One rather interesting recent addition is a fixed_list class. Basically, it’s a rough equivalent of already existing fixed_vector. It never allocates any memory and works only withing given buffer. Let’s try to create linked list of integers, that can hold up to 64 elements: rde::fixed_list<int, 64> lst; lst.

Less painful prototyping in C++

Just a bunch of useful links/ideas that may help you if you need to make a quick prototype in C++ one day. I know, it may sound stupid, but sometimes it really is a good idea, especially if you’ve the engine+ compatible assets ready. Digression: of course, initally, you may try to be a smart ass and have this idea of making your prototype in C#/XNA. After all, it doesnt have to be very fast and it’s 100x more friendly with better turn-around times and so on.

Gamefest 2008 papers

XNA Gamefest 2008 papers are finally online. Get them while they’re hot. (this year with recordings, too!) Old comments Riddlemaster 2008-09-03 16:36:00 This time they are really nice. Of course much is about XNA (bleh) but the rest is very interesting. However, I’m a bit anxious that the way of DirectX development they chose is not the very best one… It seems they lack fresh ideas. Yeah, they introduce many new things but the core is still unchanged.

Distributed Processing

This year our company introduced a nice tradition - lucky folks who went to GDC referred talks they had attended to interested people. One of workmates presented Bungie’s conference - ‘Life on the Bungie Farm: Fun Things to Do with 180 Servers’. Basically, it’s an interesting talk about Bungie’s distributed build system (deals with code builds, lightmap calculations, resource packaging and some other tasks). It’s not a new concept of course, one popular commercial example of similar environment is XGE.

Real-Time Collision Detection review

I always try to take some books with me when flying for vacation, this year I finally managed to read Real-Time Collision Detection. I’ve never been a “true” physics programmer, I am not one now and I doubt I ever will, so I thought it’s not a book for me. After few pages I already knew I had been wrong. Turns out it’s book that every game programmer should read.

“My” Visual Studio color scheme

If you read comments to my CRC32 article, you’ll notice that the most important thing there is MSVC color scheme appearently. Some hate it, some like it, it seems to be Paris Hilton of color schemes. I also got several mails asking about my settings, so instead of replying in the comments I’ll just post here. It’s not really ‘my’ scheme, original version has been made by Oren Ellenbogen. I made some minor modifications, but they’re really miniscule, so I link to Oren’s version instead.

Simple instrumenting profiler

For every bigger project, sooner (if you’re wise) or later (if you’re, errr, not that wise) comes a moment when you need to answer yourself one very important question: ‘why the hell is it so friggin’ slow?!?’. Most common methods for determining bottlenecks in games/demos evolved with time. Back in eighties/early nineties, when we were young, restless and beautiful, the most common method was counting scanlines. Those were the times when we were not really interested in finding out how many frames per second we’re running.


First, there was an essay by Joel Spolsky. Joel claims that #1 cardinal criteria your potential employee should satisfy is: Smart, and Gets Things Done. It became a popular saying, then Joel wrote a whole book on this topic. Finally, some weeks ago Steve Yegge wrote some more about it (and paraphrased to Done, and Gets Things Smart). Now, if you’re familiar with Steve’s blog, you should realize that “some more” is euphemism, basically, it’s another of those 10 page articles (although, as usually, it’s really interesting and personally, I enjoy his writing style).

Hashing made (even more) useful

One of the most basic issues in the game development is: how do we identify “things”. Thing can be, well, anything - logical object, resource, level, entity, animation etc. The most straightforward way is of course to name them and identify with strings (tags). So, you have “texture_leaves0”, “badguy_01” and so on. This solution has two advantages - it’s dead simple and contains bonus useful info. Unfortunately, it has also lots of problems.

The Final Warning

Recently I’ve been playing with SQLite for my new experiment (more about it later, when I’m done). It’s one sweet little library, integrating it and coding a quick C++ wrapper took me maybe 1 hour. As authors claim themselves - it just works. I’ve only one tiny complaint, though. As I mentioned before, all my home projects are compiled with maximum warning level and warnings treated as errors. Sadly, this seems to break roughly 50% of external libraries, including SQLite (trying to compile sqlite3.

Stupid C++ trick(s)

(idea shamelessly stolen from Jim Tilander and Power of Two Games. Not fullblown post, just random notes from time to time, as I learn/think about it) how to easily make sure all our header are self-contained? Include header as the first one in corresponding .cpp file. For example (ThreadPool class, ThreadPool.cpp file): #include "thread/ThreadPool.h" #include "thread/LockFreeMemList.h" #include "thread/LockFreeQueue.h" #include "thread/Task.h" #include "thread/TaskGroup.h"If my coding standard could only have single rule - it would be this one.

ITP – addendum

As promised, I performed some tests of my thread pool on quad-core machine. Difference between lock-based and lock-free version is bigger (than on dual core), but definitelly it’s not as significant as one could guess looking at results in Intel Thread Profiler (higher concurrency, smaller contention). The most important speed-up can be gained by letting main thread help workers when it’s waiting anyway (50 fps ‘> 61 fps), but this one is independent from other modifications and can be applied to lock-based version as well.

Intel Thread Profiler

Few days ago I decided to have closer look at my multi-threaded experiments, mainly to check if there really is any kind of difference between code based on locks and lock-free version. Downloaded evaluation version of Intel Thread Profiler and gave it a try. Especially for this experiment, I modified scenario a little bit. Mandelbrot is not the best test as tasks take relatively long time (with selected number of iterations one ‘frame’ would take over 2 seconds).

Random thoughts (#1)

Just some random notes, that are too short for their own entry, but still rather interesting: Stack Overflow - new initiative by perhaps two biggest celebrities in the programming blogging community - Jeff Attwood and Joel Spolsky (even if he insists he’s not blogging, just writing essays). In terms of marketing it’s blogging equivalent of Marlon Brando and Jack Nicholson doing movie together. It’s also my first serious contact with podcasts.

Condition Variables under Win32

If you’ve been programming POSIX threads (or Java, C#, D…), chances are you’ve used condition variables. It’s also probable that after that you switched to Win32 and found nasty surprise - no “native” condition variables here. Of course, they can be implemented with help of other synchronization primitives, but it’s not for the faint-hearted (I’ve to read this article in small portions, like 10% at a time, otherwise my head starts exploding).

Generating subsets

I’ve been reading “The Algorithm Design Manual” (Steven S. Skiena) recently and I have to share with you one of the chapters, just because it’s too cool. In “The Witcher” we have the following problem: we’ve plenty of attack animations, most of them dynamic (ie, the character is moving forward) and not so many damage animations. Most of the attack consist of multiple hits. The problem is with synchronizing damage and attack animations (so that our opponent moves back roughly at same pace and similar distance as we move forward, playing his damage animations at the same time).

Lock free hell

For the last days (weeks) apart from playing GTA IV (mainly) I’ve been improving my thread pool class. I decided to implement system similar to the one described by Ian Lewis from Microsoft in “Getting More from Multicore” from GDC2008. Basically, it’s an extension of my system, it gives greater control over inter-task dependencies. Task cannot be started until all tasks it depends on are finished. In theory it’s rather easy, I just provide extra scheduler thread that deals with incoming tasks, if any of them is “ready” it’s put in a priority queue (priority in the most simple case is number of dependents), so that it can be processed by worker threads.

RDESTL - hash_map with linear reprobing

There are several approaches to implementing a hash map. Main difference is in the way we handle collisions between entries falling the the same bucket. Probably the most popular and elegant is “chaining”, where for every slot we have a linked list of entries. In case collision we just enqueue entry in list for correct bucket. Variation of this method is used for example in MSVC’s library. I implemented it in the first version for RDESTL as well, just because it was the most straightforward.

Google Tech Talks

I think I’ve already posted links to some videos from Google Tech Talks, but I’ve been catching up recently and found some other interesting movies. They all have a nice, distinct geek spirit… Where else can you find 50 minute presentation about the beauty of quicksort implemention? Three Beautiful Quicksorts (Jon Bentley) - who said sorting is boring? How To Design A Good API And Why It Matters (Joshua Blooch),

Number of array elements - addendum

OK, so how does this snippet work? There’s a function that takes array of N elements as an argument. It returns an array of N chars. Assuming sizeof(char) == 1, size of the return type for this function is N. There’s no function body, because it’s not needed, function is never called. On the other note, we’ve been hit by the Reddit Effect. Of course, in this case it wasnt a problem of breaking the server, but it screwed all my stats.

Number of array elements

(or one more reason to love C++). Problem: how to find out number of elements in a C++ array? The most popular form is probably: #define RDE_COUNT_OF(arr) (sizeof(arr)/sizeof(arr[0])) ... Foo myTab[10]; size_t numElems = RDE_COUNT_OF(myTab); assert(numElems == 10); It’s being widely used, however most people do not realize the potential danger with this small piece of code. Imagine that one day we decide to change our static array to dynamically allocated block of memory: Foo* myTab = new Foo[numNeededFoos]; size_t numElems = RDE_COUNT_OF(myTab); // Huh?

Lock-free, multi-threaded Mandelbrot

(yeah, the title is stupid) After I got a little bored with memory experiment I moved on to something more interesting - lock free containers and multi-threaded tasks/jobs. This is all rather new to me, so it seemed like an interesting stuff. What we have here is simple system for distributing independent tasks on multiple threads. Basic system was inspired by Industrial Light&Magic’s code from OpenEXR. Main difference is that attached snippet is basically lock-free (it only locks when initializing threads at the very beginning).

MemTracer strikes back

You may remember my proof-of-concept experiment - MemTracer. It started as a hobby project, but recently I had chance to test it in a real world application. It quickly turned out there’s a big difference between my simple test program and full blown game performing hundreds of memory operations per second. Below you can find list of conclusions I drew from this experience: realtime analysis is next to useless.

Know your assembly, part 2

It’s interesting to examine code generated by C++ compilers, if only for nostalgia value. It turns out that compiler (well, MSVC) is pretty smart usually and employs same tricks we used when squeezing cycles out of software renderers years ago. For example it’ll almost always use _xor reg, reg _instead of mov reg, 0. Eons ago, on 8086 it was one cycle faster IIRC (sadly, cant find great Agner Fog’s instruction sheet anymore).

Know your assembly

Consider this simple code snippet, where we call copy constructor on an uninitialized piece of memory:void CopyConstruct0(int* mem, int orig) { new (mem) int(orig); } I didnt really bother with creating special version of this routine for fundamental types, after all this version: void CopyConstruct1(int* mem, int orig) { *mem = orig; } should result in same code as the one above, right? WRONG. In the first case compiler (MSVC) tests memory against null first.

Debugging heap corruptions

If I would have to choose family of bugs I hate the most (I would have to be under the gun obviously, as it’s a little bit like choosing your favourite illness) I would be inclined to go with heap corruptions. They’re silent, random, hard to reproduce and track down. Today I’ll describe my small collection of simple tips that helped me to stay sane during debugging sessions. First of all, let’s list the most obvious heap problems:

Debugging file leaks - the easy way

Ever had problems with opening file that surely is present at given path? It most probably is the case of “too many open handles” problem. First of all, make sure it really is the reason, it should be easy enough with our new friend – pseudoregisters, just examine error code. The most common reason for this are file leaks (file is opened, but never closed). Files are “normal” Windows resources so you can use tools like htrace for analysis.

Pseudoregisters in MSVC - spreading the word

Very interesting article at CodeProject. Shame to admit, but I’ve been working with Visual Studio for years and it’s new to me. Good way to track down MSVC hardcore freaks in your company – see who knows about it. Old comments ed 2008-02-18 22:19:16 looks like undocumented feature to me :) fyi: recently I’ve also discovered a nice feature in debugger called “automatically expanding data”. You can edit file “autoexp.

More new/delete overriding fun.

Consider the following code snippet: struct Foo { void* operator new(size_t bytes); void operator delete(void* ptr); }; struct Bar : public Foo { void* operator new(size_t bytes); void operator delete(void* ptr); }; [...] Foo* b = new Bar(); delete b; Can you see a problem here? Yes, there’s no virtual destructor. It will not manifest itself (at least under Visual Studio) in an obvious way if you do not do anything in derived class’ destructor.

Advanced Windows Debugging - review

I bought Advanced Windows Debugging book last week. Shortest review would be: if you’re serious about developing big applications for Windows platform - just get it. There are chapters that were of less interest for me (yet?) - security or interprocess communication, but everyone will find chapters that justify every price. I found parts about heap/stack corruption, resource leaks and Vista especially valuable (visit - I think there’s a sample available for download).

Protothreads in pure C

Now that’s some smart code. K& R would be proud.

Walking the stack - one more way

Recently I’ve been trying to move MemTracer from experiment category to something that’s actually usable in real world scenario. It requires more work than I expected, but it’s slowly moving forward. One of the first problems I encountered was StackWalk64 function. It works nicely, but when called very often it can cause noticeable slowdowns. Walking the stack can be done manually by ESP/EBP traversing, but on Windows XP/Vista platforms it’s actually easier to use undocumented RtlCaptureStackBackTrace function.

Cheating for fun and profit

Yesterday I had one of my weirdest debugging sessions in the last few years. Background: for some time we’ve been having problems with a third party library that comes without source code, only .lib/headers (no names given to protect innocent). It seems like not all objects were deleted after deinitialization. After basic investigation it turned out to be some kind of problem with reference counting, objects couldnt be removed because they were still referenced.

Spot a bug

I’ve found an interesting bug in a very old code today. Consider the following code snippet: 1struct foo 2{ 3 void* operator new(size_t t); 4 void operator delete(void* p); 5 int i; 6}; 7[...] 8{ 9 foo* f = new foo(); 10 ::delete(f); 11} Can you spot a bug here? Read rest of the article for an answer. It all boils down to those two ‘:’ chars. It forces compiler to call global operator delete instead of this defined for class foo.

RDESTL - string class

Copy-on-write behavior of the string class is implemented as policy now as it may or may not be desired. I’ll add “standard” storage policy as well, so it can be easily switched between. Old comments sconosciuta 2010-04-01 03:24:27 […] new products and limited editions of international brands as well as independent ones. We…RDESTL string class | .mischief.mayhem.soap.Copy-on-write behavior of the string class is implemented as policy now as it may or may not be […]


Instead of uploading complete code package every time, I set up SVN repository for RDESTL. Click here for the latest version of the code. Progress will be a little slower now, that I’m back at work, but I managed to add first, very basic version of COW string class. Old comments admin 2010-05-15 20:00:29 Pavel: Personally, never used it as a whole package and sole container/algorithm library, so more of a playground.

MSVC curiosities

While developing/optimizing RDESTL I found two funny curiosities in Microsoft Visual C++ 2008 (Express). All my home projects are compiled with maximum warning level and warnings treated as errors. One of the additional warnings in this setting are unused variables. MSVC is a little bit oversensitive, though, because it doesnt treat calling a destructor as ‘using’. Example: template<typename T> void destruct(T* mem, int_to_type) { mem->~T(); } Compiler will complain about ‘mem’ not being used.


I’ve some days off after finishing the project, flying to warm countries for holidays soon, but in the meantime I started reinventing the wheel for n-th time. I was inspired by EASTL (Electronic Arts’ version of STL) and tried to create small subset of STL, aimed at game development. I started with stuff that I knew would be needed, I dont plan to add multiset equivalent soon :). Main goals were:

MemTracer (part 2)

Seems like I should add some more info about previous note. There are two applications: memtest.exe - C++, it acts as a server, starts and waits for connection. As soon as it’s established it starts sending info about memory operations. MemTracer - C#, client application. It connects to the server and perform all the analysis of memory allocations/frees. In real world scenario memtest would be our game, role of MemTracer stays the same as it is.


As we’re more or less done with The Witcher, I’ve some time (and energy) to do some hobby coding again. Recently I’ve been toying with a new approach to memory allocations tracking. “Traditional” way is to overload new/delete operators, so that they take additional arguments (file+line) then use some preprocessor magic to call them with __FILE__ + __LINE__. Best known example of this approach is probably Paul Nettle’s Memory Manager.


Tabview must be a program with one of the best size:usefulness ratios out there. I’ve been searching for a way to examine our profiling data and it’s just perfect.

Interesting links

Random collection of links I’ve found recently: Competing on the Basis of Speed - very inspiring talk by Mary Poppendieck, it made me order their (her and Tom’s) book (“Lean Software Development”). Scrum and XP from the Trenches - the best article on Scrum I’ve seen. It’s very practical, down to earth and all the examples are based on a real project, not some ideal, simplified scenario. Read it now.

VS font/color schemes

I found this very nice font/color settings file for VS.NET at Jeff Atwood’s page. The only thing I modified was identifier color, which I made a little darker, other than that it was perfect. Give it a try if you’re bored with default settings and too lazy to create your own one from the scratch.

About std::for_each

Few weeks ago I promised to whine some more about std::for_each. _In many C++ books and articles it’s presented as an ultimate loop solution. Typical example is some _for _loop, which is later replaced with for_each form. Funny thing is, even there, in many cases the latter piece of code is 2-3 times longer than the original. Is it worth the hassle? Typical example, taken from here (this is randomly chosen article, no hard feelings):

Improve your build process in 5 days

5 days ago our build system was rather traditional, not to say old school. We had a one-click full build, but it was executed manually. Usually about 1 build per day, sometimes more often, sometimes more seldom, it depended on the number/importance of changes. I’ve been thinking about improving it for long time, but never really could force myself to actually try it. Finally, about one week ago I decided it’s really high time to give it a go.

A Critical View of C++ Practices - very interesting paper about some modern C++ practices. I really enjoyed reading it, because I could find relations to my own “coding biography” there.When I first found out about all those nice design patterns , I started to implement them everywhere I could (if all you have is hammer…). I replaced all my global variables with Singletons and so on. Now, after few years I’m almost at the same point where I started.


How can you not love this group? Where else can you see discussion about the perfect ‘for’ loop taking 240+ posts and counting?