A debugger barrier

I’ve recently been asked by a friend for a little help with debugging a problem he was running into. Occasionally the program would freeze while trying to process a chunk of data and never moved on to the next one. Application is heavily threaded and processing is done by thread B, while thread A does its own job and periodically checks if work has been finished. If so, it sends it for further transformations and queues more work for thread B. It seemed like in some rare cases code would not recognize that data has been fully processed and keep spinning trying to finish it. To be more precise, app was working on two buffers at the same time and chunk was considered “done” if they were both fully processed. First thing we noticed was that when the problem occurred, buffer X was fully done, but buffer Y was typically short by a few bytes… Since Y depended on data from X, but there was none, it’d never finish. First thing that needed to be done was to actually detect that case and break into the debugger. That part of simple, if X is fully done, so should be Y. We added an assertion and eventually hit it, code was more or less

if(processor.IsDone())
{
    processor.FinishBatch(&offsetX, &offsetY);
    if(offsetX == endX)
    {
        assert(offsetY == endY);
    }
}

Offsets are simply copied from processor member vars (but they’re set on another thread as the data is being consumed). Now, an interesting thing we noticed is that even though offsetY was incorrect, the corresponding member variable in the processor instance was actually fine (these 2 should be the same!). Behavior like this is a giveaway of a certain family of bugs… it typically points to a memory race/barrier issue. Code looked safe, though, it was all guarded on a isDone flag, but barrier seemed in place:

...
// "publish"
MEMORY_WRITE_BARRIER; // macro
isDone = true

// use
// (inside Processor::FinishBatch)
if(isDone)
{
    MEMORY_READ_BARRIER;
    outOffsetX = mOffsetX;
    outOffsetY = mOffsetY;
}

As you can see we have both write barrier (that earlier writes can’t cross) and read barrier (that later reads can’t cross). On a hunch I checked both macros, though and it turned out they were not “strong” enough. These were only compiler barrier, ie. they’d guarantee that compiler would not move reads/writes around… However, this particular CPU was a weakly-ordered one which means CPU was still allowed to move writes/reads past the barrier. As a matter of fact, as far as the processor was concerned, there was none, we confirmed that looking at the assembly, there was no fence instruction. What does it mean? Well, it’s very possible that even though our flag (isDone) has been set to true, values in mOffsetX/mOffsetY might not be 100% up-to-date.

Now, the question is, how do we confirm our theory is correct. It looks fine on paper, but it always is a bit of a struggle to convince people it’s a problem because (a) it happens so rarely and (b) is a bit bizarre to think about, especially if you’re used to x86 which offers fairly strong guarantees. Sure we could add proper barriers, run the code for a while and if it doesn’t happen in N tries, it’s fixed, right? Probably… I tried to think of a way to “prove” it in a more convincing fashion and eventually I decided to try and use debugger as a memory barrier. Now, officially it is not one, but my reasoning was surely it should be intrusive enough to flush any outstanding memory operations (as shown, to some extent by our original assertion, when the value of mOffsetY in Processor was different than outOffsetY). How can we leverage that? We added another assertion, this time inside the FinishBatch itself (I had to temporarily pass some extra info).

// Processor::FinishBatch
if(isDone)
{
    MEMORY_READ_BARRIER;
    if(mOffsetX == mTempDbg_EndX)
    {
        Assert(mOffsetY == mTempDbg_EndY);
    }
    // ... same as before

Ran it few times, hit the assertion, however now the debugger has inserted a barrier for us. mOffsetY was wrong when being tested, but by the time we broke into the debugger it has been updated. If we now let it continue, outOffsetY will actually contain a correct value and the outside assertion will not be hit… This might not be a 100% confirmation we were looking for, but it surely gave us more confidence our original theory was correct.

PS. You might ask yoursef - how is it possible to run a code with memory barrier macros not fully functional and not notice it. Well, as it turns out, it’s easier than you might think. Some platforms, even with weakly memory models might try atomic (interlocked) operations as barrier. So, if your spinlock class for example uses cmpxchg, you might get away with a broken barrier.

More Reading