A Halloween Story - get_temporary_buffer

A little bit late, as Halloween was yesterday, but I think std::get_temporary_buffer is scary enough to qualify. A co-worker called me today to show me an ‘interesting’ crash. It was deep in guts of the STL, more specifically in the std::_Merge method.

Corresponding C++ line seemed innocent:

*_Dest++ = _Move(*_First1++);

Nothing very interesting, not our code and yet it was crashing with: Exception thrown at 0x01293C05 in foo.exe: 0xC0000005: Access violation reading location 0xFFFFFFFF

Let’s open the disassembly window and see what instruction is causing problems exactly:

01293C05  movaps      xmmword ptr [ebx],xmm0 

EBX=0x001d0638 here, not 0xFFFFFFFF, though, so what’s going on? It can be a little bit confusing at first, but if you worked with SSE/MSVC before you probably know you can’t always trust the debugger here. It’s true that we hit an access violation, but it’s not because our memory region is invalid, it’s simply a misaligned access (notice the movaps instruction that expects data to be aligned at a 16-byte boundary 0x001d0638 is not). OK, progress, let’s backtrack and see what our code is doing. Again, fairly simple:

void SomeFunc(SIMDStruct* begin, SIMDStruct* end)
    std::stable_sort(begin, end);

(SIMDStruct is some struct containing __m128 elements inside). I thought it’d be easy, it’s probably just not aligned properly, but we inspected both structure declaration and our container allocates and they were handling it properly. begin was nicely aligned. Hmph, OK, let’s dig deeper and see where does EBX even come from… At first I assumed it must have been an element of given array, but the address (0x001d0638) was not in the [begin, end) range. Luckily, I had some vague recollection of stable_sort using merge sort and as we know, merge sort requires an extra buffer. Just to confirm, let’s look at our callstack and inspect function signatures:

// We crash inside this one when writing to _Dest
_OutIt _Merge(_InIt1 _First1, _InIt1 _Last1,
		_InIt2 _First2, _InIt2 _Last2,
		_OutIt _Dest, _Pr _Pred, bool _In_place = false);
void _Chunked_merge(_BidIt _First, _BidIt _Last, _OutIt _Dest,
		_Diff _Chunk, _Diff _Count, _Pr _Pred);

As you can see _Dest is passed down the callstack, but if we keep going up we eventually hit _Buffered_merge_sort that does:

_Chunked_merge(_First, _Last, _Tempbuf._Init(),	_Chunk, _Count, _Pred);
_Dest is a third argument of _Chunked_merge, so it does indeed correspond to _Tempbuf._Init(). How is this temporary buffer allocated? Using std::get_temporary_buffer and that’s where the the scary part comes in. “Bogus implementation” aside, even if our array size is small, there’s no obvious/easy way to redirect it to use any other allocation technique. It simply calls global operator new (say goodbye to your alignment). It bothered me in the past, but back then it was usually a matter of perf (I’d love to just give it my own work buffer), in this case it’s worse as it’s crashing the app… The good news is, it’s finally deprecated in C++17. The way I see it, until then we have 3 options:

  • try to somehow replace get_temporary_buffer with own version
  • use own stable sort
  • do not mix SIMD types and std::stable_sort
More Reading