The undefined flag

This week I had one of the most interesting debugging sessions in a while. Here’s the minimal code snippet exhibiting the problem. It doesn’t really make much sense in itself, but I tried to remove everything not related to the bug itself (the actual case was much more convoluted):

 1struct SVar
 2{
 3    SVar() : m_v(23000) {}
 4    void operator=(unsigned short t)
 5    {
 6        if(_rotr16(m_v, 1) != t)
 7        {
 8            m_v += t * 100;
 9        }
10    }
11    unsigned short    m_v;
12};
13static const int MAX_T = 1000;
14struct Lol
15{
16    __declspec(noinline) void Cat(int t)
17    {
18        unsigned short newT = static_cast<unsigned short>(t < MAX_T ? t : MAX_T);
19        m_t = newT; // ***********
20        printf("New t = %d\n", newT);
21    }
22    SVar m_t;
23};
24void LolCat()
25{
26    Lol lol;
27    lol.Cat(100);
28    lol.Cat(100);
29}

Focus on the Cat method. First, let’s imagine that line 19, marked with stars, is commented out. All that our method does is select smaller of given argument (always 100 in our case) & MAX_T (1000) and print it on the screen. Not surprisingly, it’ll print 100 twice (as 100 is smaller than 1000 obviously). Now, uncomment the assignment (m_t = newT) and things get more interesting. When compiled in release, 64-bit mode, the program now prints 100 & 1000 (tested with both MSVC 2008 & 2010)! It doesn’t make much sense, assignment operator never modifies newT, what’s going on here? We clearly need to dig deeper…

Let’s take a look at generated assembly code:

 1movzx       eax,word ptr [rcx]
 2mov         r8d,3E8h // 0x3E8 == 1000
 3cmp         edx,r8d
 4ror         ax,1
 5cmovl       r8w,dx
 6cmp         ax,r8w
 7je          `anonymous namespace'::Lol::Cat+28h (13F143F5Ch)
 8movzx       eax,r8w
 9imul        ax,ax,64h
10add         ax,word ptr [rcx]
11mov         word ptr [rcx],ax
12lea         rcx,[string "New t = %d\n" (13F1D3E10h)]
13movzx       edx,r8w
14jmp         qword ptr [__imp_printf (13F24F938h)]

It might be more obvious what’s wrong now, but don’t kick yourself if you don’t see it. I only caught it after single stepping through this code with the registers window open… Feel free to try it first before continuing, it’s a fun one, I promise.

Focus on lines 3-5. First, we compare t & MAX_T, then execute ROR instruction from SVar::operator= and finally use a conditional move to get newT. Do you see a problem now? cmovl only moves src to dest if the value of the overflow flag is not equal to the value of the sign flag. In our case sign flag should always be set (100 < 1000) and the overflow flag should always be clear (-900 is fine)… That’s all nice, but the compiler decided to interleave our cmp/cmovl pair with ROR instruction, which** can modify the overflow flag!** During the second call to Cat(), _mv is 33000 (highest bit set), when rotated right, OF is set to the exclusive OR of the two most significant bits of the result (0 & 1 in our case, so OF is set). After ROR is excuted, OF == SF and cmovl doesn’t do what it was supposed to do, hell freezes and min(100, 1000) is 1000 now. We’re not fully done, though. Here’s the kicker – the OF flag is only defined for 1-bit rotates (as in our example), it is undefined in all other cases. Our original code didn’t rotate by 1, it was 5. As it turns out, it runs just fine on my i5-3320m laptop and i7-850 at work (so prints 100 twice). It doesn’t seem like the OF flag is ever set. On i7 2600k however, it seems to behave similar to 1-bit rotates (didn’t have a chance to confirm exact behavior for 100%, not my machine, I’m positive it does set the OF flag in some cases, though). Exactly the same code, different result. How to fix it? There are a few solutions, one I find the most amusing is quite unorthodox use of the _WriteBarrier function:

1unsigned short newT = static_cast<unsigned short>(t < MAX_T ? t : MAX_T);
2_WriteBarrier();
3m_t = newT;
4[...]
5mov         r9,rcx
6cmp         edx,r8d
7cmovl       r8w,dx
8movzx       eax,word ptr [rcx]
9ror         ax,1

There you go, a compiler bug (OK, we were pushing it a little bit, but still) and undefined CPU flag in action, all that before I had a chance to eat my lunch that day…

Old comments

Ofek Shilon 2013-02-09 17:11:34

Wow! Good stuff.

Kos 2013-02-15 15:54:58

Another nice bedtime story… I admire the dedication of MSVC fans. ;-)

More Reading