A Tale of 3 Characters

Had an interesting little debugging adventure recently. Few weeks ago we started getting QA reports about some of the visual effects looking all weird. I tested it locally, it all seemed fine. QA dug deeper and they discovered it only occured with disc builds (not necessarily running from a DVD, just after installing fully preprocessed build), P4 version was good. The problem with installed builds is they are very close to final, fully optimized, all assets in final version with no debug info etc, so it’s pain in the ass to debug. I started with running it under PIX and discovered that for some reason shader was not getting all the parameters it expected, so it was using whatever random data was in registers. Seemed like a preprocessing bug. Next step was trying to discover the difference between my local data (OK) and the one disc build was using. I discussed it with the engine guys and found out that we recently switched the build/cache machine to 64-bit. (I’m running both, but mostly 32). Switched to 64-bit locally, forced preprocessing and finally got it to ‘break’ locally. At this point it was quite clear there was a problem with preprocessing in 64-bit builds. It went quick from here and after few minutes I got to the bottom of the issue.

It’s all caused by one line, looking roughly like this:

// uint64 Foo::GetX()
const uint64 p = 1 << foo->GetX();
// p used afterwards

Looks innocent… However, for some resource types GetX() was returning 31. Now, in theory - that’s a problem already. See this quiz about integers in C - apparently you cannot shift into, or past, the signed bit. In practice, for this particular platform it was OK (i.e., the result is what you’d expect, 1 in the highest bit), things went south a little bit later. Let’s see generated assembly:

0000000063B0EC12 0F B6 C8             movzx       ecx,al
0000000063B0EC15 8B 84 24 A0 1A 00 00 mov         eax,dword ptr [rsp+1AA0h]
0000000063B0EC1C D3 E0                shl         eax,cl
0000000063B0EC1E 48 98                cdqe
0000000063B0EC20 48 89 84 24 D0 00 00 00 mov         qword ptr [rsp+0D0h],rax

cdqe? That’s our problem right there (that’s debug, it was actually movsxd in release build, but same issue)! What happens is this: we first shift left, result is an integer, then** sign-extend** it to 64 bits, getting some crazy value (18446744071562067968 to be exact). Fix was very simple and required adding 3 characters:

const uint64 p = 1ULL << foo->GetX();

Compare generated code:

0000000063B0EC13 0F B6 C8             movzx       ecx,al
0000000063B0EC16 48 8B 84 24 A0 1A 00 00 mov         rax,qword ptr [rsp+1AA0h]
0000000063B0EC1E 48 D3 E0             shl         rax,cl
0000000063B0EC21 48 89 84 24 D0 00 00 00 mov         qword ptr [rsp+0D0h],rax

As you can see, now all the calculations are performed in the 64 bits, so no conversion is required and we get our expected result (2147483648).

That was the end of it, and I knew (intuitively) what was wrong, but I was curious why exactly it behaved this way and way it generated no warning (level 4 + Lint). First thing is, shift operator behaves a little bit unusual when it comes to arithmetic conversion. The “standard” behavior, if arguments are of different types, is to convert the “smaller” argument to the type of a “bigger” one (ie. char + int will promote the first addend to an int). If shift operator followed the same rules, it’d convert 1 to uint64 as that’s the type of object returned from Foo::GetX(). With this particular operator however, integer promotions are performed on each of the operand individually. For x86/64 the right operand will be either an immediate value or it’ll be stored in the CL register, in any case, it’ll easily fit in one byte. It’s always the type of the left operand that defines the type of the result. So, in our case, the type of 1 << GetX() expression will be an integer and it doesn’t matter that GetX() returns an uint64. Integer is later converted to uint64, which will perform the sign extension (now that we know it, another way to work around that would be to explicitly convert it to unsigned long first). I am a little bit surprised Lint was quiet about it, it normally complains about mixing signed/unsigned. Also, interestingly, when I tried a constant, not value returned from a function, MSVC was smart enough to warn us:

uint64 x = (1 << 31);

gives “warning C4245: ‘initializing’ : conversion from ‘int’ to ‘size_t’, signed/unsigned mismatch”.

Old comments

vince 2012-09-04 02:28:27

Reminds me of a subtle bug I found in some licensed code
// int64 blahFlags; defined earlier
int32 bDoSomething = blahFlags & SOME_FLAG_DEFINE_IN_UPPER_32_BITS;
if(bDoSomething)
{
// this branch will never be taken
}
Now this one I think will be caught by a warning, but the codebase helpfully had that warning disabled.

More Reading