Dynamic initializers strike back

Back in the day I wrote about the ‘dynamic initializers’ problem. Basically, older versions of MSVC (up to 2012, not sure about 2013, seems better in 2015) had problems with static const floats that depended on other static const floats. Values were not calculated compile-time, there was actually a short function generated and it’d do it. The immediate problem is a code bloat (if our constant is placed in a global header), but the other potential issue stems from the fact that these ‘dynamic initializer’ functions respect the optimization settings (/fp:fast). Few weeks ago I’ve encountered an interesting case of this problem.
I was investigation some animation processing issue and noticed that data generated by debug builds is a little bit shorter than the one done with release/retail (after compression). Now, it’s not very alarming in itself, we never use debug for preprocessing final assets anyway, but seemed interesting enough to warrant some more digging. Going through the code I noticed the following fragment (modified slightly):

static const float A = 1.41421356237f; // sqrt(2)
static const float A2 = 65536.0;
static const float SM_B = A2 / A;
static const float SM_C = SM_B / A;
float Foo2(float x)
{
    return(x + SM_C);
}
[..]
printf("f=%f\n", Foo2(0.0f));

It might not be immediately obvious, but SM_C is actually simply A2/2 = 32768.0 (A = sqrt(2), sqrt(2)*sqrt(2)=2), so in our basic example we expect it to print 32768.0. That’s exactly what happens in debug, release however will output 32767.9980, so quite off. It’d then undergo some other transformation, but the end result would compress slightly worse. Now, as I said, I’ve seen this type of stuff before, so didn’t take much time to track it down and I guess I could have expected it. I’m not sure this is a thing you keep in your mind when writing the code, though (”oh, snap, too many indirections, precision problems to follow”). You could argue we should have made it easier for the compiler and simply write SM_C = A2/2.0, but again, it’s not necessarily the first thing to consider when writing math code (I think SM_B was actually being used somewhere too, so it seemed more natural).

Root of the problem? ‘dynamic initializer’ + fp_fast. Let’s compare generated code. Debug:

movss	xmm0, DWORD PTR SM_B
divss	xmm0, DWORD PTR A
movss	DWORD PTR SM_C, xmm0

Release + fp_fast:

SM_B	DD	0473504f3r			# 46340.9
movss	xmm0, DWORD PTR SM_B
mulss	xmm0, DWORD PTR __real@3f3504f3
movss	DWORD PTR SM_C, xmm0

As you can see, compiler tried to be smart and replaced division with a multiplication by the inverse (ie, 1/A). It’s a noble attempt and would make sense in a runtime code, but in this case actually causes our precision problems. Just for comparison, here’s generated code after we surround our block with #pragma float_control(precise, on, push)/#pragma float_control(pop)

movss	xmm0, DWORD PTR SM_B
divss	xmm0, DWORD PTR __real@3fb504f3
movss	DWORD PTR SM_C, xmm0

This gives us 372768.0 as well. VS2015 is actually much smarter and doesn’t suffer from the whole ‘dynamic initializer’ problem (or at least it’s not triggered as easily):

CONST	SEGMENT
__real@47000000 DD 047000000r			# 32768
addss       xmm0,dword ptr [__real@47000000]

Clang/GCC (1191182336 == 0x47000000):

LCPI0_0:
    .long   1191182336              # float 32768
Foo2(float):                        # @Foo2(float)
    addss   xmm0, dword ptr [rip + .LCPI0_0]