Hidden enemy
20/Jun 2011
Consider the following piece of code:
void FindFrames(const S* seq, float d, UINT& outFrameA)
{
const UINT n= static_cast<UINT>(seq->fd.size());
outFrameA = 0;
while(outFrameA + 1 < n && seq->fd[outFrameA + 1] < d)
{
++outFrameA;
}
}
Looks innocent, right? Well, it was slow enough to show up quite high in the profiler… It might seem surprising at first, but if you’ve done some console programming in the past, you can probably recognized our old foe - load-hit-store here. Let’s see generated assembly code:
823690D4 cmplwi cr6,r10,1
823690D8 ble cr6,FindFrames + 00d0h (82369110h)
823690DC lwz r11,0(r31)
823690E0 lwz r8,0A4h(r30)
823690E4 addi r7,r11,1
823690E8 slwi r6,r7,2
823690EC lfsx fr0,r6,r8
823690F0 fcmpu cr6,fr0,fr31
823690F4 bge cr6,FindFrames + 00d0h (82369110h)
{
++outFrameA;
823690F8 addi r11,r11,1
823690FC stw r11,0(r31)
82369100 rlwinm r11,r11,0,0,31
82369104 addi r8,r11,1
82369108 cmplw cr6,r8,r10
8236910C blt cr6,FindFrames + 009ch (823690dch)
Whooah, some serious spills there. As you can see compiler doesn’t store outFrameA in register, instead it loads (823690DC) & stores (823690FC) it in memory on every iteration. There’s not much work done between those calls, so obviously – that’s where all those LHSes come from. To be honest - I am not exactly sure why it behaves like this. There’s no aliasing, perhaps it’s so that outside code sees the most up-to-date value of outFrameA (even though it’s not volatile?). Either way, doesn’t make too much sense to me (see also Charles’ post + comments). Solution? Simple, copy to local variable, operate on it then store back. Modified code:
UINT frameA = 0;
while(frameA + 1 < n && seq->fd[frameA + 1] < d)
{
++frameA;
}
outFrameA = frameA;
Compare generated assembly:
823690D8 cmplwi cr6,r7,1
823690DC ble cr6,FindFrames + 00d0h (82369110h)
{
UINT frameA = 0;
823690E0 mr r11,r5
823690E4 li r10,1
823690E8 add r6,r5,r9
823690EC lfs fr0,4(r6)
823690F0 fcmpu cr6,fr0,fr31
823690F4 bge cr6,FindFrames + 00d0h (82369110h)
{
++frameA;
823690F8 addi r11,r11,4
823690FC addi r10,r10,1
82369100 addi r8,r8,1
82369104 cmplw cr6,r10,r7
82369108 add r6,r11,r9
8236910C blt cr6,FindFrames + 00ach (823690ech)
Old comments
NickP 2011-06-20 12:54:01
>> To be honest â?? I am not exactly sure why it behaves like this. Thereâ??s no aliasing <<
It looks like fd is an STL vector or some other templated container? My guess is that the compiler is storing + loading the value because of the call to operator[]. It's odd that it does that even though the function gets inlined. Perhaps if operator[] were declared __forceinline it would not do so?
It's also interesting that the problem cascades. Notice that it's reloading the base pointer of the array from memory every iteration, presumably because it cannot be sure outFrameA and the base pointer of the array do not alias.
admin 2011-06-20 17:09:57
Nick - yeah, fd is my version of STL vector. It doesn’t matter here, though, even if I change function to look like this:
[code lang=“cpp”]
void FindFrames(const float* fd, int n, float d, UINT& outFrameA)
[/code]
Generated assembly will be exactly the same.
Arseny Kapoulkine 2011-06-20 20:01:18
> Thereâ??s no aliasing
The assembly style suggests that the platform is X360 - I was under the impression that MSVC does not perform type-based alias analysis (i.e. ‘strict aliasing’ problems don’t apply), so outFrameA could alias anything, including the container struct & container memory.
admin 2011-06-20 22:06:04
Yeah, I thought so, too, but adding __restrict to both outFrame (changing it to pointer, obviously) & fd (function signature showed above) will not help here.