Null references
25/Mar 2012
One of the most popular questions that fresh C++ programmers ask is about differences between pointers and references and which one to use. One of the differences people cite is “references can never be NULL”. That’s true in theory and according to the standard, but in practice, especially when mixing pointers and references there’s nothing preventing you from doing this:
1void Bar(Foo& f)
2{
3 f.x = 5;
4}
5...
6Foo* pf = NULL;
7Bar(*pf);
Now, technically, this code doesn’t conform to the standard and dereferencing a NULL pointer is an undefined behaviour (line 7). In practice, I have yet to see a compiler that gives a damn. Program will obviously crash later, when trying to write to x (line 3).
What’s interesting however, some compilers (I know of GCC/SNC, but there might be more) take advantage of the fact that standard conforming code cannot contain null references.
Consider the following snippet:
struct Lol { int x;};
struct Cat { int y;};
struct Lolcat : public Lol, public Cat {};
Cat* GetCatPtr(Lolcat* x) { return x; }
Cat& GetCatRef(Lolcat& x) { return x; }
With multiple inheritance, when trying to get one of the base classes, we need to add it’s “offset” (sizeof(Lol) for Cat) to the object address. There’s one gotcha, though - we do not shift it for NULL pointers, we do not want to get 4 (or whatever the offset is) from 0, obviously. Let’s see what MSVC does:
Cat *GetCatPtr(Lolcat *x) {return x;}
007A3340 8B 44 24 04 mov eax,dword ptr [esp+4]
007A3344 85 C0 test eax,eax
007A3346 74 04 je GetCatRef+0Ch (7A334Ch)
007A3348 83 C0 04 add eax,4
007A334B C3 ret
007A334C 33 C0 xor eax,eax
007A334E C3 ret
It treats both cases exactly the same, in fact it only generates one function and calls it in both situations.
Code spitted out by GCC is more interesting:
000000000000000c <._Z9GetCatPtrP6Lolcat>:
c: 2c 03 00 00 cmpwi r3,0
10: 38 80 00 00 li r4,0
14: 41 82 00 0c beq 20 <._Z9GetCatPtrP6Lolcat+0x14>
18: 30 83 00 04 addic r4,r3,4
1c: 78 84 00 20 clrldi r4,r4,32 # 20
20: 60 83 00 00 ori r3,r4,0
24: 4e 80 00 20 blr
0000000000000028 <._Z9GetCatRefR6Lolcat>:
28: 30 63 00 04 addic r3,r3,4
2c: 78 63 00 20 clrldi r3,r3,32 # 20
30: 4e 80 00 20 blr
As you can see – there is no NULL check in GetCatRef function as compiler assumes it’s always valid. It’s doubtful you’ll see the impact of this optimization, especially in games, where it’s relatively rare to use MI, not to mention cast to base thousands times a frame… Still a funny little trivia and one example where there’s an actual difference between pointers and references in C++.
Old comments
Trillian 2012-03-25 23:02:25
Nice to know, I never thought about that.
Out of curiosity, I can see that MSVC disassembly is x86, but what is the architecture of the GCC one? I’ve never seen those opcodes.
wiewior 2012-03-26 00:36:26
@Trillian
That’s PowerPC, hint was GCC/SNC in text (SNC being the console compiler.)
peterchen 2012-04-01 19:14:00
Never rely on undefined behavior. You get away with it only for so long.
See e.g. here: http://blog.regehr.org/archives/213