Going deeper - addendum
4/May 2014
There’s been some comments to my previous post wondering about C++ compilers and their capabilities. Normally, I’m all for compiler bashing, in this case I’d probably cut them some slack. It’s easy to optimize when you’re focused on a single piece of code, way more difficult when you have to handle plethora of cases. On top of that, uops handled differently on different CPUs, e.g. in my limited tests Haswell seems to care less. Anyhow, I’d rather expect compilers to replace INC with ADD x,1 in most cases, I’d be much less optimistic with SIB byte elimination. MSVC seems a little inconsistent about it, it sometimes uses INC, sometimes ADD, not sure what determines that. Out of curiosity, I decided to use Matt Godbolt’s excellent Compiler Explorer to see how different compilers from the GCC family behave. Results:
GCC 4.9.0 eliminates both SIB byte & uses ADD instead of inc (it pretty much generates identical code as my final, hand optimized version)
Clang 3.2 eliminates SIB byte, but uses INC [mem]
g++ 4.4 - same as Clang
I didn’t really test the more exotic versions, follow the link if you’re interested.