There may come a time in game programmer’s life when he has to fix a bug in a library he doesn’t have the source code for. It doesn’t happen often, it might never happen, but it’s good to be prepared. If I remember correctly, I had to do it only two times, one was fairly recently. We were getting quite a few crash reports and were assured that fix in the third-party library was coming, but I decided to see if it’s possible to do anything about it in the meantime. Things were further complicated by the fact we’ve never seen this crash internally, it was all based on user reports (and it was quite rare in the wild, too). Started with investigating crash dumps in WinDbg. The crash itself was division by zero, it seemed like the code was not handling all the edge cases correctly. It’d load a value from table, do some transformation and divide by result, it worked fine in most cases, but would break if the value read from the table was zero, too (it’d pass all the transformations and come out as zero on the other end). We had no sources and no symbols, so I wasn’t even sure what was this function supposed to do, but it seemed like the array should not contain zeros in the first place. Now, I didn’t really care about 100% correct solution, as it was obvious I was treating symptoms, I just wanted something that’d eliminate crashes (and wouldn’t break rendering completely, I was fine with temporary artifacts). What I had to do was to squeeze in a test against zero, handle it and also set the original array element to something else than 0 (to cut the long story short, I found out about the last requirement in the process, it’d crash in another function without it). Easy, right? That’s like ~12 bytes worth of opcodes in x64. The block I was comfortable with modifying (didn’t want to mess with the whole function) was roughly 40-45 bytes, maybe a little bit more, so I had to find a way to shrink it down by ~25-30%. I will not focus on the actual modifications too much, as they’re not applicable for anyone else and – to stress this one more time – you do not need stuff like this often, if ever. Instead, I’ll try to present some of the tricks & tools of the trade that can come useful in other situations, too.
Let’s start with writing the code that does the same as the original fragment, but is smaller. Luckily for me, code was using additional registers (outside the EAX-EDI range), even though it was not operating on 64-bit numbers (so only using lower 32-bits). When using extended registers, we have to output an additional REX prefix, so most of the time opcodes are at least 1 byte longer than their 32-bit counterparts. Example:
mov ecx, eax ; 8b c8 = 2 bytes mov ecx, r8d ; 41 8b c8 = 3 bytes ; (0x41 encodes default operand size ; (32-bit for mov) & extends the MODRM)
By changing parts of the code to operate on EAX-EDI I was able to get within 1 byte to my goal, but for the last stretch I had to resort to more risky modification involving using CDQ (1 byte opcode) instead of XOR EDX, EDX (2 bytes). They are equivalent, assuming we’re operating on positive numbers, which luckily was the case here. Surprisingly, x86 version was somewhat easier, I could not use “smaller” registers, but generated code was a little bit redundant, so I modified the algorithm slightly to do the same thing, but with less instructions.
Getting the final opcodes was trivial for x86, I simply used inline assembly and copy-pasted from the disassembly window. Could not do the same for x64 as MSVC does not support inline assembly in this mode (intrinsics only). Looking back, I should have just downloaded some x64 assembler, but if I did – I would not have discovered ODA. It’s great online disassembler supporting every platform you’ve ever coded for and a bunch you’ve never heard about. My only complaint is it sometimes takes a while to realize that opcodes have changed and still shows you the old code, but other than that – it’s simply awesome. x64 encoding is not terribly user friendly, especially when you need to generate instructions like INC BYTE PTR [R12+0x4], but I kept plowing through. Intel’s manuals are a good starting point, but I found OSDev Wiki to be a more concise reference.
For the actual editing I’ve used HTE for x86. It probably pushes the definition of oldschool a little bit too far (no mouse support…), but has a built-in disassembler, so that I could immediately verify my changes made sense. Could not find any hex editor/disassembler for 64-bits, so used my trusty xvi32 and the debugger for verification.
This brings us to the last point — how to set a breakpoint in an unknown piece of code, no function name, no symbols. Well, immediate window to the rescue! (Side note: I feel this is probably one of the most underappreciated features of Visual Studio. IME many of programmers complaining about MSVC debugger either do not use it often enough or do not use it to full potential). We know the opcodes, we can search for them in memory. Start with getting address range of the module you’re interested in (open Modules window, copy-paste). Now, in the immediate window we can use the memory search command. For example, let’s assume you’re looking for the mov ecx, eax instruction (in real life scenario, you’d probably want to choose something less common obviously) and you’re module address range is 0x003D0000-0x0044B000:
.S -W 0x003D0000 0x0044B000 0xc88b (-W = 16-bit number, -D = 32-bit).
All that’s left to do is opening the disassembly window and copy-pasting addresses returned by .S command (hopefully not too many of them) into the ‘Address’ field one-by-one. Shouldn’t take long to find a function we’re looking for. That was the last stage of my experiment, I could now verify the code was indeed running, I was able to modify data on the fly, prove that it did crash upon encountering zero in the array (remember, I have never actually experienced this bug myself, it’s all based on user error reports, hoping to just run into it was a fool’s errand). More importantly, I could verify it was no longer crashing after my changes and introduced no noticeable side effects.