Hashing made (even more) useful
9/Jul 2008
One of the most basic issues in the game development is: how do we identify “things”. Thing can be, well, anything - logical object, resource, level, entity, animation etc. The most straightforward way is of course to name them and identify with strings (tags). So, you have “texture_leaves0”, “badguy_01” and so on. This solution has two advantages - it’s dead simple and contains bonus useful info. Unfortunately, it has also lots of problems. I dont want to reiterate what’s already been said many times, so in short: memory consumption, low efficiency, variable size/dynamic memory (complicates serialization). For more comprehensive discussion see article by Mick West - Practical Hash IDs (read it before continuing with this one).What’s the alternative? Well, we can use hash values of the strings instead of strings themselves. There are many different hashing algorithms, but for gamedev purposes CRC32 is by far most popular. It’s quick to calculate and, what’s important, hash values take only 32 bits. Comparing two IDs boils down to comparing two 32-bit values, they fit structures nicely and take much less space than full strings. See Mick’s article again for more info. CRCs overcome most of the problems with strings and if you’re not using it yet – start. In an ideal situation your final game data shouldnt contain any strings apart from UI stuff. Main disadvantage of CRCs is that they make debugging more complicated. Ultimately, they’re just a number, so your nice “badguy_01” becomes 0xD3576AAF now. Unless you’re some kind of superhero with the extraordinary power of calculating hashes back and forth - it doesnt say much. We’d really prefer to see “badguy_01” in debugger in this case.
Fortunately, this is a wide known problem, so we may hope for ready solutions. It turns out that fellow Pandemic programmer - Ivo Beltchev - wrote an article on this topic - Debugger Support for CRC Hash Values (read it now). Basic idea is to use Visual Studio’s autoexp.dat file (by default located in MSVC directory in Common7\Packages\Debugger). It’s a configuration file that lets users specify the way debugger displays structure contents (in watch window/tooltip). Most straightforward way to use is to provide some method performing reverse lookup for hash values (ie, given CRC - it returns original string). Let’s assume it’s named GetStr (as in Ivo’s article) and our CRC type is rde::CRC32. All we have to do now is add single line to autoexp.dat ([AutoExpand] section): rde::CRC32=
doesnt work with remote debugging,
memory overhead. Typical implementation of GetStr() method works with huge structure mapping all existing IDs to their source strings. Of course, the usual way of bypassing this problem is to have it enabled only for production builds and remove in VeryFinalWeReallyWontHaveToDebugThisOneEVAR (and we want our 5MB of memory back!) build. Five minutes after releasing this exec Murphy’s Law kicks in and some poor bastard has to stare at 0x563FF2C wondering what NPC may it be.
Ivo solves the first problem by writing Expression Evaluator plugin. In such case debugger takes care of retrieving needed data from application being debugged, so it can work remotely. Second problem persists, however. Here’s the way I decided to tackle it:
all IDs/strings are stored in an external database. SQLite is used to manage it. Nice thing is that it solves collision detection automagically. Every time your game editor needs CRC it calculates it and try to grab from database. On success – comparison is made, if strings differ - we have a collision. If hash value doesn’t exist in the database, it is inserted. Game doesn’t need to touch database at all, it only operates on CRCs.
on evaluating the contents of CRC, EE plugin does reverse lookup using our global database. We can detect some obvious problems here (like indexing with ID that’s not present in the base). SQLite DB is just a file, so it can be copied around.
First part was trivial, writing VC plugin however, is a totally different story. There doesn’t seem to be much information on the web about it. Add problems with different compiler versions, trouble with debugging the plugin itself and it quickly becomes rather irritating. Few tips that may help brave souls trying to code MSVC plugins, hopefully it’ll save them some pain:
First things first - plugin is just a DLL realizing some of the poorly documented MS functions. In general, it should reside in the same directory as devenv.exe/or somewhere in path.
Preferred syntax for your plugin in autoexp.dat should be: rde::CRC32=
(that’s name_of_dll,name_of_function). One guaranteed way to make it work that I found is to always provide full path. Right now I’m lucky enough to have it working just with DLL name, but I’m pretty sure that in previous version it made a difference and started working only after I added full path (another explanation is that I should give up drinking). Remember to provide module definition file and make sure it’s recognized by the linker. It’s possible to reference add-in functions without it, but you’ll have to use some decorated C name (like _AddIn_CRC32@28 or whatever).
If after all those steps plugin doesnt seem to work, first make sure DLL is loaded/function is executed. In order to do it, put some debug string in the result buffer and see if debugger shows it. In general, if you see ‘???’ it means that either your plugin wasnt executed or it crashed (that’s why you should start with simple one, to eliminate this reason). Try various possibilities in autoexp.dat: full path, DLL name only, undecorated/decorated function names. At least one of those should work.
Finally, little source package realizing basic functionality described above (tested under MSVC 2008, should work with 2005, wont work with 6.0 most probably). There are two projects in the solution: sqlgen and eecrc32. sqlgen is “editor/game” application. It simply iterates all the files from c:\Windows\System32 directory and calculates CRC for them, inserting to the database if needed (GetCRC32 is main function that your tool would need). Run it once to generate SQL base. eecrc32 is EE plug-in, it needs to be built and copied to IDE directory (for MSVC2008 it’s done as a PostBuild step, for other versions, you’ll most likely have to modify target dir). Finally, modify autoexp.dat to use our new plugin for expanding CRC32 structures (add rde::CRC32=$ADDIN(eecrc32.dll,AddIn_CRC32)). Run sqlgen.exe again, under debugger and try to inspect calculated hash values. If everything went OK, we should see original strings next to CRCs, as in the snapshot below:
To be fully functional, you’d probably have to find an elegant way to specify what database it should use (right now it’s C:\crc32.db). I couldnt really discover in what directory it should be put (tried plugin dir, debugger dir, application dir), so I use absolute paths (it may be good idea anyway, there’ll be one global database anyway, it may even reside on shared drive). Also, it may be beneficial to create external ID-to-string lookup application (something similar to Microsoft’s Error Lookup), so we can resolve IDs coming from various sources (error reports, crashes, extracted from registers etc), without game application ever requiring database to work. Download code here.
PS. Went to see Kung Fu Panda today, fun movie, highly recommended (+ sick fur rendering).
PPS. If you’re interested in hashing, you may want to check new algorithm all the cool kids use (supposedly it’s very fast, havent tried it myself [yet], but now I think I should) - Murmur Hash.
Old comments
js 2008-07-10 07:30:26
OMG, you won the contest of of “ugliest visual studio color settings” EVER puke
admin 2008-07-10 07:33:59
It kicks ass, I know. This is my final setting after long years of research and experimentation :) Reminds me of days of coding in DOS Navigator internal editor.
Bookmarks about Soap 2008-07-14 09:16:04
[…] - bookmarked by 3 members originally found by sanctuaryx3 on July 12, 2008 Hashing made useful http://msinilo.pl/blog/?p=82 - bookmarked by 2 members originally found by sawing14s on July 11, […]
jos8cal 2008-08-04 00:15:59
Care to share the visual studio color settings? :)
js 2008-08-04 08:49:29
Are you both color blind ? This set makes me dizzy :(
Sesso 2009-01-24 23:53:54
Great site.
Websites tagged “hashing” on Postsaver 2009-02-06 13:32:16
[…] hype!) saved by redynsm2009-02-02 - Hashing - Good For You to Know saved by micklerlop2009-01-27 - Hashing made useful saved by mediaeater2009-01-27 - Supporting VS2005 and VS2008 Builds of Mixed-Mode Projects saved by […]