There are several approaches to implementing a hash map. Main difference is in the way we handle collisions between entries falling the the same bucket. Probably the most popular and elegant is “chaining”, where for every slot we have a linked list of entries. In case collision we just enqueue entry in list for correct bucket. Variation of this method is used for example in MSVC’s library.
I implemented it in the first version for RDESTL as well, just because it was the most straightforward. However, this solution has certain disadvantages:
- poor cache usage, the usual problem with linked lists, in my implementation miss only happens on collision, but it still may be an issue,
- node allocation for each collision,
- increases memory usage (additional entries for list management)
I’ve seen some implentations using vector of vectors (vector of entries for each bucket, instead of list). It has probably better cache usage for buckets, but there’s still problem on collision and it takes even more memory, so I didnt really consider it.
After watching Cliff Click’s presentation about his lock-free map (for Java) I decided to experiment with “open addressing hash table” with linear reprobing. In this solution, in case of collision we simply iterate the table until we find free slot. This means that whole hash table is just a linear block of memory, no external links. It has its portion of problems, for example it’s prone to “clustering” (entries falling to consecutive buckets glueing together, making finding free slots more expensive), but was intriguing enough to give it a shot. In order to minimize clustering it’s advised to maintain a relatively low load factor (50% or less if possible).
Below you can find a results of a simple performance test. It consists of enumerating all the files in c:\Windows\system32 directory (2435 files in my case), iterating all of them and finally removing randomly chosen 45% of entries. Results (sum of 1000 iterations):
- RDESTL, chaining: 7211.706ms [6477.079, 130.545, 247.396]
- RDESTL, linear probing, power-of-2 capacity: 5911.379ms [5405.187, 93.174, 244.688],
- RDESTL, linear probing, prime number capacity: 5788.215ms [5231.420, 85.864, 305.579],
- stdext: 7224.713ms [6174.747, 83.457, 692.800]
(RDESTL hashmaps with 50% load, it resulted in 6151 capacity for 1 & 3 and 8192 for 2).
The overall quickest was version 3) however, I’d argue that in most cases you’d like to choose 2), because it performs better in most common operations like iteration and removal (this tests find() as well… stdext is slooooow here). It was slower during enumeration, mainly because the table is bigger (=rehashing costs more). Longest cluster was 8 entries. Memory usage is (with cached hash value per entry):
- ~289960 bytes
- ~246040 bytes
- ~327680 bytes
I havent decided yet to replace my hash map with this new one, so for the time being it’s added as hash_map2. You can tweak capacity “policy” (power2 or prime number) and load factor. Not sure if it’s a good idea to expose such low level characteristics in the final version, probably not, but they may have big impact on final performance and memory usage, so let it stay this way for a while.
PS. It seems like GTAIV is officially the bestest game evah. In other news, we’ll soon need to bump mark scale to 200% or something, if only for GTA V.
(…and my preorder still isnt here!)