RDESTL: hash_map, another take

¹/Oct 2008 By admin

I’ve been reading lots of interesting stuff on hash tables recently and was trying to improve my RDESTL implementation of hash_map. At one point I had 5 different versions, but finally managed to create something that’s good enough for most applications. Granted, I exploit the fact that performance’s the most important factor, so I can live with some limitations that wouldnt fly in any more general library. I effectively “ban” two hash values (0xFFFFFFFE & 0xFFFFFFFF) for example and use it for my own purposes.

Some of the articles I found:

Comparison of Hash Table Libraries - comparison of 12 different hash table implementations (performance/memory consumption).
From the same author - The Google Dense Hashtable Library - analysis of Google’s dense hash_map from sparsehash lib.
Notes on hash tables from Dr. Jonathan Mohr. It’s especially interesting for analysis of average number of lookup probes as a function of load factor/collision resolving system.
Chapter in Beautiful Code - “Python’s Dictionary Implementation: Being All Things to All People”.
See also dictnotes.txt in Python’s code repository for more information about this implementation.

Final result can be found in RDESTL. I measured it’s performance against Google dense_hash_map in their own performance tests and it looks promising:

	dense_hash_map	rde::hash_map	stdext::hash_map	std::map
map_grow	2553.8/886.0 ns	878.2/374.4 ns	7500.0/2666.0 ns	5163.3/1694.2 ns
map_predict/grow	1112.3/457.0 ns	641.2/291.7 ns	7421.0/2648.9 ns	5193.3/1689.5 ns
map_replace	684.9/294.9 ns	794.3/355.7 ns	1530.3/583.4 ns	2287.0/770.7 ns
map_replace2	620.9/269.9 ns	508.6/245.0 ns	1405.5/532.0 ns	2379.0/775.3 ns
map_fetch	653.6/288.7 ns	563.1/257.4 ns	1488.3/566.3 ns	2430.5/809.6 ns
map_fetch_empty	207.5/68.6 ns	187.2/62.4 ns	837.7/265.2 ns	628.7/191.9 ns
map_remove	929.8/374.4 ns	522.6/252.7 ns	5171.5/1781.6 ns	5692.4/1836.1 ns
map_toggle	2054.5/667.7 ns	981.2/377.5 ns	9321.1/3043/5 ns	3135.7/982.8 ns

Tests performed on two machines: 2.2GHz Core Duo laptop and 2.4GHz Quad Core desktop (Q6600). (side note: it’s almost depressing how much is my laptop slower…). I chose dense_hash_map as it seemed one of the fastest implementations out there (judging by Attractive Chaos benchmark). Just for reference I included “standard” hash_map/map as well. Tests are:

map_grow: add N elements to the hash_map,
map_predict/grow: reserve space first, then add elements,
map_replace: add elements, overwrite their values in another loop,
map_replace2: same as map_replace, another implementation (see below),
map_fetch: fetch all elements one-by-one (indexing by key), N elements in map,
map_fetch_empty: same, on empty map,
map_remove: remove all elements from map, one by one,
map_toggle: in every iteration add element, then erase it immediately.

As you can see the only test when rde::hash_map is slower is map_replace. In this scenario we first fill the table, and in the second loop we overwrite all existing values:

template<class MapType>
static void time_map_replace(int iters) {
    MapType set;
    Rusage t;
    int i;

    SET_EMPTY_KEY(set, -2);
    for (i = 0; i < iters; i++) {
        set[i] = i+1;
    }

    t.Reset();
    for (i = 0; i < iters; i++) {
        set[i] = i+1;
    }
    double ut = t.UserTime();

    report("map_replace", ut, iters, 0);
}

Reason for this is implementation of subscription operator. Personally, I hate this operator with passion (in maps, I think its behaviour is higly confusing), but I included it for compatibility reasons. My version is optimized for case when key is not present in the map, so it’s not optimal for this test (it still does reasonably well). For reference, I created another version of this test where crucial fragment looks like:

for (i = 0; i < iters; i++)
{
    typename MapType::iterator it = set.find(i);
    it->second = i + 1;
   //set[i] = i+1;
}

I’ve tested various reprobing methods and opted for quadratic. Interesting alternative can be found in Python’s dictionary implementation. It’s a little bit more complicated, but according to authors, it can make up for not-so-optimal hash functions. In my case I decided it’s more important to have quick collision resolution and rely on good hashing (obviously, my situation is more specific, Python’s dictionary is one of the backbones of their system). New hashmap implementation works rather nicely in my old tests as well (and is more memory efficient, it only needs ~144kb).

(PS. I’m a bad bad project owner. There was a bug in RDESTL reported in July that I’ve only noticed… Sorry. Fortunately it was hash_map related :) PPS. When profiling hash_map I found a bug in my profiler, updated version uploaded (it would include profiler::Disable, while it shouldnt). PPPS. Updates on this site may be a little slower for a while. I’m abroad with a special mission, sadly cannot say much more for the present or I’d have to track you all down and eliminate.

Old comments

js 2008-10-01 09:46:54

It looks great,
I didn’t look at the implementation but is it possible to set a custom allocators to a rde::hash_map ?

attractivechaos 2008-10-01 10:32:19

Thanks for linking to my blog. I am interested to see how rdestl performs under GCC/Unix and so I downloaded your library and put into my benchmark. I have made three changes to make g++ report no compiling errors:
1. replace hash_map.cc:24 “class THashFunc = rdestl::hash,” to “class THashFunc,”. It seems that there is no rdestl::hash
2/3. replace “new(mem) node()” to “(node*)mem = node()“. g++ does not recognize such grammar.
Now my test program can be compiled but the result seems to be wrong. I use rdestl in almost exactly the same way as sgi-stl::hash_map. Here is the relevant code:
struct hashf_int {
inline unsigned operator()(unsigned key) const { return key; }
};
typedef rdestl::hash_map inthash;
[code lang=“C++”]
int test_int(int N, const unsigned *data)
{
int i, ret;
inthash *h = new inthash;
for (i = 0; i < N; ++i) {
pair p = h->insert(pair(data[i], i));
if (p.second == false) h->erase(p.first);
}
ret = h->size();
delete h;
return ret;
}
[/code]
Am I doing anything wrong?
In addition, I am surprised that in MSVC std::map is in general faster than stdext::hash_map. A search tree can hardly be faster than a hash table.
Are you using the same (or almost the same) time_hash_map from google-sparsehash? Have you applied optimization? My laptop just has a 2.0GHz CPU, while the result seems to be siginificantly faster than your 2.4GHz desktop.

attractivechaos 2008-10-01 10:34:34

Sorry, some charaters are masked somehow. Such as pair. But you must know what it should be.

admin 2008-10-01 11:03:26

Hmm, I dont have g++ here, so cant test it by myself, but:
1) there’s rde::hash defined just above hash_map class, will have to look into it.
2, 3) This is perfectly legal placement new operator, so it should compile, I’m surprised GCC has problems with constructs like that.
Tonight, when I get home, I’ll download your test suite and try to plug rde::hash_map in.

attractivechaos 2008-10-01 12:22:43

1) Maybe rdestl::hash is absent in your online version? There is no other class before hash_mapbase.
2/3) According to this page:
http://msdn.microsoft.com/en-us/library/kewsb8ba(VS.71).aspx
“new(mem) node()” does not match any form. In wiki:
http://en.wikipedia.org/wiki/New(C%2B%2B)
I cannot find a similar example, either. Maybe this is an extension in MSVC?

Josh Szepietowski 2008-10-01 16:16:26

@attractivechaos
That form is the ‘Placement new’ http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10 and is part of the standard.
Here is some info of GCC’s hiccup on this syntax that might be helpful:
http://www.desy.de/user/projects/C++/g++faq/placement_new_syntax.html

admin 2008-10-01 19:27:15

Yes, it’s placement new. Thanks Josh for GCC link, I had no idea it doesnt work there.
@attractivechaos: I think you used old version of hash_map from zip file. Try the one from SVN repository.
@js: yes, it should be possible to plug in your allocators, but the philosophy is closer to the one in EASTL rather than in STL.

attractivechaos 2008-10-01 19:30:07

Thank you, Maciej and Josh. I did not know this “placement new”. However, I am afraid this is an extension in some compilers, but not part of the standard. Even invokded in C++98 mode, g++ cannot compile placement new. Intel C++ compiler cannot, either. According to this page:
http://www.glenmccl.com/nd_cmp.htm
placement new is not implemented in all compilers. Icc/gcc cannot compile the example showed in that page. Icc/gcc are essentially all the matured C++ compilers outside the Windows’s world on x86 CPUs. It may be good to avoid placement new for better portability.
I can see placement new is useful. I do not know how SGI STL manages cases where placement new is preferred. Maybe it is worth checking.

admin 2008-10-01 19:48:22

It definitelly is part of the standard. It puzzles me it’s still not implemented by GCC, I’ve been using it under MSVC for years.
[Edit] This construction is used also by boost (object_pool for example), so I’d bet it should be quite safe/portable.

attractivechaos 2008-10-01 21:49:51

Sorry, it is my fault. I have overlooked one sentence in the page suggested by Josh: “#include // Must #include this to use placement new”. In the following line (overloading new operator) solves the problem:
inline void *operator new (std::size_t, void *mem) { return mem; }
Your SVN version has added “#include ” and so placement new works. I do not need change point 2/3 in your new library. Thanks a lot.

Axel 2008-11-25 11:57:12

While looking for a faster alternative to ms-stl hashmaps we are currently trying RDESTL. We have found a bug in the clear method of the hashmap. After clearing a hashmap containing already deleted elements you will get an assertion during the next insert.
The following implementation fixes the problem:
void clear()
{
node* endNode = m_nodes + m_capacity;
for (node* iter = m_nodes; iter != endNode; ++iter)
{
if (iter->is_occupied())
{
// We can make them unused, because we clear whole hash_map,
// so we can guarantee there’ll be no holes.
rde::destruct(&iter->data);
}
iter->hash = node::kUnusedHash;
}
m_size = 0;
m_numUsed = 0;
}
By the way, a swap method would be nice!

admin 2008-11-25 21:28:06

Thank you for a bugfix! I incorporated it and added swap() method as well (fairly straightforward, but still should be quicker than tmp = a; a = b; b = tmp). I normally try to avoid swapping containers, that’s what it was missing.

Axel 2008-11-26 12:53:06

Thank you for the swap method, but it crashes because of the “m_nodes” member is NULL in the rehash function.
I tried the following implementation, which works fine
void swap(hash_map& other)
{
node* tmp_nodes(other.m_nodes);
int tmp_size(other.m_size);
int tmp_capacity(other.m_capacity);
int tmp_numUsed(other.m_numUsed);
THashFunc tmp_hashFunc(other.m_hashFunc);
TKeyEqualFunc tmp_keyEqualFunc(other.m_keyEqualFunc);
TAllocator tmp_allocator(other.m_allocator);
other.m_nodes=m_nodes;
other.m_size=m_size;
other.m_capacity=m_capacity;
other.m_numUsed=m_numUsed;
other.m_hashFunc=m_hashFunc;
other.m_keyEqualFunc=m_keyEqualFunc;
other.m_allocator=m_allocator;
m_nodes=tmp_nodes;
m_size=tmp_size;
m_capacity=tmp_capacity;
m_numUsed=tmp_numUsed;
m_hashFunc=tmp_hashFunc;
m_keyEqualFunc=tmp_keyEqualFunc;
m_allocator=tmp_allocator;
}
but I’m not sure about the need to exchange allocator and functions too.

admin 2008-11-26 14:09:34

Doh, of course. I don’t know what was I thinking, your method is much quicker. Functions should be swapped definitelly, with allocators it’s tricky. If they’re “equal” (ie A can free memory allocated by B) then it doesn’t matter. Otherwise – you probably shouldn’t swap the containers. See LWG #431 for more detailed discussion. Personally, I’m most concerned about efficiency, so I’d go with undefined behavior here and wouldn’t swap allocators :/. I’ll investigate it some more and upload new version later.

.m.m.s.

Maciej Sinilo