EA STL released, updated benchmarks
October 22, 2010 – 3:36 amBig weekend news were EA releasing some of their codebase source, including parts of EASTL. Given that the original EASTL document has been the main inspiration for my RDE STL experiment, I was very interested in finally seeing the code itself. First impression – it’s still big. It is leaner than STL, but it is a big library. That’s quite understandable if you remember it’s not a local, one studio thing, it’s used by the whole organization. Thus, it includes features like optional exception support, that in other situation could be easily ignored. I was especially curious to test how their hash map implementation performed when compared to Google’s or RDESTL.
hash_map tests were done using Google’s sparsehash (v1.9), ran them on my old laptop (Core Duo 2.2GHz, first value) and desktop machine (8 core, i7@2.8GHz, second value). All benchmarks were run for three different types of objects – sizes of 4 bytes (5000000 iterations), 16 bytes (1250000 iterations) and 256 bytes (156250 iterations). Results presented in tables below (omitted sparse_hash_map (slowest by far) and std::map. See my previous post for test description):
| 4 byte objects | dense_hash_map | rde::hash_map | stdext::hash_map | eastl::hash_map |
| map_grow | 1273.8/365.9 ns | 699.0/205.9 ns | 2095.8/535.7 ns | 1815.9/402.3 ns |
| map_predict/grow | 535.6/169.4 ns | 461.5/151.3 ns | 2209.3/544.2 ns | 1833.8/421.3 ns |
| map_replace | 351.9/128. ns | 369.9/116.9 ns | 540.2/167.0 ns | 395.6/121.9 ns |
| map_replace2 | 377.0/130.4 ns | 322.3/94.2 ns | 513.5/158.3 ns | 354.1/118.8 ns |
| map_fetch | 370.9/125.8 ns | 309.3/98.4 ns | 514.3/155.0 ns | 412.9/103.7 ns |
| map_fetch_empty | 101.1/28.2 ns | 35.5/8.4 ns | 224.6/54.9 ns | 191.3/42.8 ns |
| map_remove | 495.2/158.8 ns | 370.4/121.4 ns | 2131.5/577.2 ns | 587.9/163.2 ns |
| map_toggle | 1050.9/238.9 ns | 479.5/121.6 ns | 3470.2/844.6 ns | 1193.9/264.5 ns |
| 16 byte objects | dense_hash_map | rde::hash_map | stdext::hash_map | eastl::hash_map |
| map_grow | 2760.6/685.5 ns | 1235.3/282.7 ns | 1964.5/1016.3 ns | 2057.8/522.9 ns |
| map_predict/grow | 934.5/258.5 ns | 715.4/210.7 ns | 1802.9/1026.8 ns | 2044.4/528.7 ns |
| map_replace | 693.0/199.8 ns | 569.9/172.0 ns | 774.8/178.5 ns | 651.9/214.8 ns |
| map_replace2 | 826.6/204.4 ns | 549.5/168.9 ns | 783.1/176.4 ns | 628.4/204.2 ns |
| map_fetch | 712.9/203.6 ns | 537.5/159.0 ns | 765.9/177.1 ns | 627.4/200.2 ns |
| map_fetch_empty | 152.3/40.5 ns | 87.7/23.7 ns | 169.5/91.6 ns | 369.7/78.8 ns |
| map_remove | 931.8/252.9 ns | 588.1/164.0 ns | 1708.3/1481.7 ns | 823.1/221.4 ns |
| map_toggle | 2220.7/509.0 ns | 903.3/215.5 ns | 1414.6/896.6 ns | 1785.7/371.3 ns |
| 256 byte objects | dense_hash_map | rde::hash_map | stdext::hash_map | eastl::hash_map |
| map_grow | 16839.8/3724.5 ns | 5786.5/1360.1 ns | 12130.4/3012.7 ns | 7629.7/1831.8 ns |
| map_predict/grow | 4675.8/1174.3 ns | 3845.0/934.9 ns | 12281.7/2969.9 ns | 7411.1/1852.1 ns |
| map_replace | 3893.9/999.0 ns | 3334.1/800.1 ns | 3307.8/789.5 ns | 3239.0/763.0 ns |
| map_replace2 | 4039.3/1018.4 ns | 3298.5/807.4 ns | 3222.9/815.0 ns | 3234.3/838.3 ns |
| map_fetch | 3941.6/978.8 ns | 3186.0/820.3 ns | 3173.7/797.6 ns | 3188.8/819.7 ns |
| map_fetch_empty | 233.6/66.3 ns | 158.3/46.1 ns | 3057.8/749.3 ns | 2954.8/720.3 ns |
| map_remove | 4457.8/1116.7 ns | 3301.5/785.4 ns | 9395.7/2347.9 ns | 3408.8/928.7 ns |
| map_toggle | 9773.2/2334.4 ns | 6438.6/1525.4 ns | 13127.5/2931.8 ns | 9738.0/2437.8 ns |
Note: couldn’t find reserve/resize method for EASTL, that’s why map_grow is the same as map_predict/grow. RDE tries to be as cache friendly as possible and uses open addressing, so it’s the most effective with small objects. When iterating to next object means a cache miss anyway, most of its advantages is lost, as is clearly visible for 256 byte items. Actually, for the last group, std::map was the fastest, beating all hash_maps easily. Below you can find comparative charts (results from the 8-core machine), last one includes std::map.
I ran also my own test, which included adding names of all files from c:\Windows\System32 to a (hash)map, performing some look-ups and then removing them. In general EASTL was quite close to RDESTL here (excluding construction), both quite clearly faster than standard implementation. std::map, rde::map & eastl::map were included in the test as well, and while they were slower than hash_maps, EASTL was clearly the fastest from those three (with RDE ‘fetch’ being the slowest, slower than MSVC implementation even, will have to take a look there). Typical results (in ms, laptop):
| Grow | Find | Remove | Total | |
|---|---|---|---|---|
| RDE hashmap | 191 | 133 | 105 | 429 |
| EASTL hashmap | 182 | 155 | 108 | 445 |
| STD hashmap | 222 | 185 | 240 | 647 |
| EASTL map | 253 | 584 | 278 | 1115 |
| RDE map | 252 | 709 | 338 | 1299 |
| STD map | 228 | 601 | 482 | 1311 |
I tried some less interesting containers, like vector as well. EASTL implementation is really nice, it’s a little bit faster than RDE in most important areas (push_back), while slightly slower when removing objects. There are some neat ideas there, like storing capacity same way as begin/end (ie. as vector), so that we don’t need to ‘convert’ between pointer/index when testing if vector should grow. I plan to incorporate some of those into RDE soon (it could use some overhaul/clean-up anyway).
In general, switching to EASTL should be a no brainer for any companies that are still using STL, if only for better memory management. If you have your own implementation, you probably want to run similar benchmarks yourself first, but I’d say it’s worth at least giving a try (not the whole package maybe, as I mentioned – it may be a little too big for everyone’s taste) . It’s clearly visible that it was written by someone, who actually cared about performance/memory management.
Appendix: all charts in this note were generated using Perl. I wrote a little script that parsed program output and generated images using built-in module. It was the first time I wrote something in Perl. Impressions? Boy, is it powerful when it comes to working with strings (shocker, I know). Whole parsing code is maybe 15 lines and I’m sure could be half of it, were it written by someone who actually knew what he was doing. Syntax is ugly, yes… It’s good to be concise, but here it was pushed a little bit too far. I think I still prefer it to Tcl. Thing I absolutely hated was that statements need to end with a semicolon. For some reason when I code in a scripting language I immediately switch to sloppy, non semicolon mode. What’s more annoying – error messages are completely bogus in such case, I think I’ve seen at least 10 different ones. After a while I just learnt that if message didn’t make any sense – it’s a semicolon missing somewhere.










19 Responses to “EA STL released, updated benchmarks”
Which compiler and OS did you run theses tests with?
Have you also compared to tr1:unordered_map?
By Paul on Oct 22, 2010
Maybe add STLPort to the benchmark?
By Ofek on Oct 22, 2010
Thanks for posting these benchmarks. Like the previous poster I think it would be great if you could include a test case for unordered_map. I’m mostly interested in boost’s implementation.
http://www.boost.org/doc/libs/1_44_0/doc/html/unordered.html
By Sebastian on Oct 22, 2010
You should try do this with with (pre-)C++0x enabled library implementations that support features such as move semantics and again later once C++0x offically out and has better support with other compilers.
VC++2010 supports move semantics and in their STL implementation. tr1::unordered_map is std::unordered_map in C++0x, there should not be any significant difference between std::unordered_map and stdext::hash_map except for move semantics and slight interface differences.
By snk_kid on Oct 22, 2010
Code was compiled with MSVC 2008.
Tests run under Windows Vista 32-bit (laptop) and Windows 7 64-bit (desktop).
I’ll try to add boost/STLPort over the weekend. Basically, it’s unmodified Google’s benchmark from their sparsehash library, so if anyone’s in hurry, it should be simple enough to give it a whirl.
By admin on Oct 23, 2010
Ideally you should do this for a number of compilers, there will be differences. You should definately do this with VC++2010 because you should see a sigifnicant differences because of rvalue references and move semantics and a better optimizing compiler. Get the express edition it’s free.
By snk_kid on Oct 23, 2010
I’ll give 2010 a try some day. Reason why I’m not particularly interested in testing move semantics right now is that it’s not widely support by console (X360/PS3) compilers anyway.
By admin on Oct 23, 2010
http://github.com/paulhodge/EASTL/
By Steve on Oct 25, 2010
for those wondering how unordered_map performs, check out this benchmark: http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/
it’s a little dated but I think most results are still good today.
By xp on Oct 25, 2010
Would be interesting to see a comparison vs PS3/360.
I’d bet that they’ve paid more attention to keeping a smaller cache happy and the differences between best and worst are exaggerated further.
Adrian
By Adrian on Nov 18, 2010
X360 benchmarks can be found here: http://msinilo.pl/blog/?p=675 .
By admin on Nov 19, 2010
Would be interesting to see a comparison vs PS3/360.
By ครีมหมอจุฬา on Nov 22, 2010
I did a comparison of std STL, RDE, ea STL on PC/win7, x360 and ps3: http://imageshack.us/f/545/wyniky.png/
Compiler was VS2010, all optimizations on
Can you comment on the results? Especially about vector.erase on x360
If you want I can share benchmark’s code
By Hubert on Dec 8, 2011
Thanks for the test, looks interesting. What was the stored element type? I’d have to take a look at erase() myself, I might be doing something stupid there, I mostly use unordered version, so didn’t focus on it too much (it’s still surprising it’s so slow only on X360). If stored element is POD, specializing is_pod might help.
By admin on Dec 8, 2011
Thanks for answer :] All tests for vector, list and deque were made on uint32_t.
Today I tested it also on android 2.3 NEON (Xperia Play), and again – vector.erase in rde lasts 13843m, compared to ~8200 in stl and ea. I’m gonna test also how it behaves for bigger structs, and come back here.
By Hubert on Dec 9, 2011
Doh, forgot about this one. Seems like there was a bug that would cause unsigned values to be treated as types requiring copy constructor and nullify some of optimizations. It has been fixed, feel free to give a new version a try.
By admin on Sep 5, 2012