EA STL released, updated benchmarks

October 22, 2010 – 3:36 am

Big weekend news were EA releasing some of their codebase source, including parts of EASTL. Given that the original EASTL document has been the main inspiration for my RDE STL experiment, I was very interested in finally seeing the code itself. First impression – it’s still big. It is leaner than STL, but it is a big library. That’s quite understandable if you remember it’s not a local, one studio thing, it’s used by the whole organization. Thus, it includes features like optional exception support, that in other situation could be easily ignored. I was especially curious to test how their hash map implementation performed when compared to Google’s or RDESTL.

hash_map tests were done using Google’s sparsehash (v1.9), ran them on my old laptop (Core Duo 2.2GHz, first value) and desktop machine (8 core, i7@2.8GHz, second value). All benchmarks were run for three different types of objects – sizes of 4 bytes (5000000 iterations), 16 bytes (1250000 iterations) and 256 bytes (156250 iterations). Results presented in tables below (omitted sparse_hash_map (slowest by far) and std::map. See my previous post for test description):

4 byte objects dense_hash_map rde::hash_map stdext::hash_map eastl::hash_map
map_grow 1273.8/365.9 ns 699.0/205.9 ns 2095.8/535.7 ns 1815.9/402.3 ns
map_predict/grow 535.6/169.4 ns 461.5/151.3 ns 2209.3/544.2 ns 1833.8/421.3 ns
map_replace 351.9/128. ns 369.9/116.9 ns 540.2/167.0 ns 395.6/121.9 ns
map_replace2 377.0/130.4 ns 322.3/94.2 ns 513.5/158.3 ns 354.1/118.8 ns
map_fetch 370.9/125.8 ns 309.3/98.4 ns 514.3/155.0 ns 412.9/103.7 ns
map_fetch_empty 101.1/28.2 ns 35.5/8.4 ns 224.6/54.9 ns 191.3/42.8 ns
map_remove 495.2/158.8 ns 370.4/121.4 ns 2131.5/577.2 ns 587.9/163.2 ns
map_toggle 1050.9/238.9 ns 479.5/121.6 ns 3470.2/844.6 ns 1193.9/264.5 ns
16 byte objects dense_hash_map rde::hash_map stdext::hash_map eastl::hash_map
map_grow 2760.6/685.5 ns 1235.3/282.7 ns 1964.5/1016.3 ns 2057.8/522.9 ns
map_predict/grow 934.5/258.5 ns 715.4/210.7 ns 1802.9/1026.8 ns 2044.4/528.7 ns
map_replace 693.0/199.8 ns 569.9/172.0 ns 774.8/178.5 ns 651.9/214.8 ns
map_replace2 826.6/204.4 ns 549.5/168.9 ns 783.1/176.4 ns 628.4/204.2 ns
map_fetch 712.9/203.6 ns 537.5/159.0 ns 765.9/177.1 ns 627.4/200.2 ns
map_fetch_empty 152.3/40.5 ns 87.7/23.7 ns 169.5/91.6 ns 369.7/78.8 ns
map_remove 931.8/252.9 ns 588.1/164.0 ns 1708.3/1481.7 ns 823.1/221.4 ns
map_toggle 2220.7/509.0 ns 903.3/215.5 ns 1414.6/896.6 ns 1785.7/371.3 ns
256 byte objects dense_hash_map rde::hash_map stdext::hash_map eastl::hash_map
map_grow 16839.8/3724.5 ns 5786.5/1360.1 ns 12130.4/3012.7 ns 7629.7/1831.8 ns
map_predict/grow 4675.8/1174.3 ns 3845.0/934.9 ns 12281.7/2969.9 ns 7411.1/1852.1 ns
map_replace 3893.9/999.0 ns 3334.1/800.1 ns 3307.8/789.5 ns 3239.0/763.0 ns
map_replace2 4039.3/1018.4 ns 3298.5/807.4 ns 3222.9/815.0 ns 3234.3/838.3 ns
map_fetch 3941.6/978.8 ns 3186.0/820.3 ns 3173.7/797.6 ns 3188.8/819.7 ns
map_fetch_empty 233.6/66.3 ns 158.3/46.1 ns 3057.8/749.3 ns 2954.8/720.3 ns
map_remove 4457.8/1116.7 ns 3301.5/785.4 ns 9395.7/2347.9 ns 3408.8/928.7 ns
map_toggle 9773.2/2334.4 ns 6438.6/1525.4 ns 13127.5/2931.8 ns 9738.0/2437.8 ns

Note: couldn’t find reserve/resize method for EASTL, that’s why map_grow is the same as map_predict/grow. RDE tries to be as cache friendly as possible and uses open addressing, so it’s the most effective with small objects. When iterating to next object means a cache miss anyway, most of its advantages is lost, as is clearly visible for 256 byte items. Actually, for the last group, std::map was the fastest, beating all hash_maps easily. Below you can find comparative charts (results from the 8-core machine), last one includes std::map.

graph4
graph16

graph256

I ran also my own test, which included adding names of all files from c:\Windows\System32 to a (hash)map, performing some look-ups and then removing them. In general EASTL was quite close to RDESTL here (excluding construction), both quite clearly faster than standard implementation. std::map, rde::map & eastl::map were included in the test as well, and while they were slower than hash_maps, EASTL was clearly the fastest from those three (with RDE ‘fetch’ being the slowest, slower than MSVC implementation even, will have to take a look there). Typical results (in ms, laptop):

Grow Find Remove Total
RDE hashmap 191 133 105 429
EASTL hashmap 182 155 108 445
STD hashmap 222 185 240 647
EASTL map 253 584 278 1115
RDE map 252 709 338 1299
STD map 228 601 482 1311

I tried some less interesting containers, like vector as well. EASTL implementation is really nice, it’s a little bit faster than RDE in most important areas (push_back), while slightly slower when removing objects. There are some neat ideas there, like storing capacity same way as begin/end (ie. as vector), so that we don’t need to ‘convert’ between pointer/index when testing if vector should grow. I plan to incorporate some of those into RDE soon (it could use some overhaul/clean-up anyway).

In general, switching to EASTL should be a no brainer for any companies that are still using STL, if only for better memory management. If you have your own implementation, you probably want to run similar benchmarks yourself first, but I’d say it’s worth at least giving a try (not the whole package maybe, as I mentioned – it may be a little too big for everyone’s taste) . It’s clearly visible that it was written by someone, who actually cared about performance/memory management.

Appendix: all charts in this note were generated using Perl. I wrote a little script that parsed program output and generated images using built-in module. It was the first time I wrote something in Perl. Impressions? Boy, is it powerful when it comes to working with strings (shocker, I know). Whole parsing code is maybe 15 lines and I’m sure could be half of it, were it written by someone who actually knew what he was doing. Syntax is ugly, yes… It’s good to be concise, but here it was pushed a little bit too far. I think I still prefer it to Tcl. Thing I absolutely hated was that statements need to end with a semicolon. For some reason when I code in a scripting language I immediately switch to sloppy, non semicolon mode. What’s more annoying – error messages are completely bogus in such case, I think I’ve seen at least 10 different ones. After a while I just learnt that if message didn’t make any sense – it’s a semicolon missing somewhere.

  1. 19 Responses to “EA STL released, updated benchmarks”

  2. Which compiler and OS did you run theses tests with?
    Have you also compared to tr1:unordered_map?

    By Paul on Oct 22, 2010

  3. Maybe add STLPort to the benchmark?

    By Ofek on Oct 22, 2010

  4. Thanks for posting these benchmarks. Like the previous poster I think it would be great if you could include a test case for unordered_map. I’m mostly interested in boost’s implementation.

    http://www.boost.org/doc/libs/1_44_0/doc/html/unordered.html

    By Sebastian on Oct 22, 2010

  5. You should try do this with with (pre-)C++0x enabled library implementations that support features such as move semantics and again later once C++0x offically out and has better support with other compilers.

    VC++2010 supports move semantics and in their STL implementation. tr1::unordered_map is std::unordered_map in C++0x, there should not be any significant difference between std::unordered_map and stdext::hash_map except for move semantics and slight interface differences.

    By snk_kid on Oct 22, 2010

  6. Code was compiled with MSVC 2008.
    Tests run under Windows Vista 32-bit (laptop) and Windows 7 64-bit (desktop).
    I’ll try to add boost/STLPort over the weekend. Basically, it’s unmodified Google’s benchmark from their sparsehash library, so if anyone’s in hurry, it should be simple enough to give it a whirl.

    By admin on Oct 23, 2010

  7. Ideally you should do this for a number of compilers, there will be differences. You should definately do this with VC++2010 because you should see a sigifnicant differences because of rvalue references and move semantics and a better optimizing compiler. Get the express edition it’s free.

    By snk_kid on Oct 23, 2010

  8. I’ll give 2010 a try some day. Reason why I’m not particularly interested in testing move semantics right now is that it’s not widely support by console (X360/PS3) compilers anyway.

    By admin on Oct 23, 2010

  9. http://github.com/paulhodge/EASTL/

    By Steve on Oct 25, 2010

  10. for those wondering how unordered_map performs, check out this benchmark: http://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/

    it’s a little dated but I think most results are still good today.

    By xp on Oct 25, 2010

  11. Would be interesting to see a comparison vs PS3/360.

    I’d bet that they’ve paid more attention to keeping a smaller cache happy and the differences between best and worst are exaggerated further.

    Adrian

    By Adrian on Nov 18, 2010

  12. X360 benchmarks can be found here: http://msinilo.pl/blog/?p=675 .

    By admin on Nov 19, 2010

  13. Would be interesting to see a comparison vs PS3/360.

    By ครีมหมอจุฬา on Nov 22, 2010

  14. I did a comparison of std STL, RDE, ea STL on PC/win7, x360 and ps3: http://imageshack.us/f/545/wyniky.png/

    Compiler was VS2010, all optimizations on

    Can you comment on the results? Especially about vector.erase on x360

    If you want I can share benchmark’s code

    By Hubert on Dec 8, 2011

  15. Thanks for the test, looks interesting. What was the stored element type? I’d have to take a look at erase() myself, I might be doing something stupid there, I mostly use unordered version, so didn’t focus on it too much (it’s still surprising it’s so slow only on X360). If stored element is POD, specializing is_pod might help.

    By admin on Dec 8, 2011

  16. Thanks for answer :] All tests for vector, list and deque were made on uint32_t.

    Today I tested it also on android 2.3 NEON (Xperia Play), and again – vector.erase in rde lasts 13843m, compared to ~8200 in stl and ea. I’m gonna test also how it behaves for bigger structs, and come back here.

    By Hubert on Dec 9, 2011

  17. Doh, forgot about this one. Seems like there was a bug that would cause unsigned values to be treated as types requiring copy constructor and nullify some of optimizations. It has been fixed, feel free to give a new version a try.

    By admin on Sep 5, 2012

  1. 3 Trackback(s)

  2. Oct 22, 2010: Tweets that mention EA STL released, updated benchmarks | .mischief.mayhem.soap. -- Topsy.com
  3. Nov 12, 2010: EASTL publicada como open source « martin b.r.
  4. Feb 11, 2011: A Fast URL Shortener « Neerav Kumar

Post a Comment