MemTracer - episode IV

(I’m seriously running out of subtitle ideas). Some years ago I started to toy with an memory tracing/debugging tool idea. First proof-of-concept implementation has been created in few days and while it worked with a very simple test app, it required major overhaul before integrating with any real-world program. Somewhere in February 2008 I had a new version, that proved itself very useful during my adventure with enhanced version of certain PC RPG (as it later turned out, it was by far the most memory stressing project’). I never worked on it systematically, it was more like a series of ad-hoc changes and improvements. Few months later I recoded it almost from scratch (C++ part) and ported to X360 (and tested with another game). Somewhen in 2009, I created a PS3 version that helped me to save some memory in yet another commercial title. Quite recently I decided that tool is mature & helpful enough to clean it up a little bit and release to public.

Some random notes first:

  • this is not a plug’n’play solution, it’ll require some amount of integration work. I do provide default wrappers to help you with hitting the ground running.

  • two main groups of functions that are needed:

    • general hooks (start/stop thread, mutexes, module information). Default implementations provided in DefaultFunctions.h/cpp (if RDE_MEMTRACER_NEED_DEFAULT_FUNCTIONS is defined)

    • socket management. Win32 implementation in DefaultSocketWrapperWin32.cpp (if RDE_MEMTRACER_USE_DEFAULT_SOCKET_WRAPPER defined).

  • published version will compile/work on Win32 platform. Making it work on X360 is a matter of providing platform specific function hooks (those were not included, they’re not part of my ‘home’ version). PS3 is a little bit more tricky because there are no PDBs, so function addresses need to be extracted from map file (tool side). All my console versions used proprietary code, so I cannot release them, sadly. Typical application flow:

  • game acts as a server, C# MemTracer application is a client. Game should initialize MemTracer and wait for connection. This is all done with a single call to MemTracer::Init(FunctionHooks& hooks, unsigned short port, int maxTracedThreads, BlockingMode::Enum mode). As you can see you provide basic function hooks, listening port, maximum number of traced threads (ie. threads performing memory operations) and whether we should block waiting for connection or not. Ideally, you call this when application starts, just after basic initialization. Start MemTracer, connect to game.

  • as mentioned before - game needs some basic socket related functions. I didn’t decide to make them part of function hooks structure passed to Init function (it’d blow it too much). I admit I’m still not sure what’s the best way to provide user with a way to specify his own implementations. I’m primarily game programmer, I’ll be the first one to admit I’m rubbish at designing library interfaces. I knew I didn’t want some abstract ISocket interface and make user provide his own class + factory. I went with a very low tech solution. Interface is defined in SocketWrapper.h and it’s up to the user to provide .cpp file with implementation (failing to do so will result in linker error). See DefaultSocketWrapperWin32.cpp for example. For most games (that probably have their own socket support) it should boil down to providing simple, 1-2 line wrappers.

  • game starts sending information to MemTracer. There are several types of operations, but the most importants are: memory allocation & memory free (see CommandId in MemTracer.h for others). Sample snippet sending all global memory operations:

      void* MyAlloc(size_t bytes)
      {
        void* ptr = malloc(bytes);
        MemTracer::OnAlloc(ptr, bytes, "SYSM");
        return ptr;
      }
      void MyFree(void* ptr)
      {
        if (ptr) MemTracer::OnFree(ptr);
        free(ptr);
      }
      void* __cdecl operator new(size_t bytes) { return MyAlloc(bytes); }
      void __cdecl operator delete(void* ptr) throw() { MyFree(ptr); }
      
    
    As you can see, it’s relatively simple and boils down to calling OnAlloc/OnFree. I do not recommend sending information about every single memory allocation, it can slow down tracing application quite a bit and make later analysis harder. One simple trick I use is to first only track blocks bigger than N bytes. After that’s done, it’s turn to analyse small blocks in separation. Also, it’s usually good idea to use it per subsystem/allocator/pool.

  • Third argument passed to OnAlloc is block tag (4CC code), identifying allocator. For more detailed information you can use PushTag/PopTag functions to tag all blocks allocated between those two calls (with resource name for example).

  • There is some experimental support for frame-by-frame analysis. I have not used this one in real world applications yet. From early tests it looks like it might be too heavy to have it enabled all the time, it probably makes more sense to only send frame markers for suspicious/interesting moments.

  • MemTracer can have substantial memory overhead, that’s directly dependent on number of traced threads and size of local queues. With default settings (queue size of 4096 elements, max call stack depth of 20) it’s about 400k per thread (in non-sequential mode). It can be reduced by decreasing max call stack depth, or tracing less threads at a time. Other way is to experiment with different queue sizes, but remember - if you’ll make it too small, eventually allocating threads will start blocking, waiting for MemTracer thread to consume data (ie. you’re producing events quicker then they’re consumed and there’s not enough place in the queue).

  • ‘Sequential’ mode. I never needed this one before, but it only showed up when I was writing a test for this release. Sometimes, you want to ensure that order of received mem operations is exactly the same as it was when they were generated (over the course of single frame). Normally, it’s not a problem (it’s guaranteed to be correct for single thread), however if in thread B you’re releasing memory allocated in same frame in thread A, order is important. Sequential mode has additional overhead roughly equivalent to one thread (memory wise). I’d say it’s safe to disable it for most applications (well, games) out there.

Not much more to it, really. It should be fairly straightforward, especially after analyzing sample application. I hosted MemTracer project at Google Code (NOTE: later moved to Github). You’re free to do whatever you want it with, however if you come up with some amazing modification, it’d be nice if you commited it back to the main branch. That’s one of the reasons I’m releasing it, it works well enough for me, so I kinda lost the urget to tinker with it, but I figure there’s lots of cool stuff that could be added still. Even if not - I believe it may be very helpful in current shape as well, over the last few years I’ve used it in 4 big multi-platform projects and it saved me lots of time every time (as in making the game run on a PS3 (again) in two days, instead of planned two weeks :). Have fun.

Memory usage graph employs C2DPushGraph library by Stuart Konen.

Old comments

admin 2010-12-15 03:00:53

Not sure if I understand you correctly. If you mean callstacks for allocated memory blocks - it’s already there (you’ll need to add a snapshot, then double-click it for details).

Chris D 2010-12-13 23:10:20

Have you considered adding backtraces?

kuranes 2010-11-15 19:37:09

Thanks for sharing code, seems a very helpful tool.
Another interesting somehow similar tool I was using before is Memory Analyzer ( http://blog.makingartstudios.com/?p=39 binary here, no source code alas)

admin 2010-11-15 02:11:51

Thanks, Arseny, I knew I could count on you when it comes to PS3 stuff (had no idea about addr2line, gonna give it a try tomorrow).
BTW, sorry about crappy formatting, I’ll try to fix it soon, for now I’m fighting with some stupid mod_security errors and waiting for reply from my host provider.

Arseny Kapoulkine 2010-11-14 19:33:39

You don’t need to parse map file on PS3 - just use addr2line (or the equivalent ps3bin for self). At my previous job we actually built a binutils DLL version (with tweaks for self support) and pinvoked into it from C#, though that’s not for the faint of heart (i.e. source code needs some changes to compile under mingw-64…).

More Reading
Newer// DIG 2010