Crash handler/reporter (Win32)

January 2, 2009 – 6:55 pm

Usually, I try to write about less common programming issues here, this time it may be something less “flashy”, but very useful nonetheless. To be honest, if I had to choose single most crucial feature I coded for The Witcher it would be this crash reporter/handler. It took me few hours of home-coding, but has proven invaluable in later years of development process. If you don’t have similar system in place yet and you develop for Win32 — just stop doing whatever you’re doing and implement it now, you’ll thank me later. Basic idea is to write our own unhandled exception filter and make application call it whenever critical error occurs (this usually means crash). See MSDN for SetUnhandledExceptionFilter for more info.

Now, tricky part is — what can we do in this function, application’s going to crash in a second anyway. Best bet is to record as much information as possible, to let us debug this later. The usual suspects are:

  • error type,
  • callstack,
  • register contents,
  • general system info (CPU, memory, OS, etc)

We output all this to a text file. On top of that, it probably is good idea to create minidump, it’ll really make your life easier later (see MSDN for more info, for fullblown system you’ll need symbol server as well). It’s worth to put a little more effort and make a system that mails all this (+whatever else you can think of, like log files for example) to coders in your company. This way, every time game/editor crash, user only has to describe what he was doing and tap OK. One problem is, it may be risky to do all this from your exception handler. After all, it’s supposed to crash, it may be out of resources (memory mainly)/a little unstable, good idea would be to minimize dangerous operations. One solution, is to just prepare the info and let external process do the rest. For The Witcher we’ve been using modified version of XCrashReport (it comes with full source code) and it worked very nicely.

Sometimes, all this info may not be enough. Say, your application crashes when loading particular file. Your callstack leads to LoadMesh() routine. OK, we know it’s a mesh, but which one, we have 5000 of them! You do NOT want to log every loaded resources or people will stop reading your logs (check out excellent Scott Bilas’ presentation). Besides, logging usually cause noticeable overhead, it shouldn’t be called that often. Solution? Black box. Think of your application as an airplane. Blackbox records all vital information so when the application crash, you can easily find the reason. Messages aren’t displayed anywhere, so users aren’t bothered, they don’t even know it’s happening. What to record? Well, it’s up to you. In our case, name of loaded resource, would surely be a good bet. This way, on crash all you have to do is to check latest blackbox messages (for thread that crashed).

In attached package you can find basic implementation of exception handler with call stack resolving and black box class. Black box is supposed to work in multithreaded environment with minimal overhead, especially when using version without variable number of arguments (no locking). It’s standalone library and it should be a good starting point for anyone trying to implement his own handler. Basically, you link with debug.lib and call rde::CrashHandler::Init at the start of your application. Customize MyExceptionFiler to output more info and/or spawn external reporting application like XCrashReport. Example crash log can be seen here.

Win32 exception handling is an interesting subject on its own. People still rarely use it as extensively as they could, especially for development builds. It’s not only about crash reporting. See Charles Bloom’s Gametech talk for example, every object Tick() in their system has been placed in try {} block. This way, if it failed, it didn’t bring the whole game down, it just seemed to freeze. As I understand it, they’ve been using standard C++ exceptions here, but I’d give it a try with SEH instead. Surround update method with __try, “disable” if it fails, report (of course), continue with updating other objects.

  1. 6 Responses to “Crash handler/reporter (Win32)”

  2. In regards to the Charles Bloom presentation – you mention that you’d try SEH over C++ exceptions for that. Why is that? What benefit is there to gain?

    By Anthony on Dec 2, 2009

  3. SEH would let you to catch all kinds of fatal errors, like NULL/dangling pointers accesses etc. (So, if anything fatal happens inside object’s Tick() – it freezes, but game should continue. That’s only in development builds obviously).

    By admin on Dec 2, 2009

  4. Always cool these catchers. I currently use VLD for memory leak detection (a bit enhanced to also work in sections of code) and StackWalker for catching callstacks, see http://www.codeproject.com/KB/threads/StackWalker.aspx

    By Ruud van Gaal on Jun 2, 2011

  1. 3 Trackback(s)

  2. May 24, 2009: realtimecollisiondetection.net - the blog » Catching up
  3. Jun 21, 2009: Garshasp Development Blog » Turning the wheels
  4. Oct 29, 2011: Resolving PS3 callstacks | .mischief.mayhem.soap.

Post a Comment