Crash handler/reporter (Win32)
2/Jan 2009
Usually, I try to write about less common programming issues here, this time it may be something less “flashy”, but very useful nonetheless. To be honest, if I had to choose single most crucial feature I coded for The Witcher it would be this crash reporter/handler. It took me few hours of home-coding, but has proven invaluable in later years of development process. If you don’t have similar system in place yet and you develop for Win32 – just stop doing whatever you’re doing and implement it now, you’ll thank me later. Basic idea is to write our own unhandled exception filter and make application call it whenever critical error occurs (this usually means crash). See MSDN for SetUnhandledExceptionFilter for more info.
Now, tricky part is – what can we do in this function, application’s going to crash in a second anyway. Best bet is to record as much information as possible, to let us debug this later. The usual suspects are:
- error type,
- callstack,
- register contents,
- general system info (CPU, memory, OS, etc)
We output all this to a text file. On top of that, it probably is good idea to create minidump, it’ll really make your life easier later (see MSDN for more info, for fullblown system you’ll need symbol server as well). It’s worth to put a little more effort and make a system that mails all this (+whatever else you can think of, like log files for example) to coders in your company. This way, every time game/editor crash, user only has to describe what he was doing and tap OK. One problem is, it may be risky to do all this from your exception handler. After all, it’s supposed to crash, it may be out of resources (memory mainly)/a little unstable, good idea would be to minimize dangerous operations. One solution, is to just prepare the info and let external process do the rest. For The Witcher we’ve been using modified version of XCrashReport (it comes with full source code) and it worked very nicely.
Sometimes, all this info may not be enough. Say, your application crashes when loading particular file. Your callstack leads to LoadMesh() routine. OK, we know it’s a mesh, but which one, we have 5000 of them! You do NOT want to log every loaded resources or people will stop reading your logs (check out excellent Scott Bilas’ presentation). Besides, logging usually cause noticeable overhead, it shouldn’t be called that often. Solution? Black box. Think of your application as an airplane. Blackbox records all vital information so when the application crash, you can easily find the reason. Messages aren’t displayed anywhere, so users aren’t bothered, they don’t even know it’s happening. What to record? Well, it’s up to you. In our case, name of loaded resource, would surely be a good bet. This way, on crash all you have to do is to check latest blackbox messages (for thread that crashed).
In attached package you can find basic implementation of exception handler with call stack resolving and black box class. Black box is supposed to work in multithreaded environment with minimal overhead, especially when using version without variable number of arguments (no locking). It’s standalone library and it should be a good starting point for anyone trying to implement his own handler. Basically, you link with debug.lib and call rde::CrashHandler::Init at the start of your application. Customize MyExceptionFiler to output more info and/or spawn external reporting application like XCrashReport. Example crash log can be seen here.
Win32 exception handling is an interesting subject on its own. People still rarely use it as extensively as they could, especially for development builds. It’s not only about crash reporting. See Charles Bloom’s Gametech talk for example, every object Tick() in their system has been placed in try {} block. This way, if it failed, it didn’t bring the whole game down, it just seemed to freeze. As I understand it, they’ve been using standard C++ exceptions here, but I’d give it a try with SEH instead. Surround update method with __try, “disable” if it fails, report (of course), continue with updating other objects.
Old comments
realtimecollisiondetection.net - the blog » Catching up 2009-05-24 09:33:23
[…] talks about crash handlers/reporters. If you don’t have one, you […]
Garshasp Development Blog » Turning the wheels 2009-06-21 21:56:27
[…] team can find the problems found while the design or testing team are working a bit easier. This crash handler is one in which we are investigating at the moment. VN:F [1.3.4_676]please wait…Rating: 0.0/10 (0 […]
Anthony 2009-12-02 19:29:03
In regards to the Charles Bloom presentation - you mention that you’d try SEH over C++ exceptions for that. Why is that? What benefit is there to gain?
admin 2009-12-02 20:13:40
SEH would let you to catch all kinds of fatal errors, like NULL/dangling pointers accesses etc. (So, if anything fatal happens inside object’s Tick() - it freezes, but game should continue. That’s only in development builds obviously).
Ruud van Gaal 2011-06-01 23:51:24
Always cool these catchers. I currently use VLD for memory leak detection (a bit enhanced to also work in sections of code) and StackWalker for catching callstacks, see http://www.codeproject.com/KB/threads/StackWalker.aspx