Grafana for dummies
26/Dec 2020
Grafana is a very popular “analytics platform” or in more professional terms - a system to create pretty graphs. It’s very popular for monitoring system metrics, but really can be used for any timeseries data. It supports plethora of data sources and there is a decent chance you can use one of the off-the-shelf solutions to do 99% of the work for you (for example for some basic system metrics, especially on Linux). It becomes a little bit more complicated if you want to expose custom metrics from your C++ application, but I’ll try to show how to do it with as little friction as possible
- Data source: as said, Grafana supports many of them. If you’re feeling adventerous and want to handle your own DB, Influx is probably the most popular choice. However, we wanted easy, so I opted for Prometheus. It uses own DB format, but also provides a system layer above it, so all you have to do is expose your data in expected format and Prometheus will handle the rest. Prometheus uses a pull model by default, which means you’re not sending antything. Instead, we register a new “target” associated with an HTTP endpoint (server:port/metrics typically) and Prometheus will hit it periodically to grab a new data. It is up to you where the endpoint is located exactly, I just have everything on the same Raspberry Pi (Grafana/Prometheus/Prometheus target). It does mean I need to send metrics over, as opposed to simply saving them as it’d be the case if endpoint was on the same machine as the application. However, it also means I only need to run one HTTP server. Prometheus even has own simple graphing support, but it’s very basic, nowhere close to what Grafana offers.
- Grafana<->Prometheus integration: not much to write about this one, it “just works”, Prometheus shows up as another data source in Grafana
- Exporting metrics from the application. One option is to use one of many client libraries available for most programming languages. If I’m being completely honest, however, C++ libraries seemed a bit too complex for my taste. Most games already have some kind of telemetry system, we’re 95% there, I didn’t want to integrate a third-party library just to convert it to another format (Prometheus friendly). I’m also not a fan of tieing your app to a specific service if it’s avoidable. As said, we typically already have the data, its a matter of saving it a bit differenly. Even better, my code already had an option to save selected stats to disk, although in a different format (CSV). What I needed was a ‘middleman’ service translating A to B. I just wrote a simple Python script monitoring the app directory (using the watchdog module. As soon as it detects new stats being saved, converts from “our” format to Prometheus and send it over to target endpoint. Entire script is maybe 50 lines long. It also adds some extra metrics (like last update time, so that Grafana can detect app being down (no updates in X minutes)). Application itself has not even been touched and is completely unaware (other than enabling a functionality to save stats every X seconds).
- Go nuts and create your own Grafana dashboards (application specific)
There you have it. The entire “stack” is pretty simple, really the only custom piece is the middleman Python script, everything else is off-the-shelf, but I’m pretty happy with how easy it was to “integrate” and how transparent it ended up being.