Title: Lightweight data monitoring using RRDtool | |
Author: Solène | |
Date: 16 February 2023 | |
Tags: monitoring nocloud | |
Description: In this article, I will introduce you to RRDtool, a robust | |
software to keep track of data and render graphs from it | |
# Introduction | |
I like my servers to run the least code possible, and the least | |
services running in general, this ease maintenance and let room for | |
other thing to run. I recently wrote about monitoring software to | |
gather metrics and render them, but they are all overkill if you just | |
want to keep track of a single value over time, and graph it for | |
visualization. | |
Fortunately, we have an old and robust tool doing the job fine, it's | |
perfectly documented and called RRDtool. | |
RRDtool official website | |
RRDtool stands for "Round Robin Database Tool", it's a set of programs | |
and a specific file format to gather metrics. The trick with RRD files | |
is that they have a fixed size, when you create it, you need to define | |
how many values you want to store in it, at which frequency, for how | |
long. This can't be changed after the file creation. | |
In addition, RRD files allow you to create derivated time series to | |
keep track of computed values on a longer timespan, but with a lesser | |
resolution. Think of the following use case: you want to monitor your | |
home temperature every 10 minutes for the past 48 hours, but you want | |
to keep track of some information for the past year, you can tell RRD | |
to compute the average temperature for every hour, but for a week, or | |
the average temperature for four hours but for a month, and the average | |
temperature per day for a year. All of this will be fixed size. | |
# Anatomy of a RRD file | |
RRD files can be dumped as XML, this will give you a glimpse that may | |
ease the understanding of this special file format. | |
Let's create a file to monitor the battery level of your computer every | |
20 seconds, with the last 5 values, don't focus at understanding the | |
whole command line now: | |
```rrdtool | |
rrdtool create test.rrd --step 10 DS:battery:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:5 | |
``` | |
If we dump the created file using the according command, we get this | |
result (stripped a bit to make it fit better): | |
```rrdtool | |
<!-- Round Robin Database Dump --> | |
<rrd> | |
<version>0003</version> | |
<step>10</step> <!-- Seconds --> | |
<lastupdate>1676569107</lastupdate> <!-- 2023-02-16 18:38:27 CET --> | |
<ds> | |
<name> battery </name> | |
<type> GAUGE </type> | |
<minimal_heartbeat>20</minimal_heartbeat> | |
<min>0.0000000000e+00</min> | |
<max>1.0000000000e+02</max> | |
<!-- PDP Status --> | |
<last_ds>U</last_ds> <value>NaN</value> <unknown_sec> 7 </unkno… | |
</ds> | |
<!-- Round Robin Archives --> | |
<rra> | |
<cf>AVERAGE</cf> | |
<pdp_per_row>1</pdp_per_row> <!-- 10 seconds --> | |
<params> <xff>5.0000000000e-01</xff> </params> | |
<cdp_prep> | |
<ds> | |
<primary_value>0.0000000000e+00</primary_value> | |
<secondary_value>0.0000000000e+00</secondary_value> | |
<value>NaN</value> | |
<unknown_datapoints>0</unknown_datapoints> | |
</ds> | |
</cdp_prep> | |
<database> | |
<!-- 2023-02-16 18:37:40 CET / 1676569060 --> <row><v>N… | |
<!-- 2023-02-16 18:37:50 CET / 1676569070 --> <row><v>N… | |
<!-- 2023-02-16 18:38:00 CET / 1676569080 --> <row><v>N… | |
<!-- 2023-02-16 18:38:10 CET / 1676569090 --> <row><v>N… | |
<!-- 2023-02-16 18:38:20 CET / 1676569100 --> <row><v>N… | |
</database> | |
</rra> | |
</rrd> | |
``` | |
The most important thing to understand here, is that we have a "ds" | |
(data serie) named battery of type GAUGE with no last value (I never | |
updated it), but also a "RRA" (Round Robin Archive) for our average | |
value that contain timestamp and no value associated to each. You can | |
see that internally, we already have our 5 slots that exist with a null | |
value associated. If I update the file, the first null value will | |
disappear, and a new record will be added at the end with the actual | |
value. | |
# Monitoring a value | |
In this guide, I would like to share my experience at using rrdtool to | |
monitor my solar panel power output over the last few hours, which can | |
be easily displayed on my local dashboard. The data are also collected | |
and sent to a graphana server, but it's not local and displaying to | |
know the last values is wasting resources and bandwidth. | |
First, you need `rrdtool` to be installed, you don't need anything else | |
to work with RRD files. | |
## Create the RRD file | |
Creating the RRD file is the most tricky part, because you can't change | |
it afterward. | |
I want to collect a data every 5 minutes (300 seconds), this is an | |
absolute data between 0 and 4000, so we will define a step of 300 | |
seconds to tell the file must receive a value every 300 seconds. The | |
type of the value will be GAUGE, because it's just a value that doesn't | |
depend on the previous one. If we were monitoring power change over | |
time, we would like to use DERIVE, because it computes the delta | |
between each value. | |
Furthermore, we need to configure the file to give up on a value slot | |
if it's not updated within 600 seconds. | |
Finally, we want to be able to graph each measurement, this can be done | |
by adding an AVERAGE calculated value in the file, but with a | |
resolution of 1 value, with 240 measurements stored. What this mean, | |
is for each time we add a value in the RRD file, the field for AVERAGE | |
will be calculated with only the last value as input, and we will keep | |
240 of them, allowing us to graph up to 240 * 5 minutes of data back in | |
time. | |
```shell | |
rrdtool create solar-power.rrd --step 300 ds:value:gauge:600:0:4000 rra:avera… | |
^ ^ ^ ^ ^ ^ … | |
| | | | | max value | … | |
| | | | min value | … | |
| | | time before null | … | |
| | measurement type | … | |
| variable name | |
``` | |
And then, you have your `solar-power.rrd` file created. You can | |
inspect it with `rrdtool info solar-power.rrd` or dump its content with | |
`rrdtool dump solar-power.rrd`. | |
RRDtool create documentation | |
## Add values to the RRD file | |
Now that we have prepared the file to receive data, we need to populate | |
it with something useful. This can be done using the command `rrdtool | |
update`. | |
```shell | |
CURRENT_POWER=$(some-command-returning-a-value) | |
rrdtool update solar-power.rrd "N:${CURRENT_POWER}" | |
^ ^ | |
| | value of the first field of the RRD file… | |
| when the value has been measured, N equals to… | |
``` | |
RRDtool update documentation | |
## Graph the content of the RRD file | |
The trickiest part, but less problematic, is to generate a usable graph | |
from the data. The operation is not destructive as it's not modifying | |
the file, so we can make a lot of experimentations on it without | |
affecting the content. | |
We will generate something simple like the picture below. Of course, | |
you can add a lot more information, color, axis, legends etc.. but I | |
need my dashboard to stay simple and clean. | |
A diagram displaying solar power over time (on a cloudy day) | |
```shell | |
rrdtool graph --end now -l 0 --start end-14000s --width 600 --height 300 \ | |
/var/www/htdocs/dashboard/solar.svg -a SVG \ | |
DEF:ds0=/var/lib/rrdtool/solar-power.rrd:value:AVERAGE \ | |
"LINE1:ds0#0000FF:power" \ | |
"GPRINT:ds0:LAST:current value %2.1lf" | |
``` | |
I think most flags are explicit, if not you can look at the | |
documentation, what interests us here are the last three lines. | |
The `DEF` line associates the RRA AVERAGE of the variable `value` in | |
the file `/var/lib/rrdtool/solar-power.rrd` to the name `ds0` that will | |
be used later in the command line. | |
The `LINE1` line associates a legend, and a color to the rendering of | |
this variable. | |
The `GPRINT` line adds a text in the legend, here we are using the last | |
value of `ds0` and format it in a printf style string `current value | |
%2.1lf`. | |
RRDtool graph documentation | |
RRDtool graph examples | |
# Conclusion | |
RRDtool is very nice, it's a storage engine for monitoring software | |
such as collectd or munin, but we can also use them on the spot with | |
simple scripts. However, they have drawbacks, when you start to create | |
many files it doesn't scale well, generate a lot of I/O and consume CPU | |
if you need to render hundreds of pictures, that's why a daemon named | |
`rrdcached` has been created to help mitigate the load issue by | |
delegating updates of a lot of RRD files in a more sequential way. | |
# Going further | |
I encourage you to look at the official project website, all the other | |
command can be very useful, and rrdtool also exports data as XML or | |
JSON if needed, which is perfect to plug in with other software. |