| Title: Lightweight data monitoring using RRDtool | |
| Author: Solène | |
| Date: 16 February 2023 | |
| Tags: monitoring nocloud | |
| Description: In this article, I will introduce you to RRDtool, a robust | |
| software to keep track of data and render graphs from it | |
| # Introduction | |
| I like my servers to run the least code possible, and the least | |
| services running in general, this ease maintenance and let room for | |
| other thing to run. I recently wrote about monitoring software to | |
| gather metrics and render them, but they are all overkill if you just | |
| want to keep track of a single value over time, and graph it for | |
| visualization. | |
| Fortunately, we have an old and robust tool doing the job fine, it's | |
| perfectly documented and called RRDtool. | |
| RRDtool official website | |
| RRDtool stands for "Round Robin Database Tool", it's a set of programs | |
| and a specific file format to gather metrics. The trick with RRD files | |
| is that they have a fixed size, when you create it, you need to define | |
| how many values you want to store in it, at which frequency, for how | |
| long. This can't be changed after the file creation. | |
| In addition, RRD files allow you to create derivated time series to | |
| keep track of computed values on a longer timespan, but with a lesser | |
| resolution. Think of the following use case: you want to monitor your | |
| home temperature every 10 minutes for the past 48 hours, but you want | |
| to keep track of some information for the past year, you can tell RRD | |
| to compute the average temperature for every hour, but for a week, or | |
| the average temperature for four hours but for a month, and the average | |
| temperature per day for a year. All of this will be fixed size. | |
| # Anatomy of a RRD file | |
| RRD files can be dumped as XML, this will give you a glimpse that may | |
| ease the understanding of this special file format. | |
| Let's create a file to monitor the battery level of your computer every | |
| 20 seconds, with the last 5 values, don't focus at understanding the | |
| whole command line now: | |
| ```rrdtool | |
| rrdtool create test.rrd --step 10 DS:battery:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:5 | |
| ``` | |
| If we dump the created file using the according command, we get this | |
| result (stripped a bit to make it fit better): | |
| ```rrdtool | |
| <!-- Round Robin Database Dump --> | |
| <rrd> | |
| <version>0003</version> | |
| <step>10</step> <!-- Seconds --> | |
| <lastupdate>1676569107</lastupdate> <!-- 2023-02-16 18:38:27 CET --> | |
| <ds> | |
| <name> battery </name> | |
| <type> GAUGE </type> | |
| <minimal_heartbeat>20</minimal_heartbeat> | |
| <min>0.0000000000e+00</min> | |
| <max>1.0000000000e+02</max> | |
| <!-- PDP Status --> | |
| <last_ds>U</last_ds> <value>NaN</value> <unknown_sec> 7 </unkno… | |
| </ds> | |
| <!-- Round Robin Archives --> | |
| <rra> | |
| <cf>AVERAGE</cf> | |
| <pdp_per_row>1</pdp_per_row> <!-- 10 seconds --> | |
| <params> <xff>5.0000000000e-01</xff> </params> | |
| <cdp_prep> | |
| <ds> | |
| <primary_value>0.0000000000e+00</primary_value> | |
| <secondary_value>0.0000000000e+00</secondary_value> | |
| <value>NaN</value> | |
| <unknown_datapoints>0</unknown_datapoints> | |
| </ds> | |
| </cdp_prep> | |
| <database> | |
| <!-- 2023-02-16 18:37:40 CET / 1676569060 --> <row><v>N… | |
| <!-- 2023-02-16 18:37:50 CET / 1676569070 --> <row><v>N… | |
| <!-- 2023-02-16 18:38:00 CET / 1676569080 --> <row><v>N… | |
| <!-- 2023-02-16 18:38:10 CET / 1676569090 --> <row><v>N… | |
| <!-- 2023-02-16 18:38:20 CET / 1676569100 --> <row><v>N… | |
| </database> | |
| </rra> | |
| </rrd> | |
| ``` | |
| The most important thing to understand here, is that we have a "ds" | |
| (data serie) named battery of type GAUGE with no last value (I never | |
| updated it), but also a "RRA" (Round Robin Archive) for our average | |
| value that contain timestamp and no value associated to each. You can | |
| see that internally, we already have our 5 slots that exist with a null | |
| value associated. If I update the file, the first null value will | |
| disappear, and a new record will be added at the end with the actual | |
| value. | |
| # Monitoring a value | |
| In this guide, I would like to share my experience at using rrdtool to | |
| monitor my solar panel power output over the last few hours, which can | |
| be easily displayed on my local dashboard. The data are also collected | |
| and sent to a graphana server, but it's not local and displaying to | |
| know the last values is wasting resources and bandwidth. | |
| First, you need `rrdtool` to be installed, you don't need anything else | |
| to work with RRD files. | |
| ## Create the RRD file | |
| Creating the RRD file is the most tricky part, because you can't change | |
| it afterward. | |
| I want to collect a data every 5 minutes (300 seconds), this is an | |
| absolute data between 0 and 4000, so we will define a step of 300 | |
| seconds to tell the file must receive a value every 300 seconds. The | |
| type of the value will be GAUGE, because it's just a value that doesn't | |
| depend on the previous one. If we were monitoring power change over | |
| time, we would like to use DERIVE, because it computes the delta | |
| between each value. | |
| Furthermore, we need to configure the file to give up on a value slot | |
| if it's not updated within 600 seconds. | |
| Finally, we want to be able to graph each measurement, this can be done | |
| by adding an AVERAGE calculated value in the file, but with a | |
| resolution of 1 value, with 240 measurements stored. What this mean, | |
| is for each time we add a value in the RRD file, the field for AVERAGE | |
| will be calculated with only the last value as input, and we will keep | |
| 240 of them, allowing us to graph up to 240 * 5 minutes of data back in | |
| time. | |
| ```shell | |
| rrdtool create solar-power.rrd --step 300 ds:value:gauge:600:0:4000 rra:avera… | |
| ^ ^ ^ ^ ^ ^ … | |
| | | | | | max value | … | |
| | | | | min value | … | |
| | | | time before null | … | |
| | | measurement type | … | |
| | variable name | |
| ``` | |
| And then, you have your `solar-power.rrd` file created. You can | |
| inspect it with `rrdtool info solar-power.rrd` or dump its content with | |
| `rrdtool dump solar-power.rrd`. | |
| RRDtool create documentation | |
| ## Add values to the RRD file | |
| Now that we have prepared the file to receive data, we need to populate | |
| it with something useful. This can be done using the command `rrdtool | |
| update`. | |
| ```shell | |
| CURRENT_POWER=$(some-command-returning-a-value) | |
| rrdtool update solar-power.rrd "N:${CURRENT_POWER}" | |
| ^ ^ | |
| | | value of the first field of the RRD file… | |
| | when the value has been measured, N equals to… | |
| ``` | |
| RRDtool update documentation | |
| ## Graph the content of the RRD file | |
| The trickiest part, but less problematic, is to generate a usable graph | |
| from the data. The operation is not destructive as it's not modifying | |
| the file, so we can make a lot of experimentations on it without | |
| affecting the content. | |
| We will generate something simple like the picture below. Of course, | |
| you can add a lot more information, color, axis, legends etc.. but I | |
| need my dashboard to stay simple and clean. | |
| A diagram displaying solar power over time (on a cloudy day) | |
| ```shell | |
| rrdtool graph --end now -l 0 --start end-14000s --width 600 --height 300 \ | |
| /var/www/htdocs/dashboard/solar.svg -a SVG \ | |
| DEF:ds0=/var/lib/rrdtool/solar-power.rrd:value:AVERAGE \ | |
| "LINE1:ds0#0000FF:power" \ | |
| "GPRINT:ds0:LAST:current value %2.1lf" | |
| ``` | |
| I think most flags are explicit, if not you can look at the | |
| documentation, what interests us here are the last three lines. | |
| The `DEF` line associates the RRA AVERAGE of the variable `value` in | |
| the file `/var/lib/rrdtool/solar-power.rrd` to the name `ds0` that will | |
| be used later in the command line. | |
| The `LINE1` line associates a legend, and a color to the rendering of | |
| this variable. | |
| The `GPRINT` line adds a text in the legend, here we are using the last | |
| value of `ds0` and format it in a printf style string `current value | |
| %2.1lf`. | |
| RRDtool graph documentation | |
| RRDtool graph examples | |
| # Conclusion | |
| RRDtool is very nice, it's a storage engine for monitoring software | |
| such as collectd or munin, but we can also use them on the spot with | |
| simple scripts. However, they have drawbacks, when you start to create | |
| many files it doesn't scale well, generate a lot of I/O and consume CPU | |
| if you need to render hundreds of pictures, that's why a daemon named | |
| `rrdcached` has been created to help mitigate the load issue by | |
| delegating updates of a lot of RRD files in a more sequential way. | |
| # Going further | |
| I encourage you to look at the official project website, all the other | |
| command can be very useful, and rrdtool also exports data as XML or | |
| JSON if needed, which is perfect to plug in with other software. |