Title: Reed-alert: five years later | |
Author: Solène | |
Date: 10 February 2022 | |
Tags: unix reed-alert linux lisp nocloud | |
Description: Experience feedback of using the reed-alert program on a | |
server | |
# Introduction | |
I wrote the program reed-alert five years ago, I've been using it since | |
its first days, here is some feed back about it. | |
The software reed-alert is meant to be used by system administrators | |
who want to monitor their infrastructures and get alerts when things go | |
wrong. I got a lot more experience in the monitoring field over time | |
and I wanted to share some thoughts about this project. | |
reed-alert source code | |
# Reed-alert | |
## The name | |
The software name is a pun I found in a Star Trek Enterprise episode. | |
Reed alert pun origins | |
## Project finished | |
The code didn't receive many commits over the last years, I consider | |
the program to be complete with regard to features, but new probes | |
could be added, or bug fixes could be done. But the core of the | |
software itself is perfect to me. | |
The probes are small parts of code allowing to monitor extra states, | |
like http return code, working ping, service started etc... It's | |
already easy to extend reed-alert using a shell command returning 0 or | |
not 0 to define a custom probe. | |
## Reliability | |
I don't remember having a single issue with reed-alert since I've set | |
it up on my server. It's run by a cron job every 10 minutes, this mean | |
a common lisp interpreter is loading the code, evaluating the | |
configuration file, running the check commands and alerts commands if | |
required, and stops. I chose a serviceless paradigm for reed-alert as | |
it make the code and usage a lot simpler. With a running service, it | |
could fail, leak memory, be exploited and certainly many other bugs I | |
can't think of. | |
Reed-alert is simple as it only need a common lisp interpreter, the | |
most notable sbcl and ecl interpreters are absolutely reliable and | |
change very little over time. Some unix standard commands are required | |
for some checks or default alerts, such as ping, service, mail or curl | |
but this defers all the work to well established binaries. | |
The source code is minimal with 179 lines for reed-alert core and 159 | |
lines for the probes, a total of 338 lines of code (including empty | |
lines and comments), hacking on reed-alert is super easy and always a | |
lot of fun for me. For whatever reason, my common lisp software often | |
work at first try when I add new features, so it's always pleasant to | |
work on them. | |
## Awesome features | |
One aspect of reed-alert that may disturb users at first is the choice | |
of common lisp code as a configuration file, this may look complicated | |
at first, but a simple configuration doesn't require more common lisp | |
knowledge than what is explained in reed-alert documentation. But it | |
gives all its power when you need to loop over a data entry to run | |
checks, allowing to make reed-alert dynamic instead of handwriting all | |
the configuration. | |
The use of common lisp as configuration has other advantages, it's | |
possible to chain checks to easily prevent some checks to be done in | |
case a condition is failing. Let me give a few examples for this: | |
* if you monitor a web server, you first want to check if it replies on | |
ICMP before trying to check and report errors on HTTP level | |
* if you monitor remote servers, you first want to check if you can | |
reach the internet and that your local gateway is online | |
* if you check a local web server, it would be a good idea to check if | |
all the required services are running first | |
All the previous conditions can be done with reed-alert thanks to the | |
code-as-configuration choice. | |
## Scalability | |
I've been asked a few times if reed-alert could be used in a | |
professional context. Depending on what you call a professional | |
environment, I will reply it depends. | |
Reed-alert is dumb, it needs to be run from a scheduling software (such | |
as cron) and will sequentially run the checks. It won't guarantee a | |
perfect timing between checks. | |
If you need multiples machines to run a set of checks, reed-alert is | |
not able to share the states to continue to work reliably in a high | |
availability environment. | |
In regard to resources usage, while reed-alert is small it needs to run | |
the command lisp interpreter every time, if you want to run reed-alert | |
every minute or multiple time per minute, I'd recommend using something | |
else. | |
# A real life example | |
Here is a chunk of the configuration I've been running for years, it | |
checks the system itself and some remote servers. | |
``` | |
(=> mail disk-usage :path "/" :limit 60 :desc "partition /") | |
(=> mail disk-usage :path "/var" :limit 70 :desc "partition /var") | |
(=> mail disk-usage :path "/home" :limit 95 :desc "partition /home") | |
(=> mail service :name "dovecot") | |
(=> mail service :name "spamd") | |
(=> mail service :name "dkimproxy_out") | |
(=> mail service :name "smtpd") | |
(=> mail service :name "ntpd") | |
(=> mail number-of-processes :limit 140) | |
;; check dataswamp server is working | |
(=> mail ping :host "dataswamp.org" :desc "Dataswamp") | |
;; check webzine related web servers | |
(and | |
(=> mail ping :host "openports.pl" :desc "Liaison Grifon.fr") | |
(=> mail curl-http-status :url "https://webzine.puffy.cafe" :desc "Webzine … | |
(=> mail curl-http-status :url "https://puffy.cafe" :desc "Puffy.cafe" :tim… | |
(=> mail ssl-expiration :host "webzine.puffy.cafe" :seconds (* 7 24 60 60)) | |
(=> mail ssl-expiration :host "puffy.cafe" :seconds (* 7 24 60 60))) | |
;; check openports.pl is working | |
(and | |
(=> mail ping :host "46.23.90.152" :desc "Openports.pl ping") | |
(=> mail curl-http-status :url "http://46.23.90.152" :desc "Packages OpenBS… | |
;; check www.openbsd.org website is replying under 10 seconds | |
(=> mail curl-http-status :url "https://www.openbsd.org" :desc "OpenBSD.org" :t… | |
;; check if a XML file is created regularly and valid | |
(=> mail file-updated :path "/var/www/htdocs/solene/openbsd-current.xml" :limit… | |
(=> mail command :command (format nil "xmllint /var/www/htdocs/solene/openbsd-c… | |
;; monitoring multiple gopher servers | |
(loop for host in '("grifon.fr" "dataswamp.org" "gopherproject.org") | |
do | |
(=> mail command | |
:try 6 | |
:command (format nil "echo '/is-alive?done-by-solene-at-libera' | nc … | |
:desc (concatenate 'string "Gopher " host))) | |
(quit) | |
``` | |
# Conclusion | |
I wrote a simple software using an old programming language (Common | |
LISP ANSI is from 1994), the result is that it's reliable over time, | |
require no code maintenance and is fun to code on. | |
Common Lisp on Wikipedia |