| Title: Reed-alert: five years later | |
| Author: Solène | |
| Date: 10 February 2022 | |
| Tags: unix reed-alert linux lisp nocloud | |
| Description: Experience feedback of using the reed-alert program on a | |
| server | |
| # Introduction | |
| I wrote the program reed-alert five years ago, I've been using it since | |
| its first days, here is some feed back about it. | |
| The software reed-alert is meant to be used by system administrators | |
| who want to monitor their infrastructures and get alerts when things go | |
| wrong. I got a lot more experience in the monitoring field over time | |
| and I wanted to share some thoughts about this project. | |
| reed-alert source code | |
| # Reed-alert | |
| ## The name | |
| The software name is a pun I found in a Star Trek Enterprise episode. | |
| Reed alert pun origins | |
| ## Project finished | |
| The code didn't receive many commits over the last years, I consider | |
| the program to be complete with regard to features, but new probes | |
| could be added, or bug fixes could be done. But the core of the | |
| software itself is perfect to me. | |
| The probes are small parts of code allowing to monitor extra states, | |
| like http return code, working ping, service started etc... It's | |
| already easy to extend reed-alert using a shell command returning 0 or | |
| not 0 to define a custom probe. | |
| ## Reliability | |
| I don't remember having a single issue with reed-alert since I've set | |
| it up on my server. It's run by a cron job every 10 minutes, this mean | |
| a common lisp interpreter is loading the code, evaluating the | |
| configuration file, running the check commands and alerts commands if | |
| required, and stops. I chose a serviceless paradigm for reed-alert as | |
| it make the code and usage a lot simpler. With a running service, it | |
| could fail, leak memory, be exploited and certainly many other bugs I | |
| can't think of. | |
| Reed-alert is simple as it only need a common lisp interpreter, the | |
| most notable sbcl and ecl interpreters are absolutely reliable and | |
| change very little over time. Some unix standard commands are required | |
| for some checks or default alerts, such as ping, service, mail or curl | |
| but this defers all the work to well established binaries. | |
| The source code is minimal with 179 lines for reed-alert core and 159 | |
| lines for the probes, a total of 338 lines of code (including empty | |
| lines and comments), hacking on reed-alert is super easy and always a | |
| lot of fun for me. For whatever reason, my common lisp software often | |
| work at first try when I add new features, so it's always pleasant to | |
| work on them. | |
| ## Awesome features | |
| One aspect of reed-alert that may disturb users at first is the choice | |
| of common lisp code as a configuration file, this may look complicated | |
| at first, but a simple configuration doesn't require more common lisp | |
| knowledge than what is explained in reed-alert documentation. But it | |
| gives all its power when you need to loop over a data entry to run | |
| checks, allowing to make reed-alert dynamic instead of handwriting all | |
| the configuration. | |
| The use of common lisp as configuration has other advantages, it's | |
| possible to chain checks to easily prevent some checks to be done in | |
| case a condition is failing. Let me give a few examples for this: | |
| * if you monitor a web server, you first want to check if it replies on | |
| ICMP before trying to check and report errors on HTTP level | |
| * if you monitor remote servers, you first want to check if you can | |
| reach the internet and that your local gateway is online | |
| * if you check a local web server, it would be a good idea to check if | |
| all the required services are running first | |
| All the previous conditions can be done with reed-alert thanks to the | |
| code-as-configuration choice. | |
| ## Scalability | |
| I've been asked a few times if reed-alert could be used in a | |
| professional context. Depending on what you call a professional | |
| environment, I will reply it depends. | |
| Reed-alert is dumb, it needs to be run from a scheduling software (such | |
| as cron) and will sequentially run the checks. It won't guarantee a | |
| perfect timing between checks. | |
| If you need multiples machines to run a set of checks, reed-alert is | |
| not able to share the states to continue to work reliably in a high | |
| availability environment. | |
| In regard to resources usage, while reed-alert is small it needs to run | |
| the command lisp interpreter every time, if you want to run reed-alert | |
| every minute or multiple time per minute, I'd recommend using something | |
| else. | |
| # A real life example | |
| Here is a chunk of the configuration I've been running for years, it | |
| checks the system itself and some remote servers. | |
| ``` | |
| (=> mail disk-usage :path "/" :limit 60 :desc "partition /") | |
| (=> mail disk-usage :path "/var" :limit 70 :desc "partition /var") | |
| (=> mail disk-usage :path "/home" :limit 95 :desc "partition /home") | |
| (=> mail service :name "dovecot") | |
| (=> mail service :name "spamd") | |
| (=> mail service :name "dkimproxy_out") | |
| (=> mail service :name "smtpd") | |
| (=> mail service :name "ntpd") | |
| (=> mail number-of-processes :limit 140) | |
| ;; check dataswamp server is working | |
| (=> mail ping :host "dataswamp.org" :desc "Dataswamp") | |
| ;; check webzine related web servers | |
| (and | |
| (=> mail ping :host "openports.pl" :desc "Liaison Grifon.fr") | |
| (=> mail curl-http-status :url "https://webzine.puffy.cafe" :desc "Webzine … | |
| (=> mail curl-http-status :url "https://puffy.cafe" :desc "Puffy.cafe" :tim… | |
| (=> mail ssl-expiration :host "webzine.puffy.cafe" :seconds (* 7 24 60 60)) | |
| (=> mail ssl-expiration :host "puffy.cafe" :seconds (* 7 24 60 60))) | |
| ;; check openports.pl is working | |
| (and | |
| (=> mail ping :host "46.23.90.152" :desc "Openports.pl ping") | |
| (=> mail curl-http-status :url "http://46.23.90.152" :desc "Packages OpenBS… | |
| ;; check www.openbsd.org website is replying under 10 seconds | |
| (=> mail curl-http-status :url "https://www.openbsd.org" :desc "OpenBSD.org" :t… | |
| ;; check if a XML file is created regularly and valid | |
| (=> mail file-updated :path "/var/www/htdocs/solene/openbsd-current.xml" :limit… | |
| (=> mail command :command (format nil "xmllint /var/www/htdocs/solene/openbsd-c… | |
| ;; monitoring multiple gopher servers | |
| (loop for host in '("grifon.fr" "dataswamp.org" "gopherproject.org") | |
| do | |
| (=> mail command | |
| :try 6 | |
| :command (format nil "echo '/is-alive?done-by-solene-at-libera' | nc … | |
| :desc (concatenate 'string "Gopher " host))) | |
| (quit) | |
| ``` | |
| # Conclusion | |
| I wrote a simple software using an old programming language (Common | |
| LISP ANSI is from 1994), the result is that it's reliable over time, | |
| require no code maintenance and is fun to code on. | |
| Common Lisp on Wikipedia |