| Title: Monitor your systems with reed-alert | |
| Author: Solène | |
| Date: 17 January 2018 | |
| Tags: unix lisp reed-alert | |
| Description: | |
| This article will present my software __reed-alert__, it checks | |
| user-defined states and send user-defined notification. I made it | |
| really easy to use but still configurable and extensible. | |
| ## Description | |
| __reed-alert__ is _not_ a monitoring tool producing graph or storing | |
| values. It does a job sysadmins are looking for because there are no | |
| alternative product (the alternatives comes from a very huge | |
| infrastructure like Zabbix so it's not comparable). | |
| From its configuration file, __reed-alert__ will check various states | |
| and then, if it fails, will trigger a command to send a notification | |
| (totally user-defined). | |
| ## Fetch it | |
| This is a open-source and free software released under MIT license, | |
| you can install it with the following command: | |
| # git clone git://bitreich.org/reed-alert | |
| # cd reed-alert | |
| # make | |
| # doas make install | |
| This will install a script `reed-alert` in /usr/local/bin/ with the | |
| default Makefile variables. It will try to use ecl and then sbcl if | |
| ecl is not installed. | |
| it, but we will see here how to get started quickly. | |
| You will find a few files there, __reed-alert__ is a Common LISP | |
| software and it has been chose for (I hope) good reasons that the | |
| configuration file is plain Common LISP. | |
| There is a configuration file looking like a real world example named | |
| **config.lisp.sample** and another configuration file I use for testing | |
| named **example.lisp** containing lot of cases. | |
| ## Let's start | |
| In order to use __reed-alert__ we only need to create a new | |
| configuration file and then add a cron job. | |
| ### Configuration | |
| We are going to see how to configure __reed-alert__. You can find more | |
| explanations or details in the __README__ file. | |
| #### Alerts | |
| We have to configure two kind of parameters, first we need to set-up a | |
| way to receive alerts, easiest way to do so is by sending a mail with | |
| "mail" command. Alerts are declared with the function **alert** and as | |
| parameters the alert name and the command to be executed. Some | |
| variables are replaced with values from the probe, in the __README__ | |
| file you can find the list of probes, it looks like %date% or | |
| %params%. | |
| In Common LISP functions are called by using a parenthesis before its | |
| name and until the parenthesis is closed, we are giving its | |
| parameters. | |
| Example: | |
| (alert mail "echo 'problem on %hostname%' | mail [email protected]") | |
| One should take care about nesting quotes here. | |
| __reed-alert__ will fork a shell to start the command, so pipes and | |
| redirection works. You can be creative when writing alerts that: | |
| + use a SMS service | |
| + write a script to post on a forum | |
| + publishing a file on a server | |
| + send text to IRC with ii client | |
| #### Checks | |
| Now we have some alerts, we will configure some checks in order to | |
| make __reed-alert__ useful. It uses *probes* which are pre-defined | |
| checks with parameters, a probe could be "has this file not been | |
| updated since N minutes ?" or "Is the disk space usage of partition X | |
| more than Y ?" | |
| I chose to name the function "=>" to make a check, it isn't a name | |
| and reminds an item or something going forward. Both previous example | |
| using our previous mail notifier would look like: | |
| (=> mail file-updated :path "/program/file.generated" :limit "10") | |
| (=> mail disk-usage :limit 90) | |
| It's also possible to use shell commands and check the return code | |
| using the __command__ probe, allowing the user to define useful | |
| checks. | |
| (=> mail command :command "echo '/is-this-gopher-server-up?' | nc | |
| -w 3 dataswamp.org 70" | |
| :desc "dataswamp.org gopher server") | |
| We use echo + netcat to check if a connection to a socket works. The | |
| **:desc** keyword will give a nicer name in the output instead of just | |
| "COMMAND". | |
| #### Garniture | |
| We wrote the minimum required to configure __reed-alert__, now the | |
| configuration file so your **my-config.lisp** file should looks like | |
| this: | |
| (alert mail "echo 'problem on %hostname%' | mail [email protected]") | |
| (=> mail file-updated :path "/program/file.generated" :limit "10") | |
| (=> mail disk-usage :limit 90) | |
| Now, you can start it every 5 minutes from a crontab with this: | |
| */5 * * * * ( reed-alert /path/to/my-config.lisp ) | |
| If you prefer to use ecl: | |
| */5 * * * * ( reed-alert /path/to/my-config.lisp ) | |
| The time between each run is up to you, depending on what you monitor. | |
| #### Important | |
| By default, when a check returns a failure, __reed-alert__ will only | |
| trigger the notifier associated once it reach the 3rd failure. And | |
| then, will notify again when the service is back (the variable %state% | |
| is replaced by start or end to know if it starts or stops.) | |
| This is to prevent reed-alert to send a notification each time it | |
| checks, there is absolutely no need for this for most users. | |
| The number of failures before triggering can be modified by using the | |
| keyword ":try" as in the following example: | |
| (=> mail disk-usage :limit 90 :try 1) | |
| In this case, you will get notified at the first failure of it. | |
| The number of failures of failed checks is stored in files (1 per | |
| check) in the "states/" directory of reed-alert working directory. |