Title: Monitor your systems with reed-alert | |
Author: Solène | |
Date: 17 January 2018 | |
Tags: unix lisp reed-alert | |
Description: | |
This article will present my software __reed-alert__, it checks | |
user-defined states and send user-defined notification. I made it | |
really easy to use but still configurable and extensible. | |
## Description | |
__reed-alert__ is _not_ a monitoring tool producing graph or storing | |
values. It does a job sysadmins are looking for because there are no | |
alternative product (the alternatives comes from a very huge | |
infrastructure like Zabbix so it's not comparable). | |
From its configuration file, __reed-alert__ will check various states | |
and then, if it fails, will trigger a command to send a notification | |
(totally user-defined). | |
## Fetch it | |
This is a open-source and free software released under MIT license, | |
you can install it with the following command: | |
# git clone git://bitreich.org/reed-alert | |
# cd reed-alert | |
# make | |
# doas make install | |
This will install a script `reed-alert` in /usr/local/bin/ with the | |
default Makefile variables. It will try to use ecl and then sbcl if | |
ecl is not installed. | |
it, but we will see here how to get started quickly. | |
You will find a few files there, __reed-alert__ is a Common LISP | |
software and it has been chose for (I hope) good reasons that the | |
configuration file is plain Common LISP. | |
There is a configuration file looking like a real world example named | |
**config.lisp.sample** and another configuration file I use for testing | |
named **example.lisp** containing lot of cases. | |
## Let's start | |
In order to use __reed-alert__ we only need to create a new | |
configuration file and then add a cron job. | |
### Configuration | |
We are going to see how to configure __reed-alert__. You can find more | |
explanations or details in the __README__ file. | |
#### Alerts | |
We have to configure two kind of parameters, first we need to set-up a | |
way to receive alerts, easiest way to do so is by sending a mail with | |
"mail" command. Alerts are declared with the function **alert** and as | |
parameters the alert name and the command to be executed. Some | |
variables are replaced with values from the probe, in the __README__ | |
file you can find the list of probes, it looks like %date% or | |
%params%. | |
In Common LISP functions are called by using a parenthesis before its | |
name and until the parenthesis is closed, we are giving its | |
parameters. | |
Example: | |
(alert mail "echo 'problem on %hostname%' | mail [email protected]") | |
One should take care about nesting quotes here. | |
__reed-alert__ will fork a shell to start the command, so pipes and | |
redirection works. You can be creative when writing alerts that: | |
+ use a SMS service | |
+ write a script to post on a forum | |
+ publishing a file on a server | |
+ send text to IRC with ii client | |
#### Checks | |
Now we have some alerts, we will configure some checks in order to | |
make __reed-alert__ useful. It uses *probes* which are pre-defined | |
checks with parameters, a probe could be "has this file not been | |
updated since N minutes ?" or "Is the disk space usage of partition X | |
more than Y ?" | |
I chose to name the function "=>" to make a check, it isn't a name | |
and reminds an item or something going forward. Both previous example | |
using our previous mail notifier would look like: | |
(=> mail file-updated :path "/program/file.generated" :limit "10") | |
(=> mail disk-usage :limit 90) | |
It's also possible to use shell commands and check the return code | |
using the __command__ probe, allowing the user to define useful | |
checks. | |
(=> mail command :command "echo '/is-this-gopher-server-up?' | nc | |
-w 3 dataswamp.org 70" | |
:desc "dataswamp.org gopher server") | |
We use echo + netcat to check if a connection to a socket works. The | |
**:desc** keyword will give a nicer name in the output instead of just | |
"COMMAND". | |
#### Garniture | |
We wrote the minimum required to configure __reed-alert__, now the | |
configuration file so your **my-config.lisp** file should looks like | |
this: | |
(alert mail "echo 'problem on %hostname%' | mail [email protected]") | |
(=> mail file-updated :path "/program/file.generated" :limit "10") | |
(=> mail disk-usage :limit 90) | |
Now, you can start it every 5 minutes from a crontab with this: | |
*/5 * * * * ( reed-alert /path/to/my-config.lisp ) | |
If you prefer to use ecl: | |
*/5 * * * * ( reed-alert /path/to/my-config.lisp ) | |
The time between each run is up to you, depending on what you monitor. | |
#### Important | |
By default, when a check returns a failure, __reed-alert__ will only | |
trigger the notifier associated once it reach the 3rd failure. And | |
then, will notify again when the service is back (the variable %state% | |
is replaced by start or end to know if it starts or stops.) | |
This is to prevent reed-alert to send a notification each time it | |
checks, there is absolutely no need for this for most users. | |
The number of failures before triggering can be modified by using the | |
keyword ":try" as in the following example: | |
(=> mail disk-usage :limit 90 :try 1) | |
In this case, you will get notified at the first failure of it. | |
The number of failures of failed checks is stored in files (1 per | |
check) in the "states/" directory of reed-alert working directory. |