* * * * *
troubling server crashes
One of the servers I'm monitoring (and it happens to be the most critical of
servers, go figure) has crashed every day for the past week on a 24.5 hour
schedule. This is not good, especially since the machine in question is not a
Windows system, but a Linux system. The other admin and I (we're in a
transition period as I take over) can't figure out what is causing the
problem. The only major change this past week has been the installation of
MySQL [1].
We're not sure what to make of the problem.
To that end, I installed Nagios [2], a framework of monitoring programs on
another server to monitor the troublesome machine. It took a while to
configure Nagios as the configuration file is complex, due to the separate
definitions for hosts, services, contacts and groupings of hosts, services
and contacts, but this complexity means you can fine tune the monitoring (and
it's easy to add new hosts, services or contacts once the initial
configuration is complete).
I will also be rebooting the server in a few hours in an attempt to see if it
always crashes around 8:00 in the morning, or just after 24.5 hours since the
last reboot; I doubt the crashes are due to the janitorial staff unplugging
the computer to plug their vaccuum cleaner.
At least, I hope that's not the case.
[1]
http://www.mysql.com/
[2]
http://www.nagios.org/
Email author at
[email protected]