notes
Table of Contents
1 Checks
1.1 Local
1.1.1 DONE ping
1.2 Coproc - Using SSH instead
1.2.1 DONE CPU - per core?
This is implemented as a total load reading.
- DONE Possibly need a 2s sleep before taking measurements
This didn't help the top cpu readings but taking for readings then using 'tail -n +2' to skip the first one and average the last three appears to give more consistante readings.
1.2.2 DONE Mem
This is polled from the 'free -m' output, excluding the disk cache from the computed used memory value.
1.2.3 DONE Disk
Just parsed 'df -h | tail -n +2' output through awk.
1.2.4 – Top - not included in report
Not really useful, not used.
1.2.5 DONE Service - specified in config script
Services just check the running processes list because I couldn't find a reliable way to parse 'service serviceName status' output over SSH.
2 Features
2.1 Run as service - abandoned in favor of running as a crontask
2.2 rsyslog for DB log storage (stretch goal)
2.3 DONE email notifications
Email notifications appear to work but must be used with the heirloom mailx implementation.
The BSD and GNU implementations didn't appear to work with Exchange or the ETS smtp server for
whatever reason but the heirloom mailx binary works just fine. The advantage is that the smtp
server can be explicitly specified rather than infered from mx records.
2.4 DONE Only send notifications on repeat problem
The idea is that instead of sending an email notification, it should write a trigger file
so the next time the sript runs, if there is need to send a notification and that file
exists, the notifications will be sent. This will cut down on getting an email every time
the application is starting up or reindexing the content database.
Notifications are now only sent the second time that any given thresholds are broken. commit a0e2c72e328e600249bbe05e91f93093f9d823f9
3 Documentation
3.1 DONE Is/Isn't
3.2 DONE SSH
3.3 DONE Config File
3.4 DONE cron
3.5 DONE mailx
3.6 DONE Top Parsing