Changing thresholds for disk usage
In this recipe, we'll configure the Nagios Core server to check its own disk usage and to flag a WARNING
or CRITICAL
state depending on how little free space is left on the disk. We'll accomplish this by adding a new service to the already defined localhost
host called DISK
, which will run the check_local_disk
command to examine the state of the mounted volumes on the server.
Because burgeoning disk usage can creep up on any system administrator and because of the dire effect it can have when a disk suddenly gets filled completely without any warning, this is among the more important things to monitor in any given network.
For simplicity, we'll demonstrate this only for the monitoring server itself, as a host called localhost
on 127.0.0.1
. This is because the check_disk
plugin can't directly check the disk usage of a remote server. However, the principles discussed here could be adapted to run the check on a remote server using check_nrpe
. The use of NRPE and alternative methods to run remote checks are discussed in all the recipes in Chapter 6, Enabling Remote Execution.
Getting ready
You should have a Nagios Core 4.0 or newer server with a definition of localhost
so that the monitoring host is able to check itself. A host definition for localhost
is included in the sample configuration located at /usr/local/nagios/etc/objects/localhost.cfg
. You should also understand the basics of how hosts and services fit together in a Nagios Core configuration and be familiar with the use of commands and plugins via the check_command
directive.
We'll use the example of olympus.example.net
as our Nagios Core server. We'll have it check its own disk, with its mount point at /
.
How to do it...
We can add our DISK
service to the existing host with custom usage thresholds as follows:
- Change to the object's configuration directory for Nagios Core. The default is
/usr/local/nagios/etc/objects
. If you've put the definition for your host in a different file, move it to its directory instead:# cd /usr/local/nagios/etc/objects
- Edit the file that contains your host definition:
# vi localhost.cfg
- Add the following definition to the end of the file. The one value that interests us the most here is the value of the
check_command
directive:define service { use local-service host_name localhost service_description DISK check_command check_local_disk!10%!5%!/ }
- Validate the configuration and restart the Nagios Core server:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # /etc/init.d/nagios reload
With this done, a new service is created for the
localhost
host that checks the disk usage on/
and flags aWARNING
state for the service if the free space is below10%
and aCRITICAL
state if the free space is below5%
.In both the cases, a notification will be sent to the service's defined contacts, if configured to do so.
How it works...
The configuration we added for our existing host creates a new service with a service_description
of DISK
. For the check_command
, we use the check_local_disk
command, which in turn uses the check_disk
plugin to check the local machine's disks. The interesting part here is what follows the check_local_disk
definition: the !10%!5%!/
string.
In Nagios Core, the character !
is used as a separator for arguments that should be passed to the command. In the case of check_local_disk
, the first two arguments define thresholds or conditions that, if met, should make Nagios Core flag a WARNING
state (the first argument, 10%
) or a CRITICAL
state (the second argument, 5%
) for the service. The third argument defines the mount point of the disk to check, /
. If you prefer, you can instead use its device name, for example /dev/sda1
.
There's more...
If we want to look in a bit more detail at how these arguments are applied, we can inspect the command definition of check_local_disk
. By default, this is in the /usr/local/nagios/etc/objects/commands.cfg
file and looks as follows:
define command { command_name check_local_disk command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ }
Note that in this case, the command_name
value and the name of the plugin used in the command_line
are not the same.
In the value for command_line
, the following four macros are used:
$USER1$
: This expands to/usr/local/nagios/libexec
or the directory in which the Nagios Core plugins are normally kept, includingcheck_disk
.$ARG1$
: This expands to the value given for the first argument of the command; in this case, the10%
string.$ARG2$
: This expands to the value given for the second argument of the command; in this case, the5%
string.$ARG3$
: This expands to the value given for the third argument of the command; in this case, the/
string.
The complete command-line call for our specific check with all these substitutions made would therefore look like this:
/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /
This command line makes use of the following three parameters of the check_disk
program:
-w
: This specifies the thresholds to raise aWARNING
state-c
: This specifies the thresholds to raise aCRITICAL
state-p
: This specifies the mount point or device file of the disk to be checked
We can run this directly from the command line on the Nagios Core server to see what the results of the check might be:
# /usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /
The output includes both the OK
result of the check and also some performance data:
DISK OK - free space: / 2575 MB (71% inode=78%);| /=1044MB;3432;3051;0;3814
See also
- Creating a new service, Chapter 1, Understanding Hosts, Services, and Contacts
- The Changing thresholds for PING RTT and packet loss section in this chapter
- Customizing an existing command, Chapter 2, Working with Commands and Plugins
- Creating a new command, Chapter 2, Working with Commands and Plugins
- Monitoring PING for any host, Chapter 5, Monitoring Methods