Changing thresholds for disk usage_Nagios Core Administration Cookbook（Second Edition）-QQ阅读男生玄幻网

上QQ阅读APP看书，第一时间看更新

Changing thresholds for disk usage

In this recipe, we'll configure the Nagios Core server to check its own disk usage and to flag a WARNING or CRITICAL state depending on how little free space is left on the disk. We'll accomplish this by adding a new service to the already defined localhost host called DISK, which will run the check_local_disk command to examine the state of the mounted volumes on the server.

Because burgeoning disk usage can creep up on any system administrator and because of the dire effect it can have when a disk suddenly gets filled completely without any warning, this is among the more important things to monitor in any given network.

For simplicity, we'll demonstrate this only for the monitoring server itself, as a host called localhost on 127.0.0.1. This is because the check_disk plugin can't directly check the disk usage of a remote server. However, the principles discussed here could be adapted to run the check on a remote server using check_nrpe. The use of NRPE and alternative methods to run remote checks are discussed in all the recipes in Chapter 6, Enabling Remote Execution.

Getting ready

You should have a Nagios Core 4.0 or newer server with a definition of localhost so that the monitoring host is able to check itself. A host definition for localhost is included in the sample configuration located at /usr/local/nagios/etc/objects/localhost.cfg. You should also understand the basics of how hosts and services fit together in a Nagios Core configuration and be familiar with the use of commands and plugins via the check_command directive.

We'll use the example of olympus.example.net as our Nagios Core server. We'll have it check its own disk, with its mount point at /.

How to do it...

We can add our DISK service to the existing host with custom usage thresholds as follows:

Change to the object's configuration directory for Nagios Core. The default is /usr/local/nagios/etc/objects. If you've put the definition for your host in a different file, move it to its directory instead:
```
# cd /usr/local/nagios/etc/objects
```
Edit the file that contains your host definition:
```
# vi localhost.cfg
```

Add the following definition to the end of the file. The one value that interests us the most here is the value of the check_command directive:

define service {
    use                  local-service
    host_name            localhost
    service_description  DISK
 check_command check_local_disk!10%!5%!/
}

Validate the configuration and restart the Nagios Core server:
```
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
# /etc/init.d/nagios reload
```
With this done, a new service is created for the localhost host that checks the disk usage on / and flags a WARNING state for the service if the free space is below 10% and a CRITICAL state if the free space is below 5%.
In both the cases, a notification will be sent to the service's defined contacts, if configured to do so.

How it works...

The configuration we added for our existing host creates a new service with a service_description of DISK. For the check_command, we use the check_local_disk command, which in turn uses the check_disk plugin to check the local machine's disks. The interesting part here is what follows the check_local_disk definition: the !10%!5%!/ string.

In Nagios Core, the character ! is used as a separator for arguments that should be passed to the command. In the case of check_local_disk, the first two arguments define thresholds or conditions that, if met, should make Nagios Core flag a WARNING state (the first argument, 10%) or a CRITICAL state (the second argument, 5%) for the service. The third argument defines the mount point of the disk to check, /. If you prefer, you can instead use its device name, for example /dev/sda1.

There's more...

If we want to look in a bit more detail at how these arguments are applied, we can inspect the command definition of check_local_disk. By default, this is in the /usr/local/nagios/etc/objects/commands.cfg file and looks as follows:

define command {
    command_name  check_local_disk
    command_line  $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}

Note that in this case, the command_name value and the name of the plugin used in the command_line are not the same.

In the value for command_line, the following four macros are used:

$USER1$ : This expands to /usr/local/nagios/libexec or the directory in which the Nagios Core plugins are normally kept, including check_disk.
$ARG1$ : This expands to the value given for the first argument of the command; in this case, the 10% string.
$ARG2$ : This expands to the value given for the second argument of the command; in this case, the 5% string.
$ARG3$ : This expands to the value given for the third argument of the command; in this case, the / string.

The complete command-line call for our specific check with all these substitutions made would therefore look like this:

/usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /

This command line makes use of the following three parameters of the check_disk program:

-w: This specifies the thresholds to raise a WARNING state
-c: This specifies the thresholds to raise a CRITICAL state
-p: This specifies the mount point or device file of the disk to be checked

We can run this directly from the command line on the Nagios Core server to see what the results of the check might be:

# /usr/local/nagios/libexec/check_disk -w 10% -c 5% -p /

The output includes both the OK result of the check and also some performance data:

DISK OK - free space: / 2575 MB (71% inode=78%);| /=1044MB;3432;3051;0;3814