Nagios Core Administration Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Changing thresholds for PING RTT and packet loss

In this recipe, we'll set up a service for a host that monitors PING and take a look at how to adjust the thresholds for the WARNING and CRITICAL states, which is done using command arguments. We'll accomplish this by setting up a service for an existing host that's already being checked with a check_command, such as check-host-alive. Our service will be used to monitor not whether the host is completely down, but whether it's responding to PING requests within a reasonable period of time.

This could be useful to notify and assist in diagnosing problems with the actual connectivity of a service or host.

This recipe will therefore serve as a good demonstration of the concepts of supplying arguments to a command and adjusting the WARNING and CRITICAL thresholds for a particular service.

Getting ready

You should have a Nagios Core 4.0 or newer server with at least one host configured already and have used a check_command of check-host-alive. We'll use the example of sparta.example.net, which is a host defined in its own file.

You should also understand the basics of how hosts and services fit together in a Nagios Core configuration and you should be familiar with the use of commands and plugins via the check_command directive.

How to do it...

We can add our PING service to the existing host with custom round trip time and packet loss thresholds as follows:

  1. Change to the objects configuration directory for Nagios Core. The default location is /usr/local/nagios/etc/objects. If you've put the definition of your host in a different file, move it to its directory instead:
    # cd /usr/local/nagios/etc/objects
    
  2. Edit the file that contains your host definition:
    # vi sparta.example.net.cfg
    
  3. Add the following definition to the end of the file. The one value that interests us the most is the value of the check_command directive:
    define service {
        use                  generic-service
        host_name            sparta.example.net
     service_description PING
     check_command check_ping!100,20%!200,40%
    }
  4. Validate the configuration and restart the Nagios Core server:
    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    # /etc/init.d/nagios reload
    

    With this done, Nagios Core will not only run a host check of check-host-alive against your original host to ensure that it's up, but it will also run a more stringent check of the PING responses from the machine as a service to check whether it's adequately responsive:

    • If the round trip time (RTT) of the PING response is greater than 100 ms (but less than 200 ms), Nagios Core will flag a WARNING state.
    • If the RTT of the PING response is greater than 200 ms, Nagios Core will flag a CRITICAL state.
    • If more than 20% (but less than 40%) of the PING requests receive no response, Nagios Core will flag a WARNING state.
    • If more than 40% of the PING requests receive no response, Nagios Core will flag a CRITICAL state.

    In both cases, a notification will be sent to the service's defined contacts if it is configured to do so. Otherwise, this service functions much the same as any other service and appears in the web interface as a service beneath the host.

How it works...

The configuration we added to our existing host creates a new service with a service_description of PING. For check_command, we use the check_ping command, which uses the plugin of the same name. The interesting part here is what follows the check_command definition: the!100,20%!200,40% string.

In Nagios Core, the ! character is used as a separator for arguments that should be passed to the command. In the case of check_ping, the first argument defines thresholds or conditions that, if met, should make Nagios Core flag a WARNING state for the service. Similarly, the second argument defines the thresholds for a CRITICAL state.

Each of the two arguments is comprised of two comma-separated terms: the first number is the threshold for the RTT of the PING request and its response that should trigger a state and the second number is the percentage of packet loss that should be tolerated before raising the same state.

This pattern of arguments is specific to check_ping; they would not work for other commands such as check_http.

There's more...

If we want to look in a bit more detail at how these arguments are applied, we can inspect the command definition for check_ping. By default, this is in the /usr/local/nagios/etc/objects/commands.cfg file and looks as follows:

define command {
    command_name  check_ping
    command_line  $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}

In the value of command_line, the following four macros are used:

  • $USER1$: This expands to /usr/local/nagios/libexec or the directory in which the Nagios Core plugins are normally kept, including check_ping.
  • $HOSTADDRESS$: This expands to host_name for the host or service definition in which the command is used. In this case, it expands to 192.0.2.21, the value of the address directive for the sparta.example.net host.
  • $ARG1$: This expands to the value given for the first argument of the command; in our recipe's case, the 100,20% string.
  • $ARG2$: This expands to the value given for the second argument of the command; in our recipe's case, the 200,40% string.

The complete command-line call for our specific check with all these substitutions made would therefore look like this:

/usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40% -p 5

This command line makes use of the following four parameters of the check_ping program:

  • -H: This specifies the address of the host to be checked
  • -w: This specifies the thresholds to raise a WARNING state
  • -c: This specifies the thresholds to raise a CRITICAL state
  • -p: This specifies the number of PING requests to be sent

We can run this directly from the command line on the Nagios Core server to see what the results of the check might be:

# /usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40% -p 5

This yields an output that includes the OK result of the check and also some performance data:

PING OK - Packet loss = 0%, RTA = 0.17 ms|rta=0.174000ms;100.000000;200.000000;0.000000 pl=0%;5;10;0

The arguments specified in the command are, therefore, used to customize the behavior of check_command for the particular host or service that is being edited.

The check_ping plugin isn't the only way you can monitor ICMP ECHO response times for a host; an alternative in the Nagios Plugins set is the check_icmp plugin. You may find this one more efficient to run on larger Nagios Core installations, as it requires setuid to run as root and thereby executes network code directly, rather than forking a ping(8) process and parsing its output. Its options and syntax are very similar to those of check_ping:

$ /usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40%
PING OK - Packet loss = 0%, RTA = 0.17 ms|rta=0.166000ms;100.000000;200.000000;0.000000 pl=0%;20;40;0
$ /usr/local/nagios/libexec/check_icmp -H 192.0.2.21 -w 100,20% -c 200,40%
OK - 192.0.2.21: rta 0.109ms, lost 0%|rta=0.109ms;100.000;200.000;0; pl=0%;20;40;; rtmax=0.210ms;;;; rtmin=0.082ms;;;;

See also

  • Creating a new service, Chapter 1, Understanding Hosts, Services, and Contacts
  • The Changing thresholds for disk usage section in this chapter
  • Customizing an existing command, Chapter 2, Working with Commands and Plugins
  • Creating a new command, Chapter 2, Working with Commands and Plugins
  • Monitoring PING for any host, Chapter 5, Monitoring Methods