Changing thresholds for PING RTT and packet loss
In this recipe, we'll set up a service for a host that monitors PING and take a look at how to adjust the thresholds for the WARNING
and CRITICAL
states, which is done using command arguments. We'll accomplish this by setting up a service for an existing host that's already being checked with a check_command
, such as check-host-alive
. Our service will be used to monitor not whether the host is completely down, but whether it's responding to PING requests within a reasonable period of time.
This could be useful to notify and assist in diagnosing problems with the actual connectivity of a service or host.
This recipe will therefore serve as a good demonstration of the concepts of supplying arguments to a command and adjusting the WARNING
and CRITICAL
thresholds for a particular service.
Getting ready
You should have a Nagios Core 4.0 or newer server with at least one host configured already and have used a check_command
of check-host-alive
. We'll use the example of sparta.example.net
, which is a host defined in its own file.
You should also understand the basics of how hosts and services fit together in a Nagios Core configuration and you should be familiar with the use of commands and plugins via the check_command
directive.
How to do it...
We can add our PING service to the existing host with custom round trip time and packet loss thresholds as follows:
- Change to the objects configuration directory for Nagios Core. The default location is
/usr/local/nagios/etc/objects
. If you've put the definition of your host in a different file, move it to its directory instead:# cd /usr/local/nagios/etc/objects
- Edit the file that contains your host definition:
# vi sparta.example.net.cfg
- Add the following definition to the end of the file. The one value that interests us the most is the value of the
check_command
directive:define service { use generic-service host_name sparta.example.net service_description PING check_command check_ping!100,20%!200,40% }
- Validate the configuration and restart the Nagios Core server:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # /etc/init.d/nagios reload
With this done, Nagios Core will not only run a host check of
check-host-alive
against your original host to ensure that it's up, but it will also run a more stringent check of the PING responses from the machine as a service to check whether it's adequately responsive:- If the round trip time (RTT) of the PING response is greater than 100 ms (but less than 200 ms), Nagios Core will flag a
WARNING
state. - If the RTT of the PING response is greater than 200 ms, Nagios Core will flag a
CRITICAL
state. - If more than 20% (but less than 40%) of the PING requests receive no response, Nagios Core will flag a
WARNING
state. - If more than 40% of the PING requests receive no response, Nagios Core will flag a
CRITICAL
state.
In both cases, a notification will be sent to the service's defined contacts if it is configured to do so. Otherwise, this service functions much the same as any other service and appears in the web interface as a service beneath the host.
- If the round trip time (RTT) of the PING response is greater than 100 ms (but less than 200 ms), Nagios Core will flag a
How it works...
The configuration we added to our existing host creates a new service with a service_description
of PING
. For check_command
, we use the check_ping
command, which uses the plugin of the same name. The interesting part here is what follows the check_command
definition: the!100,20%!200,40%
string.
In Nagios Core, the !
character is used as a separator for arguments that should be passed to the command. In the case of check_ping
, the first argument defines thresholds or conditions that, if met, should make Nagios Core flag a WARNING
state for the service. Similarly, the second argument defines the thresholds for a CRITICAL
state.
Each of the two arguments is comprised of two comma-separated terms: the first number is the threshold for the RTT of the PING request and its response that should trigger a state and the second number is the percentage of packet loss that should be tolerated before raising the same state.
This pattern of arguments is specific to check_ping
; they would not work for other commands such as check_http
.
There's more...
If we want to look in a bit more detail at how these arguments are applied, we can inspect the command definition for check_ping
. By default, this is in the /usr/local/nagios/etc/objects/commands.cfg
file and looks as follows:
define command { command_name check_ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5 }
In the value of command_line
, the following four macros are used:
$USER1$
: This expands to/usr/local/nagios/libexec
or the directory in which the Nagios Core plugins are normally kept, includingcheck_ping
.$HOSTADDRESS$
: This expands tohost_name
for the host or service definition in which the command is used. In this case, it expands to192.0.2.21
, the value of theaddress
directive for thesparta.example.net
host.$ARG1$
: This expands to the value given for the first argument of the command; in our recipe's case, the100,20%
string.$ARG2$
: This expands to the value given for the second argument of the command; in our recipe's case, the200,40%
string.
The complete command-line call for our specific check with all these substitutions made would therefore look like this:
/usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40% -p 5
This command line makes use of the following four parameters of the check_ping
program:
-H
: This specifies the address of the host to be checked-w
: This specifies the thresholds to raise aWARNING
state-c
: This specifies the thresholds to raise aCRITICAL
state-p
: This specifies the number of PING requests to be sent
We can run this directly from the command line on the Nagios Core server to see what the results of the check might be:
# /usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40% -p 5
This yields an output that includes the OK
result of the check and also some performance data:
PING OK - Packet loss = 0%, RTA = 0.17 ms|rta=0.174000ms;100.000000;200.000000;0.000000 pl=0%;5;10;0
The arguments specified in the command are, therefore, used to customize the behavior of check_command
for the particular host or service that is being edited.
The check_ping
plugin isn't the only way you can monitor ICMP ECHO response times for a host; an alternative in the Nagios Plugins set is the check_icmp
plugin. You may find this one more efficient to run on larger Nagios Core installations, as it requires setuid
to run as root
and thereby executes network code directly, rather than forking a ping(8)
process and parsing its output. Its options and syntax are very similar to those of check_ping
:
$ /usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40% PING OK - Packet loss = 0%, RTA = 0.17 ms|rta=0.166000ms;100.000000;200.000000;0.000000 pl=0%;20;40;0 $ /usr/local/nagios/libexec/check_icmp -H 192.0.2.21 -w 100,20% -c 200,40% OK - 192.0.2.21: rta 0.109ms, lost 0%|rta=0.109ms;100.000;200.000;0; pl=0%;20;40;; rtmax=0.210ms;;;; rtmin=0.082ms;;;;
See also
- Creating a new service, Chapter 1, Understanding Hosts, Services, and Contacts
- The Changing thresholds for disk usage section in this chapter
- Customizing an existing command, Chapter 2, Working with Commands and Plugins
- Creating a new command, Chapter 2, Working with Commands and Plugins
- Monitoring PING for any host, Chapter 5, Monitoring Methods