Implementing threshold checks in a plugin
You'll note that many of the plugins included in the Nagios Plugins set allow you to specify thresholds for different aspects of the tests that they perform, allowing custom configuration of which levels are ok, which need a warning, and which are critical. For example, the check_ping
plugin requires us to specify thresholds with -w
and -c
options that define limits for round-trip-time and packet loss:
$ /usr/local/nagios/libexec/check_ping -H 192.0.2.21 -w 100,20% -c 200,40% PING OK - Packet loss = 0%, RTA=0.20 ms|rta=0.200000ms;100.000000;100.000000;0.000000 pl=0%;10;20;0
In this case, the plugin's options are set only to raise a WARNING
state if the round-trip-time for the check exceeds 100 milliseconds or if more than 20% of the packets are lost. It will raise a CRITICAL
state if the check takes more than 200 milliseconds or 40% of the packets are lost.
When you're checking numeric values in a plugin, this is a useful way to allow the user to set their own thresholds for the check, rather than hardcoding them in the plugin itself. In this recipe, we'll adapt the plugin from the Writing a new plugin from scratch recipe to allow the user to set a threshold for kernel version numbers. We'll call this plugin check_kernel_version
.
Getting ready
You should have a Nagios Core 4.0 or newer server running with a few hosts and services configured already, and you should already have successfully deployed the check_vuln_kernel
plugin from the Writing a new plugin from scratch recipe in this chapter, including the installation of Perl and the two modules Nagios::Plugin
(or Monitoring::Plugin
) and Readonly
.
How to do it...
We can write, test, and implement our check_kernel_version
plugin as follows:
- Change to the directory containing the plugin binaries for Nagios Core. The default location is
/usr/local/nagios/libexec
:# cd /usr/local/nagios/libexec
- Start editing a new file called
check_kernel_version
:# vi check_kernel_version
- Include the following code in it. Take note of the comments, which explain what each block of code does:
#!/usr/bin/env perl # Use strict Perl style use strict; use warnings; use utf8; # Require at least Perl v5.10 use 5.010; # Require a few modules, including Nagios::Plugin use Nagios::Plugin; use POSIX; use Readonly; # Run POSIX::uname() to get the kernel version string my @uname = uname(); my $version = $uname[2]; my ($version_major) = split m/[.]/msx, $version; # Create a new Nagios::Plugin object my $np = Nagios::Plugin->new( usage => 'Usage: %s -w THRESHOLD -c THRESHOLD' ); # Add options allowing specifying warning and critical ranges $np->add_arg( spec => 'warning|w=s', help => "-w, --warning=THRESHOLD\n" . ' Kernel version number threshold for returning warning', required => 1 ); $np->add_arg( spec => 'critical|c=s', help => "-c, --critical=THRESHOLD\n" . ' Kernel version number threshold for returning critical', required => 1 ); # Read options $np->getopts(); # Compare the major version number to the thresholds my $code = $np->check_threshold( check => $version_major, warning => $np->opts->warning, critical => $np->opts->critical, ); # Exit with the appropriate status $np->nagios_exit( $code, $version ); # If we couldn't get the major version number, bail out with UNKNOWN if ( !$version_major ) { $np->nagios_die('Could not read kernel version string'); }
- Make the plugin that is owned by the
nagios
group and executable withchmod(1)
:# chown root.nagios check_kernel_version # chmod 0770 check_kernel_version
- Run the plugin directly to test it; your output may differ depending on your system's kernel version:
# sudo -s -u nagios $ ./check_kernel_version -w 4: -c 3: KERNEL_VERSION WARNING - 3.16.0-4-amd64
We should now be able to use the plugin in a command and hence in a service check, just like any other command.
How it works...
The code in check_kernel_version
differs from that of check_vuln_kernel
in several ways:
- It tests only the major kernel version number, the very first part of the kernel version string.
- It allows us to specify the range of version numbers that should raise
WARNING
orCRITICAL
states on the command line, rather than hardcoding them in the script. It does this using theNagios::Plugin
implementation of an option parser. - It includes some basic output, showing usage and help information for the options. This is required by
Nagios::Plugin
. - It uses the
check_threshold
method of theNagios::Plugin
package to check the version number against the thresholds for us.
We could have written our own code to compare the version numbers, but there's an advantage to using the check_threshold
method; it uses the standard threshold format to specify the range for an alert level. In the recipe, we used these values:
4
: For-w
, this means that an alert is generated if the value being checked is less than4
.3
: For-c
, this means an alert is generated if the value being checked is less than3
.
Because the major version number of the kernel in the recipe is 3
, we got the WARNING
output because it's less than 4
, but not less than 3
.
The threshold format syntax allows you to specify the ranges for alerts very carefully. There's a breakdown of the syntax available on the Nagios Plugins website, https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT.
There's more...
We might set up a command and corresponding service check for this new plugin as follows:
define command { command_name check_kernel_version command_line $USER1$/check_kernel_version -w $ARG1$ -c $ARG2$ } define service { use local-service host_name localhost service_description KERNEL_VERSION check_command check_kernel_version!4:!3: }
Note that we are able to define the WARNING
and CRITICAL
thresholds in the service definition as command arguments. This allows us to choose different thresholds for different services without editing the plugin's code.
Ideally, when writing plugins, we should include documentation and help output, particularly if we intend to distribute them to other users. The Nagios::Plugin
module enforces some bare minimums, such as specifying the usage information output for the plugin if we declare any options, and requiring help information for each option.
If we run this plugin with a --help
option, we get some useful output built by the Nagios::Plugin
module's methods:
$ ./check_kernel_version --help check_kernel_version This nagios plugin is free software, and comes with ABSOLUTELY NO WARRANTY. It may be used, redistributed and/or modified under the terms of the GNU General Public Licence (see http://www.fsf.org/licensing/licenses/gpl.txt). Usage: check_kernel_version -w THRESHOLD -c THRESHOLD -?, --usage Print usage information -h, --help Print detailed help screen -V, --version Print version information --extra-opts=[section][@file] Read options from an ini file. See http://nagiosplugins.org/extra-opts for usage and examples. -w, --warning=THRESHOLD Kernel version number threshold for returning warning -c, --critical=THRESHOLD Kernel version number threshold for returning critical -t, --timeout=INTEGER Seconds before plugin times out (default: 15) -v, --verbose Show details for command-line debugging (can repeat up to 3 times)
Note that one of these options is --extra-opts
. The module implements this so we can put any options for the plugin call into a file if we wish. For example, we could put our -w
and -c
options into an INI file check_kernel_version.ini
:
[check_kernel_version] warning = 4: critical = 3:
Then we could call it like this:
$ ./check_kernel_version --extra-opts=@check_kernel_version.ini KERNEL_VERSION WARNING - 3.16.0-4-amd64
See also
- The Creating a new command section in this chapter
- Creating a new service, Chapter 1, Understanding Hosts, Services, and Contacts
- The Writing a new plugin from scratch section in this chapter
- The Using macros as environment variables in a plugin section in this chapter