Nagios Core Administration Cookbook(Second Edition)
上QQ阅读APP看书,第一时间看更新

Scheduling downtime for a host or service

In this recipe, we'll learn how to schedule downtime for a host or service in Nagios Core. This is useful to elegantly suppress notifications for some predictable period of time; a very good example of this is when servers require downtime to be upgraded or to have their hardware checked.

In this example, we'll demonstrate how to schedule downtime for a host named sparta.example.net, and we'll examine the changes it makes in the web interface.

Getting ready

You should have a Nagios Core 4.0 or newer server with a definition for at least one host, at least one service, and some idea of when you would like your downtime to be scheduled. You should also have a working web interface, as per the standard installation of Nagios Core 4.0.

You should also have Nagios Core configured to process external commands and have given your web interface user the permissions to apply them. If you are logging in as the nagiosadmin user as per the recommended quick start guide, you can check this is working by ensuring the following directive in /usr/local/nagios/etc/nagios.cfg is set to 1:

check_external_commands=1

Permissions to submit external commands from the web interface are defined in /usr/local/nagios/etc/cgi.cfg; verify that your username is included in these directives:

authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

If you have followed the Nagios Core quick start guide, you will probably find that external commands are already being accepted and working (http://nagios.sourceforge.net/docs/nagioscore/4/en/quickstart.html).

How to do it...

We can set up a fixed period of scheduled downtime for our host and service as follows:

  1. Log in to the web interface for Nagios Core.
  2. Click on Hosts, which is on the left-hand side menu.
  3. Click on the host's name in the table that comes up to view the details for that host.
  4. Click on Schedule downtime for this host on the Host Commands menu.
  5. Fill out the fields in the resulting form. Include the following details:
    • Host Name: This is the name of the host for which you're scheduling downtime. This should will be filled out for you.
    • Author: This refers to your name, for records regarding who scheduled the downtime. This may be grayed out and may state Nagios Admin; that's fine.
    • Comment: This refers to a comment that explains the reason for the downtime.
    • Start Time: This is the time at which the scheduled downtime should begin and state notifications end.
    • End Time: This refers to the time at which the scheduled downtime should end and state notifications resume.

    In this case, our downtime will be from 4.00 pm to 6.00 pm on June 28, 2015. Click on Commit to submit the downtime definition and then on Done in the screen that follows.

    With this done, we can safely bring the sparta.example.net host down between the nominated times and any notifications for the host and any of its services will be suppressed until the downtime is over.

    Note that restarting Nagios Core is not required for this step, as it would be for changes made to Nagios Core's configuration files. The change is done on the fly.

    Note also that comments now appear in the detailed information for both the host and service, defining the downtime and including the reason specified for it.

How it works...

The preceding steps nominate a period of downtime for both the sparta.example.net server and all of its services. This accomplishes two things:

  • It suppresses all notification e-mails for the host or service for the appropriate time period, including RECOVERY notifications. The only exceptions are DOWNTIMESTART and DOWNTIMEEND notifications.
  • It adds a comment to the host or service that shows the scheduled downtime for the benefit of anyone else who might be using the web interface.

Nagios Core keeps track of any downtime defined for all the hosts and services and prevents the notifications it would normally send out during that time. Note that it will still run its checks and record the state of both the hosts and services even during downtime. All that is suppressed are the notifications, not the actual checks.

There's more...

Note that downtime for inpidual services can be applied in much the same way, by clicking on Schedule downtime for this service in the web interface, which is under Service Commands.

What was defined in this recipe was a method to define fixed downtime, where we know ahead of time when the host or its services are likely to be unavailable. If we don't actually know what time the unavailability will start, but we do know how long it's likely to last, we can define flexible downtime. This means that the downtime can start any time within the nominated period and it will last for the length of time we specify from that point.

A notification event is also fired when the host or service begins downtime, called DOWNTIMESTART, and another is fired when the downtime ends, called DOWNTIMEEND. This may be a useful notification to send to the relevant contact or contact group if they'd like to be notified when this happens. This can be arranged by ensuring that the host or service is configured to send these messages, by including the s flag in the notification_options directive for both hosts and services and correspondingly in the contact definition:

notification_options  d,u,r,f,s

See also

  • The Managing brief outages with flapping section in this chapter
  • The Adjusting flapping percentage thresholds for a service section in this chapter
  • Choosing states for notification, Chapter 4, Configuring Notifications
  • Tolerating a certain number of failed checks, Chapter 4, Configuring Notifications
  • Adding comments on hosts or services in a web interface, Chapter 7, Using the Web Interface