Puppet:Mastering Infrastructure Automation
上QQ阅读APP看书,第一时间看更新

Building a specific module

This chapter has discussed many theoretical and operational aspects of modules, but you are yet to gain an insight into the process of writing modules. For this purpose, the rest of this chapter will have you create an example module step by step.

It should be stressed again that for the most part, you will want to find general purpose modules from the Forge. The number of available modules is ever growing, so the odds are good that there is something already there to help you with what you need to do.

Assume that you want to add Cacti to your network, an RRD tool-based trend monitor and graphing server, including a web interface. If you would check the Forge first, you would indeed find some modules. However, let's further assume that none of them speak to you, because either the feature set or the implementation is not to your liking. If even the respective interfaces don't meet your requirements, it doesn't make much sense to base your own module on an existing one (in the form of a fork on GitHub) either. You will then need to write your own module from scratch.

Naming your module

Module names should be concise and to the point. If you manage a specific piece of software, name your module after it - apache, java, mysql, and so forth. Avoid verbs such as install_cacti or manage_cacti. If your module name does need to consist of several words (because the target subsystem has a long name), they should be divided by underscore characters. Spaces, dashes, and other non-alphanumeric characters are not allowed.

In our example, the module should just be named cacti.

Making your module available to Puppet

To use your own module, you don't need to make it available for installation through puppet module. For that, you will need to upload the module to the Forge first, which will require quite some additional effort. Luckily, a module will work just fine without all this preparation, if you just put the source code in the proper location on your master.

To create your own cacti module, create the basic directories:

root@puppetmaster# mkdir -p /opt/puppetlabs/code/environments/testing/cacti/{manifests,files}

Don't forget to synchronize all the changes to production once the agents use them.

Implementing basic module functionality

Most modules perform all of their work through their manifests.

Tip

There are notable exceptions, such as the stdlib module. It mainly adds the parser functions and a few general-purpose resource types.

When planning the classes for your module, it is most straightforward to think about how you would like to use the finished module. There is a wide range of possible interface designs. The de facto standard stipulates that the managed subsystem is initialized on the agent system by including the module's main class - the class that bears the same name as the module and is implemented in the module's init.pp file.

For our Cacti module, the user should use the following:

include cacti

As a result, Puppet would take all the required steps in order to install the software and if necessary, perform any additional initialization.

Start by creating the cacti class and implementing the setup in the way you would from the command line, replacing the commands with appropriate Puppet resources. On a Debian system, installing the cacti package is enough. Other required software is brought in through the dependencies (completing the LAMP stack), and after the package installation, the interface becomes available through the web URI /cacti/ on the server machine:

# …/modules/cacti/manifests/init.pp
class cacti {
  package { 'cacti':
    ensure => installed,
  }
}

Your module is now ready for testing. Invoke it from your agent's manifest in site.pp or nodes.pp of the testing environment:

node 'agent' {
  include cacti
}

Apply it on your agent directly:

root@agent# puppet agent --test --environment testing

This will work on Debian, and Cacti is reachable via http://<address>/cacti/.

Note

Some sites use an External Node Classifier (ENC), such as The Foreman. Among other helpful things, it can centrally assign environments to the nodes. In this scenario, the --environment switch will not work.

It's unfortunate that the Cacti web interface will not come up when the home page is requested through the / URI. To enable this, give the module the ability to configure an appropriate redirection. Prepare an Apache configuration snippet in the module in /opt/puppetlabs/code/environments/testing/cacti/files/etc/apache2/conf.d/cacti-redirect.conf:

# Do not edit this file – it is managed by Puppet!
RedirectMatch permanent ^/$ /cacti/

Tip

The warning notice is helpful, especially when multiple administrators have access to the Cacti server.

It makes sense to add a dedicated class that will sync this file to the agent machine:

# …/modules/cacti/manifests/redirect.pp
class cacti::redirect {
  file { '/etc/apache2/conf.d/cacti-redirect.conf': 
    ensure  => file, 
    source  => 'puppet:///modules/cacti/etc/apache2/conf.d/cacti-redirect.conf',
    require => Package['cacti']; 
  }
}

Tip

A short file like this can also be managed through the file type's content property instead of source:

$puppet_warning = '# Do not edit – managed by Puppet!'
$line = 'RedirectMatch permanent ^/$ /cacti/'
file { '/etc/apache2/conf.d/cacti-redirect.conf': 
  ensure  => file, 
  content => "${puppet_warning}\n${line}\n", 
} 

This is more efficient, because the content is part of the catalog and so the agent does not need to retrieve the checksum through another request to the master.

The module now allows the user to include cacti::redirect in order to get this functionality. This is not a bad interface as such, but this kind of modification is actually well-suited to become a parameter of the cacti class:

class cacti($redirect = true ) {
  if $redirect {
    contain cacti::redirect
  }
  package { 'cacti':
    ensure => installed,
  }
}

The redirect is now installed by default when a manifest uses include cacti. If the web server has other virtual hosts that serve things that are not Cacti, this might be undesirable. In such cases, the manifest will declare the class with this following parameter:

class { 'cacti': redirect => false }

Speaking of best practices, most modules will also separate the installation routine into a class of its own. In our case, this is hardly helpful, because the installation status is ensured through a single resource, but let's do it anyway:

class cacti( $redirect = true ) {
  contain cacti::install
  if $redirect {
    contain cacti::redirect
  }
}

It's sensible to use contain here in order to make the Cacti management a solid unit. The cacti::install class is put into a separate install.pp manifest file:

# …/modules/cacti/manifests/install.pp
class cacti::install {
  package { 'cacti':
    ensure => 'installed'
  }
}

On Debian, the installation process of the cacti package copies another Apache configuration file to /etc/apache2/conf.d. Since Puppet performs a normal apt installation, this result will be achieved. However, Puppet does not make sure that the configuration stays in this desired state.

Note

There is an actual risk that the configuration might get broken. If the puppetlabs-apache module is in use for a given node, it will usually purge any unmanaged configuration files from the /etc/apache2/ tree. Be very careful when you enable this module for an existing server. Test it in the noop mode. If required, amend the manifest to include the existing configuration.

It is prudent to add a file resource to the manifest that keeps the configuration snippet in its post-installation state. Usually with Puppet, this will require you to copy the config file contents to the module, just like the redirect configuration is in a file on the master. However, since the Debian package for Cacti includes a copy of the snippet in /usr/share/doc/cacti/cacti.apache.conf, you can instruct the agent to sync the actual configuration with that. Perform this in yet another de facto standard for modules - the config class:

# …/modules/cacti/manifests/config.pp
class cacti::config { 
  file { '/etc/apache2/conf.d/cacti.conf': 
    mode   => '0644', 
    source => '/usr/share/doc/cacti/cacti.apache.conf' 
  }
}

This class should be contained by the cacti class as well. Running the agent again will now have no effect, because the configuration is already in place.

Creating utilities for derived manifests

You have now created several classes that compartmentalize the basic installation and configuration work for your module. Classes lend themselves very well to implement global settings that are relevant for the managed software as a whole.

However, just installing Cacti and making its web interface available is not an especially powerful capability - after all, the module does little beyond what a user can achieve by installing Cacti through the package manager. The much greater pain point with Cacti is that it usually requires configuration via its web interface; adding servers as well as choosing and configuring graphs for each server can be an arduous task and require dozens of clicks per server, depending on the complexity of your graphing needs.

This is where Puppet can be the most helpful. A textual representation of the desired states allows for quick copy-and-paste repetition and name substitution through regular expressions. Better yet, once there is a Puppet interface, the users can devise their own defined types in order to save themselves from the copy and paste work.

Speaking of defined types, they are what is required for your module to allow this kind of configuration. Each machine in Cacti's configuration should be an instance of a defined type. The graphs can have their own type as well.

As with the implementation of the classes, the first thing you always need to ask yourself is how this task would be done from the command line.

Tip

Actually, the better question can be what API you should use for this, preferably from Ruby. However, this is only important if you intend to write Puppet plugins - resource types and providers. We will look into this later in this very chapter.

Cacti comes with a set of CLI scripts. The Debian package makes these available in /usr/share/cacti/cli. Let's discover these while we step through the implementation of the Puppet interface. The goals are defined types that will effectively wrap the command-line tools so that Puppet can always maintain the defined configuration state through appropriate queries and update commands.

Adding configuration items

While designing more capabilities for the Cacti module, first comes the ability to register a machine for monitoring - or rather, a device, as Cacti itself calls it (network infrastructure such as switches and routers are frequently monitored as well, and not only computers). The name for the first defined type should, therefore, be cacti::device.

Note

The same warnings from the Naming your module subsection apply - don't give in to the temptation of giving names such as create_device or define_domain to your type, unless you have very good reasons, such as the removal being impossible. Even then, it's probably better to skip the verb.

The CLI script used to register a device is named add_device.php. Its help output readily indicates that it requires two parameters, which are description and ip. A custom description of an entity is often a good use for the respective Puppet resource's title. The type almost writes itself now:

# …/modules/cacti/manifests/device.pp
define cacti::device ($ip) {
  $cli = '/usr/share/cacti/cli'
  $options = "--description='${name}' --ip='${ip}'"
  exec { "add-cacti-device-${name}":
    command => "${cli}/add_device.php ${options}",
    require => Class['cacti'],
}

Tip

In practice, it is often unnecessary to use so many variables, but it serves readability with the limited horizontal space of the page.

This exec resource gives Puppet the ability to use the CLI to create a new device in the Cacti configuration. Since PHP is among the Cacti package's requirements, it's sufficient to make the exec resource require the cacti class. Note the use of $name, not only for the --description parameter but in the resource name for the exec resource as well. This ensures that each cacti::device instance declares a unique exec resource in order to create itself.

However, this still lacks an important aspect. Written as in the preceding example, this exec resource will make the Puppet agent run the CLI script always, under any circumstances. This is incorrect though - it should only run if the device has not yet been added.

Every exec resource should have one of the creates, onlyif, or unless parameters. It defines a query for Puppet to determine the current sync state. The add_device call must be made unless the device exists already. The query for the existing devices must be made through the add_graphs.php script (counterintuitively). When called with the --list-hosts option, it prints one header line and a table of devices, with the description in the fourth column. The following unless query will find the resource in question:

$search = "sed 1d | cut -f4- | grep -q '^${name}\$'"
exec { "add-cacti-device-${name}":
  command => "${cli}/add_device.php ${options}",
  path    => '/bin:/usr/bin',
  unless => "${cli}/add_graphs.php --list-hosts | ${search}",
  require => Class[cacti],
}

The path parameter is useful as it allows for calling the core utilities without the respective full path.

Tip

It is a good idea to generally set a standard list of search paths, because some tools will not work with an empty PATH environment variable.

The unless command will return 0 if the exact resource title is found among the list of devices. The final $ sign is escaped so that Puppet includes it in the $search command string literally.

You can now test your new define by adding the following resource to the agent machine's manifest:

# in manifests/nodes.pp
node 'agent' {
  include cacti
 cacti::device { 'Puppet test agent (Debian 7)': 
 ip => $ipaddress,
 }
}

On the next puppet agent --test run, you will be notified that the command for adding the device has been run. Repeat the run, and Puppet will determine that everything is now already synchronized with the catalog.

Allowing customization

The add_device.php script has a range of optional parameters that allow the user to customize the device. The Puppet module should expose these dials as well. Let's pick one and implement it in the cacti::device type. Each Cacti device has a ping_method that defaults to tcp. With the module, we can even superimpose our own defaults over those of the software:

define cacti::device(
  $ip,
 $ping_method='icmp')
{
  $cli = '/usr/share/cacti/cli'
  $base_opt = "--description='${name}' --ip='${ip}'"
  $ping_opt = "--ping_method=${ping_method}"
  $options = "${base_opt} ${ping_opt}"
  $search = "sed 1d | cut -f4- | grep -q '^${name}\$'"
  exec { "add-cacti-device-${name}":
    command => "${cli}/add_device.php ${options}",
    path    => '/bin:/usr/bin',
    unless  => "${cli}/add_graphs.php --list-hosts  | ${search}",
    require => Class[cacti], 
  }
}

The module uses a default of icmp instead of tcp. The value is always passed to the CLI script, whether it was passed to the cacti::device instance or not. The parameter default is used in the latter case.

Tip

If you plan to publish your module, it is more sensible to try and use the same defaults as the managed software whenever possible.

Once you incorporate all the available CLI switches, you would have successfully created a Puppet API in order to add devices to your Cacti configuration, giving the user the benefits of easy reproduction, sharing, implicit documentation, simple versioning, and more.

Removing unwanted configuration items

There is still one remaining wrinkle. It is atypical for Puppet types to be unable to remove the entities that they create. As it stands, this is a technical limitation of the CLI that powers your module, because it does not implement a remove_device function yet. Such scripts have been made available on the Internet, but are not properly a part of Cacti at the time of writing this.

To give the module more functionality, it would make sense to incorporate additional CLI scripts among the module's files. Put the appropriate file into the right directory under modules/cacti/files/ and add another file resource to the cacti::install class:

file { '/usr/share/cacti/cli/remove_device.php':
    mode    => 755,
    source  => 'puppet:///modules/cacti/usr/share/cacti/cli/remove_device.php',
    require => Package['cacti'],
}

You can then add an ensure attribute to the cacti::device type:

define cacti::device(
  $ensure='present',
  $ip,
  $ping_method='icmp')
{
  $cli = '/usr/share/cacti/cli'
  $search = "sed 1d | cut -f4- | grep -q '^${name}\$'"
  case $ensure {
  'present': {
    # existing cacti::device code goes here
  }
  'absent': {
    $remove = "${cli}/remove_device.php"
    $get_id = "${remove} --list-devices | awk -F'\\t' '\$4==\"${name}\" { print \$1 }'"
    exec { "remove-cacti-device-${name}":
        command => "${remove} --device-id=\$( ${get_id} )",
        path    => '/bin:/usr/bin',
        onlyif  => "${cli}/add_graphs.php --list-hosts | ${search}",
        require => Class[cacti],
      }
    }
  }
}

Note that we took some liberties with the indentation here so as to not break too many lines. This new exec resource is quite a mouthful, because the remove_device.php script requires the numeric ID of the device to be removed. This is retrieved with a --list-devices call that is piped to awk. To impair readability even more, some things such as double quotes, $ signs, and backslashes must be escaped so that Puppet includes a valid awk script in the catalog.

Also note that the query for the sync state of this exec resource is identical to the one for the add resource, except that now it is used with the onlyif parameter: only take action if the device in question is still found in the configuration.

Dealing with complexity

The commands we implemented for the cacti::device define are quite convoluted. At this level of complexity, shell one-liners become unwieldy for powering Puppet's resources. It gets even worse when handling the Cacti graphs; the add_graphs.php CLI script requires numeric IDs of not only the devices, but of the graphs as well. At this point, it makes sense to move the complexity out of the manifest and write wrapper scripts for the actual CLI. I will just sketch the implementation. The wrapper script will follow this general pattern.

#!/bin/bash
DEVICE_DESCR=$1
GRAPH_DESCR=$2
DEVICE_ID=` #scriptlet to retrieve numeric device ID`
GRAPH_ID=`  #scriptlet to retrieve numeric graph ID`
GRAPH_TYPE=`#scriptlet to determine the graph type`
/usr/share/cacti/cli/add_graphs.php \
  --graph-type=$GRAPH_TYPE \
  --graph-template-id=$GRAPH_ID \
  --host-id=$DEVICE_ID

With this, you can add a straightforward graph type:

define cacti::graph($device,$graph=$name) {
  $add = '/usr/local/bin/cacti-add-graph'
  $find = '/usr/local/bin/cacti-find-graph'
  exec { "add-graph-${name}-to-${device}":
    command => "${add} '${device}' '${graph}'",
    path    => '/bin:/usr/bin',
    unless  => "${find} '${device}' '${graph}'",
  }
}

This also requires an additional cacti-find-graph script. Adding this poses an additional challenge as the current CLI has no capabilities for listing configured graphs. There are many more functionalities that can be added to a cacti module, such as the management of Cacti's data sources and the ability to change options of the devices and, possibly, other objects that already exist in the configuration.

Such commodities are beyond the essentials and won't be detailed here. Let's look at some other parts for your exemplary cacti module instead.

Enhancing the agent through plugins

The reusable classes and defines give manifests that use your module much more expressive power. Installing and configuring Cacti now works concisely, and the manifest to do this becomes very readable and maintainable.

It's time to tap into the even more powerful aspect of modules - Puppet plugins. The different types of plugins are custom facts (which were discussed in Chapter 3, A Peek under the Hood – Facts, Types, and Providers), parser functions, resource types, and providers. All these plugins are stored in the modules on the master and get synchronized to all the agents. The agent will not use the parser functions (they are available to the users of puppet apply on the agent machine once they are synchronized, however); instead the facts and resource types do most of their work on the agent. Let's concentrate on the types and providers for now - the other plugins will be discussed in dedicated sections later.

Note

This section can be considered optional. Many users will never touch the code for any resource type or provider—the manifests give you all the flexibility you will ever need. If you don't care for plugins, do skip ahead to the final sections about finding the Forge modules. On the other hand, if you are confident about your Ruby skills and would like to take advantage of them in your Puppet installations, read on to find the ways in which custom types and providers can help you.

While the custom resource types are functional on both the master and the agent, the provider will do all its work on the agent side. Although the resource types also perform mainly through the agent, they have one effect on the master: they enable manifests to declare resources of the type. The code not only describes what properties and parameters exist, but it can also include the validation and transformation code for the respective values. This part is invoked by the agent. Some resource types even do the synchronization and queries themselves, although there is usually at least one provider that takes care of this.

In the previous section, you implemented a defined type that did all its synchronization by wrapping some exec resources. By installing binaries and scripts through Puppet, you can implement almost any kind of functionality this way and extend Puppet without ever writing one plugin. This does have some disadvantages, however:

  • The output is cryptic in the ideal case and overwhelming in the case of errors
  • Puppet shells out to at least one external process per resource; and in many cases, multiple forks are required

In short, you pay a price, both in terms of usability and performance. Consider the cacti::device type. For each declared resource, Puppet will have to run an exec resource's unless query on each run (or onlyif when ensure => absent is specified). This consists of one call to a PHP script (which can be expensive) as well as several core utilities that have to parse the output. On a Cacti server with dozens or hundreds of managed devices, these calls add up and make the agent spend a lot of time forking off and waiting for these child processes.

Consider a provider, on the other hand. It can implement an instances hook, which will create an internal list of configured Cacti devices once during initialization. This requires only one PHP call in total, and all the processing of the output can be done in the Ruby code directly inside the agent process. These savings alone will make each run much less expensive: resources that are already synchronized will incur no penalty, because no additional external commands need to be run.

Let's take a quick look at the agent output before we go ahead and implement a simple type/provider pair. Following is the output of the cacti::device type when it creates a device:

Notice: /Stage[main]/Main/Node[agent]/Cacti::Device[Agent_VM_Debian_7]/Exec[add-cacti-device-Agent_VM_Debian_7]/returns: executed successfully

The native types express such actions in a much cleaner manner, such as the output from a file resource:

Notice: /Stage[main]/Main/File[/usr/local/bin/cacti-search-graph]/ensure: created

Replacing a defined type with a native type

The process of creating a custom resource type with a matching provider (or several providers) is not easy. Let's go through the steps involved:

  1. Naming your type.
  2. Creating the resource type's interface.
  3. Designing sensible parameter hooks.
  4. Using resource names.
  5. Adding a provider.
  6. Declaring management commands.
  7. Implementing the basic functionality.
  8. Allowing the provider to prefetch existing resources.
  9. Making the type robust during the provisioning.

Naming your type

The first important difference between the native and defined types is the naming. There is no module namespacing for the custom types like you get with the defined types, which are manifest-based. Native types from all the installed modules mingle freely, if you will. They use plain names. It would, therefore, be unwise to call the native implementation of cacti::device just device - this will easily clash with whatever notion of devices another module might have. The obvious choice for naming your first resource type is cacti_device.

The type must be completely implemented in cacti/lib/puppet/type/cacti_device.rb. All hooks and calls will be enclosed in a Type.newtype block:

Puppet::Type.newtype(:cacti_device) do
  @doc = <<-EOD
    Manages Cacti devices.
    EOD
end

The documentation string in @doc should be considered mandatory, and it should be a bit more substantial than this example. Consider including one or more example resource declarations. Put all the further code pieces between the EOD terminator and the final end.

Creating the resource type interface

First of all, the type should have the ensure property. Puppet's resource types have a handy helper method that generates all the necessary type code for it through a simple invocation:

ensurable

With this method call in the body of the type, you add the typical ensure property, including all the necessary hooks. This line is all that is needed in the type code (actual implementation will follow in the provider). Most properties and parameters require more code, just like the ip parameter:

require 'ipaddr'
newparam(:ip) do
  desc "The IP address of the device."
 isrequired
  validate do |value|
    begin
      IPAddr.new(value)
    rescue ArgumentError
      fail "'#{value}' is not a valid IP address"
    end
  end
  munge do |value|
    value.downcase
  end
end

Note

This should usually be an ip property instead, but the provider will rely on the Cacti CLI, which has no capability for changing the already configured devices. If the IP address was a property, such changes would be required in order to perform property-value synchronization.

As you can see, the IP address parameter code consists mostly of validation. Add the require 'ipaddr' line near the top of the file rather than inside the Type.newtype block.

The parameter is now available for the cacti_device resources, and the agent will even refuse to add devices whose IP addresses are not valid. This is helpful for the users, because obvious typos in the addresses will be detected early. Let's implement the next parameter before we look at the munge hook more closely.

Designing sensible parameter hooks

Moving right along to the ping_method parameter, it accepts values only from a limited set, so validation is easy:

newparam(:ping_method) do
  desc "How the device's reachability is determined.
    One of `tcp` (default), `udp` or `icmp`."
  validate do |value|
    [ :tcp, :udp, :icmp ].include?(value.downcase.to_sym)
  end
  munge do |value|
    value.downcase.to_sym
  end
 defaultto :tcp
end

Looking at the munge blocks carefully, you will notice that they aim at unifying the input values. This is much less critical for the parameters than the properties, but if either of these parameters is changed to a property in a future release of your Cacti module, it will not try to sync a ping_method of tcp to TCP. The latter might appear if the users prefer uppercase in their manifest. Both values just become :tcp through munging. For the IP address, invoking downcase has an effect only for IPv6.

Note

Beyond the scope of Puppet itself, the munging of a parameter's value is important as well. It allows Puppet to accept more convenient values than the subsystem being managed. For example, Cacti might not accept TCP as a value, but Puppet will, and it will do the right thing with it.

Using resource names

You need to take care of one final requirement: each Puppet resource type must declare a name variable or namevar, for short. This parameter will use the resource title from the manifest as its value, if the parameter itself is not specified for the resource. For example, the exec type has the command parameter for its namevar. You can either put the executable command into the resource title or explicitly declare the parameter:

exec { '/bin/true': }
# same effect:
exec { 'some custom name': command => '/bin/true' }

To mark one of the existing parameters as the name variable, call the isnamevar method in that parameter's body. If a type has a parameter called :name, it automatically becomes the name variable. This is a safe default.

newparam(:name) do
  desc "The name of the device."
  #isnamevar # → commented because automatically assumed
end

Adding a provider

The resource type itself is ready for action, but it lacks a provider to do the actual work of inspecting the system and performing the synchronization. Let's build it step by step, just like the type. The name of the provider need not reflect the resource type it's for. Instead, it should contain a reference to the management approach it implements. Since your provider will rely on the Cacti CLI, name it cli. It's fine for multiple providers to share a name if they provide functionality to different types.

Create the skeleton structure in cacti/lib/puppet/provider/cacti_device/cli.rb:

Puppet::Type.type(:cacti_device).provide(
  :cli,
  :parent => Puppet::Provider
  ) do
end

Specifying :parent => Puppet::Provider is not necessary, actually. Puppet::Provider is the default base class for the providers. If you write a couple of similar providers for a subsystem (each catering to a different resource type), all of which rely on the same toolchain, you might want to implement a base provider that becomes the parent for all the sibling providers.

For now, let's concentrate on putting together a self-sufficient cli provider for the cacti_device type. First of all, declare the commands that you are going to need.

Declaring management commands

Providers use the commands method to conveniently bind executables to Ruby identifiers:

commands :php        => 'php'
commands :add_device => '/usr/share/cacti/cli/add_device.php'
commands :add_graphs => '/usr/share/cacti/cli/add_graphs.php'
commands :rm_device  => '/usr/share/cacti/cli/remove_device.php'

You won't be invoking php directly. It's included here because declaring commands serves two purposes:

  • You can conveniently call the commands through a generated method
  • The provider will mark itself as valid only if all the commands are found

So, if the php CLI command is not found in Puppet's search path, Puppet will consider the provider to be dysfunctional. The user can determine this error condition quite quickly through Puppet's debug output.

Implementing the basic functionality

The basic functions of the provider can now be implemented in three instance methods. The names of these methods are not magic as such, but these are the methods that the default ensure property expects to be available (remember that you used the ensurable shortcut in the type code).

The first is the method that creates a resource if it does not exist yet. It must gather all the resource parameter's values and build an appropriate call to add_device.php:

def create
  args = []
  args << "--description=#{resource[:name]}"
  args << "--ip=#{resource[:ip]}"
  args << "--ping_method=#{resource[:ping_method]}"
  add_device(*args)
end

Note

Don't quote the parameter values as you would quote them on the command line. Puppet takes care of this for you. It also escapes any quotes that are in the arguments, so in this case, Cacti will receive any quotes for inclusion in the configuration. For example, this will lead to a wrong title:

args << "--description='#{resource[:name]}'"

The provider must also be able to remove or destroy an entity:

def destroy
  rm_device("--device-id=#{@property_hash[:id]}")
end

The property_hash variable is an instance member of the provider. Each resource gets its specific provider instance. Read on to learn how it gets initialized to include the device's ID number.

Before we get to that, let's add the final provider method in order to implement the ensure property. This is a query method that the agent uses to determine whether a resource is already present:

def exists?
  self.class.instances.find do |provider|
    provider.name == resource[:name]
  end
end

The ensure property relies on the provider class method instances in order to get a list of providers for all the entities on the system. It compares each of them with the resource attribute, which is the resource type instance for which this current provider instance is performing the work. If this is rather confusing, please refer to the diagram in the next section.

Allowing the provider to prefetch existing resources

The instances method is truly special - it implements the prefetching of the system resources during the provider initialization. You have to add it to the provider yourself. Some subsystems are not suitable for the mass-fetching of all the existing resources (such as the file type). These providers don't have an instances method. Enumerating the Cacti devices, on the other hand, is quite possible:

def self.instances
  return @instances ||= add_graphs("--list-hosts").
    split("\n").
    drop(1).
    collect do |line|
      fields = line.split(/\t/, 4)
      Puppet.debug "prefetching cacti_device #{fields[3]} " +
                   "with ID #{fields[0]}"
      new(:ensure => :present,
          :name   => fields[3],
          :id => fields[0])
    end
end

The ensure value of the provider instance reflects the current state. The method creates instances for the resources that are found on the system, so for these, the value is always present. Also note that the result of the method is cached in the @instances class member variable. This is important, because the exists? method calls instances, which can happen a lot.

Puppet requires another method to perform proper prefetching. The mass-fetching you implemented through instances supplies the agent with a list of provider instances that represent the entities found on the system. From the master, the agent received a list of the resource type instances. However, Puppet has not yet built a relation between the resources (type instances) and providers. You need to add a prefetch method to the provider class in order to make this happen:

def self.prefetch(resources)
  instances.each do |provider|
    if res = resources[provider.name]
      res.provider = provider
    end
  end
end

The agent passes the cacti_device resources as a hash, with the resource title as the respective key. This makes lookups very simple (and quick).

This completes the cli provider for the cacti_device type. You can now replace your cacti::device resources with the cacti_device instances to enjoy improved performance and cleaner agent output:

node 'agent' {
  include cacti
 cacti_device { 'Puppet test agent (Debian 7)':
 ensure => present,
      ip     => $ipaddress,
  }
}

Note that unlike your defined type cacti::device, a native type will not assume a default value of present for its ensure property. Therefore, you have to specify it for any cacti_device resource. Otherwise, Puppet will only manage the properties of the resources that already exist and not care about whether the entity exists or not. In the particular case of cacti_device, this will never do anything, because there are no other properties (only parameters).

Note

You can refer to Chapter 6, Leveraging the Full Toolset of the Language, on how to use resource defaults to save you from the repetition of the ensure => present specification.

Making the type robust during provisioning

There is yet another small issue with the cacti module. It is self-sufficient and handles both the installation and configuration of Cacti. However, this means that during Puppet's first run, the cacti package and its CLI will not be available, and the agent will correctly determine that the cli provider is not yet suitable. Since it is the only provider for the cacti_device type, any resource of this type that is synchronized before the cacti package will fail.

In the case of the defined type cacti::device, you just added the require metaparameters to the inner resources. To achieve the same end for the native type instances, you can work with the autorequire feature. Just as the files automatically depend on their containing directory, the Cacti resources should depend on the successful synchronization of the cacti package. Add the following block to the cacti_device type:

autorequire :package do
  catalog.resource(:package, 'cacti')
end

Enhancing Puppet's system knowledge through facts

When facts were introduced in Chapter 3, A Peek under the Hood – Facts, Types, and Providers, you got a small tour of the process of creating your own custom facts. We hinted at modules at that point, and now, we can take a closer look at how the fact code is deployed, using the example of the Cacti module. Let's focus on native Ruby facts - they are more portable than the external facts. As the latter are easy to create, there is no need to discuss them in depth here.

Note

For details on external facts, you can refer to the online documentation on custom facts on the Puppet Labs site at http://docs.puppetlabs.com/facter/latest/custom_facts.html#external-facts.

Facts are part of the Puppet plugins that a module can contain, just like the types and providers from the previous sections. They belong in the lib/facter/ subtree. For the users of the cacti module, it might be helpful to learn which graph templates are available on a given Cacti server (once the graph management is implemented, that is). The complete list can be passed through a fact. The following code in cacti/lib/facter/cacti_graph_templates.rb will do just this job:

Facter.add(:cacti_graph_templates) do
  setcode do
    cmd = '/usr/share/cacti/cli/add_graphs.php'
    Facter::Core::Execution.exec("#{cmd} --list-graph-templates").
      split("\n").
      drop(1).
      collect do |line|
        line.split(/\t/)[1]
      end
  end
end

The code will call the CLI script, skip its first line of output, and join the values from the second column of each remaining line in a list. Manifests can access this list through the global $cacti_graph_templates variable, just like any other fact.

Refining the interface of your module through custom functions

Functions can be of great help in keeping your manifest clean and maintainable, and some tasks cannot even be implemented without resorting to a Ruby function.

A frequent use of the custom functions (especially in Puppet 3) is input validation. You can do this in the manifest itself, but it can be a frustrating exercise because of the limitations of the language. The resulting Puppet DSL code can be hard to read and maintain. The stdlib module comes with the validate_X functions for many basic data types, such as validate_bool. Typed parameters in Puppet 4 and later versions make this more convenient and natural, because for the supported variable types, no validation function is needed anymore.

As with all the plugins, the functions need not be specific to the module's domain, and they instantly become available for all the manifests. Point in case is the cacti module that can use the validation functions for the cacti::device parameters. Checking whether a string contains a valid IP address is not at all specific to Cacti. On the other hand, checking whether ping_method is one of those that Cacti recognizes is not that generic.

To see how it works, let's just implement a function that does the job of the validate and munge hooks from the custom cacti_device type for the IP address parameter of cacti::device. This should fail the compilation if the address is invalid; otherwise, it should return the unified address value:

module Puppet::Parser::Functions
  require 'ipaddr'
  newfunction(:cacti_canonical_ip, :type => :rvalue) do |args|
    ip = args[0]
    begin
      IPAddr.new(ip)
    rescue ArgumentError
      raise "#{@resource.ref}: invalid IP address '#{ip}'"
    end
 ip.downcase
  end
end

In the exception message, @resource.ref is expanded to the textual reference of the offending resource type instance, such as Cacti::Device[Edge Switch 03].

The following example illustrates the use of the function in the simple version of cacti::device without the ensure parameter:

define cacti::device($ip) {
  $cli = '/usr/share/cacti/cli'
 $c_ip = cacti_canonical_ip(${ip})
  $options = "--description='${name}' --ip='${c_ip}'"
  exec { "add-cacti-device-${name}":
    command => "${cli}/add_device.php ${options}",
    require => Class[cacti],
  }
}

The manifest will then fail to compile if an IP address has (conveniently) transposed digits:

ip => '912.168.12.13'

IPv6 addresses will be converted to all lowercase letters.

Note

Puppet 4 introduced a more powerful API for defining the custom functions. Refer to Chapter 7, New Features from Puppet 4, to learn about its advantages.

Making your module portable across platforms

Sadly, our Cacti module is very specific to the Debian package. It expects to find the CLI at a certain place and the Apache configuration snippet at another. These locations are most likely specific to the Debian package. It will be useful for the module to work on the Red Hat derivatives as well.

The first step is to get an overview of the differences by performing a manual installation. I chose to test this with a virtual machine running Fedora 18. The basic installation is identical to Debian, except using yum instead of apt-get, of course. Puppet will automatically do the right thing here. The puppet::install class also contains a CLI file, though. The Red Hat package installs the CLI in /var/lib/cacti/cli rather than /usr/share/cacti/cli.

If the module is supposed to support both platforms, the target location for the remove_device.php script is no longer fixed. Therefore, it's best to deploy the script from a central location in the module, while the target location on the agent system becomes a module parameter, if you will. Such values are customarily gathered in a params class:

# …/cacti/manifests/params.pp
class cacti::params {
  case $osfamily {
    'Debian': {
      $cli_path = '/usr/share/cacti/cli'
    }
    'RedHat': {
      $cli_path = '/var/lib/cacti/cli'
    }
    default: {
      fail "the cacti module does not yet support the ${osfamily} platform"
    }
  }
}

It is best to fail the compilation for unsupported agent platforms. The users will have to remove the declaration of the cacti class from their module rather than have Puppet try untested installation steps that most likely will not work (this might concern Gentoo or a BSD variant).

Classes that need to access the variable value must include the params class:

class cacti::install {
  include cacti::params
  file { 'remove_device.php':
    ensure => file,
    path   => "${cacti::params::cli_path}/remove_device.php',
    source => 'puppet:///modules/cacti/cli/remove_device.php',
    mode   => '0755',
  }
}

Similar transformations will be required for the cacti::redirect class and the cacti::config class. Just add more variables to the params class. This is not limited to the manifests, either; the facts and providers must behave in accordance with the agent platform as well.

You will often see that the params class is inherited rather than included:

class cacti($redirect = ${cacti::params::redirect})
  inherits cacti::params{
  # ...
}

This is done because an include statement in the class body won't allow the use of variable values from the params class as the class parameter's default values, such as the $redirect parameter in this example.

The portability practices are often not required for your own custom modules. In the ideal case, you won't use them on more than one platform. The practice should be considered mandatory if you intend to share them on the Forge, though. For most of your Puppet needs, you will not want to write modules anyways, but download existing solutions from the Forge instead.

Note

In Puppet 4 and the later versions, the params class pattern will no longer be necessary to ship the default parameter values. There is a new data binding mechanism instead. At the time of writing this, that feature was not yet quite functional, so it is not covered in detail.