LukesCfengineTutorial

From Cfwiki

Jump to: navigation, search

This tutorial was written by Luke Kanies for a cfengine class in 2004. It has not been updated since then, but should still be useful.

Contents

Introduction

This class purports to teach you to use cfengine effectively. We could call cfengine a scripting language and say you will learn that, but this would leave out the most important aspects of cfengine. We could call cfengine a network transport and teach you that, but this would touch only the smallest part of cfengine. The truth is that cfengine is a means to an end; you have to understand cfengine to use it effectively towards that end, but it is this end that's important, and it is working towards this end on which this class concentrates.

Cfengine is a means to managing hundreds, or even thousands, of machines without much more effort than it takes to manage tens of machines. It gives you the ability to perform parallel tasks across your entire network, such as patching an exploitable binary, making key configuration changes, or disabling the account of a user who just left the company. A study by Patterson out of Berkeley found that 90% of the problems that take down ISPs are the result of sysadmin error, but cfengine can drastically reduce that error rate by using the same methods to manage all machines. All this, and most of its functionality is immediately accessible.

The scope of this class is greater than cfengine itself, but it will mostly specifically focus on using cfengine to accomplish your goals. At the end of this class you will:

 * Understand cfengine syntax and structure
 * Be able to use the most important cfengine actions
 * Understand the basic problems of configuration management and how cfengine applies to them
 * Be able to set up a self-maintaining cfengine network
 * Know when and how to integrate external data and programs with cfengine
 * How to program external cfengine modules so that they integrate well with cfengine and follow the same basic principles


Intro to Config Mgmt

The ultimate goal of configuration management is to be able to manage the features of a network instead of managing the details. Rather than performing a given change on all systems you change a central store that defines your entire network and the affected machines automatically integrate the change. Although automation using tools like cfengine is critical for successful configuration management, abstraction is even more important. In fact, one could comfortably say that configuration management is the practice of providing and maintaining an abstraction layer between the human-readable functional goals of a network of machines and the explicit configurations that result from those goals.

The first and easiest step towards better configuration management is centralization, both of configuration data itself and of the logic and means for organizing that data. This can be as simple as copying configuration files from a central location to as sophisticated as configuring all services to directly connect to central services for key information, such as using a central LDAP repository for all user authentication.

Centralization itself is never sufficient. Even in the trivial case of copying configuration files around, the services being configured often need to be notified that their configurations have changed, which means that your configuration management system (CMS) must maintain a mapping between a configuration file, the service to which it is related, and how to force the service to reload its configuration. The centralized point itself is not something you can treat trivially, either; you need to worry about single points of failure, performance bottle necks, and service robustness. You also need some facility like version control for restoring a given configuration file to a known good state -- the only thing worse than no centralization is a centralized system that destroys your network.

As you expand beyond simple centralization and your CMS gets more sophisticated, it enables you to rethink your basic management strategies. Instead of identifying problems and picking a tool to solve them, you can step back and plan solutions that cover the life cycle of machines, services, and even your whole network. Rather than having islands of functionality in your monitoring tools, host install tools, user directory, host database, event manager, etc., you learn the value of integrating those tools to provide a vision of one network with its various facets.

We are only in the beginning phases of developing this level of sophistication -- it currently takes abnormal dedication and the production of significant amounts of glue code. We do, however, have tools that put us squarely on this path, and once we are using these tools, we are much better prepared for the next generation of tools. Cfengine is such a tool, and it can be an important first step in developing your CMS.

Cfengine is not the only configuration management tool out there. Some other examples are LCFG from the University of Edinburgh, BCFG from Argonne National Labs, ISconf originally by Steve Traugott with version 3 by Luke Kanies, and Radmind from the University of Michigan. All of these tools can be easily found online. Cfengine is definitely the easiest of these and has the greatest adoption rate.


Understanding Cfengine

Cfengine is a toolchain, a scripting language, a transport protocol, a parser, and a computer immunology experiment, and that's just the binaries. Cfengine has been developed over ten years by Mark Burgess at Oslo University College in Oslo, Norway; it began life as a means of bringing cohesiveness to a collection of shell scripts used to maintain a network of machines, and it has developed into the most capable tool for configuration management currently available under an Open Source license. It is available for download from http://www.cfengine.org.

To do anything with cfengine you must begin with its main language, understood by both the client and server. Remembering cfengine's origins as a wrapper for many scripts, its main task is to provide a single, consistent structure for roughly 25 different functionalities. These separate functionalities are called 'actions'. Fortunately, cfengine follows the 80/20 rule; you will get 80% of the benefit of cfengine from only 20% of those actions.

Although the cfengine actions themselves can not really be said to have a consistent syntax, they do have some consistent features: They provide the ability to desccribe an aspect of the desired state of the system and then attempt to bring the system into that state; cfengine actions are thus declarative. Performing a given action multiple times is functionally equivalent to performing it a single time, because each action understands enough about the tasks it performs to know whether that task is necessary or the system is already in the desired state; cfengine actions are thus idempotent.

The nature of attempting to force a state on a system means that there are limited state transitions available to cfengine; a system cannot go directly from entirely unconfigured to entirely configured in one atomic action but instead must go through a relatively linear set of state transitions. Cfengine often has to be run multiple times in order for all state descriptions to be implemented. In fact, the cfengine parser automatically performs two passes over each script to mitigate this effect.

An unfortunate but important aspect of cfengine is that it is almost entirely the product of one person, Mark Burgess, and Mark has maintained strict control over what is acceptable for cfengine to do and what is not, as well as the development process itself, which has resulted in a codebase that only its author could love. In your learning experience with cfengine, you will often be bewildered, sometimes a bit confused, and disappointed sometimes. Many of these problems are the result of inherent limitations within cfengine, either because of how it has been programmed or designed. Mark is always willing to discuss patches and bug reports, but he does not always agree that something is a bug.

Given that, cfengine is still the best thing we've got going by a long shot. We've gotten all of the background out of the way, so let's step more deeply into cfengine itself.


Using Cfengine

Cfengine has three basic syntactical objects and one basic function that can be used throughout a cfengine configuration. The three objects are actions, variables, and classes (which are essentially boolean variables), and the function is the equivalent of an 'if' statement operating only on classes. Cfengine's actions and classes form the structure of a cfengine configuration; actions divide the configurations by function and classes divide the configurations similarly to how 'case' statements do in a procedural language.

Use of actions and classes is straightforward, other than a wrinkle related to the 'actionsequence', which we'll get to soon. Cfengine configurations all follow this basic structure:

 <action>:
   <class test>::
     <text>

If the class test returns a 'true' result then the text of the action is performed. Once you get past this basic structure, each action gets control of the parser which means that there is a lot of variety. Cfengine largely ignores whitespace, but there are cases where it matters, and many parts of cfengine are case insensitive but enough of them are case sensitive that it is usually best to follow the case used in the reference. Also, cfengine's parser leaves interpolation of the variable up to the actions, which means that the exact same value in a variable can be treated dramatically differently by different actions.

Three actions stand above the rest because they don't necessarily affect the system but instead affect the operation of cfengine; because of this, I call them 'meta' actions, and I call the rest of the actions 'functional' because they do directly affect the state of the system.

The most important action is 'control'. This action provides the only way to set variables inside a cfengine configuration. Of special note is the fact that cfengine has many special variables that control how cfengine itself operates, all of which can only be set within the 'control' action.. Unfortunately, there is no visible difference between these variables and normal user-defined variables, and in many cases the variables perform like built-in functions but are syntactically equivalent to variables. Speaking of functions, the control action also has some functions available to collect data from outside of cfengine, but these functions can only be used to assign data to variables.

The only required variable in every cfengine configuration is the 'actionsequence' (although lack of the 'sysadm' variable will give a warning). This variable determines which actions actually get executed; if this variable is not set cfengine will complain about that fact and then exit. However, as your configuration matures this variable will come to mean less and less, because you will be reusing the same actions in different files, and those actions will have already been previously specified as being in the action sequence. One of the quirks of the actionsequence is that cfengine will basically never perform actions in the order specified in the configuration except in the case of especially small configurations. If you specify the 'files' action throughout your configuration, they will likely all be combined into one large execution. This is usually not a problem, and serves to help you move out of the mode of procedural programming and instead thinking declaratively, but it's an important point.

Once you have a mature configuration and you are no longer worrying about the actionsequence, it takes on its more functional role: It is used to specify external modules. If you find that cfengine's builtin actions are not sufficient, you can write your own modules and call them from within any cfengine configuration. You have to set the 'moduledirectory' variable so cfengine knows where to find the modules (you cannot specify a fully qualified path in the actionsequence), and all of your modules' names must begin with 'module:'. Although you can get seemingly equivalent function by executing your scripts with the ExecResult function in the control action, the big benefit of modules is that they can set classes and variables. You can pass data to a module, let it do some work (for instance, rebuilding /etc/inetd.conf), and then set a class (e.g., noting that inetd needs to be restarted). Note that 'actionsequence' behaves like a stack rather than a variable; you can only add actions to the action sequence, you cannot replace the existing list or remove items from the list.

Other than 'actionsequence' and 'moduledirectory', there are only a few special variables that nearly every configuration will use. An optional variable but one which nearly everyone uses is 'SplayTime' (notice the capitalizations; it is specific). This is a randomized sleep statement and is useful when you have many machines using cfengine. If you have one central cfengine server and 200 clients running cfengine out of cron, you don't want them all contacting the server at the same time. Setting 'SplayTime' will cause each machine to sleep a random number of seconds, with the maximum being what you set SplayTime to in minutes. Thus, setting it to 5 will cause each machine to sleep up to 5 * 60 = 300 seconds. This is a limited but effective load balancing system.

If you need SplayTime set, then you are using cfengine for networking, which means that you need the 'domain' variable set. Cfengine's networking protocol is entirely dependent on this variable. If the cfengine server and client disagree on what the client's domain is, then the client cannot connect, no matter how trusting the server might be. The server retrieves the client's domain through a normal 'gethostbyaddr' call, but the client has to specify the domain in its configuration. If you only have one domain then this is a simple problem because you can just hardcode it, but if you have multiple domains or subdomains then this can get pretty tricky. Even if you only have one domain you may want to retrieve it from an external location. I nearly always retrieve my domain name with a call to ExecResult, but this can also get complicated if you are depending on cfengine to set a machine up in the first place.

Another important setup variable is 'sysadm'. This should be set to the email address of your system administrator or a list of sysadmins. This variable is not actually used by cfagent but is rather used by cfexecd, which is the wrapper for cfengine when you run it out of cron; it parses your configuration specifically looking for this variable so that it knows to whom it should send it cfengine output.

We will cover other special variables as we need to, but these are the four you will likely all use in your configurations. The cfengine reference covers all of the variables in sufficient detail, although it is not exactly explicit about any limitations (such as when a variable name is case sensitive). Again, always follow the reference's example case for each variable unless you have specifically tested a different case.


Managing the cfengine configuration files

Because the cfengine reference, available from www.cfengine.org, is so complete, we are spared from having to discuss the basics of each of the important actions. Instead, we will focus on moving beyond technically understanding how to use an action to a more holistic understanding of when and how to use an action so that it fits better into your overall management ideology.

Cfengine configurations often follow a similar path to maturity: They begin life as a couple of simple checks using the 'files' and 'processes' action to verify that important files have the correct permissions and that important processes are running. Soon after this the 'copy' action is brought to bear because of how important it is to run against the most current configuration. From here the configuration grows organically to describe more of each machine. Unfortunately, few people take the time to step back and reasses the entire configuration, so we're going to do that before you get too far into your configuration. It is nearly impossible to create from whole cloth a mature configuration of any kind, and cfengine is no different, but hopefully with appropriate warnings and pointers you can get the right perspective on the issues.

Before you actually begin creating your configuration, do one thing: Name all of your files with suffixes instead of prefixes. I do not know why all of Mark Burgess's examples are named 'cf.<file>', but that doesn't mean you have to do the same. In fact, I highly recommend you do not do so. You can name your files however you like, and using the more standard suffix truly will be less confusing. Think of alphabetical sorting.

The first principle of cfengine configurations is that all information gathering should happen as early as possible, for two reasons. First, cfengine's parser runs through the given file first and then runs through imported files, which means that facts gathered in an imported file are not available to the file doing the importing. Second, the configuration will be much easier to maintain if there is a conceptual break between gathering information and acting on that information. You may have to just take this on faith for now, but it really is important.

Create a groups.cf or roles.cf file that defines all of the different roles or classes each of your machines should have. If you need to collect information from external sources, do it in this file. This should be one of probably only two files imported by cfagent.conf; the second should be a file listing all of the different files to import. This is important because if you import groups.cf in cfagent.conf and then try to use the information collected in groups.cf to decide which files to import, you can't. Cfagent doesn't know anything about imported files until it has finished parsing the file at hand.

It is also a good idea to create a schedule.cf file which defines all of the different scheduling options you want to use. You'll probably want somewhat longer tasks to run hourly, some daily, maybe even weekly; define the appropriate 'Hourly', 'Daily', and 'Weekly' classes, and then as you collect these lengthy tasks you can turn them on or off selectively based on these schedules. Also, create a master scheduling class called something like 'DoEverything' which can be set on the command line and will do absolutely everything. Once you have sophisticated scheduling in place it can be difficult to force everything to happen without this master class, but it is sometimes very important (such as when you are bootstrapping a new machine).

The second principle is that your files should be as small as possible. Do not create one giant configuration file with everything in it. Even worse, don't create three giant files that you selectively import based on some simple criteria such as operating system. This severely limits your code reuse, and is just generally confusing and difficult to maintain.

The third key principle is that related information should be together, preferably in its own file. If your cfengine configuration describes how to stop sshd, restart sshd, get the configuration for sshd from a central server, and do all the little things necessary to make sshd work, then put all that into its own file. You can make any changes you need in that file to support sshd across multiple platforms, and you'll know that if you have a question about how you've got sshd configured, that file has the answer.

If you instead put all of your 'processes' actions together and all of your 'copy' actions together, then when you get confused about why cfengine is behaving a certain way you'll have to hunt through many of your configurations. Even worse, even though your system may make perfect sense to you, when it comes time to hand that system off to someone else because your elite configuration management skills have caused you to be promoted, the hand-off is going to be so difficult that you're going to continue getting called. This is definitely one of those cases where simple really is better.

One note of warning about putting everything in its own little file: Remember that actions of a similar type are grouped together by cfengine and are seldom run in the order they appear in the configuration file, so you have no guarantee that putting declarations in a certain order in a file will result in work being done in that order.

Enough with making up principles. These cover the basics on how to organize your configuration itself, but there's one more key aspect of managing the configuration files themselves: Version control them. It doesn't matter if you use CVS, Subversion, Bitkeeper, RCS, SCCS, or whatever, but use something. Create a system that allows you to commit changes to your version control repository and will automatically pull those changes down to the cfengine server where all of the clients can retrieve them. Cfengine can do the updates on the server, so this is not very difficult, and it will drastically simplify management and auditing of your files, especially if more than one person is modifying them. We'll walk through an example of this later.


Beginning to use it

Now that we have a good system for organizing our work across multiple files, it's time to get started. Just kidding! It's time for testing! This is pretty important: Just because the manual says something is true, and it is reasonable for that thing to be true, does not mean you can just depend on it being true. The manual covers most things that are positively true and that the author of the manual uses (note the 'and' there) , but there are plenty of details about the function of cfengine that are either not documented or not used often enough for the documentation to be kept up to date. When you go to try something new with cfengine, test it first in a separate configuration file.

As you begin using this little test configuration file, you'll run into the fact that cfengine generally won't perform a specific task very frequently. By default, cfengine will not perform a task more often than every 5 minutes, although this can be tuned using the 'IfElapsed' variable. The mechanism cfengine uses for preventing this repetition involves storing timeout information in /var/cfengine/cfengine_lock_db, and when you are testing something these lock files really get in the way. It's basically harmless to just delete the lock database itself, but you can also run cfagent with a '-K' flag to have it ignore the locks.

Since you are testing some aspect of cfengine that you're not familiar with, you want to know exactly what it's doing so you can verify it is correct. Fortunately cfengine supports a large range of reporting, from parsing and runtime debugging to a verbose mode to almost no output.

You generally won't use debugging output unless cfengine is behaving in a way that just really confuses you and you want to understand what's going on behind the scenes. Other than debugging, cfengine has four basic levels of output: Errors, Warnings, Inform, and Verbose, in that order. Cfengine defaults to just reporting errors, and the other three levels can basically be controlled either via a command-line flag or a variable (although controlling via flag and controlling via a variable aren't necessarily exactly equivalent). Play around with the varying levels of output and what you get with those levels. One form of output may be more or less appropriate depending on the specifics of what you are testing. See the reference for the differences between the different levels.


Our first steps

Now we are operating cautiously, verifying that what we put into the configuration actually results in what we want. With this caution, we can begin building our configuration. As mentioned above, you will likely start with 'files' and 'processes', with maybe a bit of 'shellcommands' thrown in for good measure. Fortunately the cfengine reference is sufficient to get you through this. Once you begin transitioning from independent actions to organizing multiple actions around a single goal (e.g., making sure ssh is functioning properly) you need more than the reference.

Cfengine is about providing a language in which we can describe the desired state of our systems and then trying to bring our systems into that state. Few things are more important details about a system's state than the state of the services it is providing. Most services share a few traits: They've got configuration files (which are usually not unique to the machine they are running on), some setup that needs to be done for the service to function properly, processes that should always be running, and specific means of starting and stopping those processes. If you're very lucky, there are also defined mechanisms for testing whether a service is actually functional.

Even though this combination is a very common idiom in cfengine configurations and you are likely to use it with most of the services your servers provide, there is no established mechanism for implementing this idiom for a given service. Part of that is a result of problems with abstraction; it's difficult to abstract the differences in platform (e.g., how you start sshd on AIX vs. how you start it on HP-UX) and service (heck, even abstracting the difference between versions of OpenSSH that use Privilege Separation and those that don't is difficult). Part of the blame also lies on the cfengine community; there are ongoing efforts to build a community of cfengine users who share their code and their solutions, and as this community matures the better idioms will rise to the top and hopefully be well documented with good examples. And finally, part of the blame lies with cfengine itself in that it does not exactly make idioms like this easy. In fact, you could go so far as to say that cfengine makes abstraction surprisingly difficult.

Let's look at what a simple implementation of this idiom might look like for OpenSSH. This is too simplified to sufficiently set up a server the first time, but it should be sufficient to keep OpenSSH behaving properly.

control:
	actionsequence = ( files copy shellcommands processes )
	AddInstallable = ( restart_sshd )

files:
  /etc/ssh/ssh_host_dsa_key m=400 o=root g=root action=fixall
  /etc/ssh/ssh_host_dsa_key.pub m=644 o=root g=root action=fixall

copy:
  /cfengine/config/openssh/. dest=/etc/ssh/. server=${cfengine_server}
					ignore=CVS exclude=.#* define=restart_sshd

shellcommands:
  restart_sshd::
    "/usr/bin/pkill sshd"

processes:
  "sshd" restart "/usr/local/sbin/sshd"

This snippet runs the gamut. It makes sure the keyfiles are the correct permissions, gets the configuration from a central store, kills the OpenSSH server if the configuration has changed, and then verifies the daemon is running, which will restart our server if a changed configuration caused it to be killed. Notice, of course, that even though we're always verifying that sshd is running, we're never actually verifying that it is working properly.

This snippet introduces another special variable: AddInstallable. This variable is needed because when cfengine encounters a test on a class that is not currently set, it assumes that class is a hostname and discards the test itself. If you then later set that class through something like the definition used above, cfengine will usually not notice and will never execute the results of the test. A good rule of thumb is to mark as an AddInstallable any class that you set in any action other than 'control' and 'groups'.

As simple and reasonable as our snippet is, reality quickly causes even this small snippet to come crashing down on us. If you only use one platform and all of your machines have the same version of OpenSSH, then a mildly modified form of this snippet might do the trick for you, as long as this is your entire configuration. What if instead this is part of a larger configuration and earlier in your configuration you include 'processes' in the actionsequence before 'shellcommands'? If this is the case, then cfengine will correctly kill the sshd process but the 'processes' action will have already passed so sshd will not get restarted until the next time cfengine runs.

Yes, we could change the command that restarts sshd to include both killing the process and starting it, but then you have the exact same information (how to start sshd) in your configuration file twice. Then you have the issue of how to deal with the fact that different platforms might start sshd differently. There are ways around all of these problems, but there is no perfect solution, and discussing the issues you are likely to run into is frankly more useful right now than finding solutions.


A Note on Editfiles

One of the things that immediately catches most people's eye about cfengine is its 'editfiles' action, and it is the action that people spend the most time on. I personally almost never use this action, and I don't ever use the complicated features like looping. Yes, you can do simple operations with 'editfiles' to make your life easier, but the vast majority of the time you would be better served by just using a module to generate that file your editing. Slower? Yes. Better? You betcha. Do it right the first time.

I'm not going to say that there are no valid uses for 'editfiles', and I cannot say that your environment will not have perfectly valid long-term uses for it, but I have found that it is pretty easy to spend a lot of effort trying to get editfiles to work correctly when you could have invested the same amount or less time writing a module to generate the file and ended up with better functionality and more clarity.

Nonetheless, there are many cfengine users who have done some amazing things with 'editfiles', and the cfengine users mailing list is a great resource for some of these.


Networking

Now that we've got a taste for how to use cfengine we can delve into what makes it so powerful but which can also be the biggest headache: Cfengine networking.

Cfengine is only capable of pulling files down to the host that cfagent is running on, it is not capable of pushing files to one or many hosts. This is generally accepted as the best form of file distribution because you don't have to operate iteratively, which can take significant extra time, and you don't have to worry about whether a host is up when you do your updates -- as long as each host runs cfagent periodically while it is up, you know that each host is either up and updated or not available.

The protocol that cfengine uses to pull files down is similar to that of rsync in that it only pulls down files that have changed, which means that it operates idempotently. It operates slower than rsync, sometimes significantly (although 2.0.6 saw a large speed increase in file copying), but it provides some extra assurances that rsync doesn't, and the fact that it is integrated with the rest of cfengine easily puts it past rsync in terms of usefulness.

Cfengine networking uses the standard three layers of verification for connections: Access, authentication, and authorization. It provides tcp_wrapper-like access control, but it only works with IP addresses and IP address ranges. For small networks this might not be much of a problem, and very secure networks can just leave access control turned off, but if you have hundreds of hosts on a relatively open network, maintaining this list of IP addresses can be onerous. Fortunately, you can either generate the list of IP addresses (from some central store like LDAP) and then import it into your cfservd.conf, or as of around 2.1.2 you can use ExecResult in cfservd.conf to retrieve the IP addresses directly from an external source.

Once an IP address is allowed to connect it must authenticate. Cfengine relies on public-key cryptography for its networking, similar to how SSH works except that with cfengine both the client and server always have a key pair. By default the client and server do not trust each other, and (of course) by default they do not have each other's key. SSH solves this problem by prompting the user for whether s/he really wants to connect, but cfengine nearly always runs non-interactively which makes this not an option. There are three basic ways to get the public key exchange to take place: Copy them manually, initially trust the other end of the connection, or perform some kind of interactive exchange.

It is usually at least straightforward to copy the server key manually, because you can just distribute it with the cfengine package. This won't work in all situations, such as when you must create a new server but already have cfengine installed or when you are relying on vendor packages, but it's a good start. It is usually less straightforward to manually copy the client's key to the server; I've had the best success with using a script to push the key into a central repository such as LDAP and then automatically pulling the key down onto the server.

Initially trusting connections is usually actually the best method, as long as you are not on a completely open network. Once a machine's key has been retrieved trust is no longer relied on, so this is not a terribly large security hole. Because trust is configured with each copy statement, you have the option of always trusting certain connections or temporarily configuring a given copy to use trust and then reconfiguring back to an untrusted system once the new servers are working properly. It is usually possible to get networking functional without trust, but doing so can often make the job significantly harder.

Interactive key exchange nearly always takes place on the server in order to retrieve a new client's public key. This works well if the client already has the server's key, but success requires that the client must be running cfservd. You have the option of either writing a short cfengine script to connect to the client, which automatically stores the client's public key on a succesful connection, or using cfrun to perform the connection, which will prompt you for whether you want to store the key. Ultimately either of these methods is sufficient, but if you also already have sshd running on the client it is very likely easier to ssh to the system and retrieve the key that way, although you will have to configure the client to copy the public key to a world-readable location (strangely, by default cfengine's public key is only readable by root).

Now that your clients and servers can talk... Just kidding, they can't actually talk yet. We haven't fixed the domain yet. As mentioned earlier, if the client and server don't agree on the client's domain, then the client can't communicate with the server. Even better, just about any communication problem results in a message warning you that you may have forgotten the domain. Of course, mere presence of the domain is not enough, it must be correct as defined by the server. The client gets to determine its own domain, and it passes that information on to the server. The server does a reverse lookup on the IP address of the connecting host and verifies that the hostname and domain it comes up with are the same as the ones passed by the client. As mentioned earlier, this can be a bear of a problem to solve, but once it is working it usually continues working flawlessly. Thus, just because cfengine claims you forgot your domain does not mean it is a domain issue; more often I've found that incorrect key exchanges are the culprit.


Stepping outside

Now that we're successfully distributing files with cfengine, we need to have a plan for how to get the files there for cfservd to serve. For some files this is an easy answer; if you've got a static file that will never change but is important, such as the login banner for your servers, you plop it somewhere and tell cfservd to serve it out. Most of the files you'll be serving are configuration files, though, and these files will get changed periodically, often by different people.

For this level of dynamism you need version control. It doesn't matter what version control system you use as long as your cfengine server can do automated updates from it, but I'll use CVS for all of my examples because it is currently a de facto standard, and I'm lazy and I already know CVS well.

You basically need to set up a simple flow where a user commits a new revision of a file to the CVS repository, the cfengine server automatically pulls that new revision to a location at which the clients can reach it, the clients all pull the new revisions down via cfengine, and then they react appropriately to the new revisions (restarting servers or whatever). See Figure 1 for an example of the flow.

The first step is obviously to set up a CVS repository and get your files into it. Although the process of building and maintaining a version control repository is a bit outside our scope, there are some hints I can give that might make it a bit easier. One of the most important points is that because cfengine is functioning as a proxy between the repository and your clients, you don't have to organize your repository anything like the files might appear on disk. I usually create a single module for all files that cfengine will serve (maybe call it 'config'), and then I have two subdirectories under it, one for code that my site maintains (libraries for different languages, scripts, etc.) and one for configuration files I'll be pulling down. This code is specifically code that's going to be pulled down as is, not compiled, so it really only makes sense for scripting languages like perl and ruby.

In the 'code' subdirectory, make a subdirectory for the libraries for each of the languages you use, and then maybe a master 'scripts' subdirectory which will contain scripts you use to maintain the systems locally. In the files directory, I generally have a single subdirectory for all files that are independent, and then a subdirectory for each app that has multiple configuration files (like apache and SSH).

Notice that we're completely ignoring where these files will eventually reside and instead are focusing on making them easy to maintain. Let the computer do the work of putting them in the right place, and make your repository make sense to the people who will be using it.

If you need to have multiple versions of a file, the above setup might be inefficient, but if you have too many different versions of a file you may want to consider generating that file anyway, as it's very difficult for humans to maintain more than even one version of a given file, much less many different versions. It's a bit of a different story if you have the same application being used in decidedly different uses, such as different type of web servers using apache, but for standard configuration management purposes you should try to have as much consistency as possible across all of your machines. For those cases where you need only a couple different versions of a file, such as sendmail.cf for both clients and servers, remember that you can name the files whatever you want in the repository so you could copy 'sendmail.cf.client' to clients and 'sendmail.cf.server' to servers.

One other point: This setup is far easier if the CVS server and cfengine server are the same box, but it's not necessary. It's not especially complicated to set the CVS server up to allow automated pulls. Google can quickly find a tutorial on anonymous CVS access over SSH, and that tutorial can easily be adapted to provide your cfengine server with automated access to your CVS repository.

Now that you've got your repository in order it is time to start pulling updates with our server. I'll take the easy way for now, and then later on we'll come back and do it right. I'm also going to assume that your cfengine and CVS servers are the same machine, since it makes my life easier.

On your cfengine server, just pick where you want to store the files (I'll use '/cfengine') and check them out as root in that directory. Then, add the following snippet to your cfengine configuration:

shellcommands:
  "/bin/sh -c 'cd /cfengine/config; cvs up -d -P >/dev/null 2>&1'"

This just causes the server to do a simple CVS update every time it runs. Because this is updating the cfengine configuration itself for all of the other clients to retrieve, it's important that your cfengine server run without a 'SplayTime', so that is updates the configuration before any of the other clients connect. Your cfengine server itself might still take an extra iteration to be running against the latest configuration, but at least most of your clients will be as up to date as possible. If you feel like risking it, you could put this command in 'update.conf' so that it happens before the normal cfengine server run starts, but it's probably not worth it.

Now we have a very straightforward process that abstracts the steps between committing changes to our repository and those changes appearing on the client. It's all automatic, even if it takes a few minutes. Even better, if we set up our configurations correctly we can be assured that the configurations will not only appear on the clients but that the clients will react appropriately to these new configurations. In essence, we've created a process whereby you commit a change to a configuration file and a few minutes later the appropriate service has taken that change into account. This is a very powerful mechanism.


Really stepping outside

Now we have stepped outside of cfengine just a little bit, but there's much more to do. Cfengine's builtin actions provide a great base level of utility, but you'll quickly find that you want to do things that cfengine doesn't easily enable. The first stages of this are retrieving data that cfengine does not automatically retrieve for you.

Starting with a simple example, we have already discussed how critical the value of the domain variable is, but cfengine has no facility for retrieving this value. If you already have your domain set in the system such that /bin/domainname returns the correct domain, then you can easily just run 'domain = ( ExecResult(/bin/domainname) )' to get the domain. But if you were hoping to use cfengine to initialize your machines in the first place, you cannot depend on this preexisting information. Again, of course, if you only have one domain this will not be problem for you, but for multiple domains or subdomains, the following is a more complicated example of how to step outside of cfengine to get the domain name:

control:
  domain = ( ExecResult(/bin/sh -c "domain=`${nslookup} \`${hostname}\`
    2>/dev/null| grep Name | awk '{print $2}' | sed 's/[^.][^.]*\.//'`; echo
    ${domain:=default.domain.com}") )

You cannot put this into an external script, because you would plan on using cfengine to pull this script down and cfengine's networking is not yet set up correctly. This is a common problem in bootstrapping; you can easily put yourself into a catch-22 if you are not cognizant of your assumptions. The above example provides a default domain, so that if a host does not find itself it can still try to contact the server, and as long as the server is set up to support that default domain (such as having it as an alias in /etc/hosts), you should get functional networking. One unfortunate aspect of the separation between update.conf and cfagent.conf is that the above setting will have to be in both files.

As your cfengine configuration matures you will begin describing hosts based on the roles they hold in the network. You definitely should already have a 'cfengine_server' class or something analogous, but you probably want most of your servers to be treated specially, because most of your servers perform specific functions on the network. Obviously you can specifically configure that role information in cfengine, but if the information is already available elsewhere then you can instead configure cfengine to retrieve it. If you name your hosts by their role (which is usually a bad idea) then you can parse that role out of the hostname. It is straightforward to write a small script that parses the roles out of the hostname and echoes them on standard out, and once you've got that script, you can do the following:

control:
  AddClasses = ( ExecResult(/usr/local/scripts/nameinfo) )

It is acceptable to put this into a script because we can assume that the system is functional at this point. As long as the script in question is pulled down by all hosts and thus does not depend on the role information, you will be fine.

However, here's where we encounter a wrinkle. If you are using an external script to add more than one class, when the script prints the classes it has to separate them with a specific character, and nope that character is not a space. In the case of AddClasses it is ':', but the character can change depending on the variable you are setting. A patch accepted around 2.1.3 enabled most cfengine variables, including those in cfservd, to accept multiple terms and split them appropriately based on commas or spaces, but for variables like AddClasses and AddInstallable (which uses '%' to split results) I don't believe this functionality was added. When you pull data in from outside, just make sure you test it all first; the variables all behave pretty differently, as you can see.

Cfengine supports more methods of retrieving data from external sources, such as reading directly from files. Basically, if you have an existing data store with information you want to use to make decisions in cfengine, it is not that difficult to pull that data into cfengine. It may take some testing and playing around, but once you get it working it will be very stable.


No, Really, We're Really Going to Step Outside This Time

You can step outside of cfengine for much more than decision making. We've already seen that you can use the 'shellcommands' action to perform actions that cfengine does not support, but cfengine also supports a module-like interface through which you can perform work outside of cfengine and then change the cfengine state based on that work by setting classes or variables. There are a lot of ways this is useful, but I'm just going to demonstrate my favorite idiom, one which I use for many different external functions.

For a number of different reasons it is very difficult to use cfengine to manage the 'inetd' configuration file, /etc/inetd.conf. Sure, you can use 'editfiles' to make sure certain things are or are not uncommented, but it's very difficult to get a clear picture of which hosts have which services enabled; the more you use editfiles for this, the more text you have obscuring what's actually happening. If you instead write an external module capable of managing the contents of the file and just use cfengine to determine what goes in that file, you can get a much less verbose and thus much more clear idea of what is configured where.

For this to work you need to create your inetd module and set the value of 'moduledirectory' to point to it. I always create it as a subdirectory under /var/cfengine and make sure update.conf is copying that whole tree, not just /var/cfengine/inputs. You also need to have a data file, which I store in /var/cfengine/data, that contains all of the data that could go in an inetd.conf file. You can pick whatever format you want for the data file -- only your module is going to parse the file -- but I usually just create a perl hash of hashes in the data file. My module, which is written in perl, can just source this directly and have all of the data available. The top level of the hash is the names of each of the potential configuration lines with the values being a hash of the actual data. With inetd, this is a small sample of what that data might look like:

$VAR1 = {
    'telnet_hp' => { # telnet stream tcp nowait root /usr/lbin/telnetd telnetd -b /etc/issue
        'wait' => 'nowait',
        'endpoint' => 'stream',
        'command' => 'telnetd -b /etc/issue',
        'binary' => '/usr/lbin/telnetd',
        'protocol' => 'tcp',
        'uid' => 'root',
        'name' => 'telnet'
    },
};

The name we're using is meaningful only to humans, so you can give it whatever name you want. Now you've got a module that can parse this file, which means that you've got a module with every potential configuration line for inetd; now you just use cfengine to tell the module which lines to configure:

control:
  AllowRedefinitionOf = ( inetd_svcs )
  AddInstallable = ( restart_inetd )
  inetd_svcs = ( "" )

  sun::
    inetd_svcs = ( "${inetd_svcs} rstat_sun metad" )

  hpux::
    inetd_svcs = ( "${inetd_svcs} rstat_hp" )

  any::
    actionsequence = ( "module:inetd ${inetd_svcs}" shellcommands )

shellcommands:
  restart_inetd::
	  "/bin/pkill -HUP inetd"

Very simple. You are just collecting the list of services in a variable, and then passing that variable on to your module. Your module then parses the data file, retrieves the configurations for each of the named services, and then builds the new /etc/inetd.conf file. Up until this point you could have done this as well with a 'shellcommands' action as with a module, but you obviously want to force inetd to reread its configuration if you change the configuration. Note the initialization of the 'inetd_svcs' variable; you must always initialize variables which you later reference in their own values. Behaviour resulting from this lack of initialization is currently undefined and is likely to change.

Successful use of this module requires two important additions to it: First, the module itself should operate idempotently, meaning that it should only write the file out if the contents have changed. This is easily accomplished by getting the MD5 hash of the old file and the newly configured file; if they're the same don't do anything and if they're different than write the new file. The second addition is that if the file is rewritten because of a change then the module needs to set a class that results in cfengine restarting inetd. If the module prints '+restart_inetd' on STDOUT, then cfengine will set that class, and then you can just have a test that restarts inetd if that class is set. Notice that our snippet above already lists that class as an installable class so the parser won't pass by it.


Generating Cfengine Code

If the above module itself operates idempotently, could we not reuse some of cfengine's builtin idempotent operations rather than writing our own by generating a cfengine script that wrote the file, instead of us writing it? In the general case, yes, although it might be a tad overkill to use in the above case, as cfengine's file editing capabilities focus more on chunks of files and are not well suited to creating entire files from scratch. To do so you will have to use editfiles, which (you may have noticed) I find to be more trouble than it is worth in most cases, and especially in the case of generating files from scratch.

There are many cases where you will want to generate cfengine code, though. One of the most common cases is where you need some kind of iteration. Cfengine has a limited form of implicit iteration available in some actions, but it is not available everywhere and does not always operate as you want. When you want better iteration than cfengine provides, you are limited to creating an external module that accepts the list of information to iterate across and then turns that list into cfengine code with one copy of the task for each piece of data.

Another good reason for generating cfengine code is that it can provide much-needed abstraction, almost like writing a subroutine although a bit more complicated and less functional. Cfengine does now provide a method interface, but it's an interface only a mother could love and I still find it easier to generate my code in most cases.

Once you've decided to generate your cfengine code, limitations of the cfengine parser force you into one of two situations: Either execute the code with a completely separate cfagent instance, thus losing the context developed in the main instance, or write the code to disk and import the code the next time cfagent runs normally. Because cfagent cannot do run-time importing of files and has no 'eval' function, these are your only options. Which one you choose is largely dependent on how independent your code is and how time-sensitive it is. It's relatively straightforward to write a module that creates some code that it writes to disk, and then have cfengine only import the code if it exists.

It is a completely qualitative judgement as to when to generate cfengine code versus when to create your own idempotent operations in your script. Sometimes it is not too difficult to turn your work into a cfengine script, but sometimes you can only easily use cfengine for 75% and you have to shoehorn the other 25%. Either way it is important to have this ability in your arsenal, because there are plenty of tasks you have to give up on if you can't generate the code for them.


Moving Towards the Big Picture

We've added quite a bit of functionality. We're automatically distributing any new revisions of files to all of our systems and those systems are responding correctly to the changes. We're pulling in any information necessary to decide how to configure our systems. We're also using cfengine to make decisions about work it can't do on its own so that at the least cfengine is the single configuration store. Now we can begin seeing our network as a whole, rather than just a bunch of hosts. You can begin managing your machines by their roles, and you can perform large scale changes with the flip of a switch. You can see that the above system makes it trivial to add or remove an inetd serivce on any machine or class of machines. Decide that you aren't going to use rstat any more? Just remove that from the list above and commit the changes; a bit later, you won't have rstat anywhere. Building an ftp server? Just add ftp to the list of services on the ftp servers and you're now running ftp.

This is the point at which an almost magical transition occurs; you cease thinking about the details of the changes you make and instead begin thinking about the meaning behind those details. You are not worrying about where the inetd configuration file is or how you change it to configure ftp to run; you know you need ftp and you know how to refer to it within your inetd data file, and the system takes care of the rest.

You can also begin using cfengine to asynchronously tie information sources to information consumers. If you don't have a authoritative source for system data, such as an LDAP directory, now is a good time to create it. You can use cfengine modules to populate this directory, and then use more cfengine modules to source system data from this directory. Without cfengine it is relatively difficult to trawl all of your systems collecting facts, but with cfengine you can do this pretty easily, and this puts much more information at your fingertips.

This is probably near the limit of cfengine's ability. To get much more functionality you need to begin generating cfengine code, and to cross the horizon to automatically managing all aspects of your network you will likely have to rely on (and possibly write) the next generation of tools. There is a panoply of tools and languages meant to manage a greater extent of your network than cfengine is capable of, but no tool or language has stepped forward as the front-runner.


Security, New Servers, and Disaster Recovery

Now that we have cfengine managing our network, we can use it for three crucial but time consuming duties: Managing security, building new servers, and planning disaster recovery.

Most organizations these days have security policies, but these policies are usually just text documents that someone is required to walk through and enforce on each machine. With cfengine you can usually make executable security policies based off the original text; these are just cfengine scripts which enforce the policy described in the documentation. It takes more work than just writing some text, but once you've done that work you now know that all of your machines accord with your security policy and you don't have to spend time doing it yourself.

We have mostly discussed managing your network with cfengine as though you are managing a network of preexisting machines. Your cfengine configuration should excel at building machines from scratch, too; set up your host install tool (you are using a host install tool, aren't you?) to install cfengine and maybe do some basic configuration (such as exchanging keys), and then let cfengine bring that machine into the correct state.

This ability to build machines from scratch plays beautifully into disaster recovery planning. There are basically two disaster recovery scenarios -- you maintain an equivalent network somewhere else, or you plan to build a new one as quickly as you can. Cfengine can help you with both scenarios, because you can just copy your configuraton to the parallel network and know that cfengine will keep it in sync with the main network, and you can use your integration between cfengine and your host install tools to better prepare to quickly reinstall most of your network in the event of a disaster. If you are planning for a full reinstall, though, be aware that you should maintain backup servers for cfengine and your host install tools in the second data center; it's quite impossible to use cfengine and your host install tools to rebuild a network without servers running cfengine and your host install tools.


Computer Immunology

As mentioned earlier, in addition to all that we have covered cfengine is also a computer immunology experiment. In fact, this aspect of cfengine has come to be cfengine's author's primary focus. This experiment is definitely in its early stages and basically provides some basic system trending data and the ability to respond to different states that data can reflect.

Cfenvd is collector and processor of this data, and cfagent automatically sets classes based on the data in cfenvd. Cfenvd is essentially not configurable; all of the data it collects is hard-coded and the daemon collects everything it can if it is running. The theory behind this experiment is that cfenvd comes to have a good picture of what a system should look like, as defined by the different traffic and load patterns it experiences, and then cfagent sets classes based on those patterns and thus enables you to respond to the different states.

For instance, cfagent might set the class 'www_in_high_dev3' based on data in cfenvd. This means that the web traffic coming in is three standard deviations above normal. If you've got some automated process you can kick off based on this data that might correct the situation, but the normal case is instead that you could emit an alarm of some kind. Cfengine makes it easy to have an alarm sent out through cfengine's normal warning email using the 'alerts' action, but you could also execute a script (possibly tied into your monitoring system) or generate a syslog message.

Unfortunately, cfenvd breaks one of the cardinal rules of configuration management: It collects extremely useful data but then makes it difficult to integrate that data into existing services that might be able to use it. Many companies already have applications like Cricket that graph system data over time, and it would be great to be able to integrate cfenvd into these applications, but not only is it not easy to get the data out of cfenvd, it's not even possible to get the raw data -- it's only possible to get the already-munged average data.


Conclusion

Configuration management has only recently been recognized as a single problem domain, one which might be solved by one suite of tools. Given that recent acknowledgement, it is no surprise that the tools are not yet mature and there are not yet standard solutions.

Cfengine was one of the first tools to make it possible to approach the problem of configuration management and it continues to stand out as one of the most capable tools available. It has some serious limitations, but those limitations are mitigated by its support of external modules. It is reasonable to expect that the next generation of configuration management tools will clearly surpass cfengine, but in the meantime it belongs in the toolbox of every sysadmin.

Copyleft Luke Kanies, 2004-2006. Some rights reserved. Feel free to republish/remix/rethink as long as you retain a notice that the original version was published by Luke Kanies.

Personal tools