March 2010 Archives

[Howto] PostgreSQL and Linux Memory Management

| No Comments | No TrackBacks

postgreslogo.pngThe OOM-Killer can cause nasty surprises on machines with a heavy memory load; processes are cancelled or terminated without warning. Fortunately, this behaviour can be adjusted with some clever kernel tweaks.

Administrators of Linux machines with a very high RAM-Usage are sometimes faced with a terrifying scenario: the Linux OOM-Killer (OOM = Out Of Memory). In situations such as a crashed PostgreSQL instance, the following entry can typically be found in the server log:

Out of Memory: Killed process PID (Prozessname)

Why is this?

Virtual Memory and Overcommit

Virtual Memory used by Linux can be allocated in a number of ways: malloc(), mmap(), Swap, Shared Memory, to mention some examples. It is possible to overcommit virtual memory by allocating more than is actually available in the system. If this happens, a so-called "OOM-Condition" occurs; that is, your system no longer has any available space in the virtual memory area and cannot allocate any more. This is when the OOM-Killer is activated - and does what its name suggests: kills any processes which meet certain conditions in order to free memory.

If you have an environment where servers are running PostgreSQL in parallel with other memory-intensive processes on the same machine, it's likely that the OOM-Killer will kill certain PostgreSQL processes. Due to the amount of allocated shared memory and the memory usage of each backend, the OOM-Killer will target PostgreSQL by preference since it counts the complete addressed shared memory area of all backends into summary.

The amount of committed memory of your system at a given time can be examined with the /proc-Filesystem:

$ grep Commit /proc/meminfo 
CommitLimit:    376176 kB
Committed_AS:   265476 kB

This example shows the current amount of committed memory at 265476 kB (Committed_AS). Is this equal or even larger than the amount of Committed_AS the OOM-Killer is likely to be woken up.

However, the kernel provides some interfaces to adjust the behaviour of the OOM-Killer and Overcommit with regard to PostgreSQL installations.

Turn off Overcommit

A radical method is to turn overcommit off entirely, although this is only recommended on systems dedicated to PostgreSQL. The overcommit feature can be configured within three categories with the following kernel parameter:

vm.overcommit_memory = 0

This can hold three different kinds of categories:

  • 0: Allow a careful strategy of overcommitting memory: small and reasonable amounts of overcommitting allocations are allowed, but heavy and wild allocations will be denied. In this mode, root can allocate more space than unprivileged users. This is also the kernel default setting.
  • 1: Allow overcommit without any constraints
  • 2: Turn off overcommit. The effective allocatable memory space cannot be larger than swap + a configurable percentage of physical RAM.

The fraction of physical RAM used by category 2 is defined by the parameter:

vm.overcommit_ratio = 50

While vm.overcommit_memory=1 is useful when tuning certain applications, the categories 0 or 2 are the best ones to use most of the time. If you turn off overcommit with vm.overcommit_memory=2, a process will get an "out of memory"-Exception (depending of vm_overcommit_ratio) if allocating memory when no more free space is available. Depending on the distribution you are using, we recommend that you save those settings in the configuration file /etc/sysctl.conf to ensure that they are activated on server reboot.

$ echo "vm.overcommit_memory=2 >> /etc/sysctl.conf
$ echo "vm.overcommit_ratio=60 >> /etc/sysctl.conf
$ sysctl -p /etc/sysctl.conf

Changes to those parameters are activated immediately. You can recheck this by consulting /proc/meminfo:

$ grep Commit /proc/meminfo 
CommitLimit:    401440 kB
Committed_AS:   266456 kB

The machine has 249848 kB of swap and 252656 kB physical RAM.
According to the formula swap + vm.overcommit_ratio * RAM this results in a CommitLimit of 401440 kB

Configure OOM-Killer per process

Where PostgreSQL is running without dedicated server hardware and in parallel with memory-intensive middleware (e.g. JBoss- or Tomcat-Installations), most admins would prefer to be able to control the OOM-Killer on a per-process basis and allow overcommitting of memory allocations. Since kernel 2.6.1, Linux has been providing an interface for tuning the OOM-Score of a process, which will in turn increase or decrease the affinity of the process to be killed when running in an OOM-Situation. This interface allows a very flexible configuration of processes in such environments regarding their memory requirements. The interface is exposed by the /proc-Filesystem, for example here on a PostgreSQL-Installation on Debian:

$ cat /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj
0

Values allowed range from -17 to +15, a negative value decreases, while a positive value increases the likelihood of being killed by the OOM-Killer. -17 is a special value and turns killing the process in an OOM-Situation off.
The settings are inherited from parent to child processes; in PostgreSQL you'll have to set this one to the PostgreSQL master process:

$ echo -17 >> /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj
$ psql -q postgres
test=# SELECT pg_backend_pid();
 pg_backend_pid 
----------------
           3429
(1 line)

test=# 
[1]+  Stopped                 psql -q test
$ cat /proc/3429/oom_adj
-17

The disadvantage of this method is that all child processes will now be excluded from the OOM-Killer, which is not generally what DBAs prefer. For example, where you want to protect the PostgreSQL system processes (like background writer oder autovacuum) from being killed by the OOM-Killer, but still kill ordinary database connections when running out of memory.

To set the OOM-Score you need to have a privileged user, so the best way to implement this setting is to put it into your PostgreSQL start script.

Enhancements in PostgreSQL 9.0

PostgreSQL 9.0 will have additional support for the pictured /proc-Interface. On one hand PostgreSQL 9.0 will come with a new Linux start script, which supports setting the oom_adj value before starting up PostgreSQL; on the other hand it is possible to build PostgreSQL with the special C-Macro LINUX_OOM_ADJ defined, which will allow DBAs to limit the inheritance of the OOM-Score to backend childs as shown in this example:

$ ./configure CC="ccache gcc" CFLAGS="-DLINUX_OOM_ADJ=0"

This method will save the PostgreSQL system process but will allow the OOM-Killer to kill database backend processes running amok.

Alternatives

An alternative solution is available by an additional kernel patch. This extends the existing /proc-Filesystem with a list of process names which should be excluded from the OOM-Killer. However, this patch is an unoffical extension to the Linux kernel and you may have to maintain your own builds of Linux kernels. In addition, it is not nearly as flexible as adjusting the OOM-Score and process names are not useful for uniquely identifying processes (e.g. Java- or Perlbased processes).

Summary

The Linuxkernel provides a comprehensive interface to adjust processes regarding their memory usage and the OOM-Killer. The most flexible method is the introduced /proc-Filesystem with the oom_adj-Interface. PostgreSQL 9.0 will have additional support for this interface. Dedicated PostgreSQL-Systems can be configured to avoid overcommit at all, but will need a deeper understanding of the number of memory resources the database system demands and the requirements of the VM of the kernel.

This week, credativ launches its Open Source Support Card. With this card Open Source Support can be bought at a fixed price - without a binding contract.

After a long preparation phase we are now offering our trusted services in a new, simple format; with the Open Source Support Card you get a fixed contingent of project-specific, pre-paid services.

Sup_Card_front.png

Customers using the Open Source Support Card have the unique advantage of full cost control; the card can be purchased as a product, without any obligation to sign an agreement for a specific length of time. This may be of particular benefit to larger companies, where new contracts have to be reviewed and cross-reviewed before they can be authorised. A summary of the advantages of the new pre-paid support format include:

  • Open Source Support for a specific project
  • Support not restricted to a specific number of desktops and servers within a company
  • A tempting price model, starting at just £480
  • Full control of costs
  • Support available via telephone, e-mail and remote access
  • Bilingual support - help given in English or even German, if required! ;-)
  • Cost of support NOT determined by the number of CPUs or users
  • NO binding contract - easy way to purchase
  • NO call centre - direct access to the experts
  • Support units can be used for the following services:
    • administration
    • installation (remote)
    • consultancy

All support is provided to the usual credativ standard. Just as you would expect from our usual contracts, the cost of the service is not determined by the number of CPUs, users, or DB entries. Support units purchased through the Support Card can be used for all related problems within a company - no matter which workstation or server they come up on. The support itself is provided by our Open Source Support Centre: you won't have to deal with non-technical staff or battle through FAQ scripts - our Linux experts and Open Source specialists are on hand to take calls directly. Many of us are actively involved in contributing to a number of Open Source projects - as regular readers will already be aware. ;-)

The new Open Source Support Card is also an exciting development for the wider Open Source community. By offering yet another attractive support option for free distributions, we hope to prove that there is now no reason not to consider Debian and CentOS as viable alternatives to commercial distributions.

The Open Source Support Card is designed and marketed in such a way that resellers can also get on board, making access to support that bit easier for consumers: imagine purchasing your server online and while you're at it being able to drop a Support Card into the shopping basket as well - Open Source Support with just one click!

Currently the Support Card is just available for Debian and CentOS in the UK and in Germany, although we will soon be offering it in the US and Canada too. If you have any questions or comments we'd be pleased to hear from you - we've put a lot of effort into this new product, and are looking forward to the response from our customers and the wider community.

bash.pngMany digital cameras today do not just save an image, but also save various meta data in the Exif standard. This data includes information about the position of the camera when the image was taken (such as vertical or horizontal). However, some image programs use this data to rotate the image when displaying it while others don't, leaving the user to face inconsistent behaviour.

This can be fixed with the tool exiftran; it automatically rotates all images according to the Exif data, which it discards afterwards. It is also very easy to use for mass conversion:

# apt-get install exiftran
$ find -print0 | xargs -0 exifautotran

This tool might be shipped by other distributions under a different name. Fedora, for example, calls it fbida.

bash.pngThe tool chain of a sys admin should always be comprised of effective tools. Today we are introducing the package sysstat.

Sysstat is a collection of command line tools dedicated to providing the system administrator with a quick overview of the performance of the system. They work as front-ends to the Kernel and therefore can never provide more data than the Kernel itself gathers, although the interface is much more user-friendly than querying Kernel parameters manually.

iostat

iostat is the way to go if there are problems with the throughput of a disk, NFS storages or the CPUs. For example, if your system is behaving strangely, iostat can be used to identify I/O waits:
Linux 2.6.31-19-generic (mymachine)         04.03.2010      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11,82    0,29    3,44    1,25    0,00   83,20

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               9,39       161,19       168,44    4264806    4456696
There are many options available for iostat but these are the most interesting, and they deal with specific outputs:
-d
Just show the hard disk data.
-c
Just show the CPU data.
-p
Show the I/PO data for each partition.
-n
Show the I/O data for the NFS partitions.
-x
Extended information for the hard disks.
-t $NUM1
Tells the program after how many seconds the result should be refreshed.

mpstat

mpstat is the next tool in the chain: it helps when analysing the CPU load. If you call it with no options, the default information will be shown, in the same way that the iostat results are.
Linux 2.6.31-19-generic (mymachine)         04.03.2010      _x86_64_        (2 CPU)

17:01:52     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
17:01:52     all   11,96    0,29    3,26    1,23    0,10    0,11    0,00    0,00   83,06
In contrast to iostat, you see the actual load on hard and software interrupts. The option -A extends this information further: for each processor, the statistics and interrupts per second are shown.

If you add an int $NUM after the command, the process runs without end and refreshes the output every $NUM seconds.

pidstat

pidstat concentrates on the processes itself: it shows a list of all processes. The option -C enables you to filter these by a given string:
Linux 2.6.31-19-generic (mymachine)         04.03.2010      _x86_64_        (2 CPU)

17:02:32          PID    %usr %system  %guest    %CPU   CPU  Command
17:02:32            1    0,00    0,00    0,00    0,00     1  init
17:02:32         2888    0,00    0,00    0,00    0,00     0  start_kdeinit
17:02:32         2889    0,00    0,00    0,00    0,00     0  kdeinit4
The additional option -d shows I/O statistics about the given processes, -p takes the PID as an argument to focus on known processes. Finally -r brings up an overview of the memory load.

Again, an int $NUM after the command lets the process run continuously, refreshing the output every $NUM seconds.

sar

All sysstat tools so far have had one flaw, only showing a snapshot of the current state and unable to look into the behaviour of the system in the past or during load time. Such information must be collected in the background, which is exactly what sar and its tools are all about: it collects the performance data of the system every ten minutes via cron job. If you call the tool with the default values you get a first impression:
Linux 2.6.31-19-generic (mymachine)         04.03.2010      _x86_64_        (2 CPU)

09:30:30          LINUX RESTART

09:35:02        CPU     %user     %nice   %system   %iowait    %steal     %idle
09:45:01        all     17,38      1,02      5,10      3,87      0,00     72,63
09:55:01        all     11,90      0,27      2,86      0,75      0,00     84,23
10:05:01        all     10,20      3,52      3,46      2,55      0,00     80,27
10:15:02        all     12,96      0,32      3,18      0,65      0,00     82,89
10:25:01        all      7,94      0,18      3,17      2,42      0,00     86,30
10:35:01        all     12,41      0,89      4,55      0,56      0,00     81,60
10:45:02        all      8,97      0,09      3,51      0,89      0,00     86,55

All possible information can be collected with sar -A although the amount of output will be too much for any screen size. There are too many options involved in decreasing the output with sar to cover here, but they are discussed in detail on the man page.

OpenGL 4.0 released

| No Comments | No TrackBacks

290px-OpenGL_logo.svg.pngDuring the Games Developer Conference the Khronos Group released new versions of its widely used 2D and 3D graphics API, OpenGL. Besides the expected Version 3.3, they also released OpenGL 4.0.

From 2004 - 2008 there was so little movement from OpenGL that it was practically announced dead, falling badly behind its competitor DirectX. However, since 2008 the responsible Khronos Group has regained its footing and focused on the further development of the graphics specification. In 2008 version 3.0 was released and in 2009 two more releases followed. 3.3 was on the cards for this year's Games Developer Conference, and OpenGL did not disappoint. However, the audience was caught by surprise when the immediate availability of OpenGL version 4.0 was also announced.

According to the Khronos Group, OpenGL 4.0 offers an API to access the most modern hardware features available - putting it on an equal footing with DirectX 11. Up until now DirectX 11 has been far ahead of OpenGL - 3.x was focused on older hardware that was consistent with DirectX 10 (no Tesselation, etc.). Additionally, the press announcement mentions explicitly that the new version was designed with developers in mind, to enable them to create OpenGL-based programs and games significantly faster and more efficiently.

However, the new API is only relevant if there are graphics drivers implementing it: AMD and Nvidia have both promised new drivers soon, and since in both drivers the OpenGL stack for Windows and Linux is almost the same, Linux support for OpenGL 4.0 shouldn't be far behind. It will take longer for open source graphics drivers; the OpenGL implementation Mesa3D does not yet even offer OpenGL 3.x support. However, after a long pause this project started moving quickly again two years ago; at the moment we are seeing new releases on a monthly basis so let's hope support will follow in the near future.

We at credativ would like to see a more transparent process in the Khronos Group - after all, Open Source Services and Support are our main business, and OpenGL is an important part of the Open Source ecosystem. But nevertheless, congratulations on the new release!

klogo-official-oxygen-128x128.pngPhoronix has used its Test Suite to compare the memory and power consumption of different desktop environments. However, the results should be handled with care.

The "Power & Memory Usage" test was done to evaluate whether XFCE and LXDE consume less power and memory compared with their "big" siblings KDE and Gnome. The tests were done on a stock Ubuntu installation. At first glance the results suggest that KDE consumes much more power than the others. However, these results are misleading.

For a start, measuring memory consumption is all but an easy task and requires a lot of thought. The problem is that many applications share a certain amount of memory and this is especially true for KDE. Very few programs out there can handle this shared memory properly while still showing the memory consumption of a process. "Top" and "free" are worthless in this regard; if you need to use any tool, take "exmap"! A more detailed analysis of this problem was done here.

Yet even if the memory consumption could be measured perfectly, there is still the matter of what this memory is being used for. Gnome and KDE do need quite a chunk of memory right at the start, but that is mainly due to the larger libraries, which offer a number of functions for other programs as well. It is for this reason that the memory measurement done with a set of programs shows different results, indicating that a Desktop with larger libraries has a much slower growing fingerprint for each additional application compared to a Desktop with smaller libraries.

However, even after taking this point into account, you might still ask, "what do you get?" KDE and Gnome usually offer a file indexer and tagger by default - whether these are running or sleeping can massively influence the measurements in tests such as these. Also, both desktops come with some heavy 3D effects. If you turn these off by default the memory footprint and the power consumption is of course much lower.

A comparison between a full featured desktop vs a Desktop with much less functionality is brainless - you could also add a turned off machine to the test and declare it the winner because it has the smallest memory footprint and power consumption!

To clarify: in general the Phoronix Test Suite can be used to evaluate certain data or at least trends in certain data - but data acquisition requires fixed surrounding conditions as well as a detailed discussion of the compared objects and their features, not to mention a detailed (read: scientific) analysis of the results. Throwing numbers around is not enough - if you are going to do so, you should at least include accurate details!

The Power and Memory Usage test is lacking in all these factors, so it is unfortunately worthless. :/

New Blog engine

| No Comments | No TrackBacks

Movable_Type_logo.jpgThis week we moved our blog to a new blogging software: Movable Type. Besides the technical advantages it is now much easier for us to display the same article in different languages.

credativ is a multinational corporation with employees of different nationalities. This became problematic when we first started experimenting with our company blog: articles were published in English as well as in German and most of them translated into both languages... the site quickly became a mess.

To get around this problem a new workflow was designed - but unfortunately the blog engine we were using at the time, Wordpress, couldn't really accommodate it. After a short evaluation we decided to move over to the internally well known Movable Type, which efficiently supports setting up several blogs side by side with a central configuration management.

So now the new system is in place on the old address blog.credativ.com. But there are two new addresses you, our readers, should keep in mind:


Due to the new structure you can also access feeds according to topic without having to be irritated by double posts in different languages.


So, have fun with the new site. :-)

And, of course: if you need support for Movable Type, Wordpress or a migration just ask, we are ready to help!

tux.jpgThe Red Hat Cluster Suite is a framework to bind two or more machines together to jointly handle one task. The following article gives an introduction to RHCS in terms of service failover.

Linux is used daily in mission-critical environments all over the world. It follows that Linux can be required to fulfil a range of needs with relation to availability and stability.  The Red Hat Cluster Suite (RHCS) is designed with these needs in mind; it enables the admin to set up a cluster of machines which all handle the same task or provide the same service. If the machine providing the service goes down, another machine then steps in and takes over.

Core elements of RHCS

RHCS consists of four core components:
  • cluster infrastructure
  • high availability service management
  • tools for the cluster administration
  • Linux virtual server routing
The cluster infrastructure includes all the core components necessary for the set up and running of a cluster of several nodes. These components manage the integration of nodes, shutting them down where problems occur (fencing), replicating the configuration and so on.

After the cluster has been set up the next step is to define the high availability service management. This is a service running on one node with other nodes configured for failover. The HA service management includes defining the service, start/stop scripts, ports, storage places and other resources as well as the priority of the different failover nodes.

The next core component is not so much a necessary key element but more a set of helpful tools: the cluster administration tools. In theory they are not critical to the running of the RHCS, although in practise it would be stupid to run the RHCS without them. They incorporate GUI tools, web pages for accessing cluster data and tools for status queries, among other things.

The situation is similar for the Linux virtual server routing; although RHCS documentation lists Linux virtual server routing as a core component, this functionality is not always needed as it "only" provides load balancing functions on IP level and re-routes the traffic when a node brakes down.Besides these official core components of RHCS the system can incorporate other services when they are available: GFS (Global File System) and Cluster Logical Volume Manager. They help with mounting network block devices, making storage management much easier.

Structure of a RHCS Cluster

To create an initial RHCS cluster a substantial set of machines is needed:

  1. Shared storage like iSCSI or Fibre Channel.
  2. For each node a method to detach it from the cluster (fencing), either by network or by a controllable power switch.
  3. At least two nodes with a network connection.
  4. A switch.

It is important that the shared storage is not running on one of the nodes itself - that would render the idea of fencing useless. Also keep in mind that the machines listed here only describe the minimum hardware configuration - a larger cluster would of course require many more nodes.

Closing words

RHCS offers a well thought out framework for managing a cluster, especially when it comes to service failover. Using RHCS makes securing your mission-critical systems easy, and makes them highly available with standard hardware.

The R in RHCS implies that this method only runs on RHEL machines - but this is not the case, as we will demonstrate in one of our upcoming articles.

tux.jpgThe administration of a large number of servers can be quite tiresome without a central configuration management. This article gives a first introduction into the configuration management tool, Puppet.

Introduction

In our daily work at the Open Source Support Center we maintain a large number of servers. Managing larger clusters or setups means maintaining dozens of machines with an almost identical configuration and only slight variations, if any. Without central configuration management, making small changes to the configuration would mean repeating the same step on all machines. This is where Puppet comes into play.

As with all configuration management tools, Puppet uses a central server which manages the configuration. The clients query the server on a regular basis for new configuration via an encrypted connection. If a new configuration is found, it is imported as the server instructs: the client imports new files, modifies rights, starts services and executes commands, whatever the server says. The advantages are obvious:
  • Each configuration change is done only once, regardless of the actual number of maintained servers. Unnecessary - and pretty boring - repetition is avoided, lucky us!
  • The configuration is streamlined for all machines, which makes it much easier to maintain.
  • A central infrastructure makes it easier to quickly get an overview about the setup - "running around" is not necessary anymore.
  • Last but not least, a central configuration tree enables you to incorporate a simple version control of your configuration: for example, playing back the configuration "PRE-UPDATE" on all machines of an entire setup only takes a couple of commands!

Technical workflow

Puppet consists of a central server, called "Puppet Master", and the clients, called "Nodes". The nodes query the master for the current configuration. The master responds with a list of configuration and management items: files, services which have to be running, commands which need to be executed, and so on - the possibilities are practically endless:
  • The master can hand over files which the node copies to a defined place - if it does not already exist.
  • The node is asked to check certain file and directory permissions and to correct them if necessary.
  • Depending upon the operating system, the node checks the state of services and starts or stops them. It can also check for installed packages and if they are up to date.
  • The master can force the node to execute arbitrary commands
Of course, in general all tasks can be fulfilled by handing over files from the master to the client. However, in more complex setups this kind of behaviour is not easily arranged, nor does it simplify the setup. Puppet's strength is that it facilitates abstract system tasks (restart services, ensure installed packages, add users, etc.), regardless of the actual changed files in the background. You can even use the same statement in Puppet to configure different versions of Linux or Unix.

Installation

First, you need the master, the center of all the configuration you want to manage: apt-get install puppetmaster Puppet expects that all machines in the network have FQDNs - but that should be the case anyway in a well maintained network.

Other machines become a node by installing the Puppet client: apt-get install puppet

Puppet, main configuration

The Puppet nodes do not need to be configured - they will check for a machine called "Puppet" in the local network. As long as that name points to the master you do not have to do anything else.

Since the master provides files to the nodes, the internal file server must be configured accordingly. There are different solutions for the internal file server, depending on the needs of your setup. For example, it might be better for your setup to store all files you provide to the nodes on one place, and the actual configuration you provide to the nodes somewhere else. However, in our example we keep the files and the configuration for the nodes close, as it is outlined in Puppet's Best Practice Guide and in the Module Configuration part of the Puppet documentation.Thus, it is enough to change the file /etc/puppet/fileserver.conf to:
[modules]
allow 192.168.0.1/24
allow *.credativ.de

Configuration of the configuration - Modules

Puppet's way of managing configuration is to use sets of tasks grouped by topic. For example, all tasks related to SSH should go into the module "ssh", while all tasks related to apache should be placed in the module "apache" and so on. These sets of tasks are called "Modules" and are the core of Puppet - in a perfect Puppet setup everything is defined in modules! We will explain the structure of a SSH module to highlight the basics and ideas behind Puppet's modules. We will also try to stay close to the Best Practise Guide to make it easier to check back against the Puppet documentation.

Please note, however, that this example is an example: in a real world setup the SSH configuration would be a bit more dynamic, but we focused on simple and easy-to-understand methods.

The SSH module

We have the following requirements:
  1. The package open-ssh must be installed and be the newest version.
  2. Each node's sshd_config file has to be the same as the one saved on the master.
  3. In the event that the sshd_config is changed on any node, the sshd service should be restarted.
  4. The user credativ needs to have certain files in his/her directory $HOME/.ssh.
To comply with these requirements we start by creating some necessary paths:
mkdir -p /etc/puppet/modules/ssh/manifests
mkdir -p /etc/puppet/modules/ssh/files
The directory "manifests" contains the actual configuration instructions of the module and the directory "files" provides the files we hand over to the clients.

The instructions themselves are written down in init.pp in the "manifests" directory. The set of instructions to fulfil aims 1 - 4 are grouped in a so called "class". For each task a "class" has one subsection, a type. So in our case we have four types, one for each aim:
class ssh{
        package { "openssh-server":
                 ensure => latest,
        }
        file { "/etc/ssh/sshd_config":
                owner   => root,
                group   => root,
                mode    => 644,
                source  => "puppet:///ssh/sshd_config",
        }
        service { ssh:
                ensure          => running,
                hasrestart      => true,
                subscribe       => File["/etc/ssh/sshd_config"],
        }
        file { "/home/credativ/.ssh":
                path    => "/home/credativ/.ssh",
                owner   => "credativ",
                group   => "credativ",
                mode    => 600,
                recurse => true,
                source  => "puppet:///ssh/ssh",
                ensure  => [directory, present],
        }
}
Each type is another task and calls another action on the node:
package
Here we make sure that the package openssh-server is installed in the newest version.
file
A file on the node is compared with the version on the server and overwritten if necessary. Also, the rights are adjusted.
service
Well, as the name says, this deals with services: in our case the service sshd must be running on the node. Also, in case the file /etc/ssh/sshd_config is modified, the service is restarted automatically.
file
Here we have again the file type, but this time we do not compare a file, but an entire directory.
As mentioned above, the files and directories you configured so that the server provides them to the nodes must be available in the directory /etc/puppet/modules/ssh/files/.

Nodes and modules

We now have three parts: the master, the nodes and the modules. The next step is to tell the master which nodes are related to which modules. First, you must tell the master that this module exists in /etc/puppet/manifests/modules.pp:
import "ssh"
Next, you need to modify /etc/puppet/manifests/nodes.pp. This specifies which module is loaded for which node, and which modules should be loaded as default in the event that a node does not have a special entry. The entries for the nodes support inheritance.

So, for example, to have the module "rsyslog" ready for all nodes but the module "ssh" only ready for the node "external" you need the following entry:
node default {
    include rsyslog
}
node 'external' inherits default {
    include ssh
}
Puppet is now configured!

Certificates - secured communication between nodes and master

As mentioned above, the communication between master and node is encrypted. But that implies you have to verify the partners at least once. This can be done after a node queries the master for the first time. Whenever the master is queried by an unknown node it does not provide the default configuration but instead puts the node on a waiting list. You can check the waiting list with the command: # puppetca --list

To verify a node and incorporate it into the Puppet system you need to verify it: # puppetca --sign external.example.com The entire process is explained in more detail in the puppet documentation.

Closing words

The example introduced in this article is very simple - as I noted, a real world example would be more complex and dynamic. However, it is a good way to start with Puppet, and the documentation linked throughout this article will help the willing reader to dive deeper into the components of Puppet.

We, here at credativ's Open Source Support Center have gained considerable experience with Puppet in recent years and really like the framework. Also, in our day to day support and consulting work we see the market growing as more and more customers are interested in the framework. Right now, Puppet is in the fast lane and it will be interesting to see how more established solutions like cfengine will react to this competition.
klogo-official-oxygen-128x128.pngCurrently I am testing the browser rekonq, a WebKit based KDE browser which is handled as the next KDE default browser by some. If that might happen or not is a question best left to the future, but the browser itself has some nice features in the current version 0.4 beta: kwallet integration, addblock, plugin support, etc.

However, with the current git checkout the favorites management is not working properly: on about:favorites you can easily delete the favorites, but you cannot add new ones or manage them at all. But there is still the config file $HOME/.kde/share/config/rekonqrc, all you have to do is to change the entries below "NewTabPage": a comma separated list shows first the URLs and then the comments for the URLs, and is failry simple to edit:
[NewTabPage]
previewNames=http://www.heise.de,http://www.spiegel.de,http://www.tagesschau.de,,,,,
previewUrls=heise.de,spiegel.de,tagesschau.de,,,,,
This is one of the things that rekonq still has some problems with, and it shows that rekonq is still in heavy development. But it is already promising and I do wonder whether it might be a real alternative to other browsers for KDE users.