July 2009 Archives

tux.jpg

The Problem

Both as part of our Open Security Filter and as a standalone service, we offer checking of emails for viruses using Clam AntiVirus. As the threat from viruses is constantly evolving, the ClamAV project provides ongoing updates to its virus definitions, to ensure that users are as well-protected as is possible.

There are a number of different ways in which these updates can be installed on Debian. We chose to use freshclam, a daemon that checks for definition updates at a defined interval (48 times a day, or every 30 minutes, in our case). By and large, freshclam does what we need.  The virus definitions for our clients are kept up-to-date.

We know that the virus definitions are up-to-date because we have deployed Nagios, an enterprise-class monitoring solution for hosts, services, and networks.  As part of our OSSC service, any of our clients can receive Nagios support, or make use of our Nagios hosting.  We have Nagios configured such that it checks that the virus definitions on the local machine are synchronised with those that the ClamAV project provide.  If they aren't, we receive an email notification.Now there are two problems with this solution. Firstly, Nagios checks for new virus definitions every 5 minutes, and freshclam updates the definitions every 30 minutes.  Secondly, freshclam has a habit of getting itself into a bind (normally as a result of a problem with mirrors.dat, the mirror cache) which means that updates stop altogether until we step in to fix the problem.

We can fix the first problem by setting Nagios to only check every 30 minutes.  That's not a problem. However, the second problem is harder to solve. Having to step in whenever there is a problem is fine (if inconvenient) during office hours, but what happens if the definition for a nasty new virus is released on Friday night, and freshclam is having a nutty? The client is unprotected for the weekend. What happens if there is also a problem with email, so we don't receive the notifications? We could end up with an unprotected network, perhaps even without ever knowing that this is the case.

The Solution

What we need is a way of updating the virus definitions as soon as we know that there are new ones, without needing to rely on freshclam to be working 100%. Nagios tells us when there are new definitions, so it would be really helpful if Nagios could do something to fix the problem.

The Nagios developers, in their wisdom, foresaw this exact use case, and included Event Handlers within the Nagios system.  These are commands, defined exactly as you would a normal command, that you can set to be called when the state of a host or service check changes. So our 'freshen_clamav' command definition looks something like:

define command {
        command_name    freshen_clamav
        command_line    sudo /root/nagios-plugins/update_clamav $SERVICESTATE$
}

There are a couple of interesting things here.  Firstly, the $SERVICESTATE$ variable contains the state of the service that has changed (i.e. 'OK', 'WARNING', 'CRITICAL' or 'UNKNOWN').  Secondly, we are using 'sudo'.  The commands that we use to manipulate the ClamAV definitions (in /root/nagios-plugins/update_clamav) require root permissions and include 'rm'. Granting passwordless sudo access to 'rm' isn't a good idea, so granting it for this script is more secure.  Obviously this means that access to edit the script needs to be carefully limited.  For reference, the relevant line in the sudo configuration is:

nagios  ALL= NOPASSWD: /root/nagios-plugins/update_clamav

We use the above command definition in a service definition as follows:

define service {
        use                             generic-service
        host_name                       localhost
        service_description             ClamAV database freshness
        check_command                   check_clamav_freshness
        event_handler                   freshen_clamav
}


The check_clamav_freshness command uses the check_clamav plugin.  The important line here is, of course, the event_handler line.  This is all that's needed to start using /root/nagios-plugins/update_clamav whenever the state of the service changes.

So now lets look at /root/nagios-plugins/update_clamav:

#!/bin/bash

set -eu

case $1 in
OK)
        ;;
WARNING)
        /etc/init.d/clamav-freshclam stop
        rm -f /var/lib/clamav/mirrors.dat
        freshclam || true
        /etc/init.d/clamav-freshclam start
        ;;
UNKNOWN)
        ;;
CRITICAL)
        /etc/init.d/clamav-freshclam stop
        rm -f /var/lib/clamav/mirrors.dat
        freshclam || true
        /etc/init.d/clamav-freshclam start
        ;;
esac
exit 0

This script is fairly simple. We use a bash case statement on the first argument (which you will recall is the value of $SERVICESTATE$) to do nothing if the service is OK or UNKNOWN.  If the service is WARNING or CRITICAL, we stop the freshclam daemon (so we can run freshclam ourselves), remove the mirrors database (as it is often the source of the problem), run freshclam ourselves and then restart the freshclam daemon.And now we have Nagios checking that the ClamAV definitions are up-to-date and, if they are not, updating them itself. This ensures that the client's machines are as secure as possible, with the latest virus definitions, whilst not requiring the attention of a credativ technician.