Here are patches and details for how to configure various system loggers (syslog-ng, dsyslog, etc.) so that they will strip out personally identifying information before they are written to disk. This allows you centralized, fine-grained control over all system logging. Using this technique is one of the most effective, broad strokes that you can take to anonymize many disparate parts of your system all at once.

The following sysloggers are the only ones that we know techniques to do this. If you know of others, we are very eager to hear of them!

syslog-ng

We have been maintaining a patch for older releases of syslog-ng which adds the capability to strip out any given regular expression or all IP addresses from log messages before they are written to disk. The goal is to give the system administrator the means to implement site logging policies by allowing them easy control over exactly what data they retain in their logfiles, regardless of what a particular daemon might think is best.

As of syslog-ng 3.0 there is functionality built into the code to provide a “Rewrite” capability which enables it to rewrite parts of log messages through search and replace of text, and setting specific fields to specific values. How to use this functionality to properly anonymize your logs needs to be researched and written here.

For example, when enabled for a particular log file, this patch could be used to convert:

imaplogin: LOGIN, user=myuser, ip=[69.90.134.200], protocol=IMAP

into this:
imaplogin: LOGIN, user=myuser, ip=[0.0.0.0], protocol=IMAP

Data retention has become a hot legal topic for ISPs and other Online Service Providers (OSPs). There are many instances where it is preferable to keep less information on users than is collected by default on many systems. In the United States, there is currently no requirement to retain data on users of a server, but you may be required to provide all data on a user which you have retained. OSPs can protect themselves from legal hassles and added work by choosing what data they wish to retain.

installing the package

This patch is currently included with Debian (in Sarge since June 9 2005, it is included in Etch, Lenny, Squeeze and Sid). In Ubuntu this patch was included in Hardy and Intrepid (potentially others). So, if you are running one of these, simply run “apt-get install syslog-ng.” If the available package version is greater than version 3 of syslog-ng, this method will not work.

applying the patch

If you wish to compile your own version of syslog-ng with this patch, follow these instructions.

This patch has been tested against the following versions of syslog-ng:

    * version 1.6.5
    * version 1.6.7
    * version 2.0.0
    * version 2.0.5
    * version 2.0.6
    * Debian package syslog-ng_1.6.5-2
    * Debian package syslog-ng_1.6.7-1
    * Debian package syslog-ng_2.0.0-1etch1
    * Debian package syslog-ng_2.0.5-1
    * Debian package syslog-ng_2.0.9-4.1

To use this patch, obtain the source for syslog-ng, and the latest syslog-ng anonymizing patch. Uncompress the syslog-ng source and then apply the patch:

% tar -zxvf syslog-ng.tar.gz
% cd syslog-ng
% patch -p1 < syslog-ng-anon.diff

Then compile and install syslog-ng as normal.

how to use it

This patch adds the filter “strip”. For example:

filter f_strip { strip(<regexp>); };

This will strip out all matches of the regular expression on logs to which the filter is applied and replaces all matches with the fixed length four dashes (“--”).

In place of a regular expression, you can put “ips”, which will replace all internet addresses with 0.0.0.0. For example:

filter f_mail { facility(mail) and strip(ips); };

You can alter what the replacement strings are by using replace:

replace(ips,"0.0.0.0") <--- this is the same as strip(ips)
replace(<regex>,"----") <--- this is the same as strip(regex)

For a complete example, see our sample syslog-ng.conf file.

dsyslog

Need to write this section