Denying comment spam bots
by Sebastien Mirolo on Sat, 23 Apr 2011It is kind of fun to look through your application logs and find traces of a hacker trying to break in. It might even be intellectually stimulating to play this game of hide and seek with another human being. Unfortunately most malicious attempts hitting your server will come from bots. Those don't get discouraged. Those don't change tactics. They keep trying to brute force passwords, even when you only allow private key login in your ssh daemon. They keep trying to access PHP scripts, even when you do not have any PHP stack running on your web server. Worse, if you allow people to leave comments on your web site, you are almost guarantee to attract spam bots that will waste precious bandwidth and mess up statistics you use to learn about your audience.
You could hire an army of private investigators traveling around the world to unplug those bot machines. It might actually be a very cool job (I would definitely apply for it). You might even think to pull off a good scenario ala "Blade Runner" to sell to Hollywood to offset the investigation cost.
For practical matters (or just because you are not huge on adventure around the world), you might want to try setting-up iptables, fail2ban and spamassassin. The idea is to use spamassassin to categorize comments as spam or not, use fail2ban to dynamically insert rules into the firewall when an IP definitely generates too much spam and of course use iptables to prevent those machines to reach your application stack.
iptables
Iptables comes pre-installed on all official Ubuntu distributions but unfortunately it does not come with logging enabled by default. Since we will want to verify our setup works and drops packets from banned addresses, first thing is to enable iptables logging. I also enjoyed reading Linux Firewalls Using iptables for generic information.
$ ls /etc/iptables.* /etc/iptables.conf $ grep -r 'iptables.conf' /etc /etc/network/if-up.d/load-iptables:iptables-restore < /etc/iptables.conf $ diff -U 1 /etc/iptables.conf.prev /etc/iptables.conf --- /etc/iptables.conf.prev 2011-03-15 23:48:03.000000000 +0000 +++ /etc/iptables.conf 2011-03-16 00:27:30.000000000 +0000 @@ -3,2 +3,3 @@ :FORWARD DROP [0:0] +:LOGNDROP - [0:0] :OUTPUT DROP [0:0] @@ -14,2 +15,7 @@ -A OUTPUT -m state --state NEW,RELATED,ESTABLISHED -j ACCEPT +-A INPUT -j LOGNDROP +-A LOGNDROP -p tcp -m limit --limit 5/min -j LOG\ --log-prefix "Denied TCP: " --log-level 7 +-A LOGNDROP -p udp -m limit --limit 5/min -j LOG\ --log-prefix "Denied UDP: " --log-level 7 +-A LOGNDROP -p icmp -m limit --limit 5/min -j LOG\ --log-prefix "Denied ICMP: " --log-level 7 +-A LOGNDROP -j DROP COMMIT $ iptables -N LOGNDROP $ iptables -A INPUT -j LOGNDROP $ iptables -A LOGNDROP -p tcp -m limit --limit 5/min -j LOG\ --log-prefix "Denied TCP: " --log-level 7 $ iptables -A LOGNDROP -p udp -m limit --limit 5/min -j LOG\ --log-prefix "Denied UDP: " --log-level 7 $ iptables -A LOGNDROP -p icmp -m limit --limit 5/min -j LOG\ --log-prefix "Denied ICMP: " --log-level 7 $ iptables -A LOGNDROP -j DROP $ iptables -L Chain INPUT (policy DROP) ... LOGNDROP all -- anywhere anywhere ... Chain LOGNDROP (1 references) target prot opt source destination LOG tcp -- anywhere anywhere\ limit: avg 5/min burst 5 LOG level debug prefix `Denied TCP: ' LOG udp -- anywhere anywhere\ limit: avg 5/min burst 5 LOG level debug prefix `Denied UDP: ' LOG icmp -- anywhere anywhere\ limit: avg 5/min burst 5 LOG level debug prefix `Denied ICMP: ' DROP all -- anywhere anywhere ...
From now on, iptables will use syslog to log drop packets. Since iptables is actually updating the firewall rules inside the kernel, we first figure out how syslog is configured by looking for kern in /etc/syslog.conf.
$ grep kern /etc/syslog.conf kern.* -/var/log/kern.log
OK so all kernel messages are going into the /var/log/kernel.log file. We will later look there to correlate fail2ban banned IPs to iptables drop packets.
spamassassin
Spamassassin is a very well regarded spam filter for e-mails. We plan to route all comments to the web site through spamassassin as well. There does not seem any reason to think comment spam is any different from e-mail spam and it will reduce complexity and maintenance cost to rely on a single spam filter daemon.
$ aptitude install spamassassin $ useradd -m -s /bin/false spamassassin $ diff -U /etc/postfix/master.cf.prev postfix/master.cf --- postfix/master.cf.prev 2011-03-16 01:00:46.000000000 +0000 +++ /etc/postfix/master.cf 2011-03-12 22:36:43.000000000 +0000 @@ -10,3 +10,3 @@ # ========================================================================== -smtp inet n - - - - smtpd -submission inet n - - - - smtpd +smtp inet n - - - - smtpd + -o content_filter=spamassassin +submission inet n - - - - smtpd + -o content_filter=spamassassin # -o smtpd_tls_security_level=encrypt @@ -81,2 +81,4 @@ ${nexthop} ${user} +spamassassin unix - n n - - pipe + user=spamassassin argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi\ -f ${sender} ${recipient}
Postfix is a very versatile Mail Transfer Agent (MTA) that can be configured in many different ways to achieve similar results. Documentation related to spamassassin and filtering that is worth reading include Integrating SpamAssassinwith Postfix, Postfix Virtual Domain Hosting Howto and Postfix After-Queue Content Filter.
We will create a special user account for spamassassin and used the content_filter= method on both smtp (for out of network incoming e-mail) and submission (for local e-mails). As described earlier, the semilla web application submits comments as e-mails through a local account on the mail server. I would have preferred to put the spamassassin filter later, i.e. just before delivery to the local agent but I haven't managed to do that successfully yet. Right now, spamassassin will scan all outgoing e-mails as well (content_filter on submission agent).
At this point, we can see in /var/log/mail.log that messages are filtered through spamassassin. A little bit of testing can be done by sending something like the following e-mail:
$ echo "XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X"\ | sendmail info $ tail -f /var/log/mail.log Mar 16 01:19:51 hostname spamd[26879]: spamd: identified spam\ (1000.0/5.0) for spamassassin:1003 in 0.3 seconds, 1491 bytes. Mar 16 01:19:51 hostname spamd[26879]: spamd: result: Y 1000 \ - GTUBE,HTML_MESSAGE scantime=0.3,size=1491,user=spamassassin,uid=1003,\ required_score=5.0,rhost=ip6-localhost,raddr=127.0.0.1,rport=55454,mid=\ <COL117-W56538D5CED9DE43AC69E3BA6CE0@phx.gbl>,autolearn=no
fail2ban
We will now get fail2ban to dynamically insert rules for bots trying to break into ssh or obviously referencing pages that do not exist on the web site (such as PHP scripts).
$ aptitude install fail2ban $ diff -U 3 /etc/fail2ban/jail.conf.prev /etc/fail2ban/jail.conf --- /etc/fail2ban/jail.conf.prev 2011-03-10 15:49:53.000000000 +0000 +++ /etc/fail2ban/jail.conf 2011-03-12 23:23:34.000000000 +0000 @@ -133,7 +133,7 @@ [apache] -enabled = false +enabled = true port = http,https filter = apache-auth logpath = /var/log/apache*/*error.log @@ -151,7 +151,7 @@ [apache-noscript] -enabled = false +enabled = true port = http,https filter = apache-noscript logpath = /var/log/apache*/*error.log @@ -159,7 +159,7 @@ [apache-overflows] -enabled = false +enabled = true port = http,https filter = apache-overflows logpath = /var/log/apache*/*error.log $ /etc/init.d/fail2ban restart
At this point, if you do see errors like "fail2ban.server : ERROR Unexpected communication error" in /var/log/fail2ban.log, you will need to apply the following patch to /usr/bin/fail2ban-server.
$ diff -U 1 /usr/bin/fail2ban-server.prev /usr/bin/fail2ban-server --- fail2ban-server 2011-03-16 00:48:29.000000000 +0000 +++ /usr/bin/fail2ban-server 2011-03-15 19:55:13.000000000 +0000 @@ -1,2 +1,2 @@ -#!/usr/bin/python +#!/usr/bin/python2.5 # This file is part of Fail2Ban. $ /etc/init.d/fail2ban restart
We now want to insert a new jail in fail2ban for host that are identified as sending spam but if we look into the /var/log/mail.log for spamd messages, we can see there are no IP associated to the originator of a mail identified as spam. A little patch in /usr/sbin/spamd that will print the first IP found in "Received" header fields of a mail will do. At the same time, I modified the semilla web application to send mail with a specially crafted "Received" header containing the REMOTE_ADDR environment variable.
$ diff -u spamd.org /usr/sbin/spamd --- spamd.org 2011-04-21 23:35:10.000000000 +0000 +++ /usr/sbin/spamd 2011-04-22 00:11:17.000000000 +0000 @@ -1593,7 +1593,10 @@ my $scantime = sprintf( "%.1f", time - $start_time ); - info("spamd: $was_it_spam ($msg_score/$msg_threshold) for\ $current_user:$> in" + my @from_addrs = $mail->get_pristine_header("Received"); + join("\n",@from_addrs) =~ m/(\[\d+\.\d+\.\d+\.\d+\])/; + my $from_addr = $1; + info("spamd: $was_it_spam ($msg_score/$msg_threshold) from\ $from_addr for $current_user:$> in" . " $scantime seconds, $actual_length bytes." ); # add a summary "result:" line, based on mass-check format
The spamd related lines in /var/log/mail.log thus now look like:
Apr 22 21:20:23 hostname spamd[17844]: spamd: identified spam\ (999.0/5.0) from [remoteaddr] for spamassassin:1003\ in 0.2 seconds, 2152 bytes.
It is now trivial to add the following filter in /etc/fail2ban/filter.d/spamassassin.conf
[Definition] failregex = spamd: identified spam .* from [[][]] ignoreregex =
and the following jail in /etc/fail2ban/jail.conf
[spamassassin] enabled = true port = http,https,smtp,ssmtp filter = spamassassin logpath = /var/log/mail.log
The script fail2ban-regex is very convenient to check your filter expression is doing what you are expecting. Later, while the system is up and running, you can use fail2ban-client to check the status of the jail.
$ fail2ban-regex "Apr 22 21:20:23 hostname spamd[17844]: \ spamd: identified spam (999.0/5.0) from [remoteaddr] for\ spamassassin:1003 in 0.2 seconds, 2152 bytes." \ "spamd: identified spam .* from [[][]]" $ sudo fail2ban-client status spamassassin
lire
At this point, iptables, spamassassin and fail2ban are configured to ban spam bots from hitting our application stack. It is all great but without generating statistics and reports, there is no easy way to find out how effective the solution is. So I started to investigate log reporting tools. Lire seemed the most promising so I started there. Since lire is present in the Ubuntu repository, that is a breeze to install it.
$ sudo aptitude install lire
lr_log2report seems to be the major command to generate reports.
$ lr_log2report --help dlf-converters ... iptables Iptables firewall log ... postfix postfix log file ... spamassassin spamassassin log file ...
If you are running into the following error while running your first report, you will have to apply a little patch into /usr/share/perl5/Lire/DlfStore.pm
$ lr_log2report postfix /var/log/mail.log Parsing log file using postfix DLF Converter... lr_log2report: ERROR store doesn't contain a 'lire_import_log'\ stream at /usr/share/perl5/Lire/DlfConverterProcess.pm line 170 $ diff -u DlfStore.pm.prev /usr/share/perl5/Lire/DlfStore.pm sub dlf_streams { my $self = $_[0]; my @streams = (); - my $sth = $self->{'_dbh'}->table_info( "", "", "dlf_%", "TABLE" ); - $sth->execute(); - while ( my $table_info = $sth->fetchrow_hashref() ) { - next unless $table_info->{'TABLE_NAME'} =~ /^dlf_(.*)/; - next if $table_info->{'TABLE_NAME'} =~ /_links$/; - push @streams, $1; - } - $sth->finish(); + # JB : table_info seems to fail + my @table_list = $self->{'_dbh'}->tables; + foreach my $table ( @table_list) { + next unless $table =~ /dlf_(.*)"/; + next if $table =~ /_links$/; + push @streams, $1; + } return @streams; } $ lr_log2report iptables /var/log/kern.log $ lr_log2report postfix /var/log/mail.log $ lr_log2report spamassassin /var/log/mail.log $ lr_log2report combined /var/log/apache2/domainname-access.log
We also want to add a converter for fail2ban logs so that we can correlate fail2ban actions to iptables dropped packets. Since fail2ban adds rules into the firewall through iptables, we will base its lire schema of the firewall schema (/usr/share/lire/schemas/firewall.xml). We then also add a Fail2BanConverter.pm perl script based of one of the previously existing converter (for example /usr/share/perl5/Lire/Firewall/IpfilterDlfConverter.pm) and a fail2ban_init to load our converter into the lire executable. Relevant interesting lines are
$ cat /usr/share/perl5/Lire/Firewall/Fail2BanConverter.pm ... sub process_log_line { my ( $self, $process, $line ) = @_; my($date, $time, $name, $warning, $jail, $action, $source) = split / /, $line, 7; if ( $@ ) { $process->error( $@, $line ); return; } elsif ( $action ne 'Ban' ) { $process->ignore_log_line( $line, "not a Ban record" ); return; } else { use Time::Local; my $dlf_rec = {}; if( "$date $time" =~ /(\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d):(\d\d),(\d\d\d)$/) { $year = $1; $month = $2; $day = $3; $hours = $4; $min = $5; $sec = $6; } my $timestamp = timelocal($sec,$min,$hours,$day,$month,$year); # replace 'timelocal' with 'timegm' if your input date is GMT/UTC $dlf_rec->{time} = $timestamp; $dlf_rec->{action} = "denied"; $dlf_rec->{protocol} = "TCP"; $dlf_rec->{rule} = $jail; $dlf_rec->{from_ip} = $source; $dlf_rec->{count} = 1; $process->write_dlf( "firewall", $dlf_rec ); } } ... $ cat /etc/lire/plugins/fail2ban_init use Lire::PluginManager; use Fail2BanConverter; Lire::PluginManager->register_plugin( Fail2BanConverter->new() ); $ lr_log2report fail2ban /var/log/fail2ban.log
On Ubuntu, "aptitude install lire" will setup the appropriate cron jobs to send e-mail reports by running /usr/sbin/lr_vendor_cron.
$ find /etc -name '*lire*' $ cat /etc/cron.weekly/lire LIREUSER='lire' /usr/sbin/lr_vendor_cron weekly $ less /usr/sbin/lr_vendor_cron ... for d in /etc/sysconfig/lire.d /etc/default/lire.d do test -d $d && CONFDIR=$d && break done ... for f in $CONFDIR/*.cfg do ...
So we will add a few more .cfg files for spamassassin, iptables and fail2ban.
Be careful that testing /usr/sbin/lr_vendor_cron from the command line is a little tricky. You will most likely run into cryptic su errors because of the following line in /usr/sbin/lr_vendor_cron.
eval "$filter" < $logfile | \ su - $LIREUSER -c \ "lr_log2mail -s '$rotateperiod $service report from $logfile'\ $extraopts $service root" 2>&1 | logger -p $PRIORITY -t lire
Conclusion
Voila, we are now running spamassassin on all comments posted through the web interface. Traffic from remote machines dynamically identified as spam originators is actively dropped before reaching our application stack. One last word, we add to setup a second aliases database such that the comment archiver writes files as the correct owner.