Logrotate, S3 storage, and AWStats

by Sebastien Mirolo on Wed, 24 Aug 2016

Today we are going to push the rotated logs to a S3 bucket, the download those logs and process them with awstats.

Uploading rotated logs to S3

Nginx, in the open-source version, writes access requests to log files. The nginx logs are currently rotated with logrotate using the default settings coming out of the box on Fedora. That is rotated log files are suffixed with the date of rotation, gzip compressed and the nginx process is sent a USR1 signal once (i.e. sharedscripts option) to re-open the log files.

Searching for how to access the output file in postrotate we find that the postrotate script is passed the names of the log files (i.e. the names matching the patterns on the logrotate trigger) as $1 - in most cases.

$ diff -u prev /etc/logrotate/nginx {
 /var/log/nginx/*log {
    create 0644 nginx nginx
    daily
    rotate 10
    missingok
    notifempty
    compress
    sharedscripts
    postrotate
+       LOGS=$1
        /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true
+       echo "Upload $LOGS"
    endscript
 }

The description of sharedscripts specifies

However, if none of the logs in the pattern require rotating, the scripts will not be run at all.

To test our setup we need a reliable way to trigger rotation. Furthermore in debug mode (-d), no changes will be made to the logs, which means our rotated log won't be compressed and we are not accurately testing things are working as expected. Hence we restore the last rotated log, remove it, then run logrotate with the -v option.

$ sudo mv /var/log/nginx/mysite-access.log-20160820.gz /var/log/nginx/mysite-access.log-20160820.gz~
$ sudo rm /var/log/nginx/mysite-access.log-20160820.gz
$ sudo sh -c 'cat /var/log/nginx/mysite-access.log-20160820.gz~ | gzip -d > /var/log/nginx/mysite-access.log'
$ sudo logrotate -v -f /etc/logrotate.conf
renaming /var/log/nginx/mysite-access.log to /var/log/nginx/mysite-access.log-20160820
creating new /var/log/nginx/mysite-access.log mode = 0664 uid = 995 gid = 992
running postrotate script
Upload /var/log/nginx/mysite-access.log
compressing log with: /bin/gzip

Many of the logrotate to S3 posts write to add your upload commands in the postrotate script but we see here that it won't work since gzip is run after postrotate is done. There is a good reason for this. Since gzip might take a long time to complete, we want to re-up nginx as soon as possible and delay gzip afterwards. Fortunately, logrotate provides a lastaction script. That's where we will put our upload code.

$ diff -u prev /etc/logrotate/nginx {
 /var/log/nginx/*log {
    create 0644 nginx nginx
    daily
    rotate 10
    missingok
    notifempty
    compress
    sharedscripts
    postrotate
        /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true
    endscript
+   lastaction
+       LOGS=$1
+       echo "Upload $LOGS"
+   endscript
 }

As the echo command shows, we will be passed the list of file names matching the logrotate trigger. What we really want is the list of files created by the logrotate command. There does not seem any readily-available way to get that list of newly rotated files so we wrote a Python script (dcopylogs) that will do the work for us.

We assume here the EC2 instance has write access to the mybucket S3 bucket (see setup access control to S3 resources).

$ dcopylogs --location s3://mybucket /var/log/nginx/mysite-access.log
Upload /var/log/nginx/mysite-access.log-20160818.gz to s3://mybucket/var/log/nginx/mysite-access.log-20160818.gz
Upload /var/log/nginx/mysite-access.log-20160819.gz to s3://mybucket/var/log/nginx/mysite-access.log-20160819.gz
Upload /var/log/nginx/mysite-access.log-20160820.gz to s3://mybucket/var/log/nginx/mysite-access.log-20160820.gz

We run more than one server in a load-balancer fashion, so we also added an option to insert a log suffix on uploads, passing the EC2 instance-id metadata.

$ diff -u prev /etc/logrotate/nginx {
 /var/log/nginx/www.*log {
    create 0644 nginx nginx
    daily
    rotate 10
    missingok
    notifempty
    compress
    sharedscripts
    postrotate
        /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true
    endscript
+   lastaction
+       LOGS=$1
+       INSTANCE_ID=`wget -q -O - http://instance-data/latest/meta-data/instance-id | sed -e s/i-/-/`
+       /usr/local/bin/dcopylogs --location s3://mybucket --logsuffix=$INSTANCE_ID $LOGS
+   endscript
 }

One last step, we need to insure that our uploading script can connect to S3 within the logrotate SE Linux context. If we do not do that, everything might work fine when running from the command line and yet we get weird Permission Denied when cron runs our upload script through logrotate.

$ sudo ausearch -ts today -i | grep 'denied.*' | audit2why
type=AVC msg=audit(09/09/2016 22:02:26.727:20284) : avc:  denied  { name_connect } for  pid=*** comm=dcopylogs dest=80 scontext=system_u:system_r:logrotate_t:s0-s0:c0.c tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0
    Was caused by:
    The boolean nis_enabled was set incorrectly.
    Description:
    Allow nis to enabled

    Allow access by executing:
    # setsebool -P nis_enabled 1

$ sudo yum install setools-console
$ sesearch -A -s logrotate_t -b nis_enabled -p name_connect
Found 5 semantic av rules:
   allow nsswitch_domain portmap_port_t : tcp_socket name_connect ;
   allow nsswitch_domain reserved_port_type : tcp_socket name_connect ;
   allow nsswitch_domain port_t : tcp_socket { name_bind name_connect } ;
   allow nsswitch_domain ephemeral_port_t : tcp_socket { name_bind name_connect } ;
   allow nsswitch_domain unreserved_port_t : tcp_socket { name_bind name_connect } ;

$ sudo setsebool -P nis_enabled 1

Analyzing the nginx logs stored on S3

We assume here the analyzer EC2 instance has read access to the mybucket S3 bucket.

After installing awstats, we complete the setup by changing the permissions on /var/lib/awstats (we don't want to run the scripts as root) and copying the icon assets to the web server htdocs directory.

$ sudo useradd awstats
$ sudo chown awstats:awstats /var/lib/awstats
$ mkdir -p /var/www/html/awstatsicons
$ cp -rf /usr/share/awstats/wwwroot/css /usr/share/awstats/wwwroot/icon/* /var/www/html/awstatsicons

Both webalizer and awstats use incremental processing and store state on the filesystem between run. Both tools rely on the last timestamp processed to discard "already processed" records. This is unfortunate because our load balancer has two log files covering the same timeframe.

That's where awstats logresolvemerge.pl comes in handy, as explained in FAQ-COM400. logresolvemerge.pl also has the advantage to work with gzipped logs while awstats.pl does not.

$ diff -u /etc/awstats/awstats.model.conf /etc/awstats/awstats.mysite.conf
# AWSTATS CONFIGURE FILE 7.3

-LogFile="/var/log/httpd/access_log"
+LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/downloads/var/log/nginx/*.gz |"

Once we download the logs from S3, we run the awstats update process to generate the statistics database (i.e. text files in /var/lib/awstats), then multiple output commands to generate the various html reports.

$ perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -update
$ perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output -staticlinks > /var/www/html/awstats.mysite.html
$ perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output=errors404 -staticlinks > /var/www/html/awstats.mysite.errors404.html

The log files are stored on S3 so we would only like to keep a copy locally for the time we run the awstats update. We would also prefer to avoid downloading all files every time we run update. We would also prefer to avoid awstats to spend the time to realize previous files were already processed.

We thus update the dcopylogs to accept a --last-run filename argument and will store the last (by date) rotated file downloaded for a specific log.

$ dcopylogs --download --last-run /var/lib/awstats/mysite.json --location s3://mybucket /var/log/nginx/mysite-access.log

The final cron job will thus look like:

$ cat /etc/cron.daily/awstats
#!/bin/sh

LOGS_DIR=/var/downloads
cd $LOGS_DIR
/usr/local/bin/dcopylogs --download --last-run /var/lib/awstats/mysite.json --location s3://mybucket /var/log/nginx/mysite-access.log

perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -update
perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output -staticlinks > /var/www/html/awstats.mysite.html
perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output=errors404 -staticlinks > /var/www/html/awstats.mysite.errors404.html

rm $LOGS_DIR/var/log/nginx/*.gz

Expire logs on S3

Now remains the task of remove logs from S3 after a period of time. We do that by adding a S3 object lifecycle policy.

$ cat mybucket-lifecycle-policy.json
{
 "Rules": [
 {
    "ID": "expire-logs",
    "Prefix": "var/log/",
    "Status": "Enabled",
    "Transition": {
      "Days": 90,
      "StorageClass": "GLACIER"
    },
    "Expiration" : {
      "Days": 365
    }
 }]
}

$ aws s3api put-bucket-lifecycle --bucket mybucket  \
    --lifecycle-configuration file://mybucket-lifecycle-policy.json

Et voila, a basic log analytics pipeline!

More to read

You might also like to read Django, Gunicorn and Syslog-ng next.

More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a subscription hosting platform.

by Sebastien Mirolo on Wed, 24 Aug 2016


Bring fully-featured SaaS products to production faster.

Follow us on