Logrotate, S3 storage, and AWStats
by Sebastien Mirolo on Wed, 24 Aug 2016Today we are going to push the rotated logs to a S3 bucket, the download those logs and process them with awstats.
Uploading rotated logs to S3
Nginx, in the open-source version, writes access requests to log files. The nginx logs are currently rotated with logrotate using the default settings coming out of the box on Fedora. That is rotated log files are suffixed with the date of rotation, gzip compressed and the nginx process is sent a USR1 signal once (i.e. sharedscripts option) to re-open the log files.
Searching for how to access the output file in postrotate we find that the postrotate script is passed the names of the log files (i.e. the names matching the patterns on the logrotate trigger) as $1 - in most cases.
$ diff -u prev /etc/logrotate/nginx { /var/log/nginx/*log { create 0644 nginx nginx daily rotate 10 missingok notifempty compress sharedscripts postrotate + LOGS=$1 /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true + echo "Upload $LOGS" endscript }
The description of sharedscripts
specifies
However, if none of the logs in the pattern require rotating, the scripts will not be run at all.
To test our setup we need a reliable way to trigger rotation. Furthermore in
debug mode (-d
), no changes will be made to the logs, which means
our rotated log won't be compressed and we are not accurately testing things
are working as expected. Hence we restore the last rotated log, remove it,
then run logrotate with the -v
option.
$ sudo mv /var/log/nginx/mysite-access.log-20160820.gz /var/log/nginx/mysite-access.log-20160820.gz~ $ sudo rm /var/log/nginx/mysite-access.log-20160820.gz $ sudo sh -c 'cat /var/log/nginx/mysite-access.log-20160820.gz~ | gzip -d > /var/log/nginx/mysite-access.log' $ sudo logrotate -v -f /etc/logrotate.conf renaming /var/log/nginx/mysite-access.log to /var/log/nginx/mysite-access.log-20160820 creating new /var/log/nginx/mysite-access.log mode = 0664 uid = 995 gid = 992 running postrotate script Upload /var/log/nginx/mysite-access.log compressing log with: /bin/gzip
Many of the logrotate to S3 posts write to add your upload commands in the
postrotate
script but we see here that it won't work since
gzip is run after postrotate is done. There is a good reason
for this. Since gzip might take a long time to complete, we want to re-up
nginx as soon as possible and delay gzip afterwards. Fortunately, logrotate
provides a lastaction
script. That's where we will put our
upload code.
$ diff -u prev /etc/logrotate/nginx { /var/log/nginx/*log { create 0644 nginx nginx daily rotate 10 missingok notifempty compress sharedscripts postrotate /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true endscript + lastaction + LOGS=$1 + echo "Upload $LOGS" + endscript }
As the echo
command shows, we will be passed the list of file
names matching the logrotate trigger. What we really want is the list of
files created by the logrotate command. There does not seem any
readily-available way to get that list of newly rotated files so we wrote
a Python script (dcopylogs)
that will do the work for us.
We assume here the EC2 instance has write access to the mybucket S3 bucket (see setup access control to S3 resources).
$ dcopylogs --location s3://mybucket /var/log/nginx/mysite-access.log Upload /var/log/nginx/mysite-access.log-20160818.gz to s3://mybucket/var/log/nginx/mysite-access.log-20160818.gz Upload /var/log/nginx/mysite-access.log-20160819.gz to s3://mybucket/var/log/nginx/mysite-access.log-20160819.gz Upload /var/log/nginx/mysite-access.log-20160820.gz to s3://mybucket/var/log/nginx/mysite-access.log-20160820.gz
We run more than one server in a load-balancer fashion, so we also added an option to insert a log suffix on uploads, passing the EC2 instance-id metadata.
$ diff -u prev /etc/logrotate/nginx { /var/log/nginx/www.*log { create 0644 nginx nginx daily rotate 10 missingok notifempty compress sharedscripts postrotate /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true endscript + lastaction + LOGS=$1 + INSTANCE_ID=`wget -q -O - http://instance-data/latest/meta-data/instance-id | sed -e s/i-/-/` + /usr/local/bin/dcopylogs --location s3://mybucket --logsuffix=$INSTANCE_ID $LOGS + endscript }
One last step, we need to insure that our uploading script can connect to S3 within the logrotate SE Linux context. If we do not do that, everything might work fine when running from the command line and yet we get weird Permission Denied when cron runs our upload script through logrotate.
$ sudo ausearch -ts today -i | grep 'denied.*' | audit2why type=AVC msg=audit(09/09/2016 22:02:26.727:20284) : avc: denied { name_connect } for pid=*** comm=dcopylogs dest=80 scontext=system_u:system_r:logrotate_t:s0-s0:c0.c tcontext=system_u:object_r:http_port_t:s0 tclass=tcp_socket permissive=0 Was caused by: The boolean nis_enabled was set incorrectly. Description: Allow nis to enabled Allow access by executing: # setsebool -P nis_enabled 1 $ sudo yum install setools-console $ sesearch -A -s logrotate_t -b nis_enabled -p name_connect Found 5 semantic av rules: allow nsswitch_domain portmap_port_t : tcp_socket name_connect ; allow nsswitch_domain reserved_port_type : tcp_socket name_connect ; allow nsswitch_domain port_t : tcp_socket { name_bind name_connect } ; allow nsswitch_domain ephemeral_port_t : tcp_socket { name_bind name_connect } ; allow nsswitch_domain unreserved_port_t : tcp_socket { name_bind name_connect } ; $ sudo setsebool -P nis_enabled 1
Analyzing the nginx logs stored on S3
We assume here the analyzer EC2 instance has read access to the mybucket S3 bucket.
After installing awstats, we complete the setup by changing the permissions on /var/lib/awstats (we don't want to run the scripts as root) and copying the icon assets to the web server htdocs directory.
$ sudo useradd awstats $ sudo chown awstats:awstats /var/lib/awstats $ mkdir -p /var/www/html/awstatsicons $ cp -rf /usr/share/awstats/wwwroot/css /usr/share/awstats/wwwroot/icon/* /var/www/html/awstatsicons
Both webalizer and awstats use incremental processing and store state on the filesystem between run. Both tools rely on the last timestamp processed to discard "already processed" records. This is unfortunate because our load balancer has two log files covering the same timeframe.
That's where awstats logresolvemerge.pl
comes in handy,
as explained in FAQ-COM400.
logresolvemerge.pl
also has the advantage to work with gzipped
logs while awstats.pl
does not.
$ diff -u /etc/awstats/awstats.model.conf /etc/awstats/awstats.mysite.conf # AWSTATS CONFIGURE FILE 7.3 -LogFile="/var/log/httpd/access_log" +LogFile="/usr/share/awstats/tools/logresolvemerge.pl /var/downloads/var/log/nginx/*.gz |"
Once we download the logs from S3, we run the awstats update process to generate the statistics database (i.e. text files in /var/lib/awstats), then multiple output commands to generate the various html reports.
$ perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -update $ perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output -staticlinks > /var/www/html/awstats.mysite.html $ perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output=errors404 -staticlinks > /var/www/html/awstats.mysite.errors404.html
The log files are stored on S3 so we would only like to keep a copy locally for the time we run the awstats update. We would also prefer to avoid downloading all files every time we run update. We would also prefer to avoid awstats to spend the time to realize previous files were already processed.
We thus update the dcopylogs
to accept a --last-run filename
argument and will store
the last (by date) rotated file downloaded for a specific log.
$ dcopylogs --download --last-run /var/lib/awstats/mysite.json --location s3://mybucket /var/log/nginx/mysite-access.log
The final cron job will thus look like:
$ cat /etc/cron.daily/awstats #!/bin/sh LOGS_DIR=/var/downloads cd $LOGS_DIR /usr/local/bin/dcopylogs --download --last-run /var/lib/awstats/mysite.json --location s3://mybucket /var/log/nginx/mysite-access.log perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -update perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output -staticlinks > /var/www/html/awstats.mysite.html perl /usr/share/awstats/wwwroot/cgi-bin/awstats.pl -config=nginx -output=errors404 -staticlinks > /var/www/html/awstats.mysite.errors404.html rm $LOGS_DIR/var/log/nginx/*.gz
Expire logs on S3
Now remains the task of remove logs from S3 after a period of time. We do that by adding a S3 object lifecycle policy.
$ cat mybucket-lifecycle-policy.json { "Rules": [ { "ID": "expire-logs", "Prefix": "var/log/", "Status": "Enabled", "Transition": { "Days": 90, "StorageClass": "GLACIER" }, "Expiration" : { "Days": 365 } }] } $ aws s3api put-bucket-lifecycle --bucket mybucket \ --lifecycle-configuration file://mybucket-lifecycle-policy.json
Et voila, a basic log analytics pipeline!
More to read
You might also like to read:
- Debugging logrotate scripts
- Triggering a script on uploads to AWS S3
- Fast-tracking server errors to a log aggregator on S3
- Django, Gunicorn and Syslog-ng
More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a SaaS application hosting platform.