Visually Analyzing Web Logs with goaccess.io
Background
System administrators are often interested in analyzing HTTP statistics for the sites they manage. When I worked as a consultant at the USDA, one of the dashboards we built was to analyze the server logs for QuickStats, a data product built by USDA/NASS. Each morning, they’d push their Nginx server log to our SFTP server, we’d then transform and load the data into our data lake environment. The data would feed a custom Tableau dashboard that we developed.
It was a lengthy process to get the dashboard into production and I was looking for something faster and open source for analyzing my web logs.
goaccess.io
I came across goaccess.io, an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser. Some of it’s features include:
- Completely real time
- It supports nearly all web log formats (Apache, Nginx, S3, …)
- Track application response time
- Visitor statistics
Here’s a preview of the HTML report that it generates:
For a more comprehensive list of its features see here.
My Use Case
My website doesn’t get much traffic so I didn’t feel the need for real time reporting. Instead, I chose to generate an HTML report nightly. Here’s my crontab entry:
59 23 * * * goaccess /var/log/nginx/access.log -o /var/www/website_stats/index.html --log-format=COMBINED
I’ve configured my Nginx server logs to roll over each month using logrotate. As a result, the report that goaccess generates will contain usage statistics for the current month.
Aside: you can pass multiple log files to goaccess if desired:
goaccess access.log access.log.1
Closing thoughts
goaccess.io is a great open source analytics and monitoring tool for your web logs. You can simply view the output in your terminal or as an HTML report.