Visually Analyzing Web Logs with goaccess.io

Background

System administrators are often interested in analyzing HTTP statistics for the sites they manage. When I worked as a consultant at the USDA, one of the dashboards we built was to analyze the server logs for QuickStats, a data product built by USDA/NASS. Each morning, they’d push their Nginx server log to our SFTP server, we’d then transform and load the data into our data lake environment. The data would feed a custom Tableau dashboard that we developed.

It was a lengthy process to get the dashboard into production and I was looking for something faster and open source for analyzing my web logs.

goaccess.io

I came across goaccess.io, an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser. Some of it’s features include:

  • Completely real time
  • It supports nearly all web log formats (Apache, Nginx, S3, …)
  • Track application response time
  • Visitor statistics

Here’s a preview of the HTML report that it generates: Dashboard Preview

For a more comprehensive list of its features see here.

My Use Case

My website doesn’t get much traffic so I didn’t feel the need for real time reporting. Instead, I chose to generate an HTML report nightly. Here’s my crontab entry:

59 23 * * * goaccess /var/log/nginx/access.log -o /var/www/website_stats/index.html --log-format=COMBINED

I’ve configured my Nginx server logs to roll over each month using logrotate. As a result, the report that goaccess generates will contain usage statistics for the current month.

Aside: you can pass multiple log files to goaccess if desired:

goaccess access.log access.log.1

Closing thoughts

goaccess.io is a great open source analytics and monitoring tool for your web logs. You can simply view the output in your terminal or as an HTML report.