Welcome to the deep dive, the knowledge shortcut you need to master a mountain of
sources quickly.
Today we are opening a really fascinating window into internet history, the server
log. Before all
the modern JavaScript heavy tracking became the norm, all the intelligence about
who was visiting
your site and how well it was hidden right there in these cryptic files. We are
diving deep into
the technology that helped make sense of it all. Now, if you're new to web
operations, maybe think
of a server log like this. It's the quiet, meticulous security guard who writes
down every
single interaction at the door, every request, every failure, every successful
handshake.
It's dense raw data but it's full of gold. Before we unlock this data though, we
want to thank the
supporter of this deep dive, SafeServer. They specialize in hosting software,
helping you
manage tools just like the one we are discussing today and supporting your digital
transformation.
You can find more information at www.saveserver.de. Okay, so our mission today is
really to unpack
AWStats. That stands for Advanced Web Statistics. This tool is, well, it's
foundational because it
was free, powerful, and it genuinely made web analytics accessible, even for
beginners back
then. It's the thing that took that raw, pretty technical server data and turned it
into graphical,
actionable intelligence. Fundamentally, AWStats is a powerful log analyzer. It was
distributed
under the GNU general public license, the GPL, and that license detail is actually
key because
it means the software was always free, always community-driven. It didn't just
analyze web
servers either. It could generate advanced stats for streaming services, FTP, even
mail servers,
and deliver all that rich data visually. Right, so let's start with the basics
maybe for those
who haven't actually touched a server log before. If the log is just this massive
text file of
transactions, how did AWStats work its magic? What is a log file analyzer,
practically speaking?
Think of it as a translator. Log data is messy, right? Timestamps, IP addresses,
requested files, status codes. It's almost impossible to read manually and get any
real
insight. AWStats takes that whole stream and, in its own words, transforms it into
understandable
reports using few graphical web pages. Basically, it turns lines of code into bar
charts and pie
graphs, makes it visual. And what's fascinating here, reading through the sources,
is just how
compatible it was. This wasn't some niche script just for certain platforms, was it?
Not at all.
It was, you could say, an equal opportunity analyzer. It could handle logs from
pretty much
every major server tool available at the time. We're talking Apache logs, the NCSA
combined or
common formats, ISA logs using the W3C format Webstar, plus logs from specialized
proxy, WAP,
and streaming servers. That kind of breadth, that compatibility, it's really a hallmark
of truly
foundational software. And how did this, you know, relatively small free tool
handle all the heavy
lifting? Running a log analyzer on big servers sounds like it could be pretty
intensive. Yeah,
that's where some clever engineering comes in. AWStats was designed to be
lightweight and flexible.
It was primarily written in Perl, which, okay, itself requires Perl 5.000003 or
higher for the
more modern versions. And you could run it right from the command line, or you
could execute it
dynamically as a CGI script. That flexibility was key. Hold on. If it's running off
a lightweight
language like Perl, how did it possibly manage enterprise scale log files? I mean,
we could be
talking about logs of practically unlimited size on busy servers. Wouldn't that
just grind the system
to a halt? That's an excellent point, and it really hits on the core challenge of
early log analysis.
To process these potentially massive log files quickly and often, it uses something
called a
partial information file. So instead of recalculating every single metric from the
raw log every single
time, which would take hours, it stores intermediate data in that partial file.
This means subsequent
runs are incredibly fast. It saves a ton of server resources. Ah, okay. So it sort
of pre-digested the
information, making the updates almost instantaneous even if the raw log file kept
getting bigger and
bigger. That's pretty clever resource management. Precisely. And that efficiency
was absolutely
crucial because it meant it could handle logs that were split across multiple files
or even logs that
weren't perfectly sorted, which believe me was a common headache when you're
dealing with large
load-balanced systems. And for the newcomer, the setup was famously simple. The
documentation
literally said just one configuration file to edit. That low barrier to entry for
something
so powerful, that's a big reason why it became so popular globally. That simplicity
is remarkable.
But here's where the data gets really juicy, I think. Most people might think of
server logs
as just basic traffic counters, page views, maybe hits. But AWStats extracted this
surprising level
of detail that, well, even modern tag-based analytics sometimes overlooks. Let's
get into
the specifics of what this tool was revealing. Okay, so if we start with traffic
and timing,
it definitely gave you the essentials. Number of visits, unique visitors, average
visit duration,
standard stuff. But it synthesized this data to provide genuine business
intelligence. It showed
your site's rush hours and rush days. It detailed pages, hits, kilobytes
transferred for each hour
of the day or each day of the week. That tells you exactly when to schedule server
maintenance,
right? Or maybe when to launch your most important content. It shifts you from just
counting things
to actually planning. Exactly. But the granularity, it deepens significantly when
we look at visitor
identity and technology. This is where AWStats really, really shined in that fragmented
early
internet era. Using GOAP detection, it could automatically identify 269 domains in
countries.
And maybe more importantly for web developers back then, it identified the
technical stack of
the visitor. We're talking detecting 35 different operating systems and a core of
97 different
browsers. 97 browsers? Wow. That sounds like a testing nightmare for developers. It
was a
fragmentation nightmare, which is exactly why this tool was so critical. Just the
ability to
automatically identify that fragmentation, that was power. And furthermore, if you
use the specific
browser library it offered, that number jumped to over 450 detected browsers,
including various
phone and mobile clients. That level of detail. It's something modern analytics
often just aggregates
away, you know. That kind of specificity must have been absolutely vital for making
decisions
about feature support, especially back in the early 2000s when tech wasn't nearly
as standardized as
it is now. Absolutely. And that brings us neatly to the technical insights. This is
maybe the key
insight for understanding historical web development challenges. AWS stats could
report the ratio of
visitors whose browsers supported critical, often non-standard features like Java,
Flash,
Real G2, QuickTime, WMA, even PDF readers. Knowing this dictated huge development
decisions. I mean,
if only 10% of your audience supported Flash, you knew building your main
navigation in Flash
would basically kill your traffic. This wasn't just a nice-to-have feature. This
was like a
fundamental operational requirement. It told developers whether their audience
could even
consume the content they just spent weeks building. That's a fascinating look at
the
historical constraints of the web, really put things in perspective. Moving beyond
technical
capabilities, what did AWStats tell us about why people found the site in the first
place,
the marketing side? Right, that falls under marketing and search insights. Because
the
log file records the referrer, basically, where the visitor came from, ADStats
could reverse
engineer the search process. It detected search engines, key phrases, and keywords
used to find
the site. And get this, it recognized 115 of the most famous search engines at the
time.
115 search engines. That truly is a historical snapshot, isn't it? You're talking
about a world
where Google was just one player among many alongside giants like Yahoo and the venerable
AltaVista. It captures that moment in time perfectly. And then we have the security
and
maintenance data. This was equally critical for system administrators. It tracked
visits by
319 different automated robots or bots, helping admins separate human traffic from
crawlers,
which is crucial for accurate stats. And really importantly, it detected five
families of worm
attacks, giving you real time, well, almost real time security warnings right from
your logs.
Plus it reported all the HTTP errors like the classic page not found, 404 errors.
And for
maintenance, this was great. It showed the last refer for that bad link. So you
could immediately
go fix broad internal links or, you know, contact an external site that was sending
traffic to a dead page on your site. That's incredibly powerful diagnostics
straight from
the log file. And I love the final little quirk our sources mentioned here. It
actually tracked
the number of times the site was added to favorites bookmarks. That's pure old
school
engagement data, isn't it? Extracted directly from the server transaction. Yeah, it
just shows the
breadth of metadata that was actually available in those raw transactions if you
had the right
tool to pull it out. Okay, so we've established that AWS stats was powerful, it was
free,
and it was incredibly detailed for what was essentially a single config file per
all script.
This obviously made it super popular with individual users, site owners. But what
about
the pros? Like web hosting providers managing hundreds, maybe thousands of sites.
Did this
simple tool scale up to meet those kinds of enterprise demands? It absolutely did.
The
tool's flexibility really made it ideal for providers. Crucially, it supported
multi-named
websites or what we usually call virtual servers or virtual hosts. This meant a
hosting company
could run just one instance of AWS stats and efficiently analyze the separate log
files
for dozens, even hundreds of their clients. The output flexibility was also key for
integration
purposes. Reports could be generated dynamically via CGI, maybe on demand, or you
could generate
static HTML or XHTML pages. Perfect for just dropping into a client's control panel
or portal.
Our sources even note that experimental PDF export was possible at some point.
And given that this tool was digging around in potentially sensitive server data,
I assume security and maybe extensibility were thought about, built in.
Yes, definitely. Security was baked in. Notably, it included protection against
cross-site scripting
attacks, XSS attacks, that was important. And its extensibility was actually
massive. It supported
numerous options, filters, and plugins. Things like reverse DNS lookup to turn IP
addresses into
host names. And for developers who wanted to manipulate the analysis data outside
the tool
itself, it offered the ability to store the results in XML format, which you could
then process with
XSLT or other tools. It truly provided both the raw insights and the tools to
customize how you use
them. So we have this really foundational, highly capable, free tool that was
basically the backbone
of web analytics for many years. But here's the inevitable question, the status
update.
What is the health of the AWS Stats project today? Where does it stand?
Right. And here is the critical status update for anyone still using it or
considering it.
AWS Stats is now essentially transitioning into a legacy phase. The original author,
Laurent de Steyer, who interestingly is also the project leader of Dolly Bar ERPCRM,
he's no longer developing new versions himself. Version 8.0, which was actually
released back on
August 26, 2025, is planned to be the last version released by the original author.
So this means that
future maintenance, any bug fixes, any new feature development, it's all going to
rely entirely on
the community stepping up. It's shifting toward that classic open source community
support model now.
Okay, so for our listeners who really rely on this kind of powerful log-based
analysis
and maybe especially value data ownership, you know, the ability to keep their
server data local
under their own control, what's the recommended migration path now that the
original development
is wrapping up? Well, the clearest and probably most recommended migration path,
especially for
those prioritizing open source principles and data control, is Matomo. Specifically,
they should look
at Matomo log analytics. This tool really maintains that core principle of
analyzing raw server logs.
It avoids client-side tracking tags, just like AWStats, and it offers a pretty
smooth path for
users looking to transition, potentially bringing their historical data and
analysis methods over
from AWStats. Wow. Okay, we've covered a truly remarkable piece of internet
infrastructure today.
We explored how a relatively simple single config file Perl script called AWStats
managed to transform
these cryptic server logs into clear graphical intelligence. It provided this just
unparalleled
detail for its time, tracking everything from, you know, five families of worm
attacks to the
specific browser capabilities that literally dictated early web design decisions,
all while
being free and open source. And here is the final provocative thought we want to
leave you with today.
AWStats proved years ago that you could extract incredible, really granular data
about user
behavior, their OS, their browser capabilities, their geography, all from the
server-side log file.
This required no third-party JavaScript, no cookies, no client-side tracking tags
whatsoever.
So if all that depth of data was sitting right there in the log file the whole time,
what valuable metrics are we potentially missing today by relying so heavily, maybe
solely,
on JavaScript-based analytics, analytics that are prone to ad blockers, network
issues,
and increasing privacy limitations? It really makes you wonder how much visibility,
maybe even control, we've handed over in the process. A fascinating question to ponder
indeed.
Thank you for joining us for this deep dive. We want to extend one final thanks to
our sponsor,
SafeServer, for supporting this exploration and for assisting with digital
transformation.
dive into what you
dive into what you