Today's Deep-Dive: AWStats

0:00

Welcome to the deep dive, the knowledge shortcut you need to master a mountain of

0:03

sources quickly.

0:04

Today we are opening a really fascinating window into internet history, the server

0:09

log. Before all

0:10

the modern JavaScript heavy tracking became the norm, all the intelligence about

0:15

who was visiting

0:15

your site and how well it was hidden right there in these cryptic files. We are

0:20

diving deep into

0:21

the technology that helped make sense of it all. Now, if you're new to web

0:24

operations, maybe think

0:26

of a server log like this. It's the quiet, meticulous security guard who writes

0:31

down every

0:31

single interaction at the door, every request, every failure, every successful

0:36

handshake.

0:36

It's dense raw data but it's full of gold. Before we unlock this data though, we

0:40

want to thank the

0:41

supporter of this deep dive, SafeServer. They specialize in hosting software,

0:45

helping you

0:45

manage tools just like the one we are discussing today and supporting your digital

0:49

transformation.

0:50

You can find more information at www.saveserver.de. Okay, so our mission today is

0:54

really to unpack

0:56

AWStats. That stands for Advanced Web Statistics. This tool is, well, it's

1:01

foundational because it

1:02

was free, powerful, and it genuinely made web analytics accessible, even for

1:06

beginners back

1:07

then. It's the thing that took that raw, pretty technical server data and turned it

1:12

into graphical,

1:13

actionable intelligence. Fundamentally, AWStats is a powerful log analyzer. It was

1:18

distributed

1:19

under the GNU general public license, the GPL, and that license detail is actually

1:24

key because

1:24

it means the software was always free, always community-driven. It didn't just

1:29

analyze web

1:30

servers either. It could generate advanced stats for streaming services, FTP, even

1:34

mail servers,

1:35

and deliver all that rich data visually. Right, so let's start with the basics

1:38

maybe for those

1:39

who haven't actually touched a server log before. If the log is just this massive

1:42

text file of

1:43

transactions, how did AWStats work its magic? What is a log file analyzer,

1:47

practically speaking?

1:49

Think of it as a translator. Log data is messy, right? Timestamps, IP addresses,

1:55

requested files, status codes. It's almost impossible to read manually and get any

2:00

real

2:00

insight. AWStats takes that whole stream and, in its own words, transforms it into

2:06

understandable

2:07

reports using few graphical web pages. Basically, it turns lines of code into bar

2:12

charts and pie

2:13

graphs, makes it visual. And what's fascinating here, reading through the sources,

2:16

is just how

2:16

compatible it was. This wasn't some niche script just for certain platforms, was it?

2:21

Not at all.

2:21

It was, you could say, an equal opportunity analyzer. It could handle logs from

2:25

pretty much

2:26

every major server tool available at the time. We're talking Apache logs, the NCSA

2:30

combined or

2:30

common formats, ISA logs using the W3C format Webstar, plus logs from specialized

2:36

proxy, WAP,

2:37

and streaming servers. That kind of breadth, that compatibility, it's really a hallmark

2:41

of truly

2:41

foundational software. And how did this, you know, relatively small free tool

2:45

handle all the heavy

2:46

lifting? Running a log analyzer on big servers sounds like it could be pretty

2:50

intensive. Yeah,

2:51

that's where some clever engineering comes in. AWStats was designed to be

2:55

lightweight and flexible.

2:56

It was primarily written in Perl, which, okay, itself requires Perl 5.000003 or

3:03

higher for the

3:04

more modern versions. And you could run it right from the command line, or you

3:07

could execute it

3:08

dynamically as a CGI script. That flexibility was key. Hold on. If it's running off

3:12

a lightweight

3:13

language like Perl, how did it possibly manage enterprise scale log files? I mean,

3:18

we could be

3:18

talking about logs of practically unlimited size on busy servers. Wouldn't that

3:23

just grind the system

3:24

to a halt? That's an excellent point, and it really hits on the core challenge of

3:27

early log analysis.

3:28

To process these potentially massive log files quickly and often, it uses something

3:33

called a

3:34

partial information file. So instead of recalculating every single metric from the

3:38

raw log every single

3:38

time, which would take hours, it stores intermediate data in that partial file.

3:42

This means subsequent

3:43

runs are incredibly fast. It saves a ton of server resources. Ah, okay. So it sort

3:48

of pre-digested the

3:49

information, making the updates almost instantaneous even if the raw log file kept

3:54

getting bigger and

3:54

bigger. That's pretty clever resource management. Precisely. And that efficiency

3:58

was absolutely

3:59

crucial because it meant it could handle logs that were split across multiple files

4:03

or even logs that

4:04

weren't perfectly sorted, which believe me was a common headache when you're

4:08

dealing with large

4:09

load-balanced systems. And for the newcomer, the setup was famously simple. The

4:14

documentation

4:15

literally said just one configuration file to edit. That low barrier to entry for

4:21

something

4:21

so powerful, that's a big reason why it became so popular globally. That simplicity

4:25

is remarkable.

4:27

But here's where the data gets really juicy, I think. Most people might think of

4:31

server logs

4:31

as just basic traffic counters, page views, maybe hits. But AWStats extracted this

4:36

surprising level

4:37

of detail that, well, even modern tag-based analytics sometimes overlooks. Let's

4:42

get into

4:43

the specifics of what this tool was revealing. Okay, so if we start with traffic

4:47

and timing,

4:48

it definitely gave you the essentials. Number of visits, unique visitors, average

4:51

visit duration,

4:52

standard stuff. But it synthesized this data to provide genuine business

4:57

intelligence. It showed

4:58

your site's rush hours and rush days. It detailed pages, hits, kilobytes

5:05

transferred for each hour

5:07

of the day or each day of the week. That tells you exactly when to schedule server

5:10

maintenance,

5:10

right? Or maybe when to launch your most important content. It shifts you from just

5:14

counting things

5:15

to actually planning. Exactly. But the granularity, it deepens significantly when

5:20

we look at visitor

5:21

identity and technology. This is where AWStats really, really shined in that fragmented

5:27

early

5:28

internet era. Using GOAP detection, it could automatically identify 269 domains in

5:33

countries.

5:34

And maybe more importantly for web developers back then, it identified the

5:38

technical stack of

5:39

the visitor. We're talking detecting 35 different operating systems and a core of

5:43

97 different

5:44

browsers. 97 browsers? Wow. That sounds like a testing nightmare for developers. It

5:49

was a

5:49

fragmentation nightmare, which is exactly why this tool was so critical. Just the

5:54

ability to

5:54

automatically identify that fragmentation, that was power. And furthermore, if you

5:59

use the specific

6:00

browser library it offered, that number jumped to over 450 detected browsers,

6:05

including various

6:06

phone and mobile clients. That level of detail. It's something modern analytics

6:11

often just aggregates

6:12

away, you know. That kind of specificity must have been absolutely vital for making

6:16

decisions

6:16

about feature support, especially back in the early 2000s when tech wasn't nearly

6:21

as standardized as

6:22

it is now. Absolutely. And that brings us neatly to the technical insights. This is

6:28

maybe the key

6:28

insight for understanding historical web development challenges. AWS stats could

6:33

report the ratio of

6:34

visitors whose browsers supported critical, often non-standard features like Java,

6:39

Flash,

6:39

Real G2, QuickTime, WMA, even PDF readers. Knowing this dictated huge development

6:46

decisions. I mean,

6:46

if only 10% of your audience supported Flash, you knew building your main

6:50

navigation in Flash

6:51

would basically kill your traffic. This wasn't just a nice-to-have feature. This

6:55

was like a

6:55

fundamental operational requirement. It told developers whether their audience

6:59

could even

6:59

consume the content they just spent weeks building. That's a fascinating look at

7:03

the

7:03

historical constraints of the web, really put things in perspective. Moving beyond

7:07

technical

7:08

capabilities, what did AWStats tell us about why people found the site in the first

7:11

place,

7:12

the marketing side? Right, that falls under marketing and search insights. Because

7:16

the

7:16

log file records the referrer, basically, where the visitor came from, ADStats

7:21

could reverse

7:22

engineer the search process. It detected search engines, key phrases, and keywords

7:26

used to find

7:26

the site. And get this, it recognized 115 of the most famous search engines at the

7:31

time.

7:32

115 search engines. That truly is a historical snapshot, isn't it? You're talking

7:36

about a world

7:37

where Google was just one player among many alongside giants like Yahoo and the venerable

7:42

AltaVista. It captures that moment in time perfectly. And then we have the security

7:47

and

7:47

maintenance data. This was equally critical for system administrators. It tracked

7:51

visits by

7:52

319 different automated robots or bots, helping admins separate human traffic from

7:58

crawlers,

7:59

which is crucial for accurate stats. And really importantly, it detected five

8:03

families of worm

8:04

attacks, giving you real time, well, almost real time security warnings right from

8:09

your logs.

8:10

Plus it reported all the HTTP errors like the classic page not found, 404 errors.

8:17

And for

8:18

maintenance, this was great. It showed the last refer for that bad link. So you

8:21

could immediately

8:22

go fix broad internal links or, you know, contact an external site that was sending

8:27

traffic to a dead page on your site. That's incredibly powerful diagnostics

8:30

straight from

8:31

the log file. And I love the final little quirk our sources mentioned here. It

8:35

actually tracked

8:36

the number of times the site was added to favorites bookmarks. That's pure old

8:41

school

8:41

engagement data, isn't it? Extracted directly from the server transaction. Yeah, it

8:44

just shows the

8:45

breadth of metadata that was actually available in those raw transactions if you

8:48

had the right

8:48

tool to pull it out. Okay, so we've established that AWS stats was powerful, it was

8:53

free,

8:54

and it was incredibly detailed for what was essentially a single config file per

8:58

all script.

8:59

This obviously made it super popular with individual users, site owners. But what

9:04

about

9:04

the pros? Like web hosting providers managing hundreds, maybe thousands of sites.

9:09

Did this

9:10

simple tool scale up to meet those kinds of enterprise demands? It absolutely did.

9:14

The

9:14

tool's flexibility really made it ideal for providers. Crucially, it supported

9:19

multi-named

9:19

websites or what we usually call virtual servers or virtual hosts. This meant a

9:24

hosting company

9:25

could run just one instance of AWS stats and efficiently analyze the separate log

9:29

files

9:30

for dozens, even hundreds of their clients. The output flexibility was also key for

9:34

integration

9:35

purposes. Reports could be generated dynamically via CGI, maybe on demand, or you

9:39

could generate

9:40

static HTML or XHTML pages. Perfect for just dropping into a client's control panel

9:45

or portal.

9:46

Our sources even note that experimental PDF export was possible at some point.

9:50

And given that this tool was digging around in potentially sensitive server data,

9:54

I assume security and maybe extensibility were thought about, built in.

9:58

Yes, definitely. Security was baked in. Notably, it included protection against

10:03

cross-site scripting

10:04

attacks, XSS attacks, that was important. And its extensibility was actually

10:08

massive. It supported

10:09

numerous options, filters, and plugins. Things like reverse DNS lookup to turn IP

10:14

addresses into

10:15

host names. And for developers who wanted to manipulate the analysis data outside

10:19

the tool

10:20

itself, it offered the ability to store the results in XML format, which you could

10:24

then process with

10:24

XSLT or other tools. It truly provided both the raw insights and the tools to

10:29

customize how you use

10:31

them. So we have this really foundational, highly capable, free tool that was

10:34

basically the backbone

10:35

of web analytics for many years. But here's the inevitable question, the status

10:39

update.

10:40

What is the health of the AWS Stats project today? Where does it stand?

10:43

Right. And here is the critical status update for anyone still using it or

10:47

considering it.

10:48

AWS Stats is now essentially transitioning into a legacy phase. The original author,

10:53

Laurent de Steyer, who interestingly is also the project leader of Dolly Bar ERPCRM,

10:58

he's no longer developing new versions himself. Version 8.0, which was actually

11:03

released back on

11:04

August 26, 2025, is planned to be the last version released by the original author.

11:09

So this means that

11:10

future maintenance, any bug fixes, any new feature development, it's all going to

11:14

rely entirely on

11:15

the community stepping up. It's shifting toward that classic open source community

11:19

support model now.

11:20

Okay, so for our listeners who really rely on this kind of powerful log-based

11:25

analysis

11:25

and maybe especially value data ownership, you know, the ability to keep their

11:29

server data local

11:30

under their own control, what's the recommended migration path now that the

11:33

original development

11:34

is wrapping up? Well, the clearest and probably most recommended migration path,

11:38

especially for

11:39

those prioritizing open source principles and data control, is Matomo. Specifically,

11:43

they should look

11:44

at Matomo log analytics. This tool really maintains that core principle of

11:48

analyzing raw server logs.

11:49

It avoids client-side tracking tags, just like AWStats, and it offers a pretty

11:54

smooth path for

11:54

users looking to transition, potentially bringing their historical data and

11:58

analysis methods over

11:59

from AWStats. Wow. Okay, we've covered a truly remarkable piece of internet

12:04

infrastructure today.

12:05

We explored how a relatively simple single config file Perl script called AWStats

12:12

managed to transform

12:13

these cryptic server logs into clear graphical intelligence. It provided this just

12:18

unparalleled

12:18

detail for its time, tracking everything from, you know, five families of worm

12:22

attacks to the

12:23

specific browser capabilities that literally dictated early web design decisions,

12:27

all while

12:27

being free and open source. And here is the final provocative thought we want to

12:31

leave you with today.

12:33

AWStats proved years ago that you could extract incredible, really granular data

12:39

about user

12:40

behavior, their OS, their browser capabilities, their geography, all from the

12:44

server-side log file.

12:45

This required no third-party JavaScript, no cookies, no client-side tracking tags

12:50

whatsoever.

12:51

So if all that depth of data was sitting right there in the log file the whole time,

12:55

what valuable metrics are we potentially missing today by relying so heavily, maybe

13:00

solely,

13:00

on JavaScript-based analytics, analytics that are prone to ad blockers, network

13:05

issues,

13:05

and increasing privacy limitations? It really makes you wonder how much visibility,

13:10

maybe even control, we've handed over in the process. A fascinating question to ponder

13:14

indeed.

13:15

Thank you for joining us for this deep dive. We want to extend one final thanks to

13:18

our sponsor,

13:19

SafeServer, for supporting this exploration and for assisting with digital

13:22

transformation.

13:22

dive into what you

13:22

dive into what you

Today's Deep-Dive: AWStats

Episode description

Persons