Today's Deep-Dive: AWStats
Ep. 304

Today's Deep-Dive: AWStats

Episode description

This episode explores AWStats, a foundational log analyzer that transformed raw server log data into accessible graphical insights. Before the dominance of JavaScript-heavy tracking, AWStats provided detailed analytics by processing server logs, which acted as meticulous records of website interactions. Distributed under the GPL, AWStats was free, powerful, and community-driven, making web analytics accessible even to beginners. It could analyze logs from various servers, including Apache, and presented data through user-friendly reports with charts and graphs. AWStats’s efficiency stemmed from using partial information files to store intermediate data, enabling rapid processing of large log files. The tool offered granular insights into visitor identity, technology (detecting hundreds of browsers and operating systems), and technical capabilities like Java or Flash support, which were crucial for early web development decisions. It also provided marketing insights by identifying search engines and keywords used to find a site, alongside security data like bot tracking and worm attack detection. For hosting providers, AWStats supported multi-named websites, allowing efficient analysis of numerous client logs. While the original author is no longer developing new versions, the project is transitioning to a community-supported model. For users prioritizing open source and data control, Matomo’s log analytics is recommended as a migration path. The document concludes by questioning the potential loss of visibility and control in modern analytics due to over-reliance on client-side tracking, highlighting AWStats’s server-side data extraction capabilities as a testament to what was achievable.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now!

Download transcript (.srt)
0:00

Welcome to the deep dive, the knowledge shortcut you need to master a mountain of

0:03

sources quickly.

0:04

Today we are opening a really fascinating window into internet history, the server

0:09

log. Before all

0:10

the modern JavaScript heavy tracking became the norm, all the intelligence about

0:15

who was visiting

0:15

your site and how well it was hidden right there in these cryptic files. We are

0:20

diving deep into

0:21

the technology that helped make sense of it all. Now, if you're new to web

0:24

operations, maybe think

0:26

of a server log like this. It's the quiet, meticulous security guard who writes

0:31

down every

0:31

single interaction at the door, every request, every failure, every successful

0:36

handshake.

0:36

It's dense raw data but it's full of gold. Before we unlock this data though, we

0:40

want to thank the

0:41

supporter of this deep dive, SafeServer. They specialize in hosting software,

0:45

helping you

0:45

manage tools just like the one we are discussing today and supporting your digital

0:49

transformation.

0:50

You can find more information at www.saveserver.de. Okay, so our mission today is

0:54

really to unpack

0:56

AWStats. That stands for Advanced Web Statistics. This tool is, well, it's

1:01

foundational because it

1:02

was free, powerful, and it genuinely made web analytics accessible, even for

1:06

beginners back

1:07

then. It's the thing that took that raw, pretty technical server data and turned it

1:12

into graphical,

1:13

actionable intelligence. Fundamentally, AWStats is a powerful log analyzer. It was

1:18

distributed

1:19

under the GNU general public license, the GPL, and that license detail is actually

1:24

key because

1:24

it means the software was always free, always community-driven. It didn't just

1:29

analyze web

1:30

servers either. It could generate advanced stats for streaming services, FTP, even

1:34

mail servers,

1:35

and deliver all that rich data visually. Right, so let's start with the basics

1:38

maybe for those

1:39

who haven't actually touched a server log before. If the log is just this massive

1:42

text file of

1:43

transactions, how did AWStats work its magic? What is a log file analyzer,

1:47

practically speaking?

1:49

Think of it as a translator. Log data is messy, right? Timestamps, IP addresses,

1:55

requested files, status codes. It's almost impossible to read manually and get any

2:00

real

2:00

insight. AWStats takes that whole stream and, in its own words, transforms it into

2:06

understandable

2:07

reports using few graphical web pages. Basically, it turns lines of code into bar

2:12

charts and pie

2:13

graphs, makes it visual. And what's fascinating here, reading through the sources,

2:16

is just how

2:16

compatible it was. This wasn't some niche script just for certain platforms, was it?

2:21

Not at all.

2:21

It was, you could say, an equal opportunity analyzer. It could handle logs from

2:25

pretty much

2:26

every major server tool available at the time. We're talking Apache logs, the NCSA

2:30

combined or

2:30

common formats, ISA logs using the W3C format Webstar, plus logs from specialized

2:36

proxy, WAP,

2:37

and streaming servers. That kind of breadth, that compatibility, it's really a hallmark

2:41

of truly

2:41

foundational software. And how did this, you know, relatively small free tool

2:45

handle all the heavy

2:46

lifting? Running a log analyzer on big servers sounds like it could be pretty

2:50

intensive. Yeah,

2:51

that's where some clever engineering comes in. AWStats was designed to be

2:55

lightweight and flexible.

2:56

It was primarily written in Perl, which, okay, itself requires Perl 5.000003 or

3:03

higher for the

3:04

more modern versions. And you could run it right from the command line, or you

3:07

could execute it

3:08

dynamically as a CGI script. That flexibility was key. Hold on. If it's running off

3:12

a lightweight

3:13

language like Perl, how did it possibly manage enterprise scale log files? I mean,

3:18

we could be

3:18

talking about logs of practically unlimited size on busy servers. Wouldn't that

3:23

just grind the system

3:24

to a halt? That's an excellent point, and it really hits on the core challenge of

3:27

early log analysis.

3:28

To process these potentially massive log files quickly and often, it uses something

3:33

called a

3:34

partial information file. So instead of recalculating every single metric from the

3:38

raw log every single

3:38

time, which would take hours, it stores intermediate data in that partial file.

3:42

This means subsequent

3:43

runs are incredibly fast. It saves a ton of server resources. Ah, okay. So it sort

3:48

of pre-digested the

3:49

information, making the updates almost instantaneous even if the raw log file kept

3:54

getting bigger and

3:54

bigger. That's pretty clever resource management. Precisely. And that efficiency

3:58

was absolutely

3:59

crucial because it meant it could handle logs that were split across multiple files

4:03

or even logs that

4:04

weren't perfectly sorted, which believe me was a common headache when you're

4:08

dealing with large

4:09

load-balanced systems. And for the newcomer, the setup was famously simple. The

4:14

documentation

4:15

literally said just one configuration file to edit. That low barrier to entry for

4:21

something

4:21

so powerful, that's a big reason why it became so popular globally. That simplicity

4:25

is remarkable.

4:27

But here's where the data gets really juicy, I think. Most people might think of

4:31

server logs

4:31

as just basic traffic counters, page views, maybe hits. But AWStats extracted this

4:36

surprising level

4:37

of detail that, well, even modern tag-based analytics sometimes overlooks. Let's

4:42

get into

4:43

the specifics of what this tool was revealing. Okay, so if we start with traffic

4:47

and timing,

4:48

it definitely gave you the essentials. Number of visits, unique visitors, average

4:51

visit duration,

4:52

standard stuff. But it synthesized this data to provide genuine business

4:57

intelligence. It showed

4:58

your site's rush hours and rush days. It detailed pages, hits, kilobytes

5:05

transferred for each hour

5:07

of the day or each day of the week. That tells you exactly when to schedule server

5:10

maintenance,

5:10

right? Or maybe when to launch your most important content. It shifts you from just

5:14

counting things

5:15

to actually planning. Exactly. But the granularity, it deepens significantly when

5:20

we look at visitor

5:21

identity and technology. This is where AWStats really, really shined in that fragmented

5:27

early

5:28

internet era. Using GOAP detection, it could automatically identify 269 domains in

5:33

countries.

5:34

And maybe more importantly for web developers back then, it identified the

5:38

technical stack of

5:39

the visitor. We're talking detecting 35 different operating systems and a core of

5:43

97 different

5:44

browsers. 97 browsers? Wow. That sounds like a testing nightmare for developers. It

5:49

was a

5:49

fragmentation nightmare, which is exactly why this tool was so critical. Just the

5:54

ability to

5:54

automatically identify that fragmentation, that was power. And furthermore, if you

5:59

use the specific

6:00

browser library it offered, that number jumped to over 450 detected browsers,

6:05

including various

6:06

phone and mobile clients. That level of detail. It's something modern analytics

6:11

often just aggregates

6:12

away, you know. That kind of specificity must have been absolutely vital for making

6:16

decisions

6:16

about feature support, especially back in the early 2000s when tech wasn't nearly

6:21

as standardized as

6:22

it is now. Absolutely. And that brings us neatly to the technical insights. This is

6:28

maybe the key

6:28

insight for understanding historical web development challenges. AWS stats could

6:33

report the ratio of

6:34

visitors whose browsers supported critical, often non-standard features like Java,

6:39

Flash,

6:39

Real G2, QuickTime, WMA, even PDF readers. Knowing this dictated huge development

6:46

decisions. I mean,

6:46

if only 10% of your audience supported Flash, you knew building your main

6:50

navigation in Flash

6:51

would basically kill your traffic. This wasn't just a nice-to-have feature. This

6:55

was like a

6:55

fundamental operational requirement. It told developers whether their audience

6:59

could even

6:59

consume the content they just spent weeks building. That's a fascinating look at

7:03

the

7:03

historical constraints of the web, really put things in perspective. Moving beyond

7:07

technical

7:08

capabilities, what did AWStats tell us about why people found the site in the first

7:11

place,

7:12

the marketing side? Right, that falls under marketing and search insights. Because

7:16

the

7:16

log file records the referrer, basically, where the visitor came from, ADStats

7:21

could reverse

7:22

engineer the search process. It detected search engines, key phrases, and keywords

7:26

used to find

7:26

the site. And get this, it recognized 115 of the most famous search engines at the

7:31

time.

7:32

115 search engines. That truly is a historical snapshot, isn't it? You're talking

7:36

about a world

7:37

where Google was just one player among many alongside giants like Yahoo and the venerable

7:42

AltaVista. It captures that moment in time perfectly. And then we have the security

7:47

and

7:47

maintenance data. This was equally critical for system administrators. It tracked

7:51

visits by

7:52

319 different automated robots or bots, helping admins separate human traffic from

7:58

crawlers,

7:59

which is crucial for accurate stats. And really importantly, it detected five

8:03

families of worm

8:04

attacks, giving you real time, well, almost real time security warnings right from

8:09

your logs.

8:10

Plus it reported all the HTTP errors like the classic page not found, 404 errors.

8:17

And for

8:18

maintenance, this was great. It showed the last refer for that bad link. So you

8:21

could immediately

8:22

go fix broad internal links or, you know, contact an external site that was sending

8:27

traffic to a dead page on your site. That's incredibly powerful diagnostics

8:30

straight from

8:31

the log file. And I love the final little quirk our sources mentioned here. It

8:35

actually tracked

8:36

the number of times the site was added to favorites bookmarks. That's pure old

8:41

school

8:41

engagement data, isn't it? Extracted directly from the server transaction. Yeah, it

8:44

just shows the

8:45

breadth of metadata that was actually available in those raw transactions if you

8:48

had the right

8:48

tool to pull it out. Okay, so we've established that AWS stats was powerful, it was

8:53

free,

8:54

and it was incredibly detailed for what was essentially a single config file per

8:58

all script.

8:59

This obviously made it super popular with individual users, site owners. But what

9:04

about

9:04

the pros? Like web hosting providers managing hundreds, maybe thousands of sites.

9:09

Did this

9:10

simple tool scale up to meet those kinds of enterprise demands? It absolutely did.

9:14

The

9:14

tool's flexibility really made it ideal for providers. Crucially, it supported

9:19

multi-named

9:19

websites or what we usually call virtual servers or virtual hosts. This meant a

9:24

hosting company

9:25

could run just one instance of AWS stats and efficiently analyze the separate log

9:29

files

9:30

for dozens, even hundreds of their clients. The output flexibility was also key for

9:34

integration

9:35

purposes. Reports could be generated dynamically via CGI, maybe on demand, or you

9:39

could generate

9:40

static HTML or XHTML pages. Perfect for just dropping into a client's control panel

9:45

or portal.

9:46

Our sources even note that experimental PDF export was possible at some point.

9:50

And given that this tool was digging around in potentially sensitive server data,

9:54

I assume security and maybe extensibility were thought about, built in.

9:58

Yes, definitely. Security was baked in. Notably, it included protection against

10:03

cross-site scripting

10:04

attacks, XSS attacks, that was important. And its extensibility was actually

10:08

massive. It supported

10:09

numerous options, filters, and plugins. Things like reverse DNS lookup to turn IP

10:14

addresses into

10:15

host names. And for developers who wanted to manipulate the analysis data outside

10:19

the tool

10:20

itself, it offered the ability to store the results in XML format, which you could

10:24

then process with

10:24

XSLT or other tools. It truly provided both the raw insights and the tools to

10:29

customize how you use

10:31

them. So we have this really foundational, highly capable, free tool that was

10:34

basically the backbone

10:35

of web analytics for many years. But here's the inevitable question, the status

10:39

update.

10:40

What is the health of the AWS Stats project today? Where does it stand?

10:43

Right. And here is the critical status update for anyone still using it or

10:47

considering it.

10:48

AWS Stats is now essentially transitioning into a legacy phase. The original author,

10:53

Laurent de Steyer, who interestingly is also the project leader of Dolly Bar ERPCRM,

10:58

he's no longer developing new versions himself. Version 8.0, which was actually

11:03

released back on

11:04

August 26, 2025, is planned to be the last version released by the original author.

11:09

So this means that

11:10

future maintenance, any bug fixes, any new feature development, it's all going to

11:14

rely entirely on

11:15

the community stepping up. It's shifting toward that classic open source community

11:19

support model now.

11:20

Okay, so for our listeners who really rely on this kind of powerful log-based

11:25

analysis

11:25

and maybe especially value data ownership, you know, the ability to keep their

11:29

server data local

11:30

under their own control, what's the recommended migration path now that the

11:33

original development

11:34

is wrapping up? Well, the clearest and probably most recommended migration path,

11:38

especially for

11:39

those prioritizing open source principles and data control, is Matomo. Specifically,

11:43

they should look

11:44

at Matomo log analytics. This tool really maintains that core principle of

11:48

analyzing raw server logs.

11:49

It avoids client-side tracking tags, just like AWStats, and it offers a pretty

11:54

smooth path for

11:54

users looking to transition, potentially bringing their historical data and

11:58

analysis methods over

11:59

from AWStats. Wow. Okay, we've covered a truly remarkable piece of internet

12:04

infrastructure today.

12:05

We explored how a relatively simple single config file Perl script called AWStats

12:12

managed to transform

12:13

these cryptic server logs into clear graphical intelligence. It provided this just

12:18

unparalleled

12:18

detail for its time, tracking everything from, you know, five families of worm

12:22

attacks to the

12:23

specific browser capabilities that literally dictated early web design decisions,

12:27

all while

12:27

being free and open source. And here is the final provocative thought we want to

12:31

leave you with today.

12:33

AWStats proved years ago that you could extract incredible, really granular data

12:39

about user

12:40

behavior, their OS, their browser capabilities, their geography, all from the

12:44

server-side log file.

12:45

This required no third-party JavaScript, no cookies, no client-side tracking tags

12:50

whatsoever.

12:51

So if all that depth of data was sitting right there in the log file the whole time,

12:55

what valuable metrics are we potentially missing today by relying so heavily, maybe

13:00

solely,

13:00

on JavaScript-based analytics, analytics that are prone to ad blockers, network

13:05

issues,

13:05

and increasing privacy limitations? It really makes you wonder how much visibility,

13:10

maybe even control, we've handed over in the process. A fascinating question to ponder

13:14

indeed.

13:15

Thank you for joining us for this deep dive. We want to extend one final thanks to

13:18

our sponsor,

13:19

SafeServer, for supporting this exploration and for assisting with digital

13:22

transformation.

13:22

dive into what you

13:22

dive into what you