Today's Deep-Dive: PostHog
Ep. 283

Today's Deep-Dive: PostHog

Episode description

PostHog is an open-source platform aiming to be a single source of truth for product development, consolidating tools like product analytics, session recording, feature flagging, and A-B testing into one bundle. It operates on a philosophy of being a ‘product OS,’ providing developers with a comprehensive view of their product data by integrating information from various sources, including financial and support data, not just in-app events. This unified approach aims to reduce the risks associated with managing multiple separate tools and data silos. The platform’s synergy lies in how these tools share underlying event data immediately, allowing for seamless transitions between tasks like setting up feature flags and running A-B tests, or debugging errors by directly linking them to session replays. PostHog also offers data warehouse capabilities to ingest and transform external data, sending it to over 25 downstream tools, ensuring flexibility. For AI-driven applications, it provides specialized metrics like API call traces, token counts, and cost per query, directly linking engineering costs to user behavior. PostHog is accessible through a generous free tier on its cloud-hosted version, with usage-based pricing kicking in only after limits are exceeded, and a self-hosted option available for smaller-scale use. The company emphasizes radical transparency, open-sourcing its company handbook and strategy documents, which builds trust with developers and extends to its clear, usage-based pricing with no sales calls. Future developments focus on AI automation for analysis tasks and potentially integrating AI into the development workflow, acting as a ‘product co-pilot.’ This transparent, unified ecosystem is designed to provide product engineers with a single source of truth, reducing guesswork and potential security risks.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now for 1 Euro - 30 days free!

Download transcript (.srt)
0:00

Okay, let's unpack this. We are diving into a toolkit today that, well, it seems to

0:05

have

0:05

really shifted how product engineers think about building software. And actually,

0:10

before we really

0:10

get started, we absolutely want to thank the supporter of this deep dive. Safe

0:15

Server. Safe

0:16

Server supports this exploration, handling the hosting side of things for software

0:20

like this,

0:21

and really aiding in your digital transformation. You can find more info at www.safeserver.de.

0:28

So our mission today is to explore PostHog. The goal is, well, pretty simple.

0:33

Explain what this

0:34

heavily starred open source platform actually is, why it makes this bold claim of

0:39

being an all-in-one

0:40

solution, and importantly, how it simplifies life for product teams. Especially,

0:44

you know,

0:44

if you're maybe just starting out in the world of product analytics. And you can

0:47

tell it's gaining

0:48

serious momentum. I mean, our sources today are straight from their main marketing

0:52

pages, sure,

0:52

but also their GitHub repo, and that shows, what, 29.8 thousand stars and 2,000 forks.

0:58

Wow. Yeah, that kind of traction doesn't just happen by accident. It really signals

1:02

they're

1:02

solving a, well, a pretty crucial pain point for developers. Absolutely. So the big

1:06

picture,

1:07

the core idea, is that PostHog aims to be the single source of truth for building

1:12

successful

1:12

products. It rolls together product analytics, session recording, feature flagging,

1:17

A-B testing,

1:17

all those separate tools that product engineers usually have to kind of cobble

1:22

together.

1:22

Right. And one open source bundle.

1:24

Exactly. Okay, so here's where it gets really interesting for me. You know, every

1:29

tech stack

1:29

eventually hits that problem of tool sprawl, right? One service for errors,

1:34

another for feature flags, maybe a third for user analytics. Why did PostHog decide

1:40

to consolidate

1:40

everything? What's the philosophy behind that? Well, what's fascinating here is it

1:45

feels like

1:46

a core philosophical shift. They actually define PostHog as a product OS, like an

1:51

operating system,

1:52

but for your product data. A product OS, okay. Yeah, and they recognize that if you're

1:55

a developer,

1:56

pretty much every decision you make, you know, should I fix this bug? Should we

1:59

launch that

2:00

feature? It all requires context. The key insight driving this whole thing seems to

2:04

be that developers

2:05

should operate from the full set of data, not just like a siloed slice of what's

2:09

happening inside

2:10

their app. Right, we've all felt that pain of disconnected data silos. You know, if

2:14

I'm trying

2:14

to debug something critical, I need the error log, sure, but I also need to see the

2:19

user's click path

2:20

and maybe know if they're a paying customer. It all matters. Exactly. So what does

2:25

this full set of data

2:27

actually mean in PostHog's world? Technically speaking, I mean. It means basically

2:32

breaking

2:32

down those walls between the different operational bits of the business and the

2:36

product engineering

2:37

side. So for instance, PostHog is designed to integrate data that happens outside

2:41

the application

2:42

itself, but still really defines that customer experience. Okay, like what? Things

2:47

like financial

2:48

data, maybe payments tracked in Stripe, or exceptions captured by a separate error

2:52

tracking tool,

2:53

or even support tickets logged in something like Zendesk or Salesforce. Okay, wait

2:57

a second. If I'm

2:59

a developer, that sounds fantastic for making decisions, definitely. But I have to

3:03

ask,

3:03

isn't consolidating financial data, customer service history, and every single user

3:10

click

3:10

into one potentially open source platform? Isn't that a pretty significant security

3:15

and

3:15

maintenance challenge? That's a lot of critical data in one basket. That's a really

3:20

important

3:20

question, yeah. And it speaks directly to their engineering approach, I think.

3:24

Their argument

3:24

seems to be that managing one highly secure integrated system is actually less

3:30

risky,

3:30

potentially, than constantly syncing sensitive data between, say, half a dozen

3:35

separate vendors.

3:36

Oh, okay. Each with its own authentication, its own compliance quirks. By

3:39

integrating these pipes,

3:40

they have this data pipelines and warehouse component we can talk about. They

3:43

create a kind

3:44

of unified identity across the data. I see. Which lets you do things like say, okay,

3:48

show me all

3:49

users who clicked feature X, had error Y last Tuesday, and submitted a high

3:52

priority support

3:53

ticket last week. That kind of visibility, well, it allows for much more informed

3:58

decisions.

3:58

It sounds like the real value isn't just having the tools in one place, but it's

4:03

that single,

4:04

unfragmented view of the entire user journey from payment to frustration to

4:10

actually using a feature.

4:12

Precisely. It shifts the focus from just reporting abstract numbers to directly

4:17

informing

4:18

actual product strategy changes. So we get the philosophy,

4:22

single source of truth, unified view. Since it is an all-in-one platform,

4:27

let's look at the toolkit itself. But maybe instead of just defining basic terms,

4:31

because our listeners probably know what A-B testing is, let's focus on the synergy.

4:35

How do these tools actually work together in this unified environment?

4:38

That's the critical difference, right? Because in a traditional stack, you might

4:41

run an A-B test in,

4:42

say, tool A, but then you've got to manually pipe those results over to tool B,

4:47

your main analytics

4:48

platform, just to see the full impact. Yeah, that's always a pain.

4:51

Here, the idea is the tools share the same underlying event data immediately. So

4:56

take

4:56

product analytics combined with feature flags. Let's say you set up a feature flag.

5:00

Maybe you're

5:00

rolling out a new checkout button to just 10% of your users. Since the flag is

5:04

native to the

5:05

platform, that 10% cohort is just automatically tracked by the analytics engine. No

5:10

extra setup.

5:11

Then, maybe the next day, you decide, hey, let's turn this into a proper experiment,

5:16

an A-B test.

5:17

You don't have to redefine the cohort. You don't have to re-instrument any code.

5:21

The data is already

5:21

flowing right into the experiments tool. That allows for immediate statistical

5:25

analysis on, say,

5:27

conversion rate changes. Okay, that dramatically cuts down on the

5:30

operational lag and the engineering overhead, I imagine. What about the tools

5:35

focused more on

5:36

like product stability, errors and replays? Yeah, look at session replays and error

5:42

tracking.

5:42

Session replays, for those who haven't used them, are like watching a screen

5:46

recording of a real

5:46

user session. Right, super useful.

5:48

In the traditional model, if a user reports some weird issue, you might see the

5:52

error log in tool

5:53

A, but you have zero visual context. You're just guessing. With post hog, the idea

5:58

is if an error

5:59

pops up in the error tracking section, the developer can immediately click and jump

6:03

straight to the

6:04

session replay that's tied to that exact moment the error occurred. Oh, wow. Okay,

6:08

so instead of

6:08

trying to reproduce a bug based on just a stack trace, which can be impossible

6:13

sometimes, I can

6:14

actually watch the user encounter the bug live, see exactly what steps they took,

6:18

maybe check the

6:19

corresponding network logs right there alongside the replay. That sounds like, well,

6:23

like a huge

6:24

improvement in debugging time. It absolutely can be. Yeah. And kind of feeding into

6:28

all of this

6:29

is the data warehouse pipelines capability. This is sort of the heavy lifting

6:33

engine behind the

6:33

scenes. It's what ingests that external data, we talked about Stripe, HubSpot,

6:37

whatever,

6:38

and lets you run custom transformations on it. Okay. And critically, it then lets

6:42

you send that

6:43

enriched unified data stream out to, I think they say 25 plus other downstream

6:49

tools. So it ensures

6:50

post hoc remains flexible, even if your stack evolves later. That makes sense. Keeps

6:55

it from

6:55

being too much of a closed box. Finally, we should probably talk about LLM

6:59

analytics. You know,

7:00

AI powered applications are becoming huge, but they bring this whole new layer of

7:04

operational

7:05

complexity. How does post hoc handle that? This is where they really seem to be

7:08

looking ahead.

7:09

For applications using large language models, they're capturing specialized metrics

7:14

that are

7:14

pretty vital for operations. Things like API call traces, token generation counts,

7:21

latency for the model responses, and even the actual cost per query. Oh,

7:25

interesting. Tracking

7:27

the cost directly. Yeah. It's an essential, quite specialized layer for measuring

7:32

the efficiency

7:32

and performance of these AI driven products. And it connects that raw engineering

7:37

cost

7:37

directly to the user behavior metrics you're seeing in the main analytics dashboard.

7:42

It feels like a key feature that frankly, a lot of other consolidated platforms are

7:47

probably only

7:47

just starting to think about. Yeah, that definitely feels ahead of the curve. Okay,

7:50

that is a seriously

7:52

ambitious suite of tools all bundled together. It almost sounds like the kind of

7:55

infrastructure

7:55

only, you know, major enterprises could typically afford or manage. So what does

8:00

this all mean for

8:01

someone just wanting to get started? Is this actually accessible to small teams or

8:05

is there

8:05

a steep barrier to entry? This is where their open source roots and their pricing

8:10

philosophy really

8:11

come into play and make a difference. They basically offer two main paths. The

8:15

recommended

8:16

option is post hog cloud. Right, the hosted version. Exactly. It's engineered for

8:20

speed,

8:21

reliability, basically zero maintenance for you. And crucially, they offer an

8:25

extremely generous

8:26

free tier. How generous are we talking here? Like for a startup or maybe just an

8:31

individual

8:31

developer exploring the platform? We're talking generous to the point where they

8:36

state that 98%

8:37

of their customers currently use it entirely for free. Wow, 98%. Yep. Every month

8:42

you get,

8:43

let's see, 1 million events for product analytics, 5,000 session recordings, 1

8:47

million feature flag

8:48

requests, 100,000 exceptions tracked for errors, and 1,500 survey responses. All

8:53

free every month.

8:54

Okay. That is a huge operational allowance for a startup or a small team. That's

8:58

pretty impressive.

9:00

Pricing only kicks in after you hit those limits. Correct. Usage based after the

9:04

free tier. Got it.

9:06

And what about the pure open source experience? You know, for teams who really

9:11

insist on absolute

9:13

data sovereignty and want to self-host everything? That option is definitely there.

9:18

The ability to

9:18

deploy a kind of hobby instance with just a single line of code using Docker on

9:24

Linux exists. However,

9:26

and they're quite transparent about this too, the source is generally advised that

9:29

this setup is

9:30

really only recommended up to maybe about 100,000 events per month. Ah, okay. So it

9:35

has limitations

9:36

at scale. Yeah, if you scale beyond that, they strongly suggest migrating to their

9:40

cloud

9:40

infrastructure simply because managing the underlying data warehouse click house,

9:45

I believe for massive scale, is a really non-trivial engineering task. And well,

9:50

they handle that best. Makes sense. Now let's talk about their culture for a second

9:53

because this is

9:54

often overlooked, but I think it's pretty central to their whole appeal, especially

9:58

to developers.

9:59

They don't just open source their code. Right. They seem to open source their

10:02

entire company.

10:03

Wait, what do you mean? Like their internal processes? What does that actually

10:05

involve?

10:06

Well, it's pretty extreme transparency. They open source their entire company handbook.

10:11

It details their strategy, how they make decisions, their ways of working, internal

10:16

processes.

10:16

Seriously, their strategy docs are public. Yeah. And look, it's probably not just a

10:22

gimmick. It

10:23

seems to build this profound trust with the developer community who aren't just

10:27

users,

10:27

but often contributors too. It's like a powerful statement. Look, if we're this

10:31

transparent about

10:32

how we operate internally, you could probably trust us to handle your mission

10:35

critical data

10:36

responsibly. That's bold. And I guess that transparency extends directly to their

10:41

pricing

10:42

philosophy too, which sounds like a pretty sharp contrast to standard sauce

10:46

practices.

10:47

Absolutely. Their pricing is completely usage based, like we said, and they have an

10:51

explicit

10:51

no sales call approach. You don't have to talk to anyone to get started or figure

10:55

out costs.

10:57

Nice. You can calculate your exact cost based on totally transparent per unit

11:02

pricing, like that

11:03

zero, zero, zero, zero, zero, five per event number after the free tier limit.

11:09

There are no hidden

11:09

tiers, no complex enterprise negotiations just to figure out what you'll owe. They

11:14

explicitly aim

11:15

to be the cheapest option at scale. And this radically transparent model is kind of

11:19

how they

11:19

try to prove it. Okay. That's refreshing. Finally, let's just circle back quickly

11:23

to the future.

11:24

Beyond the LLM analytics tool they already offer, what are their teams apparently

11:28

focused on in

11:29

terms of integrating AI within the platform itself? Well, the roadmap seems to

11:33

suggest they're

11:34

shifting the goal beyond just measurement towards more active automation. They're

11:38

apparently working

11:39

on AI features to automate some of the more monotonous analysis tasks, maybe

11:43

summarize

11:44

complex user journey information automatically. Okay. And perhaps more ambitiously

11:48

integrating AI

11:50

into the actual development workflow. Like having the platform eventually suggest

11:54

or maybe even make

11:55

code changes to fix bugs, it identifies automatically. Yeah, definitely an

11:59

evolution from just being a

12:00

data collector to potentially be more like a product co-pilot. That's quite a

12:04

vision. So I

12:05

guess the key takeaway here is that this isn't really just one tool or even just a

12:09

collection

12:09

of tools. It's more like a unified, very open and incredibly transparent ecosystem

12:15

designed to give

12:16

product engineers that single source of truth, hopefully eliminating some of the

12:21

guesswork and

12:22

maybe even the security risks that come from having all these disparate data

12:26

sources. Indeed.

12:27

And maybe for a final provocative thought, the sources mentioned their unique

12:33

culture,

12:34

including this one time they apparently sent customers a floppy disk that turned

12:38

out to be

12:38

a rickroll. Okay. Which is fun, sure. But underlying that is that transparency we

12:42

talked about sharing

12:43

their strategy, their sales manual, their inner workings. That's what feels really

12:47

foundational.

12:48

And the question for you, the listener might be, if a company is that open about

12:52

how they operate

12:53

internally, how much does that influence your trust when you're deciding whether to

12:57

build your

12:57

own mission critical product on their platform? That openness, maybe it isn't just

13:01

a feature,

13:02

maybe it's a kind of security and confidence guarantee in itself. That's a great

13:05

question

13:06

to leave you with. Thank you for joining us for this deep dive into PostHawk. And

13:10

once again,

13:10

a big thank you to SafeServer for the support of this deep dive that helps you with

13:14

digital

13:15

We'll see you next time.

13:15

We'll see you next time.