Today's Deep-Dive: Centrifugo

0:00

Welcome back to The Deep Dive.

0:01

Today, we are tearing apart the invisible infrastructure

0:05

that powers everything you expect to be instant online.

0:09

Think about watching a live event

0:10

and seeing that comment stream just update perfectly.

0:13

Or maybe a financial ticker that never, ever misses a beat.

0:18

We're really talking about the engineering challenge

0:20

behind managing millions of those persistent,

0:23

always-on connections.

0:24

A huge challenge.

0:25

It really is.

0:26

Now, before we jump into the architecture

0:28

that makes all of that possible,

0:29

we want to give a massive shout out to the supporter

0:32

who helps us bring these deep dives to you.

0:33

That's SafeServer.

0:34

They manage high-performance hosting solutions,

0:36

you know, exactly the kind of robust setup

0:38

you need to run the software we're discussing today.

0:41

So if you are serious about hosting

0:43

and your digital transformation,

0:44

you can find out more at www.safeserver.de.

0:49

All right, let's unpack this mission.

0:51

Today, we are focusing on Centrifugo.

0:54

It's a powerful, scalable, real-time messaging server.

0:58

And our goal today is to really get beyond the buzzwords.

1:03

We want you to understand not just what Centrifugo is,

1:07

but how it solves probably the biggest hurdle

1:09

in modern applications.

1:11

Which is?

1:11

Managing all those massive, persistent online connections

1:15

without forcing you to completely rewrite

1:17

your existing application structure.

1:19

Okay, so let's start at the very beginning.

1:21

For anyone new to this,

1:22

we throw around the term real-time,

1:24

but what does that actually mean mechanically?

1:26

That's a great question.

1:27

It's really the difference between, say,

1:29

a normal web request and an open phone line.

1:32

Okay.

1:33

A traditional web request is like you ask for data,

1:35

the server gives it to you and then hangs up.

1:37

You have to call back to check for updates.

1:39

Right, the request-response cycle.

1:40

Exactly.

1:41

Real-time messaging is about creating

1:43

these interactive applications

1:44

where events or data changes

1:46

are delivered to online users instantly

1:49

with minimal, often imperceptible delay.

1:53

It's like keeping that line open.

1:55

And we used to see this in really new applications,

1:57

but now it feels like it's absolutely everywhere.

1:59

It is everywhere.

2:01

It's the backbone of collaborative tools

2:03

like Google Docs.

2:04

It drives live sports scoreboards, live comments.

2:07

So it's a multiplayer game.

2:08

Multiplayer games.

2:09

And I think most relevant for today,

2:11

it's what enables the streaming responses

2:14

from generative AI.

2:15

Ah, okay.

2:16

So when you see chat GPT typing out a response word by word.

2:20

That's it.

2:21

That low latency delivery is happening

2:23

over real-time protocols.

2:24

So if the server is keeping that connection open,

2:26

how does the information actually flow?

2:28

What's the fundamental mechanism here?

2:32

The key is a pattern called PubSubs, Publish Subscribe.

2:36

Centrifugo is specifically

2:37

what we call a user-facing PubSub server.

2:40

Okay, like a newspaper subscription.

2:42

That's the perfect analogy.

2:43

Your backend application is the publisher,

2:46

the newspaper editor.

2:47

When an event happens, a score changes, a new message,

2:50

it publishes that one update to Centrifugo.

2:52

And Centrifugo is the delivery service.

2:55

Exactly, it automatically delivers that message

2:57

to every single online user who is subscribed to that topic.

3:00

That makes the logic sound simple for the backend,

3:03

but the real hard work, the complexity,

3:05

that's all in maintaining the connection, right?

3:07

The transport layer.

3:08

That's where the magic is.

3:10

Maintaining millions of concurrent connections

3:13

is incredibly demanding.

3:15

Centrifugo handles all that complexity

3:17

by supporting multiple transport protocols.

3:19

So most people probably know WebSocket.

3:22

Right, WebSocket is the big one.

3:24

True bi-directional, low-latency communication,

3:27

but it also supports things like HTTP streaming,

3:30

server-sent events, or SSE.

3:33

Which are more for dashboards and things, right?

3:35

One-way updates.

3:36

Precisely.

3:37

And then you have gRPC and the newer

3:39

high-performance web transport.

3:41

It manages connections across all of these

3:43

so your main application doesn't have to think about it.

3:45

Which brings us directly to why this thing was even created.

3:48

The source material says its mission

3:50

was to wash away WebSocket scalability issues.

3:53

And that's the pain point.

3:54

So many developers hit a brick wall trying

3:56

to scale persistent connections.

3:58

Yeah, you either face huge headaches and costs,

4:01

or you get locked into an expensive third-party service.

4:04

Right, like Pusher or PubNub or Ably.

4:07

Centrifugo positioned itself as the open source,

4:10

and this is key, self-hosted alternative.

4:14

You get that enterprise performance,

4:15

but you control your own infrastructure.

4:17

And it's written in Go.

4:18

Why is that so important for a server like this?

4:21

Oh, it's critical.

4:23

Go was literally designed for this kind of work,

4:25

for high concurrency networking.

4:27

It can handle millions of connections

4:29

using these lightweight things called Go routines

4:32

without just drowning in memory and CPU usage.

4:35

What I find really interesting is the integration model.

4:38

It's totally language agnostic.

4:40

Yes.

4:40

Your main app could be in Python, Java, anything.

4:44

Centrifugo just sits next to it as a separate service.

4:46

And that architectural decoupling is the whole point.

4:49

You're isolating the hardest problem, the real-time transport,

4:52

into a dedicated high-performance box.

4:55

You just tell it, hey, publish this message,

4:57

and it handles the rest.

4:58

So you don't have to touch your core business logic at all.

5:01

Not at all.

5:01

And getting started, the sources make it sound almost trivial.

5:05

It's designed to be.

5:06

I mean, you can get it running in seconds

5:08

with a single Docker command, something

5:10

like Docker runs Centrifugo.

5:12

That low barrier to entry for such a powerful tool

5:15

is a huge advantage.

5:16

OK, let's talk raw power, then.

5:18

The performance metrics in the sources

5:20

are, frankly, pretty eye-opening.

5:22

They really are.

5:24

This thing shows serious industrial strength.

5:26

It's been proven to handle 1 million concurrent web socket

5:30

connections.

5:31

A million connections.

5:32

While delivering 30 million messages per minute.

5:35

And that's on hardware that's comparable to a single modern

5:39

server.

5:39

That is just immense.

5:40

And its design really focuses on that broadcast capability.

5:44

Absolutely.

5:45

It excels at that one-to-many scenario.

5:47

Think of a breaking news alert.

5:49

One piece of info needs to instantly hit millions of users.

5:52

Centrifugo is optimized for that.

5:54

Of course, one server can't handle a truly global scale.

5:57

You need to scale out horizontally.

5:58

Right.

5:59

So how does Centrifugo achieve that?

6:00

How do you coordinate connections

6:02

across, say, dozens of servers?

6:04

That's where it integrates with external brokers.

6:06

You can run multiple Centrifugo nodes,

6:09

and they all communicate through a reliable message broker.

6:12

So the broker isn't talking to the user.

6:14

It's just for the Centrifugo servers to talk to each other.

6:16

Exactly.

6:17

The broker is the central bulletin board.

6:20

One node gets a message, posts it to the broker,

6:23

and all the other nodes pick it up and deliver it

6:25

to their connected users.

6:26

It lets you scale out almost effortlessly.

6:29

And what brokers does it support?

6:31

The big ones.

6:32

Redis, of course, and all its high-performance variants

6:35

like AWS, ElastiCache, DragonflyDB, Valky,

6:38

and also NATS.

6:39

These are all battle-tested systems for this kind of work.

6:42

What's fascinating is that they didn't just stop at speed.

6:45

They layered on these advanced features

6:47

that really solve real-world user experience problems.

6:51

Yeah, this shows the maturity of the platform.

6:53

You get flexible authentication with JWT, of course,

6:56

but the critical features are all about data management.

6:58

Let's talk about that,

6:59

because nobody tolerates gaps in their sheet.

7:01

Exactly, and that's where you get features

7:03

like hot message history and automatic message recovery.

7:06

So if my train goes through a tunnel and I disconnect

7:08

for a second.

7:09

When you reconnect,

7:11

Centrifugo automatically checks the history buffer

7:13

it keeps for that channel

7:15

and instantly fills in any messages you missed.

7:18

It's seamless for the user.

7:19

That automatic recovery is a huge win,

7:23

prevents so much user frustration.

7:25

Another one is Delta compression.

7:27

This is really clever.

7:28

Okay.

7:29

Imagine you're streaming a dashboard

7:31

with hundreds of changing numbers.

7:33

Instead of sending the full data payload every second,

7:36

it only calculates and sends the changes,

7:38

the Delta from the last update.

7:40

So you're sending a few hundred bytes

7:41

instead of a hundred kilobytes.

7:43

That's a massive bandwidth saver.

7:45

It's huge.

7:45

And finally, for the application side,

7:47

you get online presence information.

7:50

Knowing who is currently in a channel

7:51

with join and leave notifications

7:53

without having to pull your database constantly.

7:56

That feature list really proves this is battle tested stuff.

7:59

Which brings us to reliability.

8:01

Centrifugo started a decade ago.

8:03

It's mature.

8:03

Oh yeah.

8:04

It's an option by massive companies.

8:06

We're talking about VK, Badoo, ManyChat, OpenWeb,

8:09

even Grafana.

8:11

Services where lag is just not an option.

8:13

That has to give developers a lot of confidence.

8:15

And there are great testimonials too,

8:17

like from Victor Pontus at Luma.

8:19

He said it's been incredibly easy to use and reliable.

8:22

And if you want to see its speed in action,

8:24

there's a demo where it streams telemetry data

8:27

from the Assetto Corsa Racing Simulator

8:29

to a Grafana dashboard at 60 updates per second.

8:34

60 hertz.

8:34

A 60 hertz update rate.

8:36

That's the kind of responsiveness

8:38

modern data apps need.

8:39

Now, it's open source, but there's also

8:42

an enterprise option, Centrifugo PRO.

8:45

What does that bring to the table?

8:46

The PRO version basically takes that high-performance engine

8:50

and wraps it with the features that large organizations need.

8:53

It's all about observability and management.

8:55

So things like analytics, tracing.

8:58

Exactly.

8:58

Analytics with ClickHouse, real-time tracing

9:01

for debugging, a push notification API,

9:04

and crucial SSO integrations for the web UI.

9:06

So if the open source version is the high-speed engine,

9:09

PRO is like the corporate cockpit

9:11

with all the detailed telemetry and compliance tools.

9:14

That's a perfect way to put it.

9:15

Let's big companies host this critical infrastructure

9:18

themselves, but still get the management tools they need.

9:21

So after diving deep into all this,

9:24

what's the ultimate takeaway for you, the listener?

9:26

For me, what's fascinating is that Centrifugo

9:29

offers this robust, high-performance,

9:32

and critically self-hosted solution.

9:35

It lets you decouple that hardest piece of scaling,

9:37

the connection management, and add real-time features

9:40

to any application using proven, mature software.

9:43

So if you're even thinking about scaling real-time features,

9:46

a messenger, a data stream, whatever it is,

9:48

this shows that you can own that complex transport layer

9:51

and gain massive, proven performance.

9:54

And let's end on a provocative thought.

9:55

The sources mentioned using this for streaming AI responses.

9:58

Right.

9:59

Just consider the absolute explosion of generative AI.

10:02

All of it relies on instant character-by-character output

10:05

streams.

10:06

That makes these specialized high-throughput message servers

10:09

not just a nice-to-have, but absolutely essential

10:12

infrastructure for the future of any app using AI.

10:16

Infrastructure is always the key.

10:18

A big thank you once again to our supporter, Safe Server,

10:21

for powering this deep dive.

10:23

They really understand the demands of running

10:25

complex software like this.

10:26

You can learn more about how they can help

10:28

manage your next deployment at www.safeserver.de.

10:33

See you soon.

10:33

See you soon.

Today's Deep-Dive: Centrifugo

Episode description

Persons