Today's Deep-Dive: Centrifugo
Ep. 317

Today's Deep-Dive: Centrifugo

Episode description

This episode discusses Centrifugo, a powerful and scalable real-time messaging server designed to handle millions of persistent, always-on connections essential for modern instant online experiences. It explains the concept of real-time messaging as an open phone line compared to traditional request-response cycles, highlighting its use in collaborative tools, live updates, and generative AI streaming. The core mechanism is the Publish-Subscribe (PubSub) pattern, where Centrifugo acts as a user-facing PubSub server, efficiently delivering messages from a backend publisher to subscribed users. The server addresses the significant engineering challenge of maintaining millions of concurrent connections by supporting multiple transport protocols like WebSockets, SSE, and gRPC. Centrifugo was created to overcome WebSocket scalability issues and offers an open-source, self-hosted alternative to expensive third-party services, allowing developers to control their infrastructure. Written in Go for high concurrency, it boasts a language-agnostic integration model, allowing it to be easily added as a separate service to applications built in any language. The document emphasizes Centrifugo’s impressive performance metrics, including handling 1 million concurrent WebSocket connections and delivering 30 million messages per minute on modest hardware, and its ability to scale horizontally by integrating with message brokers like Redis and NATS. Advanced features such as hot message history, automatic message recovery, delta compression for bandwidth saving, and online presence information are detailed, showcasing its maturity and real-world problem-solving capabilities. Used by major companies like VK and Grafana, Centrifugo’s reliability is well-established. A PRO version offers enterprise-grade features like analytics and tracing. Ultimately, Centrifugo provides a robust solution for adding real-time features to any application, and with the rise of generative AI, such high-throughput message servers are becoming essential infrastructure.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now!

Download transcript (.srt)
0:00

Welcome back to The Deep Dive.

0:01

Today, we are tearing apart the invisible infrastructure

0:05

that powers everything you expect to be instant online.

0:09

Think about watching a live event

0:10

and seeing that comment stream just update perfectly.

0:13

Or maybe a financial ticker that never, ever misses a beat.

0:18

We're really talking about the engineering challenge

0:20

behind managing millions of those persistent,

0:23

always-on connections.

0:24

A huge challenge.

0:25

It really is.

0:26

Now, before we jump into the architecture

0:28

that makes all of that possible,

0:29

we want to give a massive shout out to the supporter

0:32

who helps us bring these deep dives to you.

0:33

That's SafeServer.

0:34

They manage high-performance hosting solutions,

0:36

you know, exactly the kind of robust setup

0:38

you need to run the software we're discussing today.

0:41

So if you are serious about hosting

0:43

and your digital transformation,

0:44

you can find out more at www.safeserver.de.

0:49

All right, let's unpack this mission.

0:51

Today, we are focusing on Centrifugo.

0:54

It's a powerful, scalable, real-time messaging server.

0:58

And our goal today is to really get beyond the buzzwords.

1:03

We want you to understand not just what Centrifugo is,

1:07

but how it solves probably the biggest hurdle

1:09

in modern applications.

1:11

Which is?

1:11

Managing all those massive, persistent online connections

1:15

without forcing you to completely rewrite

1:17

your existing application structure.

1:19

Okay, so let's start at the very beginning.

1:21

For anyone new to this,

1:22

we throw around the term real-time,

1:24

but what does that actually mean mechanically?

1:26

That's a great question.

1:27

It's really the difference between, say,

1:29

a normal web request and an open phone line.

1:32

Okay.

1:33

A traditional web request is like you ask for data,

1:35

the server gives it to you and then hangs up.

1:37

You have to call back to check for updates.

1:39

Right, the request-response cycle.

1:40

Exactly.

1:41

Real-time messaging is about creating

1:43

these interactive applications

1:44

where events or data changes

1:46

are delivered to online users instantly

1:49

with minimal, often imperceptible delay.

1:53

It's like keeping that line open.

1:55

And we used to see this in really new applications,

1:57

but now it feels like it's absolutely everywhere.

1:59

It is everywhere.

2:01

It's the backbone of collaborative tools

2:03

like Google Docs.

2:04

It drives live sports scoreboards, live comments.

2:07

So it's a multiplayer game.

2:08

Multiplayer games.

2:09

And I think most relevant for today,

2:11

it's what enables the streaming responses

2:14

from generative AI.

2:15

Ah, okay.

2:16

So when you see chat GPT typing out a response word by word.

2:20

That's it.

2:21

That low latency delivery is happening

2:23

over real-time protocols.

2:24

So if the server is keeping that connection open,

2:26

how does the information actually flow?

2:28

What's the fundamental mechanism here?

2:32

The key is a pattern called PubSubs, Publish Subscribe.

2:36

Centrifugo is specifically

2:37

what we call a user-facing PubSub server.

2:40

Okay, like a newspaper subscription.

2:42

That's the perfect analogy.

2:43

Your backend application is the publisher,

2:46

the newspaper editor.

2:47

When an event happens, a score changes, a new message,

2:50

it publishes that one update to Centrifugo.

2:52

And Centrifugo is the delivery service.

2:55

Exactly, it automatically delivers that message

2:57

to every single online user who is subscribed to that topic.

3:00

That makes the logic sound simple for the backend,

3:03

but the real hard work, the complexity,

3:05

that's all in maintaining the connection, right?

3:07

The transport layer.

3:08

That's where the magic is.

3:10

Maintaining millions of concurrent connections

3:13

is incredibly demanding.

3:15

Centrifugo handles all that complexity

3:17

by supporting multiple transport protocols.

3:19

So most people probably know WebSocket.

3:22

Right, WebSocket is the big one.

3:24

True bi-directional, low-latency communication,

3:27

but it also supports things like HTTP streaming,

3:30

server-sent events, or SSE.

3:33

Which are more for dashboards and things, right?

3:35

One-way updates.

3:36

Precisely.

3:37

And then you have gRPC and the newer

3:39

high-performance web transport.

3:41

It manages connections across all of these

3:43

so your main application doesn't have to think about it.

3:45

Which brings us directly to why this thing was even created.

3:48

The source material says its mission

3:50

was to wash away WebSocket scalability issues.

3:53

And that's the pain point.

3:54

So many developers hit a brick wall trying

3:56

to scale persistent connections.

3:58

Yeah, you either face huge headaches and costs,

4:01

or you get locked into an expensive third-party service.

4:04

Right, like Pusher or PubNub or Ably.

4:07

Centrifugo positioned itself as the open source,

4:10

and this is key, self-hosted alternative.

4:14

You get that enterprise performance,

4:15

but you control your own infrastructure.

4:17

And it's written in Go.

4:18

Why is that so important for a server like this?

4:21

Oh, it's critical.

4:23

Go was literally designed for this kind of work,

4:25

for high concurrency networking.

4:27

It can handle millions of connections

4:29

using these lightweight things called Go routines

4:32

without just drowning in memory and CPU usage.

4:35

What I find really interesting is the integration model.

4:38

It's totally language agnostic.

4:40

Yes.

4:40

Your main app could be in Python, Java, anything.

4:44

Centrifugo just sits next to it as a separate service.

4:46

And that architectural decoupling is the whole point.

4:49

You're isolating the hardest problem, the real-time transport,

4:52

into a dedicated high-performance box.

4:55

You just tell it, hey, publish this message,

4:57

and it handles the rest.

4:58

So you don't have to touch your core business logic at all.

5:01

Not at all.

5:01

And getting started, the sources make it sound almost trivial.

5:05

It's designed to be.

5:06

I mean, you can get it running in seconds

5:08

with a single Docker command, something

5:10

like Docker runs Centrifugo.

5:12

That low barrier to entry for such a powerful tool

5:15

is a huge advantage.

5:16

OK, let's talk raw power, then.

5:18

The performance metrics in the sources

5:20

are, frankly, pretty eye-opening.

5:22

They really are.

5:24

This thing shows serious industrial strength.

5:26

It's been proven to handle 1 million concurrent web socket

5:30

connections.

5:31

A million connections.

5:32

While delivering 30 million messages per minute.

5:35

And that's on hardware that's comparable to a single modern

5:39

server.

5:39

That is just immense.

5:40

And its design really focuses on that broadcast capability.

5:44

Absolutely.

5:45

It excels at that one-to-many scenario.

5:47

Think of a breaking news alert.

5:49

One piece of info needs to instantly hit millions of users.

5:52

Centrifugo is optimized for that.

5:54

Of course, one server can't handle a truly global scale.

5:57

You need to scale out horizontally.

5:58

Right.

5:59

So how does Centrifugo achieve that?

6:00

How do you coordinate connections

6:02

across, say, dozens of servers?

6:04

That's where it integrates with external brokers.

6:06

You can run multiple Centrifugo nodes,

6:09

and they all communicate through a reliable message broker.

6:12

So the broker isn't talking to the user.

6:14

It's just for the Centrifugo servers to talk to each other.

6:16

Exactly.

6:17

The broker is the central bulletin board.

6:20

One node gets a message, posts it to the broker,

6:23

and all the other nodes pick it up and deliver it

6:25

to their connected users.

6:26

It lets you scale out almost effortlessly.

6:29

And what brokers does it support?

6:31

The big ones.

6:32

Redis, of course, and all its high-performance variants

6:35

like AWS, ElastiCache, DragonflyDB, Valky,

6:38

and also NATS.

6:39

These are all battle-tested systems for this kind of work.

6:42

What's fascinating is that they didn't just stop at speed.

6:45

They layered on these advanced features

6:47

that really solve real-world user experience problems.

6:51

Yeah, this shows the maturity of the platform.

6:53

You get flexible authentication with JWT, of course,

6:56

but the critical features are all about data management.

6:58

Let's talk about that,

6:59

because nobody tolerates gaps in their sheet.

7:01

Exactly, and that's where you get features

7:03

like hot message history and automatic message recovery.

7:06

So if my train goes through a tunnel and I disconnect

7:08

for a second.

7:09

When you reconnect,

7:11

Centrifugo automatically checks the history buffer

7:13

it keeps for that channel

7:15

and instantly fills in any messages you missed.

7:18

It's seamless for the user.

7:19

That automatic recovery is a huge win,

7:23

prevents so much user frustration.

7:25

Another one is Delta compression.

7:27

This is really clever.

7:28

Okay.

7:29

Imagine you're streaming a dashboard

7:31

with hundreds of changing numbers.

7:33

Instead of sending the full data payload every second,

7:36

it only calculates and sends the changes,

7:38

the Delta from the last update.

7:40

So you're sending a few hundred bytes

7:41

instead of a hundred kilobytes.

7:43

That's a massive bandwidth saver.

7:45

It's huge.

7:45

And finally, for the application side,

7:47

you get online presence information.

7:50

Knowing who is currently in a channel

7:51

with join and leave notifications

7:53

without having to pull your database constantly.

7:56

That feature list really proves this is battle tested stuff.

7:59

Which brings us to reliability.

8:01

Centrifugo started a decade ago.

8:03

It's mature.

8:03

Oh yeah.

8:04

It's an option by massive companies.

8:06

We're talking about VK, Badoo, ManyChat, OpenWeb,

8:09

even Grafana.

8:11

Services where lag is just not an option.

8:13

That has to give developers a lot of confidence.

8:15

And there are great testimonials too,

8:17

like from Victor Pontus at Luma.

8:19

He said it's been incredibly easy to use and reliable.

8:22

And if you want to see its speed in action,

8:24

there's a demo where it streams telemetry data

8:27

from the Assetto Corsa Racing Simulator

8:29

to a Grafana dashboard at 60 updates per second.

8:34

60 hertz.

8:34

A 60 hertz update rate.

8:36

That's the kind of responsiveness

8:38

modern data apps need.

8:39

Now, it's open source, but there's also

8:42

an enterprise option, Centrifugo PRO.

8:45

What does that bring to the table?

8:46

The PRO version basically takes that high-performance engine

8:50

and wraps it with the features that large organizations need.

8:53

It's all about observability and management.

8:55

So things like analytics, tracing.

8:58

Exactly.

8:58

Analytics with ClickHouse, real-time tracing

9:01

for debugging, a push notification API,

9:04

and crucial SSO integrations for the web UI.

9:06

So if the open source version is the high-speed engine,

9:09

PRO is like the corporate cockpit

9:11

with all the detailed telemetry and compliance tools.

9:14

That's a perfect way to put it.

9:15

Let's big companies host this critical infrastructure

9:18

themselves, but still get the management tools they need.

9:21

So after diving deep into all this,

9:24

what's the ultimate takeaway for you, the listener?

9:26

For me, what's fascinating is that Centrifugo

9:29

offers this robust, high-performance,

9:32

and critically self-hosted solution.

9:35

It lets you decouple that hardest piece of scaling,

9:37

the connection management, and add real-time features

9:40

to any application using proven, mature software.

9:43

So if you're even thinking about scaling real-time features,

9:46

a messenger, a data stream, whatever it is,

9:48

this shows that you can own that complex transport layer

9:51

and gain massive, proven performance.

9:54

And let's end on a provocative thought.

9:55

The sources mentioned using this for streaming AI responses.

9:58

Right.

9:59

Just consider the absolute explosion of generative AI.

10:02

All of it relies on instant character-by-character output

10:05

streams.

10:06

That makes these specialized high-throughput message servers

10:09

not just a nice-to-have, but absolutely essential

10:12

infrastructure for the future of any app using AI.

10:16

Infrastructure is always the key.

10:18

A big thank you once again to our supporter, Safe Server,

10:21

for powering this deep dive.

10:23

They really understand the demands of running

10:25

complex software like this.

10:26

You can learn more about how they can help

10:28

manage your next deployment at www.safeserver.de.

10:33

See you soon.

10:33

See you soon.