Today's Deep-Dive: Garage

0:00

Welcome back to the Deep Dive. If you need to get up speed quickly on complex

0:05

research,

0:06

technical stuff, or what's moving in an industry, you're definitely in the right

0:10

place.

0:10

We go deep into the source material so you get the essential informed perspective.

0:15

Now, before we unpack today's sources, we really want to give a huge thank you to

0:19

our supporter,

0:20

SafeServer. SafeServer handles the hosting for exactly this kind of, you know,

0:24

crucial distributed

0:25

software. They're dedicated to supporting you on your digital transformation

0:28

journey.

0:29

You can find out more info and resources over at www.SafeServer.de.

0:35

So today, we're focusing on something that sounds pretty technical, but it's

0:38

actually

0:38

quite democratizing in the end. Distributed object storage. We're taking a deep

0:44

dive into

0:44

a specific open source project called Garage. Yeah, Garage. It basically aims to

0:48

take the kind

0:48

of resilience you see with, like, the massive cloud giants. Like Amazon, Google.

0:53

Exactly.

0:53

Yeah. And put that power directly into the hands of, you know, self-hosted users,

0:57

small businesses,

0:58

that kind of thing. Okay. So our sources really lay out a fascinating path for how

1:03

you can get

1:03

world-class data redundancy, but without needing a world-class budget. Right. The

1:09

challenge we're

1:10

sort of tackling for you, the listener, is this. How do you get data storage that's,

1:15

well, practically

1:16

indestructible, even if the servers you're running it on are a bit flaky or old or

1:21

even spread out

1:21

across totally different physical locations? That's the mission, then. To demystify

1:26

this

1:26

geo-distributed storage idea, look at the pretty cutting-edge theory Garage is

1:31

built on. And show

1:32

how this specific solution is really tailored for the, let's say, small to medium

1:36

operator,

1:37

not the huge enterprises. Got it. So the key takeaway right from the start. The

1:42

core nugget

1:42

you need is that Garage is open source. It's S3 compatible. We'll get into why that

1:46

matters.

1:47

And it's an object store specifically designed for geo-distributed setups. Right.

1:51

It's fundamentally

1:52

about bringing that sort of enterprise level availability down to environments

1:56

where, frankly,

1:57

failure is kind of expected sometimes, not this rare anomaly. Okay, let's start

2:00

with the basics

2:01

then. Because object storage itself, it's a term people hear but maybe don't fully

2:06

get compared to

2:08

you know, the normal file storage on their computer, or maybe block storage they've

2:12

heard about.

2:12

Yeah, good place to start. If I'm just saving a document or maybe doing a backup,

2:17

why would I even think about object storage? Exactly. Well, it really comes down to

2:20

scale

2:21

and flexibility. Think of your traditional file storage like a very rigid filing

2:27

cabinet. You've

2:28

got folders inside folders, a strict hierarchy. Okay. Block storage is even lower

2:32

level, dealing

2:33

with the actual chunks on a hard drive. Yeah. Object storage though, it's totally

2:36

different.

2:37

It's flat. A flat. Yeah, every piece of data could be a video file, a log entry, a

2:42

database backup,

2:43

whatever. It's treated as its own distinct self-contained object. Ah, okay. And

2:48

each object

2:49

gets tagged with rich metadata information about the object, and it gets a unique

2:54

ID. This flat

2:56

structure makes it almost infinitely scalable. Right. Which is why it's the main

3:00

method used by

3:00

companies like, say, Netflix for streaming video or Amazon S3 itself for just

3:05

massive archives of

3:06

data. And Garage ties into that directly. You said its identity is a distributed

3:11

object storage

3:12

service that emphasizes S3 compatibility. Why is being S3 compatible so critical?

3:18

Oh, it's huge

3:19

because Amazon's simple storage service, S3, is basically the universal language,

3:25

the lingua franca

3:27

of cloud storage. Right. Everyone uses it or integrates with it. Exactly. Yeah. So

3:31

by

3:31

implementing the exact same API, the way software talks to the storage garage just

3:36

wipes out the

3:37

biggest hurdle to using it. How so? Well, if you're already using some software for

3:41

say backups or

3:42

monitoring or maybe hosting website assets, and that software talks to AWS S3.

3:47

Which a lot of

3:47

software does. Precisely. You can literally just point that same software at your

3:51

own garage cluster

3:52

instead. No code changes needed. It just works. Wow. Okay. That makes it incredibly

3:57

attractive

3:58

then, especially for sysadmins who don't want to rebuild their entire tool chain.

4:01

Absolutely. Now,

4:02

this wasn't built by some corporate giant you mentioned who's actually behind

4:05

garage. And who

4:06

are they building it for? Yeah, it came from a group called Doofsler. They're

4:10

actually an experimental

4:11

small-scale self-hosted service provider themselves. So they built it out of their

4:16

own need.

4:16

Pretty much. They built garage because they needed it and they've been using it in

4:21

production since

4:21

back in 2020. Yeah. So it was really built by people running these kinds of small

4:26

distributed

4:27

setups for people doing the same. Makes sense. And importantly, it's entirely free

4:33

software.

4:34

It's released under the AGPL v3 license. Hold on. AGPL v3. That's a specific choice,

4:39

isn't it?

4:40

What does that license mean for someone thinking about using garage or maybe

4:43

building on it?

4:44

Yeah. It's an important detail. The Afro general public license AGPL is one of the

4:49

stronger copy

4:50

left licenses. Basically, if you modify the garage software and then run that

4:55

modified version as a

4:56

network service, like people connect to it over the internet, then you must make

5:01

the source code

5:01

of your modified version available to those users. Ah, so it prevents someone from

5:06

taking the open

5:06

source code, improving it, and then locking those improvements up in their own

5:10

private cloud service.

5:11

Exactly. It's designed to make sure that improvements made to free software,

5:16

especially when it's run as a service, flow back to the community. It keeps the

5:20

technology free and

5:20

open. That fits perfectly with this whole mission of democratizing resilient

5:25

infrastructure.

5:26

Okay, let's move on to the driving goals. Section two, I mean, why bother with all

5:31

this complexity?

5:32

If I can just plug a huge external hard drive into my server, why build distributed

5:37

storage across

5:38

multiple locations? You basically hit the nail on the head there. The core

5:41

motivations are resilience

5:43

and geographical distribution. Right. That single giant hard drive that's a single

5:47

point of failure,

5:47

power supply dies, fire, flood, whatever, your data's gone. Okay, yeah. Garage tackles

5:54

this head-on.

5:55

It's explicitly designed for clusters where the nodes, the servers are running in

6:00

different

6:00

physical locations. Could be different racks, different buildings, different cities

6:04

even.

6:04

So the idea is if one data center goes dark, maybe a power outage,

6:08

a network cut, the data is still safe and accessible somewhere else.

6:12

Precisely. High availability through geographic redundancy.

6:15

But wait, moving data across the public internet, that sounds like a recipe for

6:20

terrible latency consistency problems, doesn't it? Is dealing with that really

6:24

worth the

6:24

complexity for a small operator? That's the critical question, right?

6:28

Yeah. And it's actually what defines

6:30

Garage's whole engineering philosophy. Most big enterprise storage software assumes

6:35

you

6:36

have these perfect low latency, high bandwidth, dedicated network links between

6:40

data centers.

6:41

Which small operators usually don't have. Exactly. Garage's developers,

6:46

being sysadmins themselves running stuff over the regular internet,

6:49

just accepted that high latency, maybe up to 200 milliseconds even,

6:53

and flaky connections were just reality. So they built around that reality.

6:58

Yes. They focused the design on handling those network issues gracefully.

7:02

The goal isn't just copying data. It's keeping the service available and ensuring

7:06

the data

7:06

becomes eventually consistent, even if entire locations are temporarily offline.

7:10

So they democratized resilience by not requiring perfect, expensive networking.

7:15

That's it. They built the software to deal with the messy reality of non-enterprise

7:19

networks.

7:19

Making it actually usable. Right. They aimed to make it

7:23

operationally manageable, lightweight, prioritizing that availability, that

7:27

survival,

7:28

above almost everything else. Okay. This is where it gets really

7:30

interesting for me. Section three, the how. How does Garage actually achieve this

7:36

high resilience?

7:37

How does it handle data integrity when you've got machines failing, disks dying,

7:42

networks dropping packets, the whole potential mess?

7:46

Yeah. The secret sauce. They rely on, well, structured redundancy and concepts like

7:51

Quorum.

7:52

Garage is built to be highly resilient against pretty much everything.

7:56

Disk errors, yes, but also whole network partitions separating geographic locations.

8:01

And they mentioned something interesting. Sysadmin failures.

8:04

Ah, yeah. They specifically considered that. Because let's be honest, sometimes the

8:08

biggest

8:09

cause of an outage is human error during maintenance or configuration. The design

8:13

tries to minimize the impact of those two. Smart. Okay. Tell us about the data

8:16

replication.

8:17

How many copies of my data exist and where do they live in this system?

8:20

The magic number they landed on is three. Every chunk of data you upload gets

8:24

replicated,

8:25

stored in three distinct zones. Zones. What's a zone?

8:28

Think of a zone as a logical grouping that usually maps to a physical failure

8:33

domain.

8:33

So it can be a specific rack, a specific building, or most powerfully a specific

8:38

data center or geographic location. And importantly, a zone itself usually

8:44

consists of multiple servers for redundancy within the zone.

8:47

So three copies spread across three potentially separate geographical areas.

8:51

Correct. And the logic is classic quorum systems. To survive N1 failures, you need

8:56

N copies. Here,

8:57

N3. So you can lose up to two entire zones simultaneously.

9:02

Two whole data centers could go offline.

9:04

Could be, yeah. One is down for planned maintenance and another has a sudden

9:08

network

9:08

failure. Your data remains fully accessible and readable from that third surviving

9:12

zone.

9:13

Wow. That level of fault tolerance with potentially just three locations, that's

9:17

pretty remarkable for something aimed at smaller scale.

9:21

It really is. But achieving that reliability isn't trivial, theoretically speaking.

9:25

The sources mention garages standing on the shoulders of giants drawing from

9:28

decades of distributed systems research. Right. Let's unpack those giants. They

9:33

mention influences like Amazon's Dynamo. What key ideas did Garage borrow from

9:39

massive systems like that? This is really the core of it, I think.

9:42

The Garage team didn't necessarily invent brand new distributed systems math.

9:47

Instead, they very cleverly applied existing, proven, but often complex,

9:52

academic research to their specific scale and problem set.

9:56

Okay, so Dynamo first. What's the key principle there?

9:58

Dynamo, Amazon's highly available key value store.

10:01

Its core idea was prioritizing availability over perfect, immediate consistency.

10:07

Meaning?

10:08

Meaning if there's a network issue, like a flow link or a partition, Dynamo won't

10:11

just block the

10:12

user and say, try again later. It will likely accept a write operation, let the

10:16

user think

10:16

it succeeded, and then promise to sort out any potential conflicts later once

10:20

communication is

10:21

restored.

10:21

Uh, so it keeps working even when parts of the system can't talk to each other

10:25

properly.

10:26

Exactly. It favors being available for reads and writes over ensuring every node

10:30

has the

10:31

absolute latest data right now. This leads to the concept of eventual consistency.

10:36

Okay, eventual consistency. But how do you make sure things do eventually become

10:40

consistent

10:41

without losing data or getting corrupted when servers have seen different things?

10:45

Great question. And that brings us to the second major influence, conflict-free

10:50

replicated data

10:51

types, or CRDTs.

10:53

CRDTs? Sounds complicated.

10:55

The concept is brilliant, actually. CRDTs are special data structures designed

10:59

specifically

11:00

for this eventual consistency model. They allow multiple servers to update the same

11:04

piece of data

11:05

independently, even while disconnected from each other.

11:08

Okay.

11:08

And they have mathematical properties that guarantee, when those servers do

11:12

eventually

11:12

reconnect and share their updates, their states will merge together correctly and

11:16

automatically

11:17

without losing information and without needing slow, complex locking mechanisms to

11:22

coordinate.

11:22

Whoa. So CRDTs are like the algorithmic magic that makes eventual consistency

11:28

safe and reliable, especially over flaky internet links.

11:31

It stops data getting messed up when different nodes are out of sync for a while.

11:35

You got it. It elegantly solves the coordination headache that plagues

11:40

many traditional distributed databases when dealing with network partitions or high

11:45

latency.

11:46

It's a huge enabler for systems like Garage.

11:48

Fascinating. And there was a third influence mentioned, Maglev.

11:52

Right. Maglev. That's Google's high-performance software network load balancer.

11:56

Mentioning this shows they weren't just thinking about data storage theory,

12:01

but also the practicalities of efficiently routing requests and managing

12:05

connections within the

12:06

cluster. So handling traffic effectively to make sure requests go to the right

12:11

place quickly,

12:11

even under load. Exactly. Making sure data requests are

12:15

steered efficiently to the nearest or the healthiest node that holds the data.

12:19

It's really impressive how they've taken these heavyweight architectural ideas born

12:23

from giants

12:24

like Amazon and Google and managed to translate them into something lightweight

12:28

enough for modest

12:28

infrastructure. That's the key achievement, I think. Which brings us nicely to the

12:32

practicalities.

12:33

Section four, the low barrier to entry. If I actually want to set up this geo-distributed,

12:40

super resilient cluster, do I need to go out and buy three identical brand new

12:44

server racks?

12:45

That's probably the most compelling part for the self-hosting community or small

12:48

businesses.

12:49

The answer is a definite no. An explicit design goal was keeping the barrier to

12:53

entry low.

12:54

They actively encourage using existing or even older machines. You don't need a

12:59

supercomputer cluster, then? Not at all. The minimum requirements are honestly

13:03

quite minimal.

13:03

Like what? Per node, they suggest just one gigabyte of RAM.

13:07

One gig? Seriously? Yep. And at least 16 gigabytes of disk space.

13:11

For the CPU, basically any BI8664 processor from the last decade or so, or an ARMv7

13:17

or ARMv8 chip,

13:19

think Raspberry Pi level or similar is sufficient. That is incredibly low overhead

13:23

for a system

13:24

promising this kind of resilience. It really suggests, like you said, if you've got

13:27

a few

13:27

old office PCs lying around or maybe some cheap virtual servers scattered in

13:31

different regions.

13:32

You could genuinely start building a garage cluster. That's the core economic

13:36

appeal.

13:37

And crucially, they build it specifically to allow mixing and matching different

13:41

types of hardware.

13:42

Ah, so heterogeneous hardware is supported. Explicitly. You can combine servers

13:47

with

13:47

different CPUs, different amounts of RAM, different disk sizes within the same

13:51

cluster.

13:52

That massively simplifies things because you don't need to source expensive

13:56

identical machines. Use

13:58

what you have or what you can get cheap. That's huge for operational reality.

14:03

And the deployment is super simple, too. It ships as a single self-contained binary

14:07

file,

14:08

no complex dependencies to install. It just runs on pretty much any modern Linux

14:13

distribution.

14:14

Just copy the file and run it, basically. Pretty much. It really emphasizes that

14:17

focus on ease of operation for the sysadmin. And just circling back quickly to a

14:21

tech detail,

14:22

the main language used is Rust, right? About 95% of the code?

14:25

Correct. And Rust is known for its performance, efficiency, and especially memory

14:30

safety.

14:30

That choice directly contributes to those low resource requirements and overall

14:34

stability.

14:35

That technical excellence seems linked to its sustainability, too. You mentioned it's

14:39

not

14:39

corporate-backed, but projects like this need ongoing work. How is Garage supported?

14:45

Has it managed to find funding? Yes. And that's another critical

14:48

point. In a world often dominated by venture capital, Garage has actually secured

14:53

significant

14:54

public funding, which really signals confidence in it as a public good for the

14:59

internet.

15:00

That's really interesting. Where did this public funding come from?

15:03

It's primarily come via the European Commission's Next Generation Internet,

15:07

or NGI initiative. It's had several grants. Can you detail those?

15:10

Sure. Back in 2021-2022, the NGI Pointer Fund supported three full-time employees

15:17

working on

15:17

Garage for a whole year. Wow, three people full-time.

15:20

Yeah. Then more recently, from 2023 to 2024, the NLNet Foundation through the NGIU

15:26

and Trust Fund

15:27

supported one full-time employee. Okay.

15:29

And it's ongoing. Looking ahead to 2025, the NLNet NGI Year Commons Fund is

15:34

providing support

15:34

for the equivalent of 1.5 full-time employees. So there's a steady stream of grant

15:39

funding,

15:40

keeping this critical piece of decentralized infrastructure alive and evolving,

15:45

driven by

15:45

community needs, not profit. That seems to be the model, yes.

15:49

It keeps it open source, keeps it free, and aligned with that original mission.

15:53

Okay. So let's try and summarize the core takeaway for you, the listener. What

15:57

Garage seems to have

15:58

done is successfully bridge this gap, right, between really advanced complex

16:03

distributed

16:04

systems theory like Dynamo, CRDTs, and the practical, often hardware-constrained

16:10

reality

16:10

of self-hosters and small organizations. Exactly. It essentially offers smaller

16:15

players

16:15

the kind of power, geo-distributed, highly resilient storage that used to be pretty

16:20

much exclusively the domain of tech giants with massive budgets.

16:24

It levels the playing field in a way. It really does. And this whole project,

16:28

this whole approach, it raises a bigger, quite provocative question, I think.

16:32

When you have open-source projects, especially ones backed by public funds like NGI,

16:37

actively working to democratize access to this kind of complex, resilient

16:42

infrastructure,

16:43

what does that really mean for the future? For decentralized data ownership, for

16:48

control,

16:49

it kind of challenges the default assumption that only huge centralized

16:53

corporations

16:54

can truly guarantee the safety and availability of our important data.

16:57

As the cloud continues to consolidate around a few big players,

17:02

projects like Garage offer a different path. It's definitely something worth

17:05

thinking about.

17:06

An excellent point to end on. A powerful thought to mull over, indeed.

17:10

Hopefully, you feel much better informed now about the potential of geo-distributed

17:14

object storage,

17:15

and specifically, the Garage project. We want to once again thank our sponsor,

17:19

Safe Server, for supporting this deep dive. They support the hosting of this very

17:23

type of software,

17:24

helping with digital transformation. You can find out more about how they can

17:27

support your

17:28

We'll catch you

17:28

We'll catch you

Today's Deep-Dive: Garage

Episode description

Persons