Today's Deep-Dive: Garage
Ep. 307

Today's Deep-Dive: Garage

Episode description

The Deep Dive episode explores Garage, an open-source distributed object storage project designed to bring enterprise-level resilience to self-hosted users and small businesses. Garage aims to provide data redundancy and availability comparable to cloud giants like Amazon and Google, but without the massive budget. It achieves this by treating data as self-contained objects with rich metadata, stored in a flat, infinitely scalable structure. A key feature is its S3 compatibility, allowing existing software designed for Amazon’s Simple Storage Service to seamlessly integrate with a Garage cluster. Developed by Doofsler, a self-hosted service provider, Garage is released under the AGPL v3 license, ensuring improvements remain open source. The system prioritizes resilience and geographical distribution, replicating data in three distinct “zones” to withstand failures in up to two locations. Garage’s engineering philosophy embraces high latency and flaky network conditions, common for smaller operators, by leveraging concepts like Amazon’s Dynamo for availability over immediate consistency and Conflict-Free Replicated Data Types (CRDTs) for reliable data merging. It also incorporates principles from Google’s Maglev for efficient traffic routing. The project boasts a low barrier to entry, requiring minimal hardware resources (1GB RAM, 16GB disk) and supporting heterogeneous hardware configurations. Garage is sustained by public funding, notably grants from the European Commission’s Next Generation Internet (NGI) initiative, supporting its development as a public good. This approach democratizes access to advanced distributed storage, offering a decentralized alternative to large, centralized cloud providers and challenging assumptions about data safety and availability.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now!

Download transcript (.srt)
0:00

Welcome back to the Deep Dive. If you need to get up speed quickly on complex

0:05

research,

0:06

technical stuff, or what's moving in an industry, you're definitely in the right

0:10

place.

0:10

We go deep into the source material so you get the essential informed perspective.

0:15

Now, before we unpack today's sources, we really want to give a huge thank you to

0:19

our supporter,

0:20

SafeServer. SafeServer handles the hosting for exactly this kind of, you know,

0:24

crucial distributed

0:25

software. They're dedicated to supporting you on your digital transformation

0:28

journey.

0:29

You can find out more info and resources over at www.SafeServer.de.

0:35

So today, we're focusing on something that sounds pretty technical, but it's

0:38

actually

0:38

quite democratizing in the end. Distributed object storage. We're taking a deep

0:44

dive into

0:44

a specific open source project called Garage. Yeah, Garage. It basically aims to

0:48

take the kind

0:48

of resilience you see with, like, the massive cloud giants. Like Amazon, Google.

0:53

Exactly.

0:53

Yeah. And put that power directly into the hands of, you know, self-hosted users,

0:57

small businesses,

0:58

that kind of thing. Okay. So our sources really lay out a fascinating path for how

1:03

you can get

1:03

world-class data redundancy, but without needing a world-class budget. Right. The

1:09

challenge we're

1:10

sort of tackling for you, the listener, is this. How do you get data storage that's,

1:15

well, practically

1:16

indestructible, even if the servers you're running it on are a bit flaky or old or

1:21

even spread out

1:21

across totally different physical locations? That's the mission, then. To demystify

1:26

this

1:26

geo-distributed storage idea, look at the pretty cutting-edge theory Garage is

1:31

built on. And show

1:32

how this specific solution is really tailored for the, let's say, small to medium

1:36

operator,

1:37

not the huge enterprises. Got it. So the key takeaway right from the start. The

1:42

core nugget

1:42

you need is that Garage is open source. It's S3 compatible. We'll get into why that

1:46

matters.

1:47

And it's an object store specifically designed for geo-distributed setups. Right.

1:51

It's fundamentally

1:52

about bringing that sort of enterprise level availability down to environments

1:56

where, frankly,

1:57

failure is kind of expected sometimes, not this rare anomaly. Okay, let's start

2:00

with the basics

2:01

then. Because object storage itself, it's a term people hear but maybe don't fully

2:06

get compared to

2:08

you know, the normal file storage on their computer, or maybe block storage they've

2:12

heard about.

2:12

Yeah, good place to start. If I'm just saving a document or maybe doing a backup,

2:17

why would I even think about object storage? Exactly. Well, it really comes down to

2:20

scale

2:21

and flexibility. Think of your traditional file storage like a very rigid filing

2:27

cabinet. You've

2:28

got folders inside folders, a strict hierarchy. Okay. Block storage is even lower

2:32

level, dealing

2:33

with the actual chunks on a hard drive. Yeah. Object storage though, it's totally

2:36

different.

2:37

It's flat. A flat. Yeah, every piece of data could be a video file, a log entry, a

2:42

database backup,

2:43

whatever. It's treated as its own distinct self-contained object. Ah, okay. And

2:48

each object

2:49

gets tagged with rich metadata information about the object, and it gets a unique

2:54

ID. This flat

2:56

structure makes it almost infinitely scalable. Right. Which is why it's the main

3:00

method used by

3:00

companies like, say, Netflix for streaming video or Amazon S3 itself for just

3:05

massive archives of

3:06

data. And Garage ties into that directly. You said its identity is a distributed

3:11

object storage

3:12

service that emphasizes S3 compatibility. Why is being S3 compatible so critical?

3:18

Oh, it's huge

3:19

because Amazon's simple storage service, S3, is basically the universal language,

3:25

the lingua franca

3:27

of cloud storage. Right. Everyone uses it or integrates with it. Exactly. Yeah. So

3:31

by

3:31

implementing the exact same API, the way software talks to the storage garage just

3:36

wipes out the

3:37

biggest hurdle to using it. How so? Well, if you're already using some software for

3:41

say backups or

3:42

monitoring or maybe hosting website assets, and that software talks to AWS S3.

3:47

Which a lot of

3:47

software does. Precisely. You can literally just point that same software at your

3:51

own garage cluster

3:52

instead. No code changes needed. It just works. Wow. Okay. That makes it incredibly

3:57

attractive

3:58

then, especially for sysadmins who don't want to rebuild their entire tool chain.

4:01

Absolutely. Now,

4:02

this wasn't built by some corporate giant you mentioned who's actually behind

4:05

garage. And who

4:06

are they building it for? Yeah, it came from a group called Doofsler. They're

4:10

actually an experimental

4:11

small-scale self-hosted service provider themselves. So they built it out of their

4:16

own need.

4:16

Pretty much. They built garage because they needed it and they've been using it in

4:21

production since

4:21

back in 2020. Yeah. So it was really built by people running these kinds of small

4:26

distributed

4:27

setups for people doing the same. Makes sense. And importantly, it's entirely free

4:33

software.

4:34

It's released under the AGPL v3 license. Hold on. AGPL v3. That's a specific choice,

4:39

isn't it?

4:40

What does that license mean for someone thinking about using garage or maybe

4:43

building on it?

4:44

Yeah. It's an important detail. The Afro general public license AGPL is one of the

4:49

stronger copy

4:50

left licenses. Basically, if you modify the garage software and then run that

4:55

modified version as a

4:56

network service, like people connect to it over the internet, then you must make

5:01

the source code

5:01

of your modified version available to those users. Ah, so it prevents someone from

5:06

taking the open

5:06

source code, improving it, and then locking those improvements up in their own

5:10

private cloud service.

5:11

Exactly. It's designed to make sure that improvements made to free software,

5:16

especially when it's run as a service, flow back to the community. It keeps the

5:20

technology free and

5:20

open. That fits perfectly with this whole mission of democratizing resilient

5:25

infrastructure.

5:26

Okay, let's move on to the driving goals. Section two, I mean, why bother with all

5:31

this complexity?

5:32

If I can just plug a huge external hard drive into my server, why build distributed

5:37

storage across

5:38

multiple locations? You basically hit the nail on the head there. The core

5:41

motivations are resilience

5:43

and geographical distribution. Right. That single giant hard drive that's a single

5:47

point of failure,

5:47

power supply dies, fire, flood, whatever, your data's gone. Okay, yeah. Garage tackles

5:54

this head-on.

5:55

It's explicitly designed for clusters where the nodes, the servers are running in

6:00

different

6:00

physical locations. Could be different racks, different buildings, different cities

6:04

even.

6:04

So the idea is if one data center goes dark, maybe a power outage,

6:08

a network cut, the data is still safe and accessible somewhere else.

6:12

Precisely. High availability through geographic redundancy.

6:15

But wait, moving data across the public internet, that sounds like a recipe for

6:20

terrible latency consistency problems, doesn't it? Is dealing with that really

6:24

worth the

6:24

complexity for a small operator? That's the critical question, right?

6:28

Yeah. And it's actually what defines

6:30

Garage's whole engineering philosophy. Most big enterprise storage software assumes

6:35

you

6:36

have these perfect low latency, high bandwidth, dedicated network links between

6:40

data centers.

6:41

Which small operators usually don't have. Exactly. Garage's developers,

6:46

being sysadmins themselves running stuff over the regular internet,

6:49

just accepted that high latency, maybe up to 200 milliseconds even,

6:53

and flaky connections were just reality. So they built around that reality.

6:58

Yes. They focused the design on handling those network issues gracefully.

7:02

The goal isn't just copying data. It's keeping the service available and ensuring

7:06

the data

7:06

becomes eventually consistent, even if entire locations are temporarily offline.

7:10

So they democratized resilience by not requiring perfect, expensive networking.

7:15

That's it. They built the software to deal with the messy reality of non-enterprise

7:19

networks.

7:19

Making it actually usable. Right. They aimed to make it

7:23

operationally manageable, lightweight, prioritizing that availability, that

7:27

survival,

7:28

above almost everything else. Okay. This is where it gets really

7:30

interesting for me. Section three, the how. How does Garage actually achieve this

7:36

high resilience?

7:37

How does it handle data integrity when you've got machines failing, disks dying,

7:42

networks dropping packets, the whole potential mess?

7:46

Yeah. The secret sauce. They rely on, well, structured redundancy and concepts like

7:51

Quorum.

7:52

Garage is built to be highly resilient against pretty much everything.

7:56

Disk errors, yes, but also whole network partitions separating geographic locations.

8:01

And they mentioned something interesting. Sysadmin failures.

8:04

Ah, yeah. They specifically considered that. Because let's be honest, sometimes the

8:08

biggest

8:09

cause of an outage is human error during maintenance or configuration. The design

8:13

tries to minimize the impact of those two. Smart. Okay. Tell us about the data

8:16

replication.

8:17

How many copies of my data exist and where do they live in this system?

8:20

The magic number they landed on is three. Every chunk of data you upload gets

8:24

replicated,

8:25

stored in three distinct zones. Zones. What's a zone?

8:28

Think of a zone as a logical grouping that usually maps to a physical failure

8:33

domain.

8:33

So it can be a specific rack, a specific building, or most powerfully a specific

8:38

data center or geographic location. And importantly, a zone itself usually

8:44

consists of multiple servers for redundancy within the zone.

8:47

So three copies spread across three potentially separate geographical areas.

8:51

Correct. And the logic is classic quorum systems. To survive N1 failures, you need

8:56

N copies. Here,

8:57

N3. So you can lose up to two entire zones simultaneously.

9:02

Two whole data centers could go offline.

9:04

Could be, yeah. One is down for planned maintenance and another has a sudden

9:08

network

9:08

failure. Your data remains fully accessible and readable from that third surviving

9:12

zone.

9:13

Wow. That level of fault tolerance with potentially just three locations, that's

9:17

pretty remarkable for something aimed at smaller scale.

9:21

It really is. But achieving that reliability isn't trivial, theoretically speaking.

9:25

The sources mention garages standing on the shoulders of giants drawing from

9:28

decades of distributed systems research. Right. Let's unpack those giants. They

9:33

mention influences like Amazon's Dynamo. What key ideas did Garage borrow from

9:39

massive systems like that? This is really the core of it, I think.

9:42

The Garage team didn't necessarily invent brand new distributed systems math.

9:47

Instead, they very cleverly applied existing, proven, but often complex,

9:52

academic research to their specific scale and problem set.

9:56

Okay, so Dynamo first. What's the key principle there?

9:58

Dynamo, Amazon's highly available key value store.

10:01

Its core idea was prioritizing availability over perfect, immediate consistency.

10:07

Meaning?

10:08

Meaning if there's a network issue, like a flow link or a partition, Dynamo won't

10:11

just block the

10:12

user and say, try again later. It will likely accept a write operation, let the

10:16

user think

10:16

it succeeded, and then promise to sort out any potential conflicts later once

10:20

communication is

10:21

restored.

10:21

Uh, so it keeps working even when parts of the system can't talk to each other

10:25

properly.

10:26

Exactly. It favors being available for reads and writes over ensuring every node

10:30

has the

10:31

absolute latest data right now. This leads to the concept of eventual consistency.

10:36

Okay, eventual consistency. But how do you make sure things do eventually become

10:40

consistent

10:41

without losing data or getting corrupted when servers have seen different things?

10:45

Great question. And that brings us to the second major influence, conflict-free

10:50

replicated data

10:51

types, or CRDTs.

10:53

CRDTs? Sounds complicated.

10:55

The concept is brilliant, actually. CRDTs are special data structures designed

10:59

specifically

11:00

for this eventual consistency model. They allow multiple servers to update the same

11:04

piece of data

11:05

independently, even while disconnected from each other.

11:08

Okay.

11:08

And they have mathematical properties that guarantee, when those servers do

11:12

eventually

11:12

reconnect and share their updates, their states will merge together correctly and

11:16

automatically

11:17

without losing information and without needing slow, complex locking mechanisms to

11:22

coordinate.

11:22

Whoa. So CRDTs are like the algorithmic magic that makes eventual consistency

11:28

safe and reliable, especially over flaky internet links.

11:31

It stops data getting messed up when different nodes are out of sync for a while.

11:35

You got it. It elegantly solves the coordination headache that plagues

11:40

many traditional distributed databases when dealing with network partitions or high

11:45

latency.

11:46

It's a huge enabler for systems like Garage.

11:48

Fascinating. And there was a third influence mentioned, Maglev.

11:52

Right. Maglev. That's Google's high-performance software network load balancer.

11:56

Mentioning this shows they weren't just thinking about data storage theory,

12:01

but also the practicalities of efficiently routing requests and managing

12:05

connections within the

12:06

cluster. So handling traffic effectively to make sure requests go to the right

12:11

place quickly,

12:11

even under load. Exactly. Making sure data requests are

12:15

steered efficiently to the nearest or the healthiest node that holds the data.

12:19

It's really impressive how they've taken these heavyweight architectural ideas born

12:23

from giants

12:24

like Amazon and Google and managed to translate them into something lightweight

12:28

enough for modest

12:28

infrastructure. That's the key achievement, I think. Which brings us nicely to the

12:32

practicalities.

12:33

Section four, the low barrier to entry. If I actually want to set up this geo-distributed,

12:40

super resilient cluster, do I need to go out and buy three identical brand new

12:44

server racks?

12:45

That's probably the most compelling part for the self-hosting community or small

12:48

businesses.

12:49

The answer is a definite no. An explicit design goal was keeping the barrier to

12:53

entry low.

12:54

They actively encourage using existing or even older machines. You don't need a

12:59

supercomputer cluster, then? Not at all. The minimum requirements are honestly

13:03

quite minimal.

13:03

Like what? Per node, they suggest just one gigabyte of RAM.

13:07

One gig? Seriously? Yep. And at least 16 gigabytes of disk space.

13:11

For the CPU, basically any BI8664 processor from the last decade or so, or an ARMv7

13:17

or ARMv8 chip,

13:19

think Raspberry Pi level or similar is sufficient. That is incredibly low overhead

13:23

for a system

13:24

promising this kind of resilience. It really suggests, like you said, if you've got

13:27

a few

13:27

old office PCs lying around or maybe some cheap virtual servers scattered in

13:31

different regions.

13:32

You could genuinely start building a garage cluster. That's the core economic

13:36

appeal.

13:37

And crucially, they build it specifically to allow mixing and matching different

13:41

types of hardware.

13:42

Ah, so heterogeneous hardware is supported. Explicitly. You can combine servers

13:47

with

13:47

different CPUs, different amounts of RAM, different disk sizes within the same

13:51

cluster.

13:52

That massively simplifies things because you don't need to source expensive

13:56

identical machines. Use

13:58

what you have or what you can get cheap. That's huge for operational reality.

14:03

And the deployment is super simple, too. It ships as a single self-contained binary

14:07

file,

14:08

no complex dependencies to install. It just runs on pretty much any modern Linux

14:13

distribution.

14:14

Just copy the file and run it, basically. Pretty much. It really emphasizes that

14:17

focus on ease of operation for the sysadmin. And just circling back quickly to a

14:21

tech detail,

14:22

the main language used is Rust, right? About 95% of the code?

14:25

Correct. And Rust is known for its performance, efficiency, and especially memory

14:30

safety.

14:30

That choice directly contributes to those low resource requirements and overall

14:34

stability.

14:35

That technical excellence seems linked to its sustainability, too. You mentioned it's

14:39

not

14:39

corporate-backed, but projects like this need ongoing work. How is Garage supported?

14:45

Has it managed to find funding? Yes. And that's another critical

14:48

point. In a world often dominated by venture capital, Garage has actually secured

14:53

significant

14:54

public funding, which really signals confidence in it as a public good for the

14:59

internet.

15:00

That's really interesting. Where did this public funding come from?

15:03

It's primarily come via the European Commission's Next Generation Internet,

15:07

or NGI initiative. It's had several grants. Can you detail those?

15:10

Sure. Back in 2021-2022, the NGI Pointer Fund supported three full-time employees

15:17

working on

15:17

Garage for a whole year. Wow, three people full-time.

15:20

Yeah. Then more recently, from 2023 to 2024, the NLNet Foundation through the NGIU

15:26

and Trust Fund

15:27

supported one full-time employee. Okay.

15:29

And it's ongoing. Looking ahead to 2025, the NLNet NGI Year Commons Fund is

15:34

providing support

15:34

for the equivalent of 1.5 full-time employees. So there's a steady stream of grant

15:39

funding,

15:40

keeping this critical piece of decentralized infrastructure alive and evolving,

15:45

driven by

15:45

community needs, not profit. That seems to be the model, yes.

15:49

It keeps it open source, keeps it free, and aligned with that original mission.

15:53

Okay. So let's try and summarize the core takeaway for you, the listener. What

15:57

Garage seems to have

15:58

done is successfully bridge this gap, right, between really advanced complex

16:03

distributed

16:04

systems theory like Dynamo, CRDTs, and the practical, often hardware-constrained

16:10

reality

16:10

of self-hosters and small organizations. Exactly. It essentially offers smaller

16:15

players

16:15

the kind of power, geo-distributed, highly resilient storage that used to be pretty

16:20

much exclusively the domain of tech giants with massive budgets.

16:24

It levels the playing field in a way. It really does. And this whole project,

16:28

this whole approach, it raises a bigger, quite provocative question, I think.

16:32

When you have open-source projects, especially ones backed by public funds like NGI,

16:37

actively working to democratize access to this kind of complex, resilient

16:42

infrastructure,

16:43

what does that really mean for the future? For decentralized data ownership, for

16:48

control,

16:49

it kind of challenges the default assumption that only huge centralized

16:53

corporations

16:54

can truly guarantee the safety and availability of our important data.

16:57

As the cloud continues to consolidate around a few big players,

17:02

projects like Garage offer a different path. It's definitely something worth

17:05

thinking about.

17:06

An excellent point to end on. A powerful thought to mull over, indeed.

17:10

Hopefully, you feel much better informed now about the potential of geo-distributed

17:14

object storage,

17:15

and specifically, the Garage project. We want to once again thank our sponsor,

17:19

Safe Server, for supporting this deep dive. They support the hosting of this very

17:23

type of software,

17:24

helping with digital transformation. You can find out more about how they can

17:27

support your

17:28

We'll catch you

17:28

We'll catch you