1 00:00:00,000 --> 00:00:05,810 Welcome back to the Deep Dive. If you need to get up speed quickly on complex 2 00:00:05,810 --> 00:00:06,320 research, 3 00:00:06,320 --> 00:00:10,340 technical stuff, or what's moving in an industry, you're definitely in the right 4 00:00:10,340 --> 00:00:10,880 place. 5 00:00:10,880 --> 00:00:15,360 We go deep into the source material so you get the essential informed perspective. 6 00:00:15,360 --> 00:00:19,280 Now, before we unpack today's sources, we really want to give a huge thank you to 7 00:00:19,280 --> 00:00:20,080 our supporter, 8 00:00:20,080 --> 00:00:24,650 SafeServer. SafeServer handles the hosting for exactly this kind of, you know, 9 00:00:24,650 --> 00:00:25,600 crucial distributed 10 00:00:25,600 --> 00:00:28,810 software. They're dedicated to supporting you on your digital transformation 11 00:00:28,810 --> 00:00:29,200 journey. 12 00:00:29,920 --> 00:00:35,520 You can find out more info and resources over at www.SafeServer.de. 13 00:00:35,520 --> 00:00:38,170 So today, we're focusing on something that sounds pretty technical, but it's 14 00:00:38,170 --> 00:00:38,480 actually 15 00:00:38,480 --> 00:00:44,150 quite democratizing in the end. Distributed object storage. We're taking a deep 16 00:00:44,150 --> 00:00:44,560 dive into 17 00:00:44,560 --> 00:00:48,330 a specific open source project called Garage. Yeah, Garage. It basically aims to 18 00:00:48,330 --> 00:00:48,800 take the kind 19 00:00:48,800 --> 00:00:53,040 of resilience you see with, like, the massive cloud giants. Like Amazon, Google. 20 00:00:53,040 --> 00:00:53,760 Exactly. 21 00:00:53,760 --> 00:00:57,280 Yeah. And put that power directly into the hands of, you know, self-hosted users, 22 00:00:57,280 --> 00:00:58,000 small businesses, 23 00:00:58,000 --> 00:01:03,200 that kind of thing. Okay. So our sources really lay out a fascinating path for how 24 00:01:03,200 --> 00:01:03,680 you can get 25 00:01:03,680 --> 00:01:09,760 world-class data redundancy, but without needing a world-class budget. Right. The 26 00:01:09,760 --> 00:01:10,400 challenge we're 27 00:01:10,400 --> 00:01:15,860 sort of tackling for you, the listener, is this. How do you get data storage that's, 28 00:01:15,860 --> 00:01:16,560 well, practically 29 00:01:16,560 --> 00:01:21,110 indestructible, even if the servers you're running it on are a bit flaky or old or 30 00:01:21,110 --> 00:01:21,920 even spread out 31 00:01:21,920 --> 00:01:26,240 across totally different physical locations? That's the mission, then. To demystify 32 00:01:26,240 --> 00:01:26,560 this 33 00:01:26,560 --> 00:01:31,430 geo-distributed storage idea, look at the pretty cutting-edge theory Garage is 34 00:01:31,430 --> 00:01:32,560 built on. And show 35 00:01:32,560 --> 00:01:36,970 how this specific solution is really tailored for the, let's say, small to medium 36 00:01:36,970 --> 00:01:37,680 operator, 37 00:01:37,680 --> 00:01:42,320 not the huge enterprises. Got it. So the key takeaway right from the start. The 38 00:01:42,320 --> 00:01:42,880 core nugget 39 00:01:42,880 --> 00:01:46,920 you need is that Garage is open source. It's S3 compatible. We'll get into why that 40 00:01:46,920 --> 00:01:47,440 matters. 41 00:01:47,440 --> 00:01:51,760 And it's an object store specifically designed for geo-distributed setups. Right. 42 00:01:51,760 --> 00:01:52,400 It's fundamentally 43 00:01:52,400 --> 00:01:56,310 about bringing that sort of enterprise level availability down to environments 44 00:01:56,310 --> 00:01:57,200 where, frankly, 45 00:01:57,200 --> 00:02:00,840 failure is kind of expected sometimes, not this rare anomaly. Okay, let's start 46 00:02:00,840 --> 00:02:01,600 with the basics 47 00:02:01,600 --> 00:02:06,720 then. Because object storage itself, it's a term people hear but maybe don't fully 48 00:02:06,720 --> 00:02:08,000 get compared to 49 00:02:08,000 --> 00:02:12,080 you know, the normal file storage on their computer, or maybe block storage they've 50 00:02:12,080 --> 00:02:12,880 heard about. 51 00:02:12,880 --> 00:02:17,200 Yeah, good place to start. If I'm just saving a document or maybe doing a backup, 52 00:02:17,200 --> 00:02:20,880 why would I even think about object storage? Exactly. Well, it really comes down to 53 00:02:20,880 --> 00:02:21,440 scale 54 00:02:21,440 --> 00:02:27,200 and flexibility. Think of your traditional file storage like a very rigid filing 55 00:02:27,200 --> 00:02:28,320 cabinet. You've 56 00:02:28,320 --> 00:02:32,820 got folders inside folders, a strict hierarchy. Okay. Block storage is even lower 57 00:02:32,820 --> 00:02:33,520 level, dealing 58 00:02:33,520 --> 00:02:36,890 with the actual chunks on a hard drive. Yeah. Object storage though, it's totally 59 00:02:36,890 --> 00:02:37,440 different. 60 00:02:37,440 --> 00:02:42,240 It's flat. A flat. Yeah, every piece of data could be a video file, a log entry, a 61 00:02:42,240 --> 00:02:43,280 database backup, 62 00:02:43,280 --> 00:02:48,480 whatever. It's treated as its own distinct self-contained object. Ah, okay. And 63 00:02:48,480 --> 00:02:49,360 each object 64 00:02:49,360 --> 00:02:54,810 gets tagged with rich metadata information about the object, and it gets a unique 65 00:02:54,810 --> 00:02:56,000 ID. This flat 66 00:02:56,000 --> 00:03:00,130 structure makes it almost infinitely scalable. Right. Which is why it's the main 67 00:03:00,130 --> 00:03:00,880 method used by 68 00:03:00,880 --> 00:03:05,130 companies like, say, Netflix for streaming video or Amazon S3 itself for just 69 00:03:05,130 --> 00:03:06,240 massive archives of 70 00:03:06,240 --> 00:03:11,540 data. And Garage ties into that directly. You said its identity is a distributed 71 00:03:11,540 --> 00:03:12,400 object storage 72 00:03:12,400 --> 00:03:18,890 service that emphasizes S3 compatibility. Why is being S3 compatible so critical? 73 00:03:18,890 --> 00:03:19,920 Oh, it's huge 74 00:03:19,920 --> 00:03:25,920 because Amazon's simple storage service, S3, is basically the universal language, 75 00:03:25,920 --> 00:03:27,200 the lingua franca 76 00:03:27,200 --> 00:03:31,600 of cloud storage. Right. Everyone uses it or integrates with it. Exactly. Yeah. So 77 00:03:31,600 --> 00:03:31,920 by 78 00:03:31,920 --> 00:03:36,760 implementing the exact same API, the way software talks to the storage garage just 79 00:03:36,760 --> 00:03:37,440 wipes out the 80 00:03:37,440 --> 00:03:41,880 biggest hurdle to using it. How so? Well, if you're already using some software for 81 00:03:41,880 --> 00:03:42,720 say backups or 82 00:03:42,720 --> 00:03:47,220 monitoring or maybe hosting website assets, and that software talks to AWS S3. 83 00:03:47,220 --> 00:03:47,920 Which a lot of 84 00:03:47,920 --> 00:03:51,820 software does. Precisely. You can literally just point that same software at your 85 00:03:51,820 --> 00:03:52,800 own garage cluster 86 00:03:52,800 --> 00:03:57,340 instead. No code changes needed. It just works. Wow. Okay. That makes it incredibly 87 00:03:57,340 --> 00:03:58,000 attractive 88 00:03:58,000 --> 00:04:01,810 then, especially for sysadmins who don't want to rebuild their entire tool chain. 89 00:04:01,810 --> 00:04:02,480 Absolutely. Now, 90 00:04:02,480 --> 00:04:05,980 this wasn't built by some corporate giant you mentioned who's actually behind 91 00:04:05,980 --> 00:04:06,720 garage. And who 92 00:04:06,720 --> 00:04:10,140 are they building it for? Yeah, it came from a group called Doofsler. They're 93 00:04:10,140 --> 00:04:11,760 actually an experimental 94 00:04:11,760 --> 00:04:16,480 small-scale self-hosted service provider themselves. So they built it out of their 95 00:04:16,480 --> 00:04:16,880 own need. 96 00:04:16,880 --> 00:04:21,120 Pretty much. They built garage because they needed it and they've been using it in 97 00:04:21,120 --> 00:04:21,840 production since 98 00:04:21,840 --> 00:04:26,510 back in 2020. Yeah. So it was really built by people running these kinds of small 99 00:04:26,510 --> 00:04:27,280 distributed 100 00:04:27,280 --> 00:04:33,560 setups for people doing the same. Makes sense. And importantly, it's entirely free 101 00:04:33,560 --> 00:04:34,160 software. 102 00:04:34,160 --> 00:04:39,500 It's released under the AGPL v3 license. Hold on. AGPL v3. That's a specific choice, 103 00:04:39,500 --> 00:04:40,160 isn't it? 104 00:04:40,160 --> 00:04:43,900 What does that license mean for someone thinking about using garage or maybe 105 00:04:43,900 --> 00:04:44,880 building on it? 106 00:04:44,880 --> 00:04:49,680 Yeah. It's an important detail. The Afro general public license AGPL is one of the 107 00:04:49,680 --> 00:04:50,480 stronger copy 108 00:04:50,480 --> 00:04:55,230 left licenses. Basically, if you modify the garage software and then run that 109 00:04:55,230 --> 00:04:56,480 modified version as a 110 00:04:56,480 --> 00:05:01,000 network service, like people connect to it over the internet, then you must make 111 00:05:01,000 --> 00:05:01,840 the source code 112 00:05:01,840 --> 00:05:06,180 of your modified version available to those users. Ah, so it prevents someone from 113 00:05:06,180 --> 00:05:06,720 taking the open 114 00:05:06,720 --> 00:05:10,240 source code, improving it, and then locking those improvements up in their own 115 00:05:10,240 --> 00:05:11,920 private cloud service. 116 00:05:11,920 --> 00:05:16,400 Exactly. It's designed to make sure that improvements made to free software, 117 00:05:16,400 --> 00:05:20,120 especially when it's run as a service, flow back to the community. It keeps the 118 00:05:20,120 --> 00:05:20,960 technology free and 119 00:05:20,960 --> 00:05:25,060 open. That fits perfectly with this whole mission of democratizing resilient 120 00:05:25,060 --> 00:05:26,080 infrastructure. 121 00:05:26,720 --> 00:05:31,390 Okay, let's move on to the driving goals. Section two, I mean, why bother with all 122 00:05:31,390 --> 00:05:32,720 this complexity? 123 00:05:32,720 --> 00:05:37,600 If I can just plug a huge external hard drive into my server, why build distributed 124 00:05:37,600 --> 00:05:38,560 storage across 125 00:05:38,560 --> 00:05:41,820 multiple locations? You basically hit the nail on the head there. The core 126 00:05:41,820 --> 00:05:43,280 motivations are resilience 127 00:05:43,280 --> 00:05:47,360 and geographical distribution. Right. That single giant hard drive that's a single 128 00:05:47,360 --> 00:05:47,920 point of failure, 129 00:05:47,920 --> 00:05:54,160 power supply dies, fire, flood, whatever, your data's gone. Okay, yeah. Garage tackles 130 00:05:54,160 --> 00:05:54,800 this head-on. 131 00:05:55,360 --> 00:06:00,080 It's explicitly designed for clusters where the nodes, the servers are running in 132 00:06:00,080 --> 00:06:00,320 different 133 00:06:00,320 --> 00:06:04,010 physical locations. Could be different racks, different buildings, different cities 134 00:06:04,010 --> 00:06:04,400 even. 135 00:06:04,400 --> 00:06:08,400 So the idea is if one data center goes dark, maybe a power outage, 136 00:06:08,400 --> 00:06:12,320 a network cut, the data is still safe and accessible somewhere else. 137 00:06:12,320 --> 00:06:15,760 Precisely. High availability through geographic redundancy. 138 00:06:15,760 --> 00:06:20,400 But wait, moving data across the public internet, that sounds like a recipe for 139 00:06:20,400 --> 00:06:24,640 terrible latency consistency problems, doesn't it? Is dealing with that really 140 00:06:24,640 --> 00:06:24,960 worth the 141 00:06:24,960 --> 00:06:28,560 complexity for a small operator? That's the critical question, right? 142 00:06:28,560 --> 00:06:30,480 Yeah. And it's actually what defines 143 00:06:30,480 --> 00:06:35,920 Garage's whole engineering philosophy. Most big enterprise storage software assumes 144 00:06:35,920 --> 00:06:36,080 you 145 00:06:36,080 --> 00:06:40,880 have these perfect low latency, high bandwidth, dedicated network links between 146 00:06:40,880 --> 00:06:41,680 data centers. 147 00:06:41,680 --> 00:06:46,240 Which small operators usually don't have. Exactly. Garage's developers, 148 00:06:46,240 --> 00:06:49,760 being sysadmins themselves running stuff over the regular internet, 149 00:06:49,760 --> 00:06:53,840 just accepted that high latency, maybe up to 200 milliseconds even, 150 00:06:53,840 --> 00:06:58,000 and flaky connections were just reality. So they built around that reality. 151 00:06:58,000 --> 00:07:02,560 Yes. They focused the design on handling those network issues gracefully. 152 00:07:02,560 --> 00:07:06,320 The goal isn't just copying data. It's keeping the service available and ensuring 153 00:07:06,320 --> 00:07:06,800 the data 154 00:07:06,800 --> 00:07:10,960 becomes eventually consistent, even if entire locations are temporarily offline. 155 00:07:10,960 --> 00:07:15,440 So they democratized resilience by not requiring perfect, expensive networking. 156 00:07:15,440 --> 00:07:19,440 That's it. They built the software to deal with the messy reality of non-enterprise 157 00:07:19,440 --> 00:07:19,920 networks. 158 00:07:19,920 --> 00:07:23,040 Making it actually usable. Right. They aimed to make it 159 00:07:23,040 --> 00:07:27,120 operationally manageable, lightweight, prioritizing that availability, that 160 00:07:27,120 --> 00:07:28,160 survival, 161 00:07:28,160 --> 00:07:30,400 above almost everything else. Okay. This is where it gets really 162 00:07:30,400 --> 00:07:36,940 interesting for me. Section three, the how. How does Garage actually achieve this 163 00:07:36,940 --> 00:07:37,840 high resilience? 164 00:07:37,840 --> 00:07:42,880 How does it handle data integrity when you've got machines failing, disks dying, 165 00:07:42,880 --> 00:07:46,480 networks dropping packets, the whole potential mess? 166 00:07:46,480 --> 00:07:51,450 Yeah. The secret sauce. They rely on, well, structured redundancy and concepts like 167 00:07:51,450 --> 00:07:51,840 Quorum. 168 00:07:52,480 --> 00:07:56,160 Garage is built to be highly resilient against pretty much everything. 169 00:07:56,160 --> 00:08:01,600 Disk errors, yes, but also whole network partitions separating geographic locations. 170 00:08:01,600 --> 00:08:04,320 And they mentioned something interesting. Sysadmin failures. 171 00:08:04,320 --> 00:08:08,800 Ah, yeah. They specifically considered that. Because let's be honest, sometimes the 172 00:08:08,800 --> 00:08:09,120 biggest 173 00:08:09,120 --> 00:08:13,520 cause of an outage is human error during maintenance or configuration. The design 174 00:08:13,520 --> 00:08:16,880 tries to minimize the impact of those two. Smart. Okay. Tell us about the data 175 00:08:16,880 --> 00:08:17,600 replication. 176 00:08:17,600 --> 00:08:20,960 How many copies of my data exist and where do they live in this system? 177 00:08:20,960 --> 00:08:24,900 The magic number they landed on is three. Every chunk of data you upload gets 178 00:08:24,900 --> 00:08:25,600 replicated, 179 00:08:25,600 --> 00:08:28,800 stored in three distinct zones. Zones. What's a zone? 180 00:08:28,800 --> 00:08:33,010 Think of a zone as a logical grouping that usually maps to a physical failure 181 00:08:33,010 --> 00:08:33,680 domain. 182 00:08:33,680 --> 00:08:38,720 So it can be a specific rack, a specific building, or most powerfully a specific 183 00:08:38,720 --> 00:08:44,080 data center or geographic location. And importantly, a zone itself usually 184 00:08:44,080 --> 00:08:47,360 consists of multiple servers for redundancy within the zone. 185 00:08:47,360 --> 00:08:51,920 So three copies spread across three potentially separate geographical areas. 186 00:08:51,920 --> 00:08:56,560 Correct. And the logic is classic quorum systems. To survive N1 failures, you need 187 00:08:56,560 --> 00:08:57,600 N copies. Here, 188 00:08:57,600 --> 00:09:02,400 N3. So you can lose up to two entire zones simultaneously. 189 00:09:02,400 --> 00:09:04,240 Two whole data centers could go offline. 190 00:09:04,240 --> 00:09:08,160 Could be, yeah. One is down for planned maintenance and another has a sudden 191 00:09:08,160 --> 00:09:08,480 network 192 00:09:08,480 --> 00:09:12,800 failure. Your data remains fully accessible and readable from that third surviving 193 00:09:12,800 --> 00:09:13,280 zone. 194 00:09:13,280 --> 00:09:17,520 Wow. That level of fault tolerance with potentially just three locations, that's 195 00:09:17,520 --> 00:09:21,040 pretty remarkable for something aimed at smaller scale. 196 00:09:21,040 --> 00:09:25,440 It really is. But achieving that reliability isn't trivial, theoretically speaking. 197 00:09:25,440 --> 00:09:28,960 The sources mention garages standing on the shoulders of giants drawing from 198 00:09:28,960 --> 00:09:33,520 decades of distributed systems research. Right. Let's unpack those giants. They 199 00:09:33,520 --> 00:09:39,120 mention influences like Amazon's Dynamo. What key ideas did Garage borrow from 200 00:09:39,120 --> 00:09:42,640 massive systems like that? This is really the core of it, I think. 201 00:09:42,640 --> 00:09:47,280 The Garage team didn't necessarily invent brand new distributed systems math. 202 00:09:47,280 --> 00:09:52,640 Instead, they very cleverly applied existing, proven, but often complex, 203 00:09:52,640 --> 00:09:56,240 academic research to their specific scale and problem set. 204 00:09:56,240 --> 00:09:58,880 Okay, so Dynamo first. What's the key principle there? 205 00:09:58,880 --> 00:10:01,840 Dynamo, Amazon's highly available key value store. 206 00:10:01,840 --> 00:10:07,600 Its core idea was prioritizing availability over perfect, immediate consistency. 207 00:10:07,600 --> 00:10:08,080 Meaning? 208 00:10:08,080 --> 00:10:11,680 Meaning if there's a network issue, like a flow link or a partition, Dynamo won't 209 00:10:11,680 --> 00:10:12,160 just block the 210 00:10:12,160 --> 00:10:16,150 user and say, try again later. It will likely accept a write operation, let the 211 00:10:16,150 --> 00:10:16,560 user think 212 00:10:16,560 --> 00:10:20,720 it succeeded, and then promise to sort out any potential conflicts later once 213 00:10:20,720 --> 00:10:21,440 communication is 214 00:10:21,440 --> 00:10:21,920 restored. 215 00:10:21,920 --> 00:10:25,980 Uh, so it keeps working even when parts of the system can't talk to each other 216 00:10:25,980 --> 00:10:26,480 properly. 217 00:10:26,480 --> 00:10:30,960 Exactly. It favors being available for reads and writes over ensuring every node 218 00:10:30,960 --> 00:10:31,360 has the 219 00:10:31,360 --> 00:10:36,400 absolute latest data right now. This leads to the concept of eventual consistency. 220 00:10:36,400 --> 00:10:40,880 Okay, eventual consistency. But how do you make sure things do eventually become 221 00:10:40,880 --> 00:10:41,280 consistent 222 00:10:41,280 --> 00:10:45,440 without losing data or getting corrupted when servers have seen different things? 223 00:10:45,440 --> 00:10:50,320 Great question. And that brings us to the second major influence, conflict-free 224 00:10:50,320 --> 00:10:51,200 replicated data 225 00:10:51,200 --> 00:10:53,040 types, or CRDTs. 226 00:10:53,040 --> 00:10:55,520 CRDTs? Sounds complicated. 227 00:10:55,520 --> 00:10:59,840 The concept is brilliant, actually. CRDTs are special data structures designed 228 00:10:59,840 --> 00:11:00,800 specifically 229 00:11:00,800 --> 00:11:04,800 for this eventual consistency model. They allow multiple servers to update the same 230 00:11:04,800 --> 00:11:05,360 piece of data 231 00:11:05,360 --> 00:11:08,000 independently, even while disconnected from each other. 232 00:11:08,000 --> 00:11:08,640 Okay. 233 00:11:08,640 --> 00:11:12,320 And they have mathematical properties that guarantee, when those servers do 234 00:11:12,320 --> 00:11:12,720 eventually 235 00:11:12,720 --> 00:11:16,560 reconnect and share their updates, their states will merge together correctly and 236 00:11:16,560 --> 00:11:17,680 automatically 237 00:11:17,680 --> 00:11:22,080 without losing information and without needing slow, complex locking mechanisms to 238 00:11:22,080 --> 00:11:22,880 coordinate. 239 00:11:22,880 --> 00:11:28,400 Whoa. So CRDTs are like the algorithmic magic that makes eventual consistency 240 00:11:28,400 --> 00:11:31,680 safe and reliable, especially over flaky internet links. 241 00:11:31,680 --> 00:11:35,600 It stops data getting messed up when different nodes are out of sync for a while. 242 00:11:35,600 --> 00:11:40,720 You got it. It elegantly solves the coordination headache that plagues 243 00:11:40,720 --> 00:11:45,200 many traditional distributed databases when dealing with network partitions or high 244 00:11:45,200 --> 00:11:46,000 latency. 245 00:11:46,000 --> 00:11:48,960 It's a huge enabler for systems like Garage. 246 00:11:48,960 --> 00:11:52,640 Fascinating. And there was a third influence mentioned, Maglev. 247 00:11:52,640 --> 00:11:56,960 Right. Maglev. That's Google's high-performance software network load balancer. 248 00:11:56,960 --> 00:12:01,360 Mentioning this shows they weren't just thinking about data storage theory, 249 00:12:01,360 --> 00:12:05,760 but also the practicalities of efficiently routing requests and managing 250 00:12:05,760 --> 00:12:06,640 connections within the 251 00:12:06,640 --> 00:12:11,180 cluster. So handling traffic effectively to make sure requests go to the right 252 00:12:11,180 --> 00:12:11,840 place quickly, 253 00:12:11,840 --> 00:12:15,120 even under load. Exactly. Making sure data requests are 254 00:12:15,120 --> 00:12:19,760 steered efficiently to the nearest or the healthiest node that holds the data. 255 00:12:19,760 --> 00:12:23,520 It's really impressive how they've taken these heavyweight architectural ideas born 256 00:12:23,520 --> 00:12:24,160 from giants 257 00:12:24,160 --> 00:12:28,000 like Amazon and Google and managed to translate them into something lightweight 258 00:12:28,000 --> 00:12:28,880 enough for modest 259 00:12:28,880 --> 00:12:32,400 infrastructure. That's the key achievement, I think. Which brings us nicely to the 260 00:12:32,400 --> 00:12:33,520 practicalities. 261 00:12:33,520 --> 00:12:40,320 Section four, the low barrier to entry. If I actually want to set up this geo-distributed, 262 00:12:40,320 --> 00:12:44,870 super resilient cluster, do I need to go out and buy three identical brand new 263 00:12:44,870 --> 00:12:45,920 server racks? 264 00:12:45,920 --> 00:12:48,960 That's probably the most compelling part for the self-hosting community or small 265 00:12:48,960 --> 00:12:49,840 businesses. 266 00:12:49,840 --> 00:12:53,920 The answer is a definite no. An explicit design goal was keeping the barrier to 267 00:12:53,920 --> 00:12:54,960 entry low. 268 00:12:54,960 --> 00:12:59,040 They actively encourage using existing or even older machines. You don't need a 269 00:12:59,040 --> 00:13:03,260 supercomputer cluster, then? Not at all. The minimum requirements are honestly 270 00:13:03,260 --> 00:13:03,840 quite minimal. 271 00:13:03,840 --> 00:13:07,040 Like what? Per node, they suggest just one gigabyte of RAM. 272 00:13:07,040 --> 00:13:11,360 One gig? Seriously? Yep. And at least 16 gigabytes of disk space. 273 00:13:11,360 --> 00:13:17,680 For the CPU, basically any BI8664 processor from the last decade or so, or an ARMv7 274 00:13:17,680 --> 00:13:19,040 or ARMv8 chip, 275 00:13:19,040 --> 00:13:23,440 think Raspberry Pi level or similar is sufficient. That is incredibly low overhead 276 00:13:23,440 --> 00:13:24,080 for a system 277 00:13:24,080 --> 00:13:27,360 promising this kind of resilience. It really suggests, like you said, if you've got 278 00:13:27,360 --> 00:13:27,600 a few 279 00:13:27,600 --> 00:13:31,340 old office PCs lying around or maybe some cheap virtual servers scattered in 280 00:13:31,340 --> 00:13:32,320 different regions. 281 00:13:32,320 --> 00:13:36,880 You could genuinely start building a garage cluster. That's the core economic 282 00:13:36,880 --> 00:13:37,440 appeal. 283 00:13:37,440 --> 00:13:41,920 And crucially, they build it specifically to allow mixing and matching different 284 00:13:41,920 --> 00:13:42,800 types of hardware. 285 00:13:42,800 --> 00:13:47,520 Ah, so heterogeneous hardware is supported. Explicitly. You can combine servers 286 00:13:47,520 --> 00:13:47,680 with 287 00:13:47,680 --> 00:13:51,170 different CPUs, different amounts of RAM, different disk sizes within the same 288 00:13:51,170 --> 00:13:51,760 cluster. 289 00:13:52,320 --> 00:13:56,720 That massively simplifies things because you don't need to source expensive 290 00:13:56,720 --> 00:13:58,480 identical machines. Use 291 00:13:58,480 --> 00:14:03,440 what you have or what you can get cheap. That's huge for operational reality. 292 00:14:03,440 --> 00:14:07,840 And the deployment is super simple, too. It ships as a single self-contained binary 293 00:14:07,840 --> 00:14:08,640 file, 294 00:14:08,640 --> 00:14:13,430 no complex dependencies to install. It just runs on pretty much any modern Linux 295 00:14:13,430 --> 00:14:14,000 distribution. 296 00:14:14,000 --> 00:14:17,360 Just copy the file and run it, basically. Pretty much. It really emphasizes that 297 00:14:17,360 --> 00:14:21,600 focus on ease of operation for the sysadmin. And just circling back quickly to a 298 00:14:21,600 --> 00:14:22,320 tech detail, 299 00:14:22,320 --> 00:14:25,680 the main language used is Rust, right? About 95% of the code? 300 00:14:25,680 --> 00:14:30,130 Correct. And Rust is known for its performance, efficiency, and especially memory 301 00:14:30,130 --> 00:14:30,720 safety. 302 00:14:30,720 --> 00:14:34,610 That choice directly contributes to those low resource requirements and overall 303 00:14:34,610 --> 00:14:35,360 stability. 304 00:14:35,360 --> 00:14:39,400 That technical excellence seems linked to its sustainability, too. You mentioned it's 305 00:14:39,400 --> 00:14:39,520 not 306 00:14:39,520 --> 00:14:45,200 corporate-backed, but projects like this need ongoing work. How is Garage supported? 307 00:14:45,200 --> 00:14:48,560 Has it managed to find funding? Yes. And that's another critical 308 00:14:48,560 --> 00:14:53,440 point. In a world often dominated by venture capital, Garage has actually secured 309 00:14:53,440 --> 00:14:54,080 significant 310 00:14:54,080 --> 00:14:59,660 public funding, which really signals confidence in it as a public good for the 311 00:14:59,660 --> 00:15:00,640 internet. 312 00:15:00,640 --> 00:15:03,760 That's really interesting. Where did this public funding come from? 313 00:15:03,760 --> 00:15:07,120 It's primarily come via the European Commission's Next Generation Internet, 314 00:15:07,120 --> 00:15:10,720 or NGI initiative. It's had several grants. Can you detail those? 315 00:15:10,720 --> 00:15:17,280 Sure. Back in 2021-2022, the NGI Pointer Fund supported three full-time employees 316 00:15:17,280 --> 00:15:17,760 working on 317 00:15:17,760 --> 00:15:20,800 Garage for a whole year. Wow, three people full-time. 318 00:15:20,800 --> 00:15:26,240 Yeah. Then more recently, from 2023 to 2024, the NLNet Foundation through the NGIU 319 00:15:26,240 --> 00:15:27,200 and Trust Fund 320 00:15:27,200 --> 00:15:29,360 supported one full-time employee. Okay. 321 00:15:29,360 --> 00:15:34,320 And it's ongoing. Looking ahead to 2025, the NLNet NGI Year Commons Fund is 322 00:15:34,320 --> 00:15:34,960 providing support 323 00:15:34,960 --> 00:15:39,810 for the equivalent of 1.5 full-time employees. So there's a steady stream of grant 324 00:15:39,810 --> 00:15:40,320 funding, 325 00:15:40,320 --> 00:15:45,500 keeping this critical piece of decentralized infrastructure alive and evolving, 326 00:15:45,500 --> 00:15:45,840 driven by 327 00:15:45,840 --> 00:15:49,280 community needs, not profit. That seems to be the model, yes. 328 00:15:49,280 --> 00:15:53,520 It keeps it open source, keeps it free, and aligned with that original mission. 329 00:15:53,520 --> 00:15:57,680 Okay. So let's try and summarize the core takeaway for you, the listener. What 330 00:15:57,680 --> 00:15:58,640 Garage seems to have 331 00:15:58,640 --> 00:16:03,590 done is successfully bridge this gap, right, between really advanced complex 332 00:16:03,590 --> 00:16:04,480 distributed 333 00:16:04,480 --> 00:16:10,320 systems theory like Dynamo, CRDTs, and the practical, often hardware-constrained 334 00:16:10,320 --> 00:16:10,880 reality 335 00:16:10,880 --> 00:16:15,200 of self-hosters and small organizations. Exactly. It essentially offers smaller 336 00:16:15,200 --> 00:16:15,600 players 337 00:16:15,600 --> 00:16:20,800 the kind of power, geo-distributed, highly resilient storage that used to be pretty 338 00:16:20,800 --> 00:16:24,480 much exclusively the domain of tech giants with massive budgets. 339 00:16:24,480 --> 00:16:28,160 It levels the playing field in a way. It really does. And this whole project, 340 00:16:28,160 --> 00:16:32,720 this whole approach, it raises a bigger, quite provocative question, I think. 341 00:16:32,720 --> 00:16:37,840 When you have open-source projects, especially ones backed by public funds like NGI, 342 00:16:37,840 --> 00:16:42,620 actively working to democratize access to this kind of complex, resilient 343 00:16:42,620 --> 00:16:43,200 infrastructure, 344 00:16:43,200 --> 00:16:48,620 what does that really mean for the future? For decentralized data ownership, for 345 00:16:48,620 --> 00:16:49,280 control, 346 00:16:49,280 --> 00:16:53,680 it kind of challenges the default assumption that only huge centralized 347 00:16:53,680 --> 00:16:54,560 corporations 348 00:16:54,560 --> 00:16:57,760 can truly guarantee the safety and availability of our important data. 349 00:16:57,760 --> 00:17:02,080 As the cloud continues to consolidate around a few big players, 350 00:17:02,080 --> 00:17:05,480 projects like Garage offer a different path. It's definitely something worth 351 00:17:05,480 --> 00:17:06,320 thinking about. 352 00:17:06,320 --> 00:17:10,800 An excellent point to end on. A powerful thought to mull over, indeed. 353 00:17:10,800 --> 00:17:14,380 Hopefully, you feel much better informed now about the potential of geo-distributed 354 00:17:14,380 --> 00:17:15,280 object storage, 355 00:17:15,280 --> 00:17:19,920 and specifically, the Garage project. We want to once again thank our sponsor, 356 00:17:19,920 --> 00:17:23,600 Safe Server, for supporting this deep dive. They support the hosting of this very 357 00:17:23,600 --> 00:17:24,640 type of software, 358 00:17:24,640 --> 00:17:27,860 helping with digital transformation. You can find out more about how they can 359 00:17:27,860 --> 00:17:28,240 support your 360 00:17:28,240 --> 00:17:33,680 infrastructure needs at www.safeserver.de. Thanks for joining us for the deep dive. 361 00:17:33,680 --> 00:17:34,080 We'll catch you