1 00:00:00,000 --> 00:00:05,000 Okay, so get this. Imagine you're so deep into Kubernetes. 2 00:00:05,000 --> 00:00:09,040 Like you're giving a KubeCon talk about your setup. 3 00:00:09,040 --> 00:00:11,640 You've got to handling millions of users. 4 00:00:11,640 --> 00:00:13,720 But then you get it. 5 00:00:13,720 --> 00:00:15,520 That's what we're diving into today. 6 00:00:15,520 --> 00:00:18,560 Why Gitpod decided Kubernetes, 7 00:00:18,560 --> 00:00:21,480 specifically for their developer environments, 8 00:00:21,480 --> 00:00:22,600 wasn't working. 9 00:00:22,600 --> 00:00:24,080 Yeah, it's really interesting 10 00:00:24,080 --> 00:00:26,680 because they're not saying Kubernetes is bad, right? 11 00:00:26,680 --> 00:00:28,640 They're saying it's not the right tool 12 00:00:28,640 --> 00:00:30,040 when it comes to developer environment. 13 00:00:30,040 --> 00:00:31,960 Exactly, we're looking at their blog post 14 00:00:31,960 --> 00:00:34,800 from October 31st, 2024. 15 00:00:34,800 --> 00:00:37,720 It's this like six year saga of them trying to make it work, 16 00:00:37,720 --> 00:00:39,240 hitting all these roadblocks. 17 00:00:39,240 --> 00:00:41,320 They came up with some pretty interesting workarounds. 18 00:00:41,320 --> 00:00:42,520 Oh yeah. 19 00:00:42,520 --> 00:00:43,600 You almost feel bad for them, 20 00:00:43,600 --> 00:00:46,240 but you learn a lot whether you're deep into Kubernetes 21 00:00:46,240 --> 00:00:48,280 or just curious about developer tools. 22 00:00:48,280 --> 00:00:49,920 Yeah, and it just shows that even teams 23 00:00:49,920 --> 00:00:53,520 with tons of experience, even huge teams, 24 00:00:53,520 --> 00:00:56,000 sometimes have to take a step back and look at their tools. 25 00:00:56,000 --> 00:00:56,840 Right. 26 00:00:56,840 --> 00:00:58,640 You've got to pick the right tool for the job. 27 00:00:58,640 --> 00:00:59,480 Right. 28 00:00:59,480 --> 00:01:00,520 It doesn't have to be the popular one. 29 00:01:00,520 --> 00:01:01,360 Yeah, okay. 30 00:01:01,360 --> 00:01:04,040 So Gitpod's main argument is running applications 31 00:01:04,040 --> 00:01:05,840 in production, that's where Kubernetes shines. 32 00:01:05,840 --> 00:01:06,680 Yeah. 33 00:01:06,680 --> 00:01:10,020 But developer environments, that's a whole different beast. 34 00:01:10,020 --> 00:01:10,920 Totally. 35 00:01:10,920 --> 00:01:13,840 And the blog post breaks down why they highlight 36 00:01:13,840 --> 00:01:16,420 these four characteristics of developer environments. 37 00:01:16,420 --> 00:01:18,400 The first being that they're super stateful 38 00:01:18,400 --> 00:01:19,240 and interactive. 39 00:01:19,240 --> 00:01:21,640 So you've got gigabytes of source code, 40 00:01:21,640 --> 00:01:24,280 you've got build caches, you've got containers running. 41 00:01:24,280 --> 00:01:25,920 All that is constantly changing. 42 00:01:25,920 --> 00:01:28,440 It's not like a stateless app. 43 00:01:28,440 --> 00:01:31,920 Your developer environment is basically an extension of you. 44 00:01:31,920 --> 00:01:34,480 Yeah, it's like the difference between a pristine server room 45 00:01:34,480 --> 00:01:36,340 and your desk. 46 00:01:36,340 --> 00:01:38,640 Your desk has projects everywhere and coffee mugs. 47 00:01:38,640 --> 00:01:39,480 Exactly. 48 00:01:39,480 --> 00:01:42,840 And that mess is really valuable to developers. 49 00:01:42,840 --> 00:01:45,560 So you can imagine it's a huge pain 50 00:01:45,560 --> 00:01:47,720 if they lose changes or get interrupted. 51 00:01:47,720 --> 00:01:50,400 And that leads us to the second characteristic, 52 00:01:50,400 --> 00:01:52,840 unpredictable resource usage. 53 00:01:52,840 --> 00:01:55,920 So you might be coding along, and suddenly, bam, 54 00:01:55,920 --> 00:01:58,720 you need tons of CPU for compilation. 55 00:01:58,720 --> 00:02:00,560 Or memory usage might spike. 56 00:02:00,560 --> 00:02:02,080 Yeah, and Kubernetes isn't really 57 00:02:02,080 --> 00:02:03,880 known for loving surprises. 58 00:02:03,880 --> 00:02:04,920 Not really, no. 59 00:02:04,920 --> 00:02:06,360 Gitpod talks about all the struggles 60 00:02:06,360 --> 00:02:08,200 they had with CPU throttling. 61 00:02:08,200 --> 00:02:10,680 Your terminal's lagging because your IDE is fighting 62 00:02:10,680 --> 00:02:12,920 some random process for resources. 63 00:02:12,920 --> 00:02:14,520 They did all kinds of stuff. 64 00:02:14,520 --> 00:02:17,440 Custom controllers, messing with process priorities, 65 00:02:17,440 --> 00:02:19,840 even tweaking Cgroups V2. 66 00:02:19,840 --> 00:02:21,320 Yeah, and for those who don't know, 67 00:02:21,320 --> 00:02:26,520 Cgroups V2 is how the Linux kernel organizes processes 68 00:02:26,520 --> 00:02:29,480 into these hierarchical groups. 69 00:02:29,480 --> 00:02:31,040 It's for controlling and monitoring 70 00:02:31,040 --> 00:02:34,920 things like CPU and memory and disk I.O. 71 00:02:34,920 --> 00:02:37,680 It's very fine-grained control, but it's complex. 72 00:02:37,680 --> 00:02:39,560 Yeah, it sounds like they went really deep. 73 00:02:39,560 --> 00:02:41,120 Deep down the rabbit hole. 74 00:02:41,120 --> 00:02:42,560 And remember, this is all happening 75 00:02:42,560 --> 00:02:44,280 inside a single container because that's 76 00:02:44,280 --> 00:02:45,680 the way Kubernetes works. 77 00:02:45,680 --> 00:02:47,920 So all these processes crammed together, 78 00:02:47,920 --> 00:02:51,080 it just makes resource usage a total guessing game. 79 00:02:51,080 --> 00:02:52,400 Right. 80 00:02:52,400 --> 00:02:54,160 OK, so then there's memory management. 81 00:02:54,160 --> 00:02:56,960 Apparently, until SwapSpace was available in Kubernetes 82 00:02:56,960 --> 00:03:01,120 version 1.22, overbooking memory was a pretty big risk. 83 00:03:01,120 --> 00:03:03,680 Like, you could end up killing a central processor, 84 00:03:03,680 --> 00:03:04,720 you imagine. 85 00:03:04,720 --> 00:03:05,880 Developer rage. 86 00:03:05,880 --> 00:03:06,480 Yeah. 87 00:03:06,480 --> 00:03:08,780 I mean, this just shows that even mature technologies 88 00:03:08,780 --> 00:03:11,320 like Kubernetes can have limitations, especially 89 00:03:11,320 --> 00:03:13,760 for specific use cases, right? 90 00:03:13,760 --> 00:03:15,680 It's really important to evaluate 91 00:03:15,680 --> 00:03:20,240 whether a tool's strengths really fit what you need it for. 92 00:03:20,240 --> 00:03:21,600 They must've been, I mean, can you imagine? 93 00:03:21,600 --> 00:03:22,440 Pulling their hair out. 94 00:03:22,440 --> 00:03:23,160 Yeah. 95 00:03:23,160 --> 00:03:23,960 Yeah. 96 00:03:23,960 --> 00:03:26,400 OK, so then we have storage performance. 97 00:03:26,400 --> 00:03:29,680 Gitpod really hammers on about how much this matters, 98 00:03:29,680 --> 00:03:32,440 not just for how fast your environment starts up, 99 00:03:32,440 --> 00:03:35,320 but your whole experience inside the environment. 100 00:03:35,320 --> 00:03:37,600 Yeah, because if you're waiting for files to load 101 00:03:37,600 --> 00:03:40,680 or for builds to finish, it just kills your flow. 102 00:03:40,680 --> 00:03:41,280 Totally. 103 00:03:41,280 --> 00:03:42,320 And they tried everything. 104 00:03:42,320 --> 00:03:45,200 SSD, rate zero for speed, a little risky. 105 00:03:45,200 --> 00:03:47,360 Then block storage for availability, 106 00:03:47,360 --> 00:03:52,040 but they hit a wall with persistent volume claims, or PVCs. 107 00:03:52,040 --> 00:03:53,840 For those who aren't deep into Kubernetes, 108 00:03:53,840 --> 00:03:55,880 explain why PVCs were such a pain. 109 00:03:55,880 --> 00:03:56,380 Sure. 110 00:03:56,380 --> 00:03:58,760 So PVCs, it's like this abstraction layer 111 00:03:58,760 --> 00:03:59,920 that lets you use storage. 112 00:03:59,920 --> 00:04:02,160 You don't have to worry about the underlying hardware, 113 00:04:02,160 --> 00:04:03,480 so it's flexible. 114 00:04:03,480 --> 00:04:07,400 But in practice, when these PVCs would attach or detach, 115 00:04:07,400 --> 00:04:10,160 it was unpredictable, and that messed with their attempts 116 00:04:10,160 --> 00:04:12,760 to make workspace startups super fast. 117 00:04:12,760 --> 00:04:15,000 They also ran into some reliability issues, 118 00:04:15,000 --> 00:04:16,400 especially on Google Cloud. 119 00:04:16,400 --> 00:04:18,320 So you're a developer, you're ready to code, 120 00:04:18,320 --> 00:04:19,960 and your whole environment just crashes. 121 00:04:19,960 --> 00:04:20,420 Yeah. 122 00:04:20,420 --> 00:04:21,000 Not a good look. 123 00:04:21,000 --> 00:04:22,400 Talk about a buzzkill. 124 00:04:22,400 --> 00:04:25,080 And then there's backing up and restoring these environments. 125 00:04:25,080 --> 00:04:26,320 They can get huge, right? 126 00:04:26,320 --> 00:04:27,040 Right. 127 00:04:27,040 --> 00:04:29,480 So moving them around became this balancing act 128 00:04:29,480 --> 00:04:32,920 of I-O, network bandwidth, and CPU. 129 00:04:32,920 --> 00:04:33,640 Wow. 130 00:04:33,640 --> 00:04:36,920 They even had to use sick group-based I-O limiters 131 00:04:36,920 --> 00:04:40,000 to prevent one workspace from hogging all the resources 132 00:04:40,000 --> 00:04:42,120 and then starving the others. 133 00:04:42,120 --> 00:04:45,880 It's crazy how these things that sound simple get so complex. 134 00:04:45,880 --> 00:04:46,380 Totally. 135 00:04:46,380 --> 00:04:48,480 Speaking of complex, another challenge? 136 00:04:48,480 --> 00:04:49,920 Autoscaling and startup time. 137 00:04:49,920 --> 00:04:50,760 Yeah. 138 00:04:50,760 --> 00:04:52,640 They were obsessed with minimizing 139 00:04:52,640 --> 00:04:53,720 that initial wait time. 140 00:04:53,720 --> 00:04:54,640 Of course, yeah. 141 00:04:54,640 --> 00:04:56,040 But that clashed with their desire 142 00:04:56,040 --> 00:04:59,760 to use their machines as efficiently as possible. 143 00:04:59,760 --> 00:05:02,400 Yeah, I mean Kubernetes by design 144 00:05:02,400 --> 00:05:05,080 has this inherent lower limit on startup time, right? 145 00:05:05,080 --> 00:05:05,560 Right. 146 00:05:05,560 --> 00:05:07,520 Because of all the steps involved, 147 00:05:07,520 --> 00:05:10,240 moving content around, spinning up containers. 148 00:05:10,240 --> 00:05:12,120 So they started off thinking, let's just 149 00:05:12,120 --> 00:05:14,640 run multiple workspaces on one node 150 00:05:14,640 --> 00:05:17,240 to leverage shared caches. 151 00:05:17,240 --> 00:05:19,320 But that didn't really work out. 152 00:05:19,320 --> 00:05:20,600 Didn't quite work out, no. 153 00:05:20,600 --> 00:05:22,440 So they tried some creative solutions. 154 00:05:22,440 --> 00:05:25,040 They tried something they called ghost workspaces. 155 00:05:25,040 --> 00:05:26,080 Ghost workspaces. 156 00:05:26,080 --> 00:05:26,600 Yeah. 157 00:05:26,600 --> 00:05:28,920 So these were preemptible pods that would just 158 00:05:28,920 --> 00:05:32,400 sit there to hold space so they could scale in advance. 159 00:05:32,400 --> 00:05:34,440 They're like phantom developers taking up space. 160 00:05:34,440 --> 00:05:35,640 That's a good way to put it. 161 00:05:35,640 --> 00:05:38,920 Clever, but too slow and unreliable. 162 00:05:38,920 --> 00:05:41,420 Then they tried ballast pods. 163 00:05:41,420 --> 00:05:44,680 So these were entire nodes filled with dummy pods 164 00:05:44,680 --> 00:05:46,600 just to ensure that they had enough capacity. 165 00:05:46,600 --> 00:05:48,760 Kind of like renting out an empty apartment building 166 00:05:48,760 --> 00:05:50,300 just in case you might need it later. 167 00:05:50,300 --> 00:05:52,080 Pretty much not efficient. 168 00:05:52,080 --> 00:05:55,560 Finally, they landed on cluster autoscaler plugins, 169 00:05:55,560 --> 00:05:57,680 which is a much more elegant solution. 170 00:05:57,680 --> 00:05:59,640 But it took a while to get there. 171 00:05:59,640 --> 00:06:02,220 They even implemented proportional autoscaling, 172 00:06:02,220 --> 00:06:05,160 which basically controls the rate of scale up. 173 00:06:05,160 --> 00:06:08,520 It's based on how quickly devs are starting new environments. 174 00:06:08,520 --> 00:06:09,860 So if there's a sudden rush, they 175 00:06:09,860 --> 00:06:12,660 can add capacity quickly without overshooting. 176 00:06:12,660 --> 00:06:14,120 It's all about finding that balance 177 00:06:14,120 --> 00:06:15,920 between being responsive and making 178 00:06:15,920 --> 00:06:17,040 the most of your resources. 179 00:06:17,040 --> 00:06:18,040 My brain's hurting. 180 00:06:18,040 --> 00:06:18,560 Anyone else? 181 00:06:18,560 --> 00:06:19,520 OK. 182 00:06:19,520 --> 00:06:21,840 Image polls, another headache. 183 00:06:21,840 --> 00:06:24,320 Workspace container images can be huge. 184 00:06:24,320 --> 00:06:26,400 We're talking like 10 gigabytes or more. 185 00:06:26,400 --> 00:06:29,320 And that impacts performance when you have to download and extract 186 00:06:29,320 --> 00:06:31,440 that much data for every workspace. 187 00:06:31,440 --> 00:06:34,040 Yeah, it's like downloading the entire Library of Congress 188 00:06:34,040 --> 00:06:35,320 every time you want to read a book. 189 00:06:35,320 --> 00:06:35,820 Right. 190 00:06:35,820 --> 00:06:38,600 So they tried pre-pulling images with demon sets, which 191 00:06:38,600 --> 00:06:41,880 are basically agents on every node making sure the images are ready. 192 00:06:41,880 --> 00:06:43,960 Then they tried building their own custom images 193 00:06:43,960 --> 00:06:47,760 to maximize layer reuse, even baking images directly 194 00:06:47,760 --> 00:06:49,520 into the node disk image. 195 00:06:49,520 --> 00:06:51,720 Yeah, each of those came with their own trade-offs, right? 196 00:06:51,720 --> 00:06:54,280 Increased complexity, higher costs, limits 197 00:06:54,280 --> 00:06:55,780 on what images devs could use. 198 00:06:55,780 --> 00:06:58,840 Again, another example of how something seemingly simple 199 00:06:58,840 --> 00:07:00,880 can get really complicated at scale. 200 00:07:00,880 --> 00:07:03,680 Yeah, and they even built their own registry facade. 201 00:07:03,680 --> 00:07:07,440 They integrated it with IPFS, the Interplanetary File System, 202 00:07:07,440 --> 00:07:10,760 that decentralized way to store and share files. 203 00:07:10,760 --> 00:07:11,800 They were so proud of it. 204 00:07:11,800 --> 00:07:14,160 They gave a whole KubeCon talk about it. 205 00:07:14,160 --> 00:07:16,240 But in the end, the best solution 206 00:07:16,240 --> 00:07:19,600 was just encouraging everyone to use similar base images, 207 00:07:19,600 --> 00:07:21,720 making caching a lot more effective. 208 00:07:21,720 --> 00:07:24,120 Sometimes the simplest answer really is the best one. 209 00:07:24,120 --> 00:07:26,320 But getting there takes some effort. 210 00:07:26,320 --> 00:07:27,280 OK, buckle up. 211 00:07:27,280 --> 00:07:30,320 We're going into the world of networking in Kubernetes. 212 00:07:30,320 --> 00:07:32,240 And this is where it gets a little technical. 213 00:07:32,240 --> 00:07:36,960 This is where the conflict between what Kubernetes assumes 214 00:07:36,960 --> 00:07:40,880 and what developer environments need becomes really clear. 215 00:07:40,880 --> 00:07:43,240 Yeah, you've got the issue of access control. 216 00:07:43,240 --> 00:07:45,640 You want each environment to be its own little island. 217 00:07:45,640 --> 00:07:47,480 So walled gardens for every developer. 218 00:07:47,480 --> 00:07:48,480 Exactly. 219 00:07:48,480 --> 00:07:50,760 So no peeking at your neighbor's code. 220 00:07:50,760 --> 00:07:53,360 And you need to control who can access what. 221 00:07:53,360 --> 00:07:56,240 Kubernetes has these things called network policies. 222 00:07:56,240 --> 00:07:58,840 They're for defining fine-grained rules 223 00:07:58,840 --> 00:08:01,560 about what traffic can flow within the cluster. 224 00:08:01,560 --> 00:08:04,920 Sounds great, but even those cause headaches for Gitpod. 225 00:08:04,920 --> 00:08:06,040 Of course they did. 226 00:08:06,040 --> 00:08:07,960 So what was their initial approach? 227 00:08:07,960 --> 00:08:12,840 So they started using Kubernetes services and an ingress proxy. 228 00:08:12,840 --> 00:08:17,160 It's to manage access to individual environment ports. 229 00:08:17,160 --> 00:08:20,880 Think your IDE or services running within the workspace. 230 00:08:20,880 --> 00:08:24,200 But as they scaled, this approach became unreliable. 231 00:08:24,200 --> 00:08:27,160 Because more users equals more complexity equals more things 232 00:08:27,160 --> 00:08:27,920 that can go wrong. 233 00:08:27,920 --> 00:08:28,560 Exactly. 234 00:08:28,560 --> 00:08:31,200 With thousands of environments running simultaneously, 235 00:08:31,200 --> 00:08:33,680 name resolution started failing. 236 00:08:33,680 --> 00:08:36,880 Sometimes, it even crashed entire workspaces. 237 00:08:36,880 --> 00:08:38,760 Even established Kubernetes features 238 00:08:38,760 --> 00:08:41,040 have their limits when you push them to the extreme. 239 00:08:41,040 --> 00:08:42,960 It's a good reminder that scaling isn't just 240 00:08:42,960 --> 00:08:44,280 about making things bigger. 241 00:08:44,280 --> 00:08:44,760 No. 242 00:08:44,760 --> 00:08:47,420 It's about making sure they can handle all the complexity that 243 00:08:47,420 --> 00:08:49,120 comes with size. 244 00:08:49,120 --> 00:08:51,760 OK, so resource constraints, another area 245 00:08:51,760 --> 00:08:55,240 where Gitpod face challenges, network bandwidth sharing. 246 00:08:55,240 --> 00:08:57,120 It's like having multiple apartments sharing 247 00:08:57,120 --> 00:08:58,960 the same internet connection, and everyone 248 00:08:58,960 --> 00:09:01,000 wants to stream movies at the same time. 249 00:09:01,000 --> 00:09:03,080 Yeah, just like CPU and memory, you've 250 00:09:03,080 --> 00:09:05,760 got multiple workspaces on a node, all competing 251 00:09:05,760 --> 00:09:07,700 for that same network pipe. 252 00:09:07,700 --> 00:09:10,720 Some container network interfaces, or CNIs, 253 00:09:10,720 --> 00:09:12,440 have features for network shaping, 254 00:09:12,440 --> 00:09:14,520 but that adds even more complexity. 255 00:09:14,520 --> 00:09:16,860 And then there's the question of fairness. 256 00:09:16,860 --> 00:09:18,360 How do you divide up that bandwidth 257 00:09:18,360 --> 00:09:20,240 so everyone gets a decent slice? 258 00:09:20,240 --> 00:09:23,240 It's a never-ending battle. 259 00:09:23,240 --> 00:09:26,160 Balancing performance, security, making 260 00:09:26,160 --> 00:09:27,920 the most of your resources. 261 00:09:27,920 --> 00:09:32,600 And that brings us to, I think, one of the hairiest topics. 262 00:09:32,600 --> 00:09:33,640 Security. 263 00:09:33,640 --> 00:09:36,060 Specifically in the context of developer environments. 264 00:09:36,060 --> 00:09:37,840 How do you give developers the freedom 265 00:09:37,840 --> 00:09:41,080 they need without creating a security nightmare? 266 00:09:41,080 --> 00:09:43,680 This is where the tension between flexibility and control 267 00:09:43,680 --> 00:09:44,520 really comes in. 268 00:09:44,520 --> 00:09:45,440 It gets complicated. 269 00:09:45,440 --> 00:09:48,120 So they start by highlighting this naive approach. 270 00:09:48,120 --> 00:09:50,480 Just give everyone root access to their containers. 271 00:09:50,480 --> 00:09:51,400 Seems simple, right? 272 00:09:51,400 --> 00:09:53,400 Yeah, just give everyone the keys to the kingdom. 273 00:09:53,400 --> 00:09:54,440 What could go wrong? 274 00:09:54,440 --> 00:09:56,720 Well, aside from being a security disaster waiting 275 00:09:56,720 --> 00:10:00,400 to happen, giving users root in their containers 276 00:10:00,400 --> 00:10:02,920 basically gives them root on the node itself. 277 00:10:02,920 --> 00:10:04,960 That means they can potentially swoop around 278 00:10:04,960 --> 00:10:07,440 in other environments that are running on the same node. 279 00:10:07,440 --> 00:10:09,280 They could mess with the infrastructure. 280 00:10:09,280 --> 00:10:09,920 Yeah, not good. 281 00:10:09,920 --> 00:10:11,320 Not exactly what you want. 282 00:10:11,320 --> 00:10:12,080 Not stable. 283 00:10:12,080 --> 00:10:14,520 So they needed something more sophisticated. 284 00:10:14,520 --> 00:10:17,440 Enter user namespaces. 285 00:10:17,440 --> 00:10:19,200 So this is a Linux kernel feature 286 00:10:19,200 --> 00:10:23,600 that lets you map user and group IDs inside containers. 287 00:10:23,600 --> 00:10:26,000 So you can basically make a user feel 288 00:10:26,000 --> 00:10:29,160 like they have root privileges within their environment, 289 00:10:29,160 --> 00:10:32,340 but without actually giving them control over the host system. 290 00:10:32,340 --> 00:10:34,800 OK, that sounds clever, but I bet it wasn't easy to set up. 291 00:10:34,800 --> 00:10:36,080 You bet it wasn't. 292 00:10:36,080 --> 00:10:39,080 Kubernetes did eventually add support for user namespaces 293 00:10:39,080 --> 00:10:42,120 in version 1.25, but Gitpod had already 294 00:10:42,120 --> 00:10:45,460 started their own implementation with version 1.22. 295 00:10:45,460 --> 00:10:47,360 And let me tell you, their solution 296 00:10:47,360 --> 00:10:50,280 involves some serious technical gymnastics. 297 00:10:50,280 --> 00:10:51,240 Give us the highlights. 298 00:10:51,240 --> 00:10:52,440 What kind of gymnastics? 299 00:10:52,440 --> 00:10:54,560 Well, for starters, they had to implement something 300 00:10:54,560 --> 00:10:57,680 called file system UID shifting. 301 00:10:57,680 --> 00:11:00,840 This ensures that files that are created inside the container 302 00:11:00,840 --> 00:11:04,200 are mapped correctly to user IDs on the host system. 303 00:11:04,200 --> 00:11:07,360 So it prevents any security bypasses. 304 00:11:07,360 --> 00:11:09,320 They tried a bunch of different approaches, 305 00:11:09,320 --> 00:11:13,320 like shifts, fuse overlays, even id mapped mounts. 306 00:11:13,320 --> 00:11:14,760 Each of those had their own quirks 307 00:11:14,760 --> 00:11:16,880 in terms of performance and compatibility. 308 00:11:16,880 --> 00:11:19,320 It sounds like they were really pushing the limits of what 309 00:11:19,320 --> 00:11:21,800 Kubernetes could do, trying to fit a square peg 310 00:11:21,800 --> 00:11:22,640 into a round hole. 311 00:11:22,640 --> 00:11:23,920 Exactly. 312 00:11:23,920 --> 00:11:25,800 And then there was a challenge of mounting 313 00:11:25,800 --> 00:11:28,640 what they call a masked proc file system. 314 00:11:28,640 --> 00:11:32,040 So usually when a container starts up, it mounts proc. 315 00:11:32,040 --> 00:11:33,600 This gives it access to information 316 00:11:33,600 --> 00:11:35,080 about the host system. 317 00:11:35,080 --> 00:11:37,720 But for Gitpod's security model, proc 318 00:11:37,720 --> 00:11:40,520 had to be hidden to prevent vulnerabilities. 319 00:11:40,520 --> 00:11:43,680 So they had to create this custom masked proc 320 00:11:43,680 --> 00:11:46,120 and then carefully move it into the right mount 321 00:11:46,120 --> 00:11:48,240 namespace for each container. 322 00:11:48,240 --> 00:11:50,480 And they did this using seccomp notify, 323 00:11:50,480 --> 00:11:53,560 which is like a super low level way to intercept and modify 324 00:11:53,560 --> 00:11:54,520 system calls. 325 00:11:54,520 --> 00:11:55,720 Pretty hardcore stuff. 326 00:11:55,720 --> 00:11:58,200 Wow, it's like they're doing brain surgery on Kubernetes 327 00:11:58,200 --> 00:11:59,160 to make it work. 328 00:11:59,160 --> 00:12:00,160 Pretty much. 329 00:12:00,160 --> 00:12:02,280 But wait, there's more. 330 00:12:02,280 --> 00:12:05,720 They also needed to add support for FUSE file 331 00:12:05,720 --> 00:12:06,960 system in user space. 332 00:12:06,960 --> 00:12:07,520 Yeah. 333 00:12:07,520 --> 00:12:09,720 A lot of developer tools rely on that. 334 00:12:09,720 --> 00:12:13,080 So this involved messing with the container's EBPF device 335 00:12:13,080 --> 00:12:15,040 filter, another low level tweak. 336 00:12:15,040 --> 00:12:17,560 And then there's the issue of network capabilities. 337 00:12:17,560 --> 00:12:18,060 Right. 338 00:12:18,060 --> 00:12:21,360 So as root, you have these powerful capabilities 339 00:12:21,360 --> 00:12:23,440 like KAPNA TADBEN and KAPNA TRAW. 340 00:12:23,440 --> 00:12:25,040 They let you control networking. 341 00:12:25,040 --> 00:12:25,540 Right. 342 00:12:25,540 --> 00:12:27,700 So giving those to a container would totally 343 00:12:27,700 --> 00:12:29,120 break their security model. 344 00:12:29,120 --> 00:12:29,680 Yeah. 345 00:12:29,680 --> 00:12:31,080 So how did they get around that? 346 00:12:31,080 --> 00:12:33,720 Well, they ended up creating another network namespace, 347 00:12:33,720 --> 00:12:37,000 but this time inside the Kubernetes container. 348 00:12:37,000 --> 00:12:39,240 Initially, they used sloop fornets. 349 00:12:39,240 --> 00:12:42,080 And then they switched to veth pairs and custom NF tables 350 00:12:42,080 --> 00:12:43,200 rules. 351 00:12:43,200 --> 00:12:45,480 It's like they were building a secure little networking 352 00:12:45,480 --> 00:12:48,160 sandbox within another sandbox. 353 00:12:48,160 --> 00:12:51,040 It's amazing how much work they put into making this all work. 354 00:12:51,040 --> 00:12:51,680 It really is. 355 00:12:51,680 --> 00:12:54,180 But all this complexity comes with a price, right? 356 00:12:54,180 --> 00:12:56,340 You've got performance hits, especially 357 00:12:56,340 --> 00:12:58,040 with the earlier solutions. 358 00:12:58,040 --> 00:13:00,920 You've got compatibility issues with certain tools. 359 00:13:00,920 --> 00:13:02,760 And then the never-ending struggle 360 00:13:02,760 --> 00:13:05,000 to keep up with Kubernetes updates. 361 00:13:05,000 --> 00:13:08,280 So you can see why they started looking for alternatives. 362 00:13:08,280 --> 00:13:10,360 And that's where their exploration of micro VMs comes 363 00:13:10,360 --> 00:13:10,860 in. 364 00:13:10,860 --> 00:13:12,520 But we're going to save that for part two. 365 00:13:12,520 --> 00:13:13,360 Stay tuned, folks. 366 00:13:13,360 --> 00:13:15,200 Things get really interesting. 367 00:13:15,200 --> 00:13:15,880 Welcome back. 368 00:13:15,880 --> 00:13:17,480 If you're just tuning in, we're talking 369 00:13:17,480 --> 00:13:20,520 about Gitpod's journey, how they went from Kubernetes fans 370 00:13:20,520 --> 00:13:24,200 to creating their own system for developer environments. 371 00:13:24,200 --> 00:13:26,480 Yeah, it got to the point where they were willing to try 372 00:13:26,480 --> 00:13:28,360 anything, even something completely 373 00:13:28,360 --> 00:13:29,560 different from Kubernetes. 374 00:13:29,560 --> 00:13:30,060 Right. 375 00:13:30,060 --> 00:13:31,800 So that's where micro VMs come in. 376 00:13:31,800 --> 00:13:34,440 Now, for those of us who aren't living 377 00:13:34,440 --> 00:13:38,040 in the infrastructure world, can you give us a micro VMs 101? 378 00:13:38,040 --> 00:13:38,880 What are they? 379 00:13:38,880 --> 00:13:40,840 And why was Gitpod so interested? 380 00:13:40,840 --> 00:13:41,340 Sure. 381 00:13:41,340 --> 00:13:46,160 So think of micro VMs like tiny specialized virtual machines, 382 00:13:46,160 --> 00:13:46,880 right? 383 00:13:46,880 --> 00:13:49,320 Strip down to just the essentials. 384 00:13:49,320 --> 00:13:52,040 They boot up super fast, small footprint, 385 00:13:52,040 --> 00:13:54,880 and security is kind of baked into their design. 386 00:13:54,880 --> 00:13:56,500 Gitpod was looking at technologies 387 00:13:56,500 --> 00:14:00,880 like Firecracker, Cloud Hypervisor, QEMU. 388 00:14:00,880 --> 00:14:03,720 So what was it about micro VMs that they were so excited about? 389 00:14:03,720 --> 00:14:06,520 What problems were they hoping to solve that Kubernetes just 390 00:14:06,520 --> 00:14:07,840 wasn't cutting it for? 391 00:14:07,840 --> 00:14:11,440 Well, first and foremost, better resource isolation. 392 00:14:11,440 --> 00:14:15,480 Unlike containers, which share the host's kernel, micro VMs, 393 00:14:15,480 --> 00:14:17,680 they get their own dedicated kernel. 394 00:14:17,680 --> 00:14:20,200 So that means less chance of one environment interfering 395 00:14:20,200 --> 00:14:23,000 with another, more predictable performance overall. 396 00:14:23,000 --> 00:14:28,720 So no more laggy terminal, because your IDE is fighting 397 00:14:28,720 --> 00:14:31,360 some compiler process for CPU. 398 00:14:31,360 --> 00:14:31,960 Exactly. 399 00:14:31,960 --> 00:14:36,920 Another big plus, memory snapshots, near instant resume. 400 00:14:36,920 --> 00:14:38,420 With something like Firecracker, you 401 00:14:38,420 --> 00:14:40,880 can take a snapshot of the entire VM's memory state, 402 00:14:40,880 --> 00:14:42,640 and that includes everything that's running. 403 00:14:42,640 --> 00:14:44,040 You can restore it in an instant. 404 00:14:44,040 --> 00:14:46,120 Wait, so you're saying you could literally 405 00:14:46,120 --> 00:14:48,280 pause your whole developer environment, mid-debug 406 00:14:48,280 --> 00:14:50,360 session, coffee break, whatever, and come back to it 407 00:14:50,360 --> 00:14:51,600 exactly as you left it. 408 00:14:51,600 --> 00:14:53,760 That's the power of micro VMs. 409 00:14:53,760 --> 00:14:55,680 Imagine the productivity boost, especially 410 00:14:55,680 --> 00:14:58,000 for large projects, complex projects, 411 00:14:58,000 --> 00:15:00,400 where restarting everything can take forever. 412 00:15:00,400 --> 00:15:02,840 Yeah, that's a feature I think a lot of developers would love. 413 00:15:02,840 --> 00:15:04,000 For sure. 414 00:15:04,000 --> 00:15:06,100 But I'm guessing there were some downsides, right? 415 00:15:06,100 --> 00:15:07,720 Otherwise, Gitpod would have just switched over 416 00:15:07,720 --> 00:15:08,680 and called it a day. 417 00:15:08,680 --> 00:15:10,880 Of course, no technology is perfect. 418 00:15:10,880 --> 00:15:13,140 One challenge was overhead. 419 00:15:13,140 --> 00:15:15,240 Even though micro VMs are lightweight 420 00:15:15,240 --> 00:15:17,020 compared to like traditional VMs, 421 00:15:17,020 --> 00:15:19,920 they still add more overhead than containers. 422 00:15:19,920 --> 00:15:22,960 And that impacts performance, resource utilization, 423 00:15:22,960 --> 00:15:25,360 which for a platform like Gitpod is a huge deal. 424 00:15:25,360 --> 00:15:27,640 Right, because they're running thousands, if not millions, 425 00:15:27,640 --> 00:15:28,520 of these environments. 426 00:15:28,520 --> 00:15:29,080 Exactly. 427 00:15:29,080 --> 00:15:31,560 Every little bit of efficiency matters. 428 00:15:31,560 --> 00:15:33,680 Another hurdle was image conversion. 429 00:15:33,680 --> 00:15:37,320 Most developer tools, they come packaged as container images 430 00:15:37,320 --> 00:15:38,880 using the OCI standard. 431 00:15:38,880 --> 00:15:40,400 Kubernetes loves that. 432 00:15:40,400 --> 00:15:42,520 But to use those images in a micro VM, 433 00:15:42,520 --> 00:15:44,400 you have to convert them to a format 434 00:15:44,400 --> 00:15:47,840 that the micro VM understands, that adds complexity 435 00:15:47,840 --> 00:15:49,160 and slows down startup. 436 00:15:49,160 --> 00:15:50,560 Right, so it's not just as simple 437 00:15:50,560 --> 00:15:52,920 as swapping out Kubernetes and plugging in micro VMs. 438 00:15:52,920 --> 00:15:55,240 No, it's a whole translation process. 439 00:15:55,240 --> 00:15:56,760 And then there are some limitations 440 00:15:56,760 --> 00:16:00,520 that are specific to micro VM technologies themselves. 441 00:16:00,520 --> 00:16:03,000 For example, Firecracker, which is known for its speed 442 00:16:03,000 --> 00:16:04,240 and its snapshotting. 443 00:16:04,240 --> 00:16:05,800 Well, at the time, it didn't support 444 00:16:05,800 --> 00:16:08,120 GPUs, which is a deal breaker if you're 445 00:16:08,120 --> 00:16:11,920 working on graphics intensive applications. 446 00:16:11,920 --> 00:16:14,000 OK, so even cutting edge technology 447 00:16:14,000 --> 00:16:15,720 has its limitations. 448 00:16:15,720 --> 00:16:17,040 What else did they run into? 449 00:16:17,040 --> 00:16:19,480 Well, data movement became a much bigger problem. 450 00:16:19,480 --> 00:16:23,160 With micro VMs, you're dealing with whole VM images, 451 00:16:23,160 --> 00:16:24,880 including those memory snapshots, which 452 00:16:24,880 --> 00:16:26,280 can be pretty large. 453 00:16:26,280 --> 00:16:29,200 Moving them around, whether it's for backups or scheduling, 454 00:16:29,200 --> 00:16:31,360 gets more complex and it takes more time. 455 00:16:31,360 --> 00:16:33,760 And I bet storage, which was already a pain point, 456 00:16:33,760 --> 00:16:35,440 became even more of a headache. 457 00:16:35,440 --> 00:16:36,560 You got it. 458 00:16:36,560 --> 00:16:39,240 They tried attaching EBS volumes, 459 00:16:39,240 --> 00:16:43,280 that's elastic block storage, from AWS to their micro VMs, 460 00:16:43,280 --> 00:16:45,560 thinking that they could improve startup times 461 00:16:45,560 --> 00:16:48,040 and reduce network strain by keeping the workspace 462 00:16:48,040 --> 00:16:49,400 data local. 463 00:16:49,400 --> 00:16:52,600 But then you run into all these performance quotas, latency 464 00:16:52,600 --> 00:16:55,520 issues, and just the challenge of scaling that approach 465 00:16:55,520 --> 00:16:57,160 across a huge platform. 466 00:16:57,160 --> 00:16:59,480 So kind of swapping one set of problems for another. 467 00:16:59,480 --> 00:17:00,360 In a way. 468 00:17:00,360 --> 00:17:03,360 But the micro VM detour, it wasn't a dead end at all. 469 00:17:03,360 --> 00:17:05,800 It was really a turning point in their thinking. 470 00:17:05,800 --> 00:17:07,960 First, it really solidified their commitment 471 00:17:07,960 --> 00:17:10,380 to things like full workspace backup 472 00:17:10,380 --> 00:17:12,960 and being able to suspend and resume environments. 473 00:17:12,960 --> 00:17:14,200 So that became a must have. 474 00:17:14,200 --> 00:17:14,840 Exactly. 475 00:17:14,840 --> 00:17:16,360 It was non-negotiable. 476 00:17:16,360 --> 00:17:18,680 But maybe more importantly, this experiment 477 00:17:18,680 --> 00:17:23,160 made them really consider moving away from Kubernetes. 478 00:17:23,160 --> 00:17:27,520 Trying to shoehorn these micro VMs into the Kubernetes world 479 00:17:27,520 --> 00:17:30,480 made them realize that there might be a better way. 480 00:17:30,480 --> 00:17:32,560 A way where they weren't constantly fighting 481 00:17:32,560 --> 00:17:34,320 the limitations of the platform. 482 00:17:34,320 --> 00:17:38,120 So it's like, those micro VMs were the gateway drug 483 00:17:38,120 --> 00:17:39,720 to their Kubernetes exodus. 484 00:17:39,720 --> 00:17:40,880 I like that analogy. 485 00:17:40,880 --> 00:17:41,840 It's perfect. 486 00:17:41,840 --> 00:17:43,420 They got a taste of something different. 487 00:17:43,420 --> 00:17:46,000 And they realized maybe they didn't need Kubernetes 488 00:17:46,000 --> 00:17:46,680 after all. 489 00:17:46,680 --> 00:17:48,920 OK, so after all that experimenting, 490 00:17:48,920 --> 00:17:50,440 what was their final move? 491 00:17:50,440 --> 00:17:52,720 Did they find the solution they were searching for? 492 00:17:52,720 --> 00:17:53,280 They did. 493 00:17:53,280 --> 00:17:55,980 They built their own system called Gitpod Flex. 494 00:17:55,980 --> 00:17:59,160 It's designed from the ground up to be like the perfect home 495 00:17:59,160 --> 00:18:00,800 for developer environments. 496 00:18:00,800 --> 00:18:02,240 Taking the best of what they learned 497 00:18:02,240 --> 00:18:04,680 and leaving the Kubernetes baggage behind. 498 00:18:04,680 --> 00:18:07,640 All right, so this is where it gets really interesting. 499 00:18:07,640 --> 00:18:09,360 Tell me more about Gitpod Flex. 500 00:18:09,360 --> 00:18:10,520 What makes it so special? 501 00:18:10,520 --> 00:18:14,080 Well, it's not a complete rejection of Kubernetes, right? 502 00:18:14,080 --> 00:18:15,960 They kept some of the core principles. 503 00:18:15,960 --> 00:18:18,800 For example, declarative APIs are still 504 00:18:18,800 --> 00:18:20,840 a core part of Gitpod Flex. 505 00:18:20,840 --> 00:18:23,320 Remember all those YAML files in Kubernetes? 506 00:18:23,320 --> 00:18:23,840 Yeah. 507 00:18:23,840 --> 00:18:25,760 Defining your infrastructure as code. 508 00:18:25,760 --> 00:18:27,080 Well, that's still there. 509 00:18:27,080 --> 00:18:27,360 OK. 510 00:18:27,360 --> 00:18:29,480 But in a more streamlined and targeted way. 511 00:18:29,480 --> 00:18:33,200 So you still get those benefits of infrastructure as code 512 00:18:33,200 --> 00:18:34,600 without all the complexity. 513 00:18:34,600 --> 00:18:35,360 Right. 514 00:18:35,360 --> 00:18:38,040 And they also kept the use of control theory 515 00:18:38,040 --> 00:18:39,680 for resource management. 516 00:18:39,680 --> 00:18:42,240 This basically means they're using fancy algorithms 517 00:18:42,240 --> 00:18:45,800 to automatically adjust resource allocation based on what's 518 00:18:45,800 --> 00:18:47,160 happening in real time. 519 00:18:47,160 --> 00:18:47,760 OK. 520 00:18:47,760 --> 00:18:49,720 Kind of like Kubernetes auto scaling, 521 00:18:49,720 --> 00:18:53,000 but tailored for how developer environments actually behave. 522 00:18:53,000 --> 00:18:53,500 Right. 523 00:18:53,500 --> 00:18:56,040 So even though it sounds complex under the hood, 524 00:18:56,040 --> 00:18:57,600 what does this mean for developers 525 00:18:57,600 --> 00:18:58,960 who are using Gitpod Flex? 526 00:18:58,960 --> 00:19:00,440 What's the experience like? 527 00:19:00,440 --> 00:19:03,000 Well, one big plus is the seamless integration 528 00:19:03,000 --> 00:19:04,640 with dev containers. 529 00:19:04,640 --> 00:19:07,240 These are like pre-configured, self-contained developer 530 00:19:07,240 --> 00:19:10,480 environments, all the tools, libraries, dependencies, 531 00:19:10,480 --> 00:19:12,640 all bundled up for specific projects. 532 00:19:12,640 --> 00:19:15,200 So it's like a recipe for your perfect developer environment, 533 00:19:15,200 --> 00:19:16,120 just had code. 534 00:19:16,120 --> 00:19:17,160 Exactly. 535 00:19:17,160 --> 00:19:20,280 And Gitpod Flex makes it super easy to spin those up. 536 00:19:20,280 --> 00:19:22,620 They've also really doubled down on self-hosting. 537 00:19:22,620 --> 00:19:26,640 So remember, Gitpod used to offer a cloud and a self-managed 538 00:19:26,640 --> 00:19:27,400 version. 539 00:19:27,400 --> 00:19:29,440 And they said that the self-managed version, which 540 00:19:29,440 --> 00:19:32,600 was heavily Kubernetes-based, was a real pain to support. 541 00:19:32,600 --> 00:19:33,240 Right. 542 00:19:33,240 --> 00:19:36,080 Well, with Gitpod Flex, self-hosting is super easy. 543 00:19:36,080 --> 00:19:38,920 You can have it up and running in less than three minutes 544 00:19:38,920 --> 00:19:40,560 on pretty much any infrastructure. 545 00:19:40,560 --> 00:19:42,240 Three minutes? 546 00:19:42,240 --> 00:19:44,720 That's faster than it takes to order a pizza. 547 00:19:44,720 --> 00:19:45,360 It really is. 548 00:19:45,360 --> 00:19:47,920 And that opens up a lot of possibilities. 549 00:19:47,920 --> 00:19:50,440 Companies can now run their developer environments closer 550 00:19:50,440 --> 00:19:53,160 to their data, even on premises if they need to. 551 00:19:53,160 --> 00:19:56,000 Gives them more control over security, compliance, all 552 00:19:56,000 --> 00:19:56,760 that stuff. 553 00:19:56,760 --> 00:20:00,240 So flexibility and control are really key here. 554 00:20:00,240 --> 00:20:02,420 But what about performance? 555 00:20:02,420 --> 00:20:05,780 All those Kubernetes headaches, the CPU throttling, storage 556 00:20:05,780 --> 00:20:07,160 bottlenecks, all those things. 557 00:20:07,160 --> 00:20:09,660 Have they managed to get rid of those with Gitpod Flex? 558 00:20:09,660 --> 00:20:11,520 And that was one of their main goals. 559 00:20:11,520 --> 00:20:13,240 And from what they've said, it seems 560 00:20:13,240 --> 00:20:15,360 like they made a lot of progress. 561 00:20:15,360 --> 00:20:18,960 By moving away from that shared kernel model of containers 562 00:20:18,960 --> 00:20:23,120 and giving each environment its own dedicated resources, 563 00:20:23,120 --> 00:20:25,400 they've managed to smooth out a lot of those performance 564 00:20:25,400 --> 00:20:25,880 hiccups. 565 00:20:25,880 --> 00:20:28,200 So each environment gets its own slice of the pie. 566 00:20:28,200 --> 00:20:29,000 Exactly. 567 00:20:29,000 --> 00:20:31,240 Now what about that memory snapshot feature 568 00:20:31,240 --> 00:20:33,480 that they were so keen on with micro VMs? 569 00:20:33,480 --> 00:20:35,360 Did that make it into Gitpod Flex? 570 00:20:35,360 --> 00:20:37,800 So they haven't specifically said, 571 00:20:37,800 --> 00:20:40,920 but knowing how much they care about making developer 572 00:20:40,920 --> 00:20:43,080 environments stateful friendly, I 573 00:20:43,080 --> 00:20:45,160 wouldn't be surprised if they're working on it. 574 00:20:45,160 --> 00:20:45,800 Fingers crossed. 575 00:20:45,800 --> 00:20:48,000 Right, because it fits perfectly with their vision. 576 00:20:48,000 --> 00:20:49,960 OK, let's talk about security. 577 00:20:49,960 --> 00:20:52,200 We know they put a ton of effort into securing 578 00:20:52,200 --> 00:20:53,480 their Kubernetes setup. 579 00:20:53,480 --> 00:20:54,040 Oh, yeah. 580 00:20:54,040 --> 00:20:56,320 But it always felt like they were swimming upstream. 581 00:20:56,320 --> 00:20:57,040 Right. 582 00:20:57,040 --> 00:20:58,680 What's the story with Gitpod Flex? 583 00:20:58,680 --> 00:21:02,280 Did they manage to make it simpler but also more secure? 584 00:21:02,280 --> 00:21:05,200 Well, security is kind of baked into Gitpod Flex 585 00:21:05,200 --> 00:21:06,280 from the very beginning. 586 00:21:06,280 --> 00:21:08,880 They went all in on a zero trust architecture. 587 00:21:08,880 --> 00:21:12,440 That basically means no user, no device, no request 588 00:21:12,440 --> 00:21:14,440 is automatically trusted. 589 00:21:14,440 --> 00:21:17,320 Everything has to be authenticated, authorized, 590 00:21:17,320 --> 00:21:18,360 every step of the way. 591 00:21:18,360 --> 00:21:19,600 Fort Knox for code. 592 00:21:19,600 --> 00:21:20,840 Exactly. 593 00:21:20,840 --> 00:21:23,760 This approach kind of avoids a lot of the vulnerabilities 594 00:21:23,760 --> 00:21:25,360 they were dealing with in Kubernetes. 595 00:21:25,360 --> 00:21:25,860 Right. 596 00:21:25,860 --> 00:21:29,200 No more messing around with user namespaces or containers 597 00:21:29,200 --> 00:21:30,720 breaking out of their isolation. 598 00:21:30,720 --> 00:21:33,560 So more secure A and D, easier to manage. 599 00:21:33,560 --> 00:21:34,320 That's the goal. 600 00:21:34,320 --> 00:21:35,120 That's the dream. 601 00:21:35,120 --> 00:21:35,840 Right. 602 00:21:35,840 --> 00:21:38,200 And they've also made it much easier for companies 603 00:21:38,200 --> 00:21:42,160 to apply their own security policies within Gitpod Flex. 604 00:21:42,160 --> 00:21:44,880 So they can hook it into their existing identity management 605 00:21:44,880 --> 00:21:45,720 systems. 606 00:21:45,720 --> 00:21:48,200 They can really control who has access to what. 607 00:21:48,200 --> 00:21:49,760 And they can monitor everything. 608 00:21:49,760 --> 00:21:51,520 So they really put security front and center 609 00:21:51,520 --> 00:21:52,240 from the beginning. 610 00:21:52,240 --> 00:21:52,760 They did. 611 00:21:52,760 --> 00:21:55,460 And it just shows how Gitpod Flex is really built for this. 612 00:21:55,460 --> 00:21:57,000 It's not just about running code. 613 00:21:57,000 --> 00:21:59,880 It's about creating this space where developers 614 00:21:59,880 --> 00:22:02,360 can be productive, collaborative, and secure. 615 00:22:02,360 --> 00:22:04,240 So after this whole journey, what's 616 00:22:04,240 --> 00:22:05,680 the big takeaway here? 617 00:22:05,680 --> 00:22:08,600 What can we learn from their experience? 618 00:22:08,600 --> 00:22:10,200 Welcome back to the Deep Dive. 619 00:22:10,200 --> 00:22:12,720 We've been talking all about Gitpod's journey, 620 00:22:12,720 --> 00:22:16,680 from Kubernetes lovers to creating Gitpod Flex, 621 00:22:16,680 --> 00:22:18,040 their own custom system. 622 00:22:18,040 --> 00:22:21,040 Yeah, it shows that sometimes the most popular solution 623 00:22:21,040 --> 00:22:22,840 isn't always the right one. 624 00:22:22,840 --> 00:22:25,000 They realized Kubernetes just wasn't the right tool 625 00:22:25,000 --> 00:22:25,920 for what they needed. 626 00:22:25,920 --> 00:22:29,240 And they had the guts to go and do their own thing. 627 00:22:29,240 --> 00:22:30,440 Exactly. 628 00:22:30,440 --> 00:22:31,960 So in this final part, let's kind 629 00:22:31,960 --> 00:22:35,880 of dig into what makes Gitpod Flex tick. 630 00:22:35,880 --> 00:22:37,960 What were some of the architectural decisions 631 00:22:37,960 --> 00:22:38,480 they made? 632 00:22:38,480 --> 00:22:41,240 What are the features that really set it apart? 633 00:22:41,240 --> 00:22:43,160 So one of the first things to understand 634 00:22:43,160 --> 00:22:46,120 is that it's not a total rejection of Kubernetes. 635 00:22:46,120 --> 00:22:47,920 They kept some of the core principles. 636 00:22:47,920 --> 00:22:50,320 For example, declarative APIs are still 637 00:22:50,320 --> 00:22:52,440 a big part of Gitpod Flex. 638 00:22:52,440 --> 00:22:54,000 Remember all that YAML configuration 639 00:22:54,000 --> 00:22:55,640 we talked about in Kubernetes? 640 00:22:55,640 --> 00:22:58,440 That approach is still there, but it's a lot more streamlined, 641 00:22:58,440 --> 00:22:59,440 more focused. 642 00:22:59,440 --> 00:23:02,800 So you're still defining your infrastructure as code 643 00:23:02,800 --> 00:23:04,720 without all that Kubernetes baggage. 644 00:23:04,720 --> 00:23:05,560 Exactly. 645 00:23:05,560 --> 00:23:07,840 And they also kept the use of control theory 646 00:23:07,840 --> 00:23:09,440 for resource management. 647 00:23:09,440 --> 00:23:11,960 Basically, this means that they're using these smart 648 00:23:11,960 --> 00:23:16,000 algorithms to automatically adjust resource allocation 649 00:23:16,000 --> 00:23:18,600 based on what's needed in real time, 650 00:23:18,600 --> 00:23:20,800 kind of like Kubernetes auto-scaling, but again, 651 00:23:20,800 --> 00:23:22,640 tailored for developer environments. 652 00:23:22,640 --> 00:23:23,360 Right. 653 00:23:23,360 --> 00:23:27,040 So even though it might sound kind of complex under the hood, 654 00:23:27,040 --> 00:23:30,320 what does it mean for developers who are actually using Gitpod 655 00:23:30,320 --> 00:23:31,040 Flex? 656 00:23:31,040 --> 00:23:33,800 Well, one big benefit is the seamless integration 657 00:23:33,800 --> 00:23:34,920 with dev containers. 658 00:23:34,920 --> 00:23:37,760 These are basically like pre-configured, self-contained 659 00:23:37,760 --> 00:23:38,760 developer environments. 660 00:23:38,760 --> 00:23:41,400 You've got all your tools, libraries, dependencies, 661 00:23:41,400 --> 00:23:44,120 all bundled together for specific projects. 662 00:23:44,120 --> 00:23:46,880 So it's like a recipe for your perfect developer environment. 663 00:23:46,880 --> 00:23:47,820 You just add code. 664 00:23:47,820 --> 00:23:48,800 Exactly. 665 00:23:48,800 --> 00:23:51,520 And Gitpod Flex makes it super easy to just spin those up. 666 00:23:51,520 --> 00:23:53,080 And remember how they were struggling 667 00:23:53,080 --> 00:23:55,640 with self-hosting their platform on Kubernetes? 668 00:23:55,640 --> 00:23:56,200 Yeah. 669 00:23:56,200 --> 00:23:59,800 With Gitpod Flex, self-hosting is incredibly easy. 670 00:23:59,800 --> 00:24:01,840 You can have it up and running in under three minutes 671 00:24:01,840 --> 00:24:03,400 on pretty much any infrastructure. 672 00:24:03,400 --> 00:24:04,480 Three minutes. 673 00:24:04,480 --> 00:24:06,400 That's faster than making a cup of coffee. 674 00:24:06,400 --> 00:24:07,360 Pretty much. 675 00:24:07,360 --> 00:24:09,960 And that opens up a lot of possibilities. 676 00:24:09,960 --> 00:24:11,960 Companies can run their developer environments 677 00:24:11,960 --> 00:24:14,880 closer to their data, even on premises, if they need to. 678 00:24:14,880 --> 00:24:17,740 Gives them more control over security, compliance, 679 00:24:17,740 --> 00:24:19,280 all that good stuff. 680 00:24:19,280 --> 00:24:21,840 So flexibility and control are key here. 681 00:24:21,840 --> 00:24:23,400 What about performance? 682 00:24:23,400 --> 00:24:25,880 They had all those struggles with Kubernetes, CPU 683 00:24:25,880 --> 00:24:29,320 throttling, storage bottlenecks, all those things. 684 00:24:29,320 --> 00:24:33,080 Did they manage to fix those with Gitpod Flex? 685 00:24:33,080 --> 00:24:35,000 That was definitely a top priority for them. 686 00:24:35,000 --> 00:24:37,440 And it seems like they've made some major progress. 687 00:24:37,440 --> 00:24:41,120 By ditching the whole shared kernel model of containers 688 00:24:41,120 --> 00:24:44,240 and giving each environment its own dedicated resources, 689 00:24:44,240 --> 00:24:47,040 they've managed to smooth out a lot of those performance issues. 690 00:24:47,040 --> 00:24:48,540 So no more fighting over resources. 691 00:24:48,540 --> 00:24:49,040 Right. 692 00:24:49,040 --> 00:24:51,840 Every environment gets its own slice of the pie. 693 00:24:51,840 --> 00:24:54,240 Now, what about that memory snapshot feature 694 00:24:54,240 --> 00:24:57,480 that they were so excited about during the micro VM phase? 695 00:24:57,480 --> 00:24:59,640 You know, the one where you could just pause and resume 696 00:24:59,640 --> 00:25:01,520 your entire environment in a snap? 697 00:25:01,520 --> 00:25:03,440 Did that make it into Gitpod Flex? 698 00:25:03,440 --> 00:25:05,040 They haven't explicitly said, but I 699 00:25:05,040 --> 00:25:07,940 wouldn't be surprised if they found a way to make it work. 700 00:25:07,940 --> 00:25:10,360 It really aligns with their goal of making a system that's 701 00:25:10,360 --> 00:25:12,160 truly developer friendly. 702 00:25:12,160 --> 00:25:13,200 Fingers crossed. 703 00:25:13,200 --> 00:25:14,800 OK, let's talk about security. 704 00:25:14,800 --> 00:25:16,440 We know that they put a ton of effort 705 00:25:16,440 --> 00:25:19,040 into securing their Kubernetes setup, 706 00:25:19,040 --> 00:25:20,660 but it felt like they were constantly 707 00:25:20,660 --> 00:25:22,520 fighting an uphill battle. 708 00:25:22,520 --> 00:25:25,520 What's the security story with Gitpod Flex? 709 00:25:25,520 --> 00:25:28,880 Well, security is a core part of Gitpod Flex. 710 00:25:28,880 --> 00:25:33,400 They decided to go all in on a zero trust architecture, which 711 00:25:33,400 --> 00:25:36,080 means that nothing is automatically trusted. 712 00:25:36,080 --> 00:25:38,780 Every user, every device, every request 713 00:25:38,780 --> 00:25:41,360 has to be authenticated and authorized 714 00:25:41,360 --> 00:25:42,440 every step of the way. 715 00:25:42,440 --> 00:25:43,840 So it's like Fort Knox for your code. 716 00:25:43,840 --> 00:25:44,640 Exactly. 717 00:25:44,640 --> 00:25:46,240 And this approach kind of eliminates 718 00:25:46,240 --> 00:25:48,440 a lot of those vulnerabilities that they were always 719 00:25:48,440 --> 00:25:49,960 struggling with in Kubernetes. 720 00:25:49,960 --> 00:25:53,840 No more complex user namespaces or containers breaking out 721 00:25:53,840 --> 00:25:55,000 of their isolation. 722 00:25:55,000 --> 00:25:57,600 So more secure and easier to manage. 723 00:25:57,600 --> 00:25:59,160 It sounds almost too good to be true. 724 00:25:59,160 --> 00:26:00,840 Well, it shows what's possible when 725 00:26:00,840 --> 00:26:03,480 you build a system that's designed for these requirements 726 00:26:03,480 --> 00:26:05,200 from the ground up. 727 00:26:05,200 --> 00:26:07,360 They've also made it a lot easier for companies 728 00:26:07,360 --> 00:26:11,800 to integrate their own security policies into Gitpod Flex, 729 00:26:11,800 --> 00:26:14,000 connecting it with their existing identity management 730 00:26:14,000 --> 00:26:17,160 systems, setting fine grained access controls, 731 00:26:17,160 --> 00:26:19,280 monitoring everything in real time. 732 00:26:19,280 --> 00:26:20,800 So they're giving companies the tools 733 00:26:20,800 --> 00:26:22,960 they need to make sure that everything's locked down. 734 00:26:22,960 --> 00:26:23,880 Exactly. 735 00:26:23,880 --> 00:26:27,400 And this really highlights what Gitpod Flex is all about. 736 00:26:27,400 --> 00:26:29,120 It's not just a platform to run code. 737 00:26:29,120 --> 00:26:32,000 It's an environment that's built to support developers. 738 00:26:32,000 --> 00:26:33,680 A place where they can be productive, 739 00:26:33,680 --> 00:26:37,760 they can be collaborative, and most importantly, secure. 740 00:26:37,760 --> 00:26:40,880 So after this whole journey, what's the big takeaway? 741 00:26:40,880 --> 00:26:42,920 What can we learn from their experience? 742 00:26:42,920 --> 00:26:44,640 I think it's a reminder that sometimes you 743 00:26:44,640 --> 00:26:45,960 have to go against the grain. 744 00:26:45,960 --> 00:26:49,400 The most popular solution isn't always the best, right? 745 00:26:49,400 --> 00:26:52,600 It's about understanding what you need, what your goals are, 746 00:26:52,600 --> 00:26:54,720 and then finding the tools that fit, 747 00:26:54,720 --> 00:26:56,800 even if it means building something yourself. 748 00:26:56,800 --> 00:26:59,240 It's a story about challenging assumptions 749 00:26:59,240 --> 00:27:02,040 and being willing to experiment and having the courage 750 00:27:02,040 --> 00:27:05,200 to try something new when the old way just isn't working. 751 00:27:05,200 --> 00:27:05,840 It really is. 752 00:27:05,840 --> 00:27:09,840 And it makes you wonder, in our own work, 753 00:27:09,840 --> 00:27:13,160 are we forcing tools into roles they weren't meant for? 754 00:27:13,160 --> 00:27:14,580 Are there other systems out there 755 00:27:14,580 --> 00:27:17,000 that could benefit from a similar rethink, 756 00:27:17,000 --> 00:27:18,560 like what Gitpod did? 757 00:27:18,560 --> 00:27:20,920 That's a great question for all of us to think about. 758 00:27:20,920 --> 00:27:23,220 This has been a really interesting deep dive exploring 759 00:27:23,220 --> 00:27:25,720 developer environments and how Gitpod 760 00:27:25,720 --> 00:27:28,400 built this innovative solution. 761 00:27:28,400 --> 00:27:31,240 In this world of technology that's always changing, 762 00:27:31,240 --> 00:27:32,960 being willing to adapt, to experiment, 763 00:27:32,960 --> 00:27:34,520 to break away from the norm, well, 764 00:27:34,520 --> 00:27:36,600 that can lead to some amazing breakthroughs. 765 00:27:36,600 --> 00:27:37,800 It's been a great discussion. 766 00:27:37,800 --> 00:27:40,480 Thanks for joining us on the deep dive.