Today's Deep-Dive: gitpod - we-are-leaving-kubernetes
Ep. 27

Today's Deep-Dive: gitpod - we-are-leaving-kubernetes

Episode description

Gitpod recently published a blog post detailing their six-year journey of trying to use Kubernetes for developer environments, only to eventually conclude that it’s not the best fit for the job. While Kubernetes is excellent for running production applications, they found that developer environments present unique challenges that clash with Kubernetes’ design.

https://www.gitpod.io/blog/we-are-leaving-kubernetes

Download transcript (.srt)
0:00

Okay, so get this. Imagine you're so deep into Kubernetes.

0:05

Like you're giving a KubeCon talk about your setup.

0:09

You've got to handling millions of users.

0:11

But then you get it.

0:13

That's what we're diving into today.

0:15

Why Gitpod decided Kubernetes,

0:18

specifically for their developer environments,

0:21

wasn't working.

0:22

Yeah, it's really interesting

0:24

because they're not saying Kubernetes is bad, right?

0:26

They're saying it's not the right tool

0:28

when it comes to developer environment.

0:30

Exactly, we're looking at their blog post

0:31

from October 31st, 2024.

0:34

It's this like six year saga of them trying to make it work,

0:37

hitting all these roadblocks.

0:39

They came up with some pretty interesting workarounds.

0:41

Oh yeah.

0:42

You almost feel bad for them,

0:43

but you learn a lot whether you're deep into Kubernetes

0:46

or just curious about developer tools.

0:48

Yeah, and it just shows that even teams

0:49

with tons of experience, even huge teams,

0:53

sometimes have to take a step back and look at their tools.

0:56

Right.

0:56

You've got to pick the right tool for the job.

0:58

Right.

0:59

It doesn't have to be the popular one.

1:00

Yeah, okay.

1:01

So Gitpod's main argument is running applications

1:04

in production, that's where Kubernetes shines.

1:05

Yeah.

1:06

But developer environments, that's a whole different beast.

1:10

Totally.

1:10

And the blog post breaks down why they highlight

1:13

these four characteristics of developer environments.

1:16

The first being that they're super stateful

1:18

and interactive.

1:19

So you've got gigabytes of source code,

1:21

you've got build caches, you've got containers running.

1:24

All that is constantly changing.

1:25

It's not like a stateless app.

1:28

Your developer environment is basically an extension of you.

1:31

Yeah, it's like the difference between a pristine server room

1:34

and your desk.

1:36

Your desk has projects everywhere and coffee mugs.

1:38

Exactly.

1:39

And that mess is really valuable to developers.

1:42

So you can imagine it's a huge pain

1:45

if they lose changes or get interrupted.

1:47

And that leads us to the second characteristic,

1:50

unpredictable resource usage.

1:52

So you might be coding along, and suddenly, bam,

1:55

you need tons of CPU for compilation.

1:58

Or memory usage might spike.

2:00

Yeah, and Kubernetes isn't really

2:02

known for loving surprises.

2:03

Not really, no.

2:04

Gitpod talks about all the struggles

2:06

they had with CPU throttling.

2:08

Your terminal's lagging because your IDE is fighting

2:10

some random process for resources.

2:12

They did all kinds of stuff.

2:14

Custom controllers, messing with process priorities,

2:17

even tweaking Cgroups V2.

2:19

Yeah, and for those who don't know,

2:21

Cgroups V2 is how the Linux kernel organizes processes

2:26

into these hierarchical groups.

2:29

It's for controlling and monitoring

2:31

things like CPU and memory and disk I.O.

2:34

It's very fine-grained control, but it's complex.

2:37

Yeah, it sounds like they went really deep.

2:39

Deep down the rabbit hole.

2:41

And remember, this is all happening

2:42

inside a single container because that's

2:44

the way Kubernetes works.

2:45

So all these processes crammed together,

2:47

it just makes resource usage a total guessing game.

2:51

Right.

2:52

OK, so then there's memory management.

2:54

Apparently, until SwapSpace was available in Kubernetes

2:56

version 1.22, overbooking memory was a pretty big risk.

3:01

Like, you could end up killing a central processor,

3:03

you imagine.

3:04

Developer rage.

3:05

Yeah.

3:06

I mean, this just shows that even mature technologies

3:08

like Kubernetes can have limitations, especially

3:11

for specific use cases, right?

3:13

It's really important to evaluate

3:15

whether a tool's strengths really fit what you need it for.

3:20

They must've been, I mean, can you imagine?

3:21

Pulling their hair out.

3:22

Yeah.

3:23

Yeah.

3:23

OK, so then we have storage performance.

3:26

Gitpod really hammers on about how much this matters,

3:29

not just for how fast your environment starts up,

3:32

but your whole experience inside the environment.

3:35

Yeah, because if you're waiting for files to load

3:37

or for builds to finish, it just kills your flow.

3:40

Totally.

3:41

And they tried everything.

3:42

SSD, rate zero for speed, a little risky.

3:45

Then block storage for availability,

3:47

but they hit a wall with persistent volume claims, or PVCs.

3:52

For those who aren't deep into Kubernetes,

3:53

explain why PVCs were such a pain.

3:55

Sure.

3:56

So PVCs, it's like this abstraction layer

3:58

that lets you use storage.

3:59

You don't have to worry about the underlying hardware,

4:02

so it's flexible.

4:03

But in practice, when these PVCs would attach or detach,

4:07

it was unpredictable, and that messed with their attempts

4:10

to make workspace startups super fast.

4:12

They also ran into some reliability issues,

4:15

especially on Google Cloud.

4:16

So you're a developer, you're ready to code,

4:18

and your whole environment just crashes.

4:19

Yeah.

4:20

Not a good look.

4:21

Talk about a buzzkill.

4:22

And then there's backing up and restoring these environments.

4:25

They can get huge, right?

4:26

Right.

4:27

So moving them around became this balancing act

4:29

of I-O, network bandwidth, and CPU.

4:32

Wow.

4:33

They even had to use sick group-based I-O limiters

4:36

to prevent one workspace from hogging all the resources

4:40

and then starving the others.

4:42

It's crazy how these things that sound simple get so complex.

4:45

Totally.

4:46

Speaking of complex, another challenge?

4:48

Autoscaling and startup time.

4:49

Yeah.

4:50

They were obsessed with minimizing

4:52

that initial wait time.

4:53

Of course, yeah.

4:54

But that clashed with their desire

4:56

to use their machines as efficiently as possible.

4:59

Yeah, I mean Kubernetes by design

5:02

has this inherent lower limit on startup time, right?

5:05

Right.

5:05

Because of all the steps involved,

5:07

moving content around, spinning up containers.

5:10

So they started off thinking, let's just

5:12

run multiple workspaces on one node

5:14

to leverage shared caches.

5:17

But that didn't really work out.

5:19

Didn't quite work out, no.

5:20

So they tried some creative solutions.

5:22

They tried something they called ghost workspaces.

5:25

Ghost workspaces.

5:26

Yeah.

5:26

So these were preemptible pods that would just

5:28

sit there to hold space so they could scale in advance.

5:32

They're like phantom developers taking up space.

5:34

That's a good way to put it.

5:35

Clever, but too slow and unreliable.

5:38

Then they tried ballast pods.

5:41

So these were entire nodes filled with dummy pods

5:44

just to ensure that they had enough capacity.

5:46

Kind of like renting out an empty apartment building

5:48

just in case you might need it later.

5:50

Pretty much not efficient.

5:52

Finally, they landed on cluster autoscaler plugins,

5:55

which is a much more elegant solution.

5:57

But it took a while to get there.

5:59

They even implemented proportional autoscaling,

6:02

which basically controls the rate of scale up.

6:05

It's based on how quickly devs are starting new environments.

6:08

So if there's a sudden rush, they

6:09

can add capacity quickly without overshooting.

6:12

It's all about finding that balance

6:14

between being responsive and making

6:15

the most of your resources.

6:17

My brain's hurting.

6:18

Anyone else?

6:18

OK.

6:19

Image polls, another headache.

6:21

Workspace container images can be huge.

6:24

We're talking like 10 gigabytes or more.

6:26

And that impacts performance when you have to download and extract

6:29

that much data for every workspace.

6:31

Yeah, it's like downloading the entire Library of Congress

6:34

every time you want to read a book.

6:35

Right.

6:35

So they tried pre-pulling images with demon sets, which

6:38

are basically agents on every node making sure the images are ready.

6:41

Then they tried building their own custom images

6:43

to maximize layer reuse, even baking images directly

6:47

into the node disk image.

6:49

Yeah, each of those came with their own trade-offs, right?

6:51

Increased complexity, higher costs, limits

6:54

on what images devs could use.

6:55

Again, another example of how something seemingly simple

6:58

can get really complicated at scale.

7:00

Yeah, and they even built their own registry facade.

7:03

They integrated it with IPFS, the Interplanetary File System,

7:07

that decentralized way to store and share files.

7:10

They were so proud of it.

7:11

They gave a whole KubeCon talk about it.

7:14

But in the end, the best solution

7:16

was just encouraging everyone to use similar base images,

7:19

making caching a lot more effective.

7:21

Sometimes the simplest answer really is the best one.

7:24

But getting there takes some effort.

7:26

OK, buckle up.

7:27

We're going into the world of networking in Kubernetes.

7:30

And this is where it gets a little technical.

7:32

This is where the conflict between what Kubernetes assumes

7:36

and what developer environments need becomes really clear.

7:40

Yeah, you've got the issue of access control.

7:43

You want each environment to be its own little island.

7:45

So walled gardens for every developer.

7:47

Exactly.

7:48

So no peeking at your neighbor's code.

7:50

And you need to control who can access what.

7:53

Kubernetes has these things called network policies.

7:56

They're for defining fine-grained rules

7:58

about what traffic can flow within the cluster.

8:01

Sounds great, but even those cause headaches for Gitpod.

8:04

Of course they did.

8:06

So what was their initial approach?

8:07

So they started using Kubernetes services and an ingress proxy.

8:12

It's to manage access to individual environment ports.

8:17

Think your IDE or services running within the workspace.

8:20

But as they scaled, this approach became unreliable.

8:24

Because more users equals more complexity equals more things

8:27

that can go wrong.

8:27

Exactly.

8:28

With thousands of environments running simultaneously,

8:31

name resolution started failing.

8:33

Sometimes, it even crashed entire workspaces.

8:36

Even established Kubernetes features

8:38

have their limits when you push them to the extreme.

8:41

It's a good reminder that scaling isn't just

8:42

about making things bigger.

8:44

No.

8:44

It's about making sure they can handle all the complexity that

8:47

comes with size.

8:49

OK, so resource constraints, another area

8:51

where Gitpod face challenges, network bandwidth sharing.

8:55

It's like having multiple apartments sharing

8:57

the same internet connection, and everyone

8:58

wants to stream movies at the same time.

9:01

Yeah, just like CPU and memory, you've

9:03

got multiple workspaces on a node, all competing

9:05

for that same network pipe.

9:07

Some container network interfaces, or CNIs,

9:10

have features for network shaping,

9:12

but that adds even more complexity.

9:14

And then there's the question of fairness.

9:16

How do you divide up that bandwidth

9:18

so everyone gets a decent slice?

9:20

It's a never-ending battle.

9:23

Balancing performance, security, making

9:26

the most of your resources.

9:27

And that brings us to, I think, one of the hairiest topics.

9:32

Security.

9:33

Specifically in the context of developer environments.

9:36

How do you give developers the freedom

9:37

they need without creating a security nightmare?

9:41

This is where the tension between flexibility and control

9:43

really comes in.

9:44

It gets complicated.

9:45

So they start by highlighting this naive approach.

9:48

Just give everyone root access to their containers.

9:50

Seems simple, right?

9:51

Yeah, just give everyone the keys to the kingdom.

9:53

What could go wrong?

9:54

Well, aside from being a security disaster waiting

9:56

to happen, giving users root in their containers

10:00

basically gives them root on the node itself.

10:02

That means they can potentially swoop around

10:04

in other environments that are running on the same node.

10:07

They could mess with the infrastructure.

10:09

Yeah, not good.

10:09

Not exactly what you want.

10:11

Not stable.

10:12

So they needed something more sophisticated.

10:14

Enter user namespaces.

10:17

So this is a Linux kernel feature

10:19

that lets you map user and group IDs inside containers.

10:23

So you can basically make a user feel

10:26

like they have root privileges within their environment,

10:29

but without actually giving them control over the host system.

10:32

OK, that sounds clever, but I bet it wasn't easy to set up.

10:34

You bet it wasn't.

10:36

Kubernetes did eventually add support for user namespaces

10:39

in version 1.25, but Gitpod had already

10:42

started their own implementation with version 1.22.

10:45

And let me tell you, their solution

10:47

involves some serious technical gymnastics.

10:50

Give us the highlights.

10:51

What kind of gymnastics?

10:52

Well, for starters, they had to implement something

10:54

called file system UID shifting.

10:57

This ensures that files that are created inside the container

11:00

are mapped correctly to user IDs on the host system.

11:04

So it prevents any security bypasses.

11:07

They tried a bunch of different approaches,

11:09

like shifts, fuse overlays, even id mapped mounts.

11:13

Each of those had their own quirks

11:14

in terms of performance and compatibility.

11:16

It sounds like they were really pushing the limits of what

11:19

Kubernetes could do, trying to fit a square peg

11:21

into a round hole.

11:22

Exactly.

11:23

And then there was a challenge of mounting

11:25

what they call a masked proc file system.

11:28

So usually when a container starts up, it mounts proc.

11:32

This gives it access to information

11:33

about the host system.

11:35

But for Gitpod's security model, proc

11:37

had to be hidden to prevent vulnerabilities.

11:40

So they had to create this custom masked proc

11:43

and then carefully move it into the right mount

11:46

namespace for each container.

11:48

And they did this using seccomp notify,

11:50

which is like a super low level way to intercept and modify

11:53

system calls.

11:54

Pretty hardcore stuff.

11:55

Wow, it's like they're doing brain surgery on Kubernetes

11:58

to make it work.

11:59

Pretty much.

12:00

But wait, there's more.

12:02

They also needed to add support for FUSE file

12:05

system in user space.

12:06

Yeah.

12:07

A lot of developer tools rely on that.

12:09

So this involved messing with the container's EBPF device

12:13

filter, another low level tweak.

12:15

And then there's the issue of network capabilities.

12:17

Right.

12:18

So as root, you have these powerful capabilities

12:21

like KAPNA TADBEN and KAPNA TRAW.

12:23

They let you control networking.

12:25

Right.

12:25

So giving those to a container would totally

12:27

break their security model.

12:29

Yeah.

12:29

So how did they get around that?

12:31

Well, they ended up creating another network namespace,

12:33

but this time inside the Kubernetes container.

12:37

Initially, they used sloop fornets.

12:39

And then they switched to veth pairs and custom NF tables

12:42

rules.

12:43

It's like they were building a secure little networking

12:45

sandbox within another sandbox.

12:48

It's amazing how much work they put into making this all work.

12:51

It really is.

12:51

But all this complexity comes with a price, right?

12:54

You've got performance hits, especially

12:56

with the earlier solutions.

12:58

You've got compatibility issues with certain tools.

13:00

And then the never-ending struggle

13:02

to keep up with Kubernetes updates.

13:05

So you can see why they started looking for alternatives.

13:08

And that's where their exploration of micro VMs comes

13:10

in.

13:10

But we're going to save that for part two.

13:12

Stay tuned, folks.

13:13

Things get really interesting.

13:15

Welcome back.

13:15

If you're just tuning in, we're talking

13:17

about Gitpod's journey, how they went from Kubernetes fans

13:20

to creating their own system for developer environments.

13:24

Yeah, it got to the point where they were willing to try

13:26

anything, even something completely

13:28

different from Kubernetes.

13:29

Right.

13:30

So that's where micro VMs come in.

13:31

Now, for those of us who aren't living

13:34

in the infrastructure world, can you give us a micro VMs 101?

13:38

What are they?

13:38

And why was Gitpod so interested?

13:40

Sure.

13:41

So think of micro VMs like tiny specialized virtual machines,

13:46

right?

13:46

Strip down to just the essentials.

13:49

They boot up super fast, small footprint,

13:52

and security is kind of baked into their design.

13:54

Gitpod was looking at technologies

13:56

like Firecracker, Cloud Hypervisor, QEMU.

14:00

So what was it about micro VMs that they were so excited about?

14:03

What problems were they hoping to solve that Kubernetes just

14:06

wasn't cutting it for?

14:07

Well, first and foremost, better resource isolation.

14:11

Unlike containers, which share the host's kernel, micro VMs,

14:15

they get their own dedicated kernel.

14:17

So that means less chance of one environment interfering

14:20

with another, more predictable performance overall.

14:23

So no more laggy terminal, because your IDE is fighting

14:28

some compiler process for CPU.

14:31

Exactly.

14:31

Another big plus, memory snapshots, near instant resume.

14:36

With something like Firecracker, you

14:38

can take a snapshot of the entire VM's memory state,

14:40

and that includes everything that's running.

14:42

You can restore it in an instant.

14:44

Wait, so you're saying you could literally

14:46

pause your whole developer environment, mid-debug

14:48

session, coffee break, whatever, and come back to it

14:50

exactly as you left it.

14:51

That's the power of micro VMs.

14:53

Imagine the productivity boost, especially

14:55

for large projects, complex projects,

14:58

where restarting everything can take forever.

15:00

Yeah, that's a feature I think a lot of developers would love.

15:02

For sure.

15:04

But I'm guessing there were some downsides, right?

15:06

Otherwise, Gitpod would have just switched over

15:07

and called it a day.

15:08

Of course, no technology is perfect.

15:10

One challenge was overhead.

15:13

Even though micro VMs are lightweight

15:15

compared to like traditional VMs,

15:17

they still add more overhead than containers.

15:19

And that impacts performance, resource utilization,

15:22

which for a platform like Gitpod is a huge deal.

15:25

Right, because they're running thousands, if not millions,

15:27

of these environments.

15:28

Exactly.

15:29

Every little bit of efficiency matters.

15:31

Another hurdle was image conversion.

15:33

Most developer tools, they come packaged as container images

15:37

using the OCI standard.

15:38

Kubernetes loves that.

15:40

But to use those images in a micro VM,

15:42

you have to convert them to a format

15:44

that the micro VM understands, that adds complexity

15:47

and slows down startup.

15:49

Right, so it's not just as simple

15:50

as swapping out Kubernetes and plugging in micro VMs.

15:52

No, it's a whole translation process.

15:55

And then there are some limitations

15:56

that are specific to micro VM technologies themselves.

16:00

For example, Firecracker, which is known for its speed

16:03

and its snapshotting.

16:04

Well, at the time, it didn't support

16:05

GPUs, which is a deal breaker if you're

16:08

working on graphics intensive applications.

16:11

OK, so even cutting edge technology

16:14

has its limitations.

16:15

What else did they run into?

16:17

Well, data movement became a much bigger problem.

16:19

With micro VMs, you're dealing with whole VM images,

16:23

including those memory snapshots, which

16:24

can be pretty large.

16:26

Moving them around, whether it's for backups or scheduling,

16:29

gets more complex and it takes more time.

16:31

And I bet storage, which was already a pain point,

16:33

became even more of a headache.

16:35

You got it.

16:36

They tried attaching EBS volumes,

16:39

that's elastic block storage, from AWS to their micro VMs,

16:43

thinking that they could improve startup times

16:45

and reduce network strain by keeping the workspace

16:48

data local.

16:49

But then you run into all these performance quotas, latency

16:52

issues, and just the challenge of scaling that approach

16:55

across a huge platform.

16:57

So kind of swapping one set of problems for another.

16:59

In a way.

17:00

But the micro VM detour, it wasn't a dead end at all.

17:03

It was really a turning point in their thinking.

17:05

First, it really solidified their commitment

17:07

to things like full workspace backup

17:10

and being able to suspend and resume environments.

17:12

So that became a must have.

17:14

Exactly.

17:14

It was non-negotiable.

17:16

But maybe more importantly, this experiment

17:18

made them really consider moving away from Kubernetes.

17:23

Trying to shoehorn these micro VMs into the Kubernetes world

17:27

made them realize that there might be a better way.

17:30

A way where they weren't constantly fighting

17:32

the limitations of the platform.

17:34

So it's like, those micro VMs were the gateway drug

17:38

to their Kubernetes exodus.

17:39

I like that analogy.

17:40

It's perfect.

17:41

They got a taste of something different.

17:43

And they realized maybe they didn't need Kubernetes

17:46

after all.

17:46

OK, so after all that experimenting,

17:48

what was their final move?

17:50

Did they find the solution they were searching for?

17:52

They did.

17:53

They built their own system called Gitpod Flex.

17:55

It's designed from the ground up to be like the perfect home

17:59

for developer environments.

18:00

Taking the best of what they learned

18:02

and leaving the Kubernetes baggage behind.

18:04

All right, so this is where it gets really interesting.

18:07

Tell me more about Gitpod Flex.

18:09

What makes it so special?

18:10

Well, it's not a complete rejection of Kubernetes, right?

18:14

They kept some of the core principles.

18:15

For example, declarative APIs are still

18:18

a core part of Gitpod Flex.

18:20

Remember all those YAML files in Kubernetes?

18:23

Yeah.

18:23

Defining your infrastructure as code.

18:25

Well, that's still there.

18:27

OK.

18:27

But in a more streamlined and targeted way.

18:29

So you still get those benefits of infrastructure as code

18:33

without all the complexity.

18:34

Right.

18:35

And they also kept the use of control theory

18:38

for resource management.

18:39

This basically means they're using fancy algorithms

18:42

to automatically adjust resource allocation based on what's

18:45

happening in real time.

18:47

OK.

18:47

Kind of like Kubernetes auto scaling,

18:49

but tailored for how developer environments actually behave.

18:53

Right.

18:53

So even though it sounds complex under the hood,

18:56

what does this mean for developers

18:57

who are using Gitpod Flex?

18:58

What's the experience like?

19:00

Well, one big plus is the seamless integration

19:03

with dev containers.

19:04

These are like pre-configured, self-contained developer

19:07

environments, all the tools, libraries, dependencies,

19:10

all bundled up for specific projects.

19:12

So it's like a recipe for your perfect developer environment,

19:15

just had code.

19:16

Exactly.

19:17

And Gitpod Flex makes it super easy to spin those up.

19:20

They've also really doubled down on self-hosting.

19:22

So remember, Gitpod used to offer a cloud and a self-managed

19:26

version.

19:27

And they said that the self-managed version, which

19:29

was heavily Kubernetes-based, was a real pain to support.

19:32

Right.

19:33

Well, with Gitpod Flex, self-hosting is super easy.

19:36

You can have it up and running in less than three minutes

19:38

on pretty much any infrastructure.

19:40

Three minutes?

19:42

That's faster than it takes to order a pizza.

19:44

It really is.

19:45

And that opens up a lot of possibilities.

19:47

Companies can now run their developer environments closer

19:50

to their data, even on premises if they need to.

19:53

Gives them more control over security, compliance, all

19:56

that stuff.

19:56

So flexibility and control are really key here.

20:00

But what about performance?

20:02

All those Kubernetes headaches, the CPU throttling, storage

20:05

bottlenecks, all those things.

20:07

Have they managed to get rid of those with Gitpod Flex?

20:09

And that was one of their main goals.

20:11

And from what they've said, it seems

20:13

like they made a lot of progress.

20:15

By moving away from that shared kernel model of containers

20:18

and giving each environment its own dedicated resources,

20:23

they've managed to smooth out a lot of those performance

20:25

hiccups.

20:25

So each environment gets its own slice of the pie.

20:28

Exactly.

20:29

Now what about that memory snapshot feature

20:31

that they were so keen on with micro VMs?

20:33

Did that make it into Gitpod Flex?

20:35

So they haven't specifically said,

20:37

but knowing how much they care about making developer

20:40

environments stateful friendly, I

20:43

wouldn't be surprised if they're working on it.

20:45

Fingers crossed.

20:45

Right, because it fits perfectly with their vision.

20:48

OK, let's talk about security.

20:49

We know they put a ton of effort into securing

20:52

their Kubernetes setup.

20:53

Oh, yeah.

20:54

But it always felt like they were swimming upstream.

20:56

Right.

20:57

What's the story with Gitpod Flex?

20:58

Did they manage to make it simpler but also more secure?

21:02

Well, security is kind of baked into Gitpod Flex

21:05

from the very beginning.

21:06

They went all in on a zero trust architecture.

21:08

That basically means no user, no device, no request

21:12

is automatically trusted.

21:14

Everything has to be authenticated, authorized,

21:17

every step of the way.

21:18

Fort Knox for code.

21:19

Exactly.

21:20

This approach kind of avoids a lot of the vulnerabilities

21:23

they were dealing with in Kubernetes.

21:25

Right.

21:25

No more messing around with user namespaces or containers

21:29

breaking out of their isolation.

21:30

So more secure A and D, easier to manage.

21:33

That's the goal.

21:34

That's the dream.

21:35

Right.

21:35

And they've also made it much easier for companies

21:38

to apply their own security policies within Gitpod Flex.

21:42

So they can hook it into their existing identity management

21:44

systems.

21:45

They can really control who has access to what.

21:48

And they can monitor everything.

21:49

So they really put security front and center

21:51

from the beginning.

21:52

They did.

21:52

And it just shows how Gitpod Flex is really built for this.

21:55

It's not just about running code.

21:57

It's about creating this space where developers

21:59

can be productive, collaborative, and secure.

22:02

So after this whole journey, what's

22:04

the big takeaway here?

22:05

What can we learn from their experience?

22:08

Welcome back to the Deep Dive.

22:10

We've been talking all about Gitpod's journey,

22:12

from Kubernetes lovers to creating Gitpod Flex,

22:16

their own custom system.

22:18

Yeah, it shows that sometimes the most popular solution

22:21

isn't always the right one.

22:22

They realized Kubernetes just wasn't the right tool

22:25

for what they needed.

22:25

And they had the guts to go and do their own thing.

22:29

Exactly.

22:30

So in this final part, let's kind

22:31

of dig into what makes Gitpod Flex tick.

22:35

What were some of the architectural decisions

22:37

they made?

22:38

What are the features that really set it apart?

22:41

So one of the first things to understand

22:43

is that it's not a total rejection of Kubernetes.

22:46

They kept some of the core principles.

22:47

For example, declarative APIs are still

22:50

a big part of Gitpod Flex.

22:52

Remember all that YAML configuration

22:54

we talked about in Kubernetes?

22:55

That approach is still there, but it's a lot more streamlined,

22:58

more focused.

22:59

So you're still defining your infrastructure as code

23:02

without all that Kubernetes baggage.

23:04

Exactly.

23:05

And they also kept the use of control theory

23:07

for resource management.

23:09

Basically, this means that they're using these smart

23:11

algorithms to automatically adjust resource allocation

23:16

based on what's needed in real time,

23:18

kind of like Kubernetes auto-scaling, but again,

23:20

tailored for developer environments.

23:22

Right.

23:23

So even though it might sound kind of complex under the hood,

23:27

what does it mean for developers who are actually using Gitpod

23:30

Flex?

23:31

Well, one big benefit is the seamless integration

23:33

with dev containers.

23:34

These are basically like pre-configured, self-contained

23:37

developer environments.

23:38

You've got all your tools, libraries, dependencies,

23:41

all bundled together for specific projects.

23:44

So it's like a recipe for your perfect developer environment.

23:46

You just add code.

23:47

Exactly.

23:48

And Gitpod Flex makes it super easy to just spin those up.

23:51

And remember how they were struggling

23:53

with self-hosting their platform on Kubernetes?

23:55

Yeah.

23:56

With Gitpod Flex, self-hosting is incredibly easy.

23:59

You can have it up and running in under three minutes

24:01

on pretty much any infrastructure.

24:03

Three minutes.

24:04

That's faster than making a cup of coffee.

24:06

Pretty much.

24:07

And that opens up a lot of possibilities.

24:09

Companies can run their developer environments

24:11

closer to their data, even on premises, if they need to.

24:14

Gives them more control over security, compliance,

24:17

all that good stuff.

24:19

So flexibility and control are key here.

24:21

What about performance?

24:23

They had all those struggles with Kubernetes, CPU

24:25

throttling, storage bottlenecks, all those things.

24:29

Did they manage to fix those with Gitpod Flex?

24:33

That was definitely a top priority for them.

24:35

And it seems like they've made some major progress.

24:37

By ditching the whole shared kernel model of containers

24:41

and giving each environment its own dedicated resources,

24:44

they've managed to smooth out a lot of those performance issues.

24:47

So no more fighting over resources.

24:48

Right.

24:49

Every environment gets its own slice of the pie.

24:51

Now, what about that memory snapshot feature

24:54

that they were so excited about during the micro VM phase?

24:57

You know, the one where you could just pause and resume

24:59

your entire environment in a snap?

25:01

Did that make it into Gitpod Flex?

25:03

They haven't explicitly said, but I

25:05

wouldn't be surprised if they found a way to make it work.

25:07

It really aligns with their goal of making a system that's

25:10

truly developer friendly.

25:12

Fingers crossed.

25:13

OK, let's talk about security.

25:14

We know that they put a ton of effort

25:16

into securing their Kubernetes setup,

25:19

but it felt like they were constantly

25:20

fighting an uphill battle.

25:22

What's the security story with Gitpod Flex?

25:25

Well, security is a core part of Gitpod Flex.

25:28

They decided to go all in on a zero trust architecture, which

25:33

means that nothing is automatically trusted.

25:36

Every user, every device, every request

25:38

has to be authenticated and authorized

25:41

every step of the way.

25:42

So it's like Fort Knox for your code.

25:43

Exactly.

25:44

And this approach kind of eliminates

25:46

a lot of those vulnerabilities that they were always

25:48

struggling with in Kubernetes.

25:49

No more complex user namespaces or containers breaking out

25:53

of their isolation.

25:55

So more secure and easier to manage.

25:57

It sounds almost too good to be true.

25:59

Well, it shows what's possible when

26:00

you build a system that's designed for these requirements

26:03

from the ground up.

26:05

They've also made it a lot easier for companies

26:07

to integrate their own security policies into Gitpod Flex,

26:11

connecting it with their existing identity management

26:14

systems, setting fine grained access controls,

26:17

monitoring everything in real time.

26:19

So they're giving companies the tools

26:20

they need to make sure that everything's locked down.

26:22

Exactly.

26:23

And this really highlights what Gitpod Flex is all about.

26:27

It's not just a platform to run code.

26:29

It's an environment that's built to support developers.

26:32

A place where they can be productive,

26:33

they can be collaborative, and most importantly, secure.

26:37

So after this whole journey, what's the big takeaway?

26:40

What can we learn from their experience?

26:42

I think it's a reminder that sometimes you

26:44

have to go against the grain.

26:45

The most popular solution isn't always the best, right?

26:49

It's about understanding what you need, what your goals are,

26:52

and then finding the tools that fit,

26:54

even if it means building something yourself.

26:56

It's a story about challenging assumptions

26:59

and being willing to experiment and having the courage

27:02

to try something new when the old way just isn't working.

27:05

It really is.

27:05

And it makes you wonder, in our own work,

27:09

are we forcing tools into roles they weren't meant for?

27:13

Are there other systems out there

27:14

that could benefit from a similar rethink,

27:17

like what Gitpod did?

27:18

That's a great question for all of us to think about.

27:20

This has been a really interesting deep dive exploring

27:23

developer environments and how Gitpod

27:25

built this innovative solution.

27:28

In this world of technology that's always changing,

27:31

being willing to adapt, to experiment,

27:32

to break away from the norm, well,

27:34

that can lead to some amazing breakthroughs.

27:36

Thanks for joining us on the deep dive.

27:36

Thanks for joining us on the deep dive.