Today's Deep-Dive: gitpod - we-are-leaving-kubernetes

0:00

Okay, so get this. Imagine you're so deep into Kubernetes.

0:05

Like you're giving a KubeCon talk about your setup.

0:09

You've got to handling millions of users.

0:11

But then you get it.

0:13

That's what we're diving into today.

0:15

Why Gitpod decided Kubernetes,

0:18

specifically for their developer environments,

0:21

wasn't working.

0:22

Yeah, it's really interesting

0:24

because they're not saying Kubernetes is bad, right?

0:26

They're saying it's not the right tool

0:28

when it comes to developer environment.

0:30

Exactly, we're looking at their blog post

0:31

from October 31st, 2024.

0:34

It's this like six year saga of them trying to make it work,

0:37

hitting all these roadblocks.

0:39

They came up with some pretty interesting workarounds.

0:41

Oh yeah.

0:42

You almost feel bad for them,

0:43

but you learn a lot whether you're deep into Kubernetes

0:46

or just curious about developer tools.

0:48

Yeah, and it just shows that even teams

0:49

with tons of experience, even huge teams,

0:53

sometimes have to take a step back and look at their tools.

0:56

Right.

0:56

You've got to pick the right tool for the job.

0:58

Right.

0:59

It doesn't have to be the popular one.

1:00

Yeah, okay.

1:01

So Gitpod's main argument is running applications

1:04

in production, that's where Kubernetes shines.

1:05

Yeah.

1:06

But developer environments, that's a whole different beast.

1:10

Totally.

1:10

And the blog post breaks down why they highlight

1:13

these four characteristics of developer environments.

1:16

The first being that they're super stateful

1:18

and interactive.

1:19

So you've got gigabytes of source code,

1:21

you've got build caches, you've got containers running.

1:24

All that is constantly changing.

1:25

It's not like a stateless app.

1:28

Your developer environment is basically an extension of you.

1:31

Yeah, it's like the difference between a pristine server room

1:34

and your desk.

1:36

Your desk has projects everywhere and coffee mugs.

1:38

Exactly.

1:39

And that mess is really valuable to developers.

1:42

So you can imagine it's a huge pain

1:45

if they lose changes or get interrupted.

1:47

And that leads us to the second characteristic,

1:50

unpredictable resource usage.

1:52

So you might be coding along, and suddenly, bam,

1:55

you need tons of CPU for compilation.

1:58

Or memory usage might spike.

2:00

Yeah, and Kubernetes isn't really

2:02

known for loving surprises.

2:03

Not really, no.

2:04

Gitpod talks about all the struggles

2:06

they had with CPU throttling.

2:08

Your terminal's lagging because your IDE is fighting

2:10

some random process for resources.

2:12

They did all kinds of stuff.

2:14

Custom controllers, messing with process priorities,

2:17

even tweaking Cgroups V2.

2:19

Yeah, and for those who don't know,

2:21

Cgroups V2 is how the Linux kernel organizes processes

2:26

into these hierarchical groups.

2:29

It's for controlling and monitoring

2:31

things like CPU and memory and disk I.O.

2:34

It's very fine-grained control, but it's complex.

2:37

Yeah, it sounds like they went really deep.

2:39

Deep down the rabbit hole.

2:41

And remember, this is all happening

2:42

inside a single container because that's

2:44

the way Kubernetes works.

2:45

So all these processes crammed together,

2:47

it just makes resource usage a total guessing game.

2:51

Right.

2:52

OK, so then there's memory management.

2:54

Apparently, until SwapSpace was available in Kubernetes

2:56

version 1.22, overbooking memory was a pretty big risk.

3:01

Like, you could end up killing a central processor,

3:03

you imagine.

3:04

Developer rage.

3:05

Yeah.

3:06

I mean, this just shows that even mature technologies

3:08

like Kubernetes can have limitations, especially

3:11

for specific use cases, right?

3:13

It's really important to evaluate

3:15

whether a tool's strengths really fit what you need it for.

3:20

They must've been, I mean, can you imagine?

3:21

Pulling their hair out.

3:22

Yeah.

3:23

Yeah.

3:23

OK, so then we have storage performance.

3:26

Gitpod really hammers on about how much this matters,

3:29

not just for how fast your environment starts up,

3:32

but your whole experience inside the environment.

3:35

Yeah, because if you're waiting for files to load

3:37

or for builds to finish, it just kills your flow.

3:40

Totally.

3:41

And they tried everything.

3:42

SSD, rate zero for speed, a little risky.

3:45

Then block storage for availability,

3:47

but they hit a wall with persistent volume claims, or PVCs.

3:52

For those who aren't deep into Kubernetes,

3:53

explain why PVCs were such a pain.

3:55

Sure.

3:56

So PVCs, it's like this abstraction layer

3:58

that lets you use storage.

3:59

You don't have to worry about the underlying hardware,

4:02

so it's flexible.

4:03

But in practice, when these PVCs would attach or detach,

4:07

it was unpredictable, and that messed with their attempts

4:10

to make workspace startups super fast.

4:12

They also ran into some reliability issues,

4:15

especially on Google Cloud.

4:16

So you're a developer, you're ready to code,

4:18

and your whole environment just crashes.

4:19

Yeah.

4:20

Not a good look.

4:21

Talk about a buzzkill.

4:22

And then there's backing up and restoring these environments.

4:25

They can get huge, right?

4:26

Right.

4:27

So moving them around became this balancing act

4:29

of I-O, network bandwidth, and CPU.

4:32

Wow.

4:33

They even had to use sick group-based I-O limiters

4:36

to prevent one workspace from hogging all the resources

4:40

and then starving the others.

4:42

It's crazy how these things that sound simple get so complex.

4:45

Totally.

4:46

Speaking of complex, another challenge?

4:48

Autoscaling and startup time.

4:49

Yeah.

4:50

They were obsessed with minimizing

4:52

that initial wait time.

4:53

Of course, yeah.

4:54

But that clashed with their desire

4:56

to use their machines as efficiently as possible.

4:59

Yeah, I mean Kubernetes by design

5:02

has this inherent lower limit on startup time, right?

5:05

Right.

5:05

Because of all the steps involved,

5:07

moving content around, spinning up containers.

5:10

So they started off thinking, let's just

5:12

run multiple workspaces on one node

5:14

to leverage shared caches.

5:17

But that didn't really work out.

5:19

Didn't quite work out, no.

5:20

So they tried some creative solutions.

5:22

They tried something they called ghost workspaces.

5:25

Ghost workspaces.

5:26

Yeah.

5:26

So these were preemptible pods that would just

5:28

sit there to hold space so they could scale in advance.

5:32

They're like phantom developers taking up space.

5:34

That's a good way to put it.

5:35

Clever, but too slow and unreliable.

5:38

Then they tried ballast pods.

5:41

So these were entire nodes filled with dummy pods

5:44

just to ensure that they had enough capacity.

5:46

Kind of like renting out an empty apartment building

5:48

just in case you might need it later.

5:50

Pretty much not efficient.

5:52

Finally, they landed on cluster autoscaler plugins,

5:55

which is a much more elegant solution.

5:57

But it took a while to get there.

5:59

They even implemented proportional autoscaling,

6:02

which basically controls the rate of scale up.

6:05

It's based on how quickly devs are starting new environments.

6:08

So if there's a sudden rush, they

6:09

can add capacity quickly without overshooting.

6:12

It's all about finding that balance

6:14

between being responsive and making

6:15

the most of your resources.

6:17

My brain's hurting.

6:18

Anyone else?

6:18

OK.

6:19

Image polls, another headache.

6:21

Workspace container images can be huge.

6:24

We're talking like 10 gigabytes or more.

6:26

And that impacts performance when you have to download and extract

6:29

that much data for every workspace.

6:31

Yeah, it's like downloading the entire Library of Congress

6:34

every time you want to read a book.

6:35

Right.

6:35

So they tried pre-pulling images with demon sets, which

6:38

are basically agents on every node making sure the images are ready.

6:41

Then they tried building their own custom images

6:43

to maximize layer reuse, even baking images directly

6:47

into the node disk image.

6:49

Yeah, each of those came with their own trade-offs, right?

6:51

Increased complexity, higher costs, limits

6:54

on what images devs could use.

6:55

Again, another example of how something seemingly simple

6:58

can get really complicated at scale.

7:00

Yeah, and they even built their own registry facade.

7:03

They integrated it with IPFS, the Interplanetary File System,

7:07

that decentralized way to store and share files.

7:10

They were so proud of it.

7:11

They gave a whole KubeCon talk about it.

7:14

But in the end, the best solution

7:16

was just encouraging everyone to use similar base images,

7:19

making caching a lot more effective.

7:21

Sometimes the simplest answer really is the best one.

7:24

But getting there takes some effort.

7:26

OK, buckle up.

7:27

We're going into the world of networking in Kubernetes.

7:30

And this is where it gets a little technical.

7:32

This is where the conflict between what Kubernetes assumes

7:36

and what developer environments need becomes really clear.

7:40

Yeah, you've got the issue of access control.

7:43

You want each environment to be its own little island.

7:45

So walled gardens for every developer.

7:47

Exactly.

7:48

So no peeking at your neighbor's code.

7:50

And you need to control who can access what.

7:53

Kubernetes has these things called network policies.

7:56

They're for defining fine-grained rules

7:58

about what traffic can flow within the cluster.

8:01

Sounds great, but even those cause headaches for Gitpod.

8:04

Of course they did.

8:06

So what was their initial approach?

8:07

So they started using Kubernetes services and an ingress proxy.

8:12

It's to manage access to individual environment ports.

8:17

Think your IDE or services running within the workspace.

8:20

But as they scaled, this approach became unreliable.

8:24

Because more users equals more complexity equals more things

8:27

that can go wrong.

8:27

Exactly.

8:28

With thousands of environments running simultaneously,

8:31

name resolution started failing.

8:33

Sometimes, it even crashed entire workspaces.

8:36

Even established Kubernetes features

8:38

have their limits when you push them to the extreme.

8:41

It's a good reminder that scaling isn't just

8:42

about making things bigger.

8:44

No.

8:44

It's about making sure they can handle all the complexity that

8:47

comes with size.

8:49

OK, so resource constraints, another area

8:51

where Gitpod face challenges, network bandwidth sharing.

8:55

It's like having multiple apartments sharing

8:57

the same internet connection, and everyone

8:58

wants to stream movies at the same time.

9:01

Yeah, just like CPU and memory, you've

9:03

got multiple workspaces on a node, all competing

9:05

for that same network pipe.

9:07

Some container network interfaces, or CNIs,

9:10

have features for network shaping,

9:12

but that adds even more complexity.

9:14

And then there's the question of fairness.

9:16

How do you divide up that bandwidth

9:18

so everyone gets a decent slice?

9:20

It's a never-ending battle.

9:23

Balancing performance, security, making

9:26

the most of your resources.

9:27

And that brings us to, I think, one of the hairiest topics.

9:32

Security.

9:33

Specifically in the context of developer environments.

9:36

How do you give developers the freedom

9:37

they need without creating a security nightmare?

9:41

This is where the tension between flexibility and control

9:43

really comes in.

9:44

It gets complicated.

9:45

So they start by highlighting this naive approach.

9:48

Just give everyone root access to their containers.

9:50

Seems simple, right?

9:51

Yeah, just give everyone the keys to the kingdom.

9:53

What could go wrong?

9:54

Well, aside from being a security disaster waiting

9:56

to happen, giving users root in their containers

10:00

basically gives them root on the node itself.

10:02

That means they can potentially swoop around

10:04

in other environments that are running on the same node.

10:07

They could mess with the infrastructure.

10:09

Yeah, not good.

10:09

Not exactly what you want.

10:11

Not stable.

10:12

So they needed something more sophisticated.

10:14

Enter user namespaces.

10:17

So this is a Linux kernel feature

10:19

that lets you map user and group IDs inside containers.

10:23

So you can basically make a user feel

10:26

like they have root privileges within their environment,

10:29

but without actually giving them control over the host system.

10:32

OK, that sounds clever, but I bet it wasn't easy to set up.

10:34

You bet it wasn't.

10:36

Kubernetes did eventually add support for user namespaces

10:39

in version 1.25, but Gitpod had already

10:42

started their own implementation with version 1.22.

10:45

And let me tell you, their solution

10:47

involves some serious technical gymnastics.

10:50

Give us the highlights.

10:51

What kind of gymnastics?

10:52

Well, for starters, they had to implement something

10:54

called file system UID shifting.

10:57

This ensures that files that are created inside the container

11:00

are mapped correctly to user IDs on the host system.

11:04

So it prevents any security bypasses.

11:07

They tried a bunch of different approaches,

11:09

like shifts, fuse overlays, even id mapped mounts.

11:13

Each of those had their own quirks

11:14

in terms of performance and compatibility.

11:16

It sounds like they were really pushing the limits of what

11:19

Kubernetes could do, trying to fit a square peg

11:21

into a round hole.

11:22

Exactly.

11:23

And then there was a challenge of mounting

11:25

what they call a masked proc file system.

11:28

So usually when a container starts up, it mounts proc.

11:32

This gives it access to information

11:33

about the host system.

11:35

But for Gitpod's security model, proc

11:37

had to be hidden to prevent vulnerabilities.

11:40

So they had to create this custom masked proc

11:43

and then carefully move it into the right mount

11:46

namespace for each container.

11:48

And they did this using seccomp notify,

11:50

which is like a super low level way to intercept and modify

11:53

system calls.

11:54

Pretty hardcore stuff.

11:55

Wow, it's like they're doing brain surgery on Kubernetes

11:58

to make it work.

11:59

Pretty much.

12:00

But wait, there's more.

12:02

They also needed to add support for FUSE file

12:05

system in user space.

12:06

Yeah.

12:07

A lot of developer tools rely on that.

12:09

So this involved messing with the container's EBPF device

12:13

filter, another low level tweak.

12:15

And then there's the issue of network capabilities.

12:17

Right.

12:18

So as root, you have these powerful capabilities

12:21

like KAPNA TADBEN and KAPNA TRAW.

12:23

They let you control networking.

12:25

Right.

12:25

So giving those to a container would totally

12:27

break their security model.

12:29

Yeah.

12:29

So how did they get around that?

12:31

Well, they ended up creating another network namespace,

12:33

but this time inside the Kubernetes container.

12:37

Initially, they used sloop fornets.

12:39

And then they switched to veth pairs and custom NF tables

12:42

rules.

12:43

It's like they were building a secure little networking

12:45

sandbox within another sandbox.

12:48

It's amazing how much work they put into making this all work.

12:51

It really is.

12:51

But all this complexity comes with a price, right?

12:54

You've got performance hits, especially

12:56

with the earlier solutions.

12:58

You've got compatibility issues with certain tools.

13:00

And then the never-ending struggle

13:02

to keep up with Kubernetes updates.

13:05

So you can see why they started looking for alternatives.

13:08

And that's where their exploration of micro VMs comes

13:10

in.

13:10

But we're going to save that for part two.

13:12

Stay tuned, folks.

13:13

Things get really interesting.

13:15

Welcome back.

13:15

If you're just tuning in, we're talking

13:17

about Gitpod's journey, how they went from Kubernetes fans

13:20

to creating their own system for developer environments.

13:24

Yeah, it got to the point where they were willing to try

13:26

anything, even something completely

13:28

different from Kubernetes.

13:29

Right.

13:30

So that's where micro VMs come in.

13:31

Now, for those of us who aren't living

13:34

in the infrastructure world, can you give us a micro VMs 101?

13:38

What are they?

13:38

And why was Gitpod so interested?

13:40

Sure.

13:41

So think of micro VMs like tiny specialized virtual machines,

13:46

right?

13:46

Strip down to just the essentials.

13:49

They boot up super fast, small footprint,

13:52

and security is kind of baked into their design.

13:54

Gitpod was looking at technologies

13:56

like Firecracker, Cloud Hypervisor, QEMU.

14:00

So what was it about micro VMs that they were so excited about?

14:03

What problems were they hoping to solve that Kubernetes just

14:06

wasn't cutting it for?

14:07

Well, first and foremost, better resource isolation.

14:11

Unlike containers, which share the host's kernel, micro VMs,

14:15

they get their own dedicated kernel.

14:17

So that means less chance of one environment interfering

14:20

with another, more predictable performance overall.

14:23

So no more laggy terminal, because your IDE is fighting

14:28

some compiler process for CPU.

14:31

Exactly.

14:31

Another big plus, memory snapshots, near instant resume.

14:36

With something like Firecracker, you

14:38

can take a snapshot of the entire VM's memory state,

14:40

and that includes everything that's running.

14:42

You can restore it in an instant.

14:44

Wait, so you're saying you could literally

14:46

pause your whole developer environment, mid-debug

14:48

session, coffee break, whatever, and come back to it

14:50

exactly as you left it.

14:51

That's the power of micro VMs.

14:53

Imagine the productivity boost, especially

14:55

for large projects, complex projects,

14:58

where restarting everything can take forever.

15:00

Yeah, that's a feature I think a lot of developers would love.

15:02

For sure.

15:04

But I'm guessing there were some downsides, right?

15:06

Otherwise, Gitpod would have just switched over

15:07

and called it a day.

15:08

Of course, no technology is perfect.

15:10

One challenge was overhead.

15:13

Even though micro VMs are lightweight

15:15

compared to like traditional VMs,

15:17

they still add more overhead than containers.

15:19

And that impacts performance, resource utilization,

15:22

which for a platform like Gitpod is a huge deal.

15:25

Right, because they're running thousands, if not millions,

15:27

of these environments.

15:28

Exactly.

15:29

Every little bit of efficiency matters.

15:31

Another hurdle was image conversion.

15:33

Most developer tools, they come packaged as container images

15:37

using the OCI standard.

15:38

Kubernetes loves that.

15:40

But to use those images in a micro VM,

15:42

you have to convert them to a format

15:44

that the micro VM understands, that adds complexity

15:47

and slows down startup.

15:49

Right, so it's not just as simple

15:50

as swapping out Kubernetes and plugging in micro VMs.

15:52

No, it's a whole translation process.

15:55

And then there are some limitations

15:56

that are specific to micro VM technologies themselves.

16:00

For example, Firecracker, which is known for its speed

16:03

and its snapshotting.

16:04

Well, at the time, it didn't support

16:05

GPUs, which is a deal breaker if you're

16:08

working on graphics intensive applications.

16:11

OK, so even cutting edge technology

16:14

has its limitations.

16:15

What else did they run into?

16:17

Well, data movement became a much bigger problem.

16:19

With micro VMs, you're dealing with whole VM images,

16:23

including those memory snapshots, which

16:24

can be pretty large.

16:26

Moving them around, whether it's for backups or scheduling,

16:29

gets more complex and it takes more time.

16:31

And I bet storage, which was already a pain point,

16:33

became even more of a headache.

16:35

You got it.

16:36

They tried attaching EBS volumes,

16:39

that's elastic block storage, from AWS to their micro VMs,

16:43

thinking that they could improve startup times

16:45

and reduce network strain by keeping the workspace

16:48

data local.

16:49

But then you run into all these performance quotas, latency

16:52

issues, and just the challenge of scaling that approach

16:55

across a huge platform.

16:57

So kind of swapping one set of problems for another.

16:59

In a way.

17:00

But the micro VM detour, it wasn't a dead end at all.

17:03

It was really a turning point in their thinking.

17:05

First, it really solidified their commitment

17:07

to things like full workspace backup

17:10

and being able to suspend and resume environments.

17:12

So that became a must have.

17:14

Exactly.

17:14

It was non-negotiable.

17:16

But maybe more importantly, this experiment

17:18

made them really consider moving away from Kubernetes.

17:23

Trying to shoehorn these micro VMs into the Kubernetes world

17:27

made them realize that there might be a better way.

17:30

A way where they weren't constantly fighting

17:32

the limitations of the platform.

17:34

So it's like, those micro VMs were the gateway drug

17:38

to their Kubernetes exodus.

17:39

I like that analogy.

17:40

It's perfect.

17:41

They got a taste of something different.

17:43

And they realized maybe they didn't need Kubernetes

17:46

after all.

17:46

OK, so after all that experimenting,

17:48

what was their final move?

17:50

Did they find the solution they were searching for?

17:52

They did.

17:53

They built their own system called Gitpod Flex.

17:55

It's designed from the ground up to be like the perfect home

17:59

for developer environments.

18:00

Taking the best of what they learned

18:02

and leaving the Kubernetes baggage behind.

18:04

All right, so this is where it gets really interesting.

18:07

Tell me more about Gitpod Flex.

18:09

What makes it so special?

18:10

Well, it's not a complete rejection of Kubernetes, right?

18:14

They kept some of the core principles.

18:15

For example, declarative APIs are still

18:18

a core part of Gitpod Flex.

18:20

Remember all those YAML files in Kubernetes?

18:23

Yeah.

18:23

Defining your infrastructure as code.

18:25

Well, that's still there.

18:27

OK.

18:27

But in a more streamlined and targeted way.

18:29

So you still get those benefits of infrastructure as code

18:33

without all the complexity.

18:34

Right.

18:35

And they also kept the use of control theory

18:38

for resource management.

18:39

This basically means they're using fancy algorithms

18:42

to automatically adjust resource allocation based on what's

18:45

happening in real time.

18:47

OK.

18:47

Kind of like Kubernetes auto scaling,

18:49

but tailored for how developer environments actually behave.

18:53

Right.

18:53

So even though it sounds complex under the hood,

18:56

what does this mean for developers

18:57

who are using Gitpod Flex?

18:58

What's the experience like?

19:00

Well, one big plus is the seamless integration

19:03

with dev containers.

19:04

These are like pre-configured, self-contained developer

19:07

environments, all the tools, libraries, dependencies,

19:10

all bundled up for specific projects.

19:12

So it's like a recipe for your perfect developer environment,

19:15

just had code.

19:16

Exactly.

19:17

And Gitpod Flex makes it super easy to spin those up.

19:20

They've also really doubled down on self-hosting.

19:22

So remember, Gitpod used to offer a cloud and a self-managed

19:26

version.

19:27

And they said that the self-managed version, which

19:29

was heavily Kubernetes-based, was a real pain to support.

19:32

Right.

19:33

Well, with Gitpod Flex, self-hosting is super easy.

19:36

You can have it up and running in less than three minutes

19:38

on pretty much any infrastructure.

19:40

Three minutes?

19:42

That's faster than it takes to order a pizza.

19:44

It really is.

19:45

And that opens up a lot of possibilities.

19:47

Companies can now run their developer environments closer

19:50

to their data, even on premises if they need to.

19:53

Gives them more control over security, compliance, all

19:56

that stuff.

19:56

So flexibility and control are really key here.

20:00

But what about performance?

20:02

All those Kubernetes headaches, the CPU throttling, storage

20:05

bottlenecks, all those things.

20:07

Have they managed to get rid of those with Gitpod Flex?

20:09

And that was one of their main goals.

20:11

And from what they've said, it seems

20:13

like they made a lot of progress.

20:15

By moving away from that shared kernel model of containers

20:18

and giving each environment its own dedicated resources,

20:23

they've managed to smooth out a lot of those performance

20:25

hiccups.

20:25

So each environment gets its own slice of the pie.

20:28

Exactly.

20:29

Now what about that memory snapshot feature

20:31

that they were so keen on with micro VMs?

20:33

Did that make it into Gitpod Flex?

20:35

So they haven't specifically said,

20:37

but knowing how much they care about making developer

20:40

environments stateful friendly, I

20:43

wouldn't be surprised if they're working on it.

20:45

Fingers crossed.

20:45

Right, because it fits perfectly with their vision.

20:48

OK, let's talk about security.

20:49

We know they put a ton of effort into securing

20:52

their Kubernetes setup.

20:53

Oh, yeah.

20:54

But it always felt like they were swimming upstream.

20:56

Right.

20:57

What's the story with Gitpod Flex?

20:58

Did they manage to make it simpler but also more secure?

21:02

Well, security is kind of baked into Gitpod Flex

21:05

from the very beginning.

21:06

They went all in on a zero trust architecture.

21:08

That basically means no user, no device, no request

21:12

is automatically trusted.

21:14

Everything has to be authenticated, authorized,

21:17

every step of the way.

21:18

Fort Knox for code.

21:19

Exactly.

21:20

This approach kind of avoids a lot of the vulnerabilities

21:23

they were dealing with in Kubernetes.

21:25

Right.

21:25

No more messing around with user namespaces or containers

21:29

breaking out of their isolation.

21:30

So more secure A and D, easier to manage.

21:33

That's the goal.

21:34

That's the dream.

21:35

Right.

21:35

And they've also made it much easier for companies

21:38

to apply their own security policies within Gitpod Flex.

21:42

So they can hook it into their existing identity management

21:44

systems.

21:45

They can really control who has access to what.

21:48

And they can monitor everything.

21:49

So they really put security front and center

21:51

from the beginning.

21:52

They did.

21:52

And it just shows how Gitpod Flex is really built for this.

21:55

It's not just about running code.

21:57

It's about creating this space where developers

21:59

can be productive, collaborative, and secure.

22:02

So after this whole journey, what's

22:04

the big takeaway here?

22:05

What can we learn from their experience?

22:08

Welcome back to the Deep Dive.

22:10

We've been talking all about Gitpod's journey,

22:12

from Kubernetes lovers to creating Gitpod Flex,

22:16

their own custom system.

22:18

Yeah, it shows that sometimes the most popular solution

22:21

isn't always the right one.

22:22

They realized Kubernetes just wasn't the right tool

22:25

for what they needed.

22:25

And they had the guts to go and do their own thing.

22:29

Exactly.

22:30

So in this final part, let's kind

22:31

of dig into what makes Gitpod Flex tick.

22:35

What were some of the architectural decisions

22:37

they made?

22:38

What are the features that really set it apart?

22:41

So one of the first things to understand

22:43

is that it's not a total rejection of Kubernetes.

22:46

They kept some of the core principles.

22:47

For example, declarative APIs are still

22:50

a big part of Gitpod Flex.

22:52

Remember all that YAML configuration

22:54

we talked about in Kubernetes?

22:55

That approach is still there, but it's a lot more streamlined,

22:58

more focused.

22:59

So you're still defining your infrastructure as code

23:02

without all that Kubernetes baggage.

23:04

Exactly.

23:05

And they also kept the use of control theory

23:07

for resource management.

23:09

Basically, this means that they're using these smart

23:11

algorithms to automatically adjust resource allocation

23:16

based on what's needed in real time,

23:18

kind of like Kubernetes auto-scaling, but again,

23:20

tailored for developer environments.

23:22

Right.

23:23

So even though it might sound kind of complex under the hood,

23:27

what does it mean for developers who are actually using Gitpod

23:30

Flex?

23:31

Well, one big benefit is the seamless integration

23:33

with dev containers.

23:34

These are basically like pre-configured, self-contained

23:37

developer environments.

23:38

You've got all your tools, libraries, dependencies,

23:41

all bundled together for specific projects.

23:44

So it's like a recipe for your perfect developer environment.

23:46

You just add code.

23:47

Exactly.

23:48

And Gitpod Flex makes it super easy to just spin those up.

23:51

And remember how they were struggling

23:53

with self-hosting their platform on Kubernetes?

23:55

Yeah.

23:56

With Gitpod Flex, self-hosting is incredibly easy.

23:59

You can have it up and running in under three minutes

24:01

on pretty much any infrastructure.

24:03

Three minutes.

24:04

That's faster than making a cup of coffee.

24:06

Pretty much.

24:07

And that opens up a lot of possibilities.

24:09

Companies can run their developer environments

24:11

closer to their data, even on premises, if they need to.

24:14

Gives them more control over security, compliance,

24:17

all that good stuff.

24:19

So flexibility and control are key here.

24:21

What about performance?

24:23

They had all those struggles with Kubernetes, CPU

24:25

throttling, storage bottlenecks, all those things.

24:29

Did they manage to fix those with Gitpod Flex?

24:33

That was definitely a top priority for them.

24:35

And it seems like they've made some major progress.

24:37

By ditching the whole shared kernel model of containers

24:41

and giving each environment its own dedicated resources,

24:44

they've managed to smooth out a lot of those performance issues.

24:47

So no more fighting over resources.

24:48

Right.

24:49

Every environment gets its own slice of the pie.

24:51

Now, what about that memory snapshot feature

24:54

that they were so excited about during the micro VM phase?

24:57

You know, the one where you could just pause and resume

24:59

your entire environment in a snap?

25:01

Did that make it into Gitpod Flex?

25:03

They haven't explicitly said, but I

25:05

wouldn't be surprised if they found a way to make it work.

25:07

It really aligns with their goal of making a system that's

25:10

truly developer friendly.

25:12

Fingers crossed.

25:13

OK, let's talk about security.

25:14

We know that they put a ton of effort

25:16

into securing their Kubernetes setup,

25:19

but it felt like they were constantly

25:20

fighting an uphill battle.

25:22

What's the security story with Gitpod Flex?

25:25

Well, security is a core part of Gitpod Flex.

25:28

They decided to go all in on a zero trust architecture, which

25:33

means that nothing is automatically trusted.

25:36

Every user, every device, every request

25:38

has to be authenticated and authorized

25:41

every step of the way.

25:42

So it's like Fort Knox for your code.

25:43

Exactly.

25:44

And this approach kind of eliminates

25:46

a lot of those vulnerabilities that they were always

25:48

struggling with in Kubernetes.

25:49

No more complex user namespaces or containers breaking out

25:53

of their isolation.

25:55

So more secure and easier to manage.

25:57

It sounds almost too good to be true.

25:59

Well, it shows what's possible when

26:00

you build a system that's designed for these requirements

26:03

from the ground up.

26:05

They've also made it a lot easier for companies

26:07

to integrate their own security policies into Gitpod Flex,

26:11

connecting it with their existing identity management

26:14

systems, setting fine grained access controls,

26:17

monitoring everything in real time.

26:19

So they're giving companies the tools

26:20

they need to make sure that everything's locked down.

26:22

Exactly.

26:23

And this really highlights what Gitpod Flex is all about.

26:27

It's not just a platform to run code.

26:29

It's an environment that's built to support developers.

26:32

A place where they can be productive,

26:33

they can be collaborative, and most importantly, secure.

26:37

So after this whole journey, what's the big takeaway?

26:40

What can we learn from their experience?

26:42

I think it's a reminder that sometimes you

26:44

have to go against the grain.

26:45

The most popular solution isn't always the best, right?

26:49

It's about understanding what you need, what your goals are,

26:52

and then finding the tools that fit,

26:54

even if it means building something yourself.

26:56

It's a story about challenging assumptions

26:59

and being willing to experiment and having the courage

27:02

to try something new when the old way just isn't working.

27:05

It really is.

27:05

And it makes you wonder, in our own work,

27:09

are we forcing tools into roles they weren't meant for?

27:13

Are there other systems out there

27:14

that could benefit from a similar rethink,

27:17

like what Gitpod did?

27:18

That's a great question for all of us to think about.

27:20

This has been a really interesting deep dive exploring

27:23

developer environments and how Gitpod

27:25

built this innovative solution.

27:28

In this world of technology that's always changing,

27:31

being willing to adapt, to experiment,

27:32

to break away from the norm, well,

27:34

that can lead to some amazing breakthroughs.

27:36

Thanks for joining us on the deep dive.

27:36

Thanks for joining us on the deep dive.

Today's Deep-Dive: gitpod - we-are-leaving-kubernetes

Episode description

Persons