Today's Deep-Dive: Prometheus

0:00

ever take a look at your car's dashboard?

0:02

You know, just to check that everything's running

0:04

how it should be.

0:05

Yeah, definitely.

0:07

Well, today we're doing kind of the same thing,

0:09

but for the digital world,

0:11

we're peeking under the hood at these systems

0:13

that make sure all our favorite websites and apps

0:15

are running smooth and staying healthy.

0:17

Makes sense.

0:18

Yeah, like, ever had a website just freeze up on you?

0:21

Or maybe an app that just crashes out of nowhere?

0:24

Happens all the time.

0:25

Well, chances are there's a monitoring system

0:28

working behind the scenes,

0:29

trying to figure out what went wrong.

0:30

And that's exactly what we're gonna explore today.

0:32

Sounds interesting.

0:33

Specifically, we're gonna do a deep dive into Prometheus.

0:36

Prometheus.

0:37

Yeah, super popular.

0:38

It's this open source tool that a lot of folks use for,

0:42

well, this kind of monitoring.

0:43

And for this deep dive, we went straight to the source.

0:46

We got all this great info directly

0:48

from the Prometheus project on GitHub

0:51

and their official website too.

0:53

So, straight from the horse's mouth.

0:55

Exactly.

0:56

But before we really get started,

0:58

I want to give a big shout out to Safe Server.

1:00

They're the ones who made this whole deep dive possible.

1:03

They provide amazing hosting for software,

1:06

and they even offer some really expert advice

1:08

on digital transformation too.

1:10

So, if you're interested, check them out.

1:12

www.safe-server.de.

1:15

I'll have to take a look.

1:16

So, today's goal is pretty simple.

1:18

We want to give you a clear, easy-to-understand

1:20

introduction to Prometheus.

1:22

What it is, how it works, and why

1:25

it's become so important for, well, all things software.

1:28

Sounds good to me.

1:29

Yeah.

1:29

We're really aiming to make this accessible to everyone,

1:32

even if you're just kind of curious about what

1:33

goes on behind the scenes in the digital world.

1:35

Right, right.

1:36

OK, so let's jump in.

1:38

Right off the bat, the Prometheus GitHub page calls it,

1:41

and I'm quoting here, a systems and service monitoring system

1:45

and a time series database.

1:50

Yeah, so what does that actually mean in plain English?

1:53

Well, a monitoring system is, well,

1:55

pretty much what it sounds like.

1:56

It's a way to keep a close eye on all those digital tools

1:59

we use, making sure they're running the way they should,

2:01

kind of like a doctor, constantly checking

2:03

a patient's vital signs.

2:05

And then there's the whole time series database part.

2:08

Think of it like a diary, but for your software.

2:10

It's constantly recording different measurements,

2:12

like how busy the software is or how fast it's responding,

2:15

and it does this at specific points in time.

2:18

So you can actually see how things change over days, hours,

2:21

even minutes.

2:22

Oh, interesting.

2:22

So it's like trapping the ups and downs.

2:24

Exactly.

2:25

And the way Prometheus organizes all that info in this diary,

2:29

it's pretty neat.

2:30

Instead of just having simple entries,

2:32

it uses something called a multi-dimensional data model.

2:34

Multi-dimension?

2:35

Yeah, it basically means that every bit of info,

2:38

Prometheus records, has a specific name.

2:41

We call that the metric name.

2:42

It could be something like website visits or even

2:45

server temperature.

2:46

OK, so far so good.

2:47

But here's the cool part.

2:49

It also has these extra labels attached to it.

2:52

Those are the key value dimensions.

2:54

Think of it like this.

2:55

Instead of just saying temperature, 25 degrees,

2:59

Prometheus might record something like temperature.

3:01

And then in curly brackets, room equals living room,

3:05

sensor equals one, and then 25 degrees.

3:08

So it's adding context to the numbers.

3:10

Exactly.

3:10

Those extra labels like room and sensor,

3:13

they give you a much richer picture of what's going on.

3:16

Then you can ask some really specific questions like,

3:18

what was the temperature in the living room for the past hour?

3:22

That makes sense.

3:23

Yeah.

3:23

And to actually ask those specific questions,

3:26

Prometheus has this special language.

3:28

It's called PROMQL.

3:30

PROMQL?

3:31

Stands for Prometheus Query Language.

3:33

OK, another acronym.

3:35

I know, right?

3:35

But query language might sound kind of intimidating,

3:39

but it's really just a way to search and analyze

3:41

all that data in your software's diary.

3:44

So filter in the noise?

3:46

Exactly.

3:47

Imagine you have a giant spreadsheet

3:49

with all these measurements.

3:50

PROMQL is like the super-powered search bar,

3:53

letting you pull out exactly the info you need.

3:56

You can use it to create graphs, spot trends, even

3:59

set up alerts.

4:01

For example, you could use PROMQL to say,

4:03

show me a graph of how many people logged

4:05

into the website in the last hour.

4:07

Or maybe something like, tell me if the server's memory usage

4:09

has been too high for more than five minutes.

4:11

That's pretty powerful.

4:12

Oh, yeah.

4:13

That's the power of PROMQL.

4:15

It helps you turn all that raw data into actual useful insights.

4:19

Makes sense.

4:21

So Prometheus is recording all this data,

4:23

but how does it actually collect it?

4:25

Well, the way it collects data is pretty interesting, too.

4:28

It uses what's known as an HTTP pull model.

4:31

An HTTP pull model.

4:33

Yeah, so basically, instead of your software

4:35

just sending its measurements to Prometheus automatically,

4:38

Prometheus actually reaches out and asks

4:40

for the latest readings.

4:41

And it does this at regular intervals.

4:43

So it's actively checking in.

4:45

Exactly.

4:45

Kind of like a health inspector.

4:46

They go around to different restaurants

4:48

and check on things instead of waiting for the restaurants

4:50

to report in.

4:51

That's a good analogy.

4:52

Yeah, this whole polling approach,

4:53

it gives you more control and can be way more

4:56

reliable in some cases.

4:57

Especially if the system's being monitored,

5:00

they have spotty internet connections.

5:03

While Prometheus has to actively go and get the data,

5:06

it's actually more robust than relying

5:08

on each individual system to constantly push

5:11

the data its way.

5:12

I see.

5:12

So Prometheus is actively going out and asking for this info.

5:16

But what happens if you have a process that

5:19

only runs for a short time?

5:21

Like a script that just does something once a day

5:24

and then shuts down.

5:25

It wouldn't even be there for Prometheus to check in with.

5:28

Oh, that's a great point.

5:29

So for those kind of short-lived batch jobs,

5:33

as they're called, Prometheus can also handle pushing data.

5:37

And it does this through something called a gateway.

5:39

A gateway.

5:40

Yeah, so instead of waiting around to be asked,

5:43

the batch job can send its data directly to this gateway

5:46

when it finishes up its work.

5:48

Then Prometheus can come along later

5:50

and pull the data from there.

5:51

Like if you need to send a message,

5:53

but the other person's not always available,

5:55

so you just leave it at a central mailbox

5:56

for them to pick up later.

5:57

Makes sense.

5:58

So Prometheus is gathering all this data,

6:00

but it needs to know where to go to get it

6:02

in the first place, right?

6:03

How does it figure out what to actually monitor?

6:06

I saw something about service discovery and static configuration.

6:10

Yeah.

6:11

So service discovery is basically Prometheus

6:14

being really smart and automatically finding

6:16

the things it should be keeping an eye on.

6:18

In today's software world, things are constantly changing.

6:21

New servers popping up, old ones shutting down.

6:24

But Prometheus can actually connect

6:26

with systems that manage all these changes,

6:28

like Kubernetes or different cloud platforms,

6:32

so it automatically knows when something new pops up.

6:35

And it just starts monitoring it without you

6:37

having to lift a finger.

6:38

Wow, that's convenient.

6:39

Right.

6:39

And then there's static configuration.

6:41

That one's a bit simpler.

6:43

You basically just give Prometheus

6:44

a list, like a list of all the specific addresses

6:47

of the systems you want it to monitor.

6:49

And you put this list in its configuration file.

6:52

Both methods just make sure Prometheus

6:54

knows where to find the data it needs.

6:57

So we're collecting data.

6:58

We got this awesome language to ask all sorts of questions

7:01

about it.

7:01

But how do we actually see what's going on?

7:04

Well.

7:04

When I saw something in the documentation about graphing

7:06

and dashboarding support, it even

7:08

mentioned a built-in expression browser and integration

7:11

with Grafana.

7:12

Yes.

7:12

Seeing all that data visually is super important.

7:15

It helps you understand trends and spot problems quickly.

7:18

Prometheus actually has a basic tool built right in.

7:21

It's called the Expression Browser.

7:24

You can type in your PromQL queries right there

7:26

and see the results show up as graphs or tables.

7:29

Pretty handy.

7:30

So neat.

7:30

But for something more sophisticated,

7:32

something that'll give you a really good overview,

7:34

Prometheus often works with another open source

7:36

tool called Grafana.

7:38

Grafana can hook right into Prometheus

7:40

and use it as a data source.

7:42

Then you can build these really rich customizable dashboards.

7:46

It's great for visualizing all your important monitoring data

7:49

all in one place.

7:50

That sounds way better for getting a quick overview.

7:52

Now think about it.

7:53

All this time-based data we're talking about,

7:55

it's got to take up a lot of storage, right?

7:57

Oh, for sure.

7:58

Efficient storage is critical for any system that's

8:01

dealing with this much data over time.

8:03

But Prometheus is pretty clever about it.

8:06

It stores its data in this special format

8:08

on the local disk of the server it's running on.

8:11

This format is designed specifically

8:13

for time series data, making it super efficient to store

8:16

and query the info.

8:18

And to speed things up even more,

8:19

it keeps some of the most recent data in memory.

8:22

So it's optimized for speed.

8:24

Exactly.

8:24

And another important thing is that each Prometheus server

8:27

is kind of like its own little island.

8:29

It manages its own data and doesn't really

8:32

rely on other servers.

8:33

This makes it more reliable.

8:35

Because even if one server crashes,

8:37

the others can keep on trucking.

8:39

For larger setups, you might have multiple of Prometheus

8:41

instances running, each one keeping

8:43

an eye on a different part of your infrastructure.

8:45

So it's designed for redundancy, too.

8:47

That's great.

8:48

But speaking of things going wrong,

8:50

how does Prometheus actually let you know when there's a problem?

8:53

I think I read something about precise alerting

8:55

based on that PromQL language we talked about.

8:58

Ah, yeah, this is where Prometheus gets really proactive.

9:01

With PromQL, you can set up what are called alerting rules.

9:05

Hearing rules.

9:06

Yeah.

9:07

These rules are basically like instructions.

9:10

They say something like, if this specific thing happens

9:13

in our data, send out an alert.

9:15

Could be something like website response times getting too slow,

9:19

or a server running out of memory,

9:21

whatever you define as a potential problem.

9:23

And because these rules are based on PromQL,

9:26

they can be really specific.

9:28

Exactly.

9:29

They can even factor in those multidimensional labels

9:31

we talked about earlier.

9:32

So you can get super granular with your alerts.

9:35

Now when an alert is triggered, Prometheus

9:37

doesn't actually send the notification itself.

9:40

It hands it off to another tool called Alert Manager.

9:42

Alert Manager.

9:43

What's that do?

9:44

Well, Alert Manager is the one that

9:45

handles all the notifications.

9:47

It groups similar alerts together, silences them

9:49

if needed, and makes sure they get to the right people

9:52

through channels like email, Slack, or even text messages.

9:55

So it's like the messenger.

9:56

Exactly.

9:58

This whole system helps teams respond to issues super quickly

10:01

before they turn into major headaches.

10:03

That sounds incredibly valuable.

10:04

Now, for the folks out there who are actually building these software

10:07

services, how easy is it for them to make their apps,

10:10

talk to Prometheus, and share all these metrics?

10:13

I saw that the documentation mentioned client libraries.

10:17

Oh, yeah, the client libraries.

10:18

This is a huge advantage of using Prometheus.

10:21

It's got these libraries available for instrument and code

10:24

in over 10 popular programming languages, Python, Java, Go,

10:30

you name it.

10:31

Instrument, what's that mean?

10:32

It basically means add little snippets of code

10:35

to your application so it can expose

10:37

all those internal metrics in a format

10:39

that Prometheus understands.

10:40

So it's like speaking the same language.

10:42

Exactly.

10:43

These libraries make it super easy for developers

10:45

to keep track of all sorts of things,

10:47

like how many requests their app is handling,

10:49

how long those requests are taking to process,

10:51

how much memory the app is using, all that good stuff.

10:53

It's like building sensors right into your software

10:55

so you can get a clear read-in of its vital signs.

10:58

That's a great way to put it.

10:59

Now, what about systems or applications

11:01

that weren't built using these fancy client libraries,

11:05

like existing third-party software, or maybe even

11:09

hardware devices?

11:10

That's where exporters come in.

11:12

Exporters.

11:13

Yeah, think of them like translators.

11:15

They bridge the gap between different systems.

11:17

There's a ton of exporters out there

11:19

that can collect metrics from all sorts of third-party stuff,

11:22

like your operating system, Docker containers, databases,

11:26

web servers, all that.

11:27

And then they present those metrics in a way

11:29

that Prometheus can understand.

11:30

So basically, you can integrate Prometheus

11:32

with a huge range of technologies

11:34

without having to actually modify those systems directly.

11:37

That's incredibly flexible.

11:38

It is.

11:39

So for anyone listening who might

11:41

be interested in trying Prometheus out for themselves,

11:44

what's the best way to get started?

11:45

The GitHub page mentioned it's open source

11:48

and part of the Cloud Native Computing Foundation.

11:50

They also talked about different ways to install it.

11:52

You got it.

11:53

Prometheus is 100% open source, so it's

11:55

free to use and modify to your heart's content.

11:58

That's awesome.

11:59

It is.

12:00

And it's also a graduated project

12:01

under the Cloud Native Computing Foundation.

12:04

That's a pretty big deal, actually.

12:06

It means that it's a mature, stable, and widely used

12:09

technology within the Cloud Native world.

12:11

And getting started is pretty straightforward, too.

12:14

Like you mentioned, you can just download pre-compiled versions

12:16

for different operating systems straight

12:18

from the Prometheus website.

12:20

That's usually the quickest and easiest way

12:22

to get it up and running.

12:23

Makes sense.

12:24

But if you're comfortable with containers,

12:26

there's also official Docker images available.

12:30

And if you're more technically inclined,

12:32

or maybe you want to contribute to the project itself,

12:35

you can even build it directly from the source code.

12:37

The Prometheus website, prometheus.io,

12:40

has detailed instructions for all these methods.

12:43

So there's really an option for everyone.

12:45

Exactly.

12:46

So to quickly recap our deep dive into Prometheus,

12:48

it's this amazing open source monitoring system

12:51

and time series database that helps you understand

12:54

how healthy and how well your software

12:56

and services are performing.

12:58

Uses a multi-dimensional data model,

13:00

has this powerful query language called PromQL,

13:03

and uses a pull-based approach to gather metrics.

13:06

Right.

13:07

And it offers precise learning,

13:10

integrates with visualization tools like Grafana,

13:13

and has a whole ecosystem of client libraries

13:15

and exporters to make things easier.

13:18

It's honestly a fundamental tool

13:19

for keeping the digital world running smoothly.

13:21

Couldn't agree more.

13:23

And one final thought for you,

13:24

even if you don't work directly with software,

13:27

just think about how much you rely on digital services

13:29

every single day.

13:31

Behind the scenes, tools like Prometheus

13:33

are constantly working hard to make sure those services

13:36

are available and work in the way they should.

13:38

True. Yeah.

13:39

Just understanding the basics

13:40

of how these systems are monitored,

13:42

it can really give you a new appreciation

13:44

for the complexity and the effort

13:46

that goes into keeping our connected world running.

13:49

And if you're interested in digging deeper,

13:51

I highly encourage you to visit

13:52

the official Prometheus website, prometheus.io.

13:56

It's a fantastic resource, trust me.

13:58

I'll have to check it out.

13:59

Definitely do.

14:00

Well, that was our deep dive into Prometheus.

14:02

Thanks for joining us.

14:03

It was fun.

14:04

And a big thanks once again to Safe Server

14:06

for making this whole thing possible.

14:08

If you're looking for reliable software hosting

14:10

or expert advice on all things digital transformation,

14:13

be sure to visit their website at www.safeserver.de.

14:18

They're great.

14:18

They really are.

14:19

All right, that's it for today.

14:21

See ya.

14:21

See ya.

Today's Deep-Dive: Prometheus

Episode description

Persons