ever take a look at your car's dashboard?
You know, just to check that everything's running
how it should be.
Yeah, definitely.
Well, today we're doing kind of the same thing,
but for the digital world,
we're peeking under the hood at these systems
that make sure all our favorite websites and apps
are running smooth and staying healthy.
Makes sense.
Yeah, like, ever had a website just freeze up on you?
Or maybe an app that just crashes out of nowhere?
Happens all the time.
Well, chances are there's a monitoring system
working behind the scenes,
trying to figure out what went wrong.
And that's exactly what we're gonna explore today.
Sounds interesting.
Specifically, we're gonna do a deep dive into Prometheus.
Prometheus.
Yeah, super popular.
It's this open source tool that a lot of folks use for,
well, this kind of monitoring.
And for this deep dive, we went straight to the source.
We got all this great info directly
from the Prometheus project on GitHub
and their official website too.
So, straight from the horse's mouth.
Exactly.
But before we really get started,
I want to give a big shout out to Safe Server.
They're the ones who made this whole deep dive possible.
They provide amazing hosting for software,
and they even offer some really expert advice
on digital transformation too.
So, if you're interested, check them out.
www.safe-server.de.
I'll have to take a look.
So, today's goal is pretty simple.
We want to give you a clear, easy-to-understand
introduction to Prometheus.
What it is, how it works, and why
it's become so important for, well, all things software.
Sounds good to me.
Yeah.
We're really aiming to make this accessible to everyone,
even if you're just kind of curious about what
goes on behind the scenes in the digital world.
Right, right.
OK, so let's jump in.
Right off the bat, the Prometheus GitHub page calls it,
and I'm quoting here, a systems and service monitoring system
and a time series database.
Yeah, so what does that actually mean in plain English?
Well, a monitoring system is, well,
pretty much what it sounds like.
It's a way to keep a close eye on all those digital tools
we use, making sure they're running the way they should,
kind of like a doctor, constantly checking
a patient's vital signs.
And then there's the whole time series database part.
Think of it like a diary, but for your software.
It's constantly recording different measurements,
like how busy the software is or how fast it's responding,
and it does this at specific points in time.
So you can actually see how things change over days, hours,
even minutes.
Oh, interesting.
So it's like trapping the ups and downs.
Exactly.
And the way Prometheus organizes all that info in this diary,
it's pretty neat.
Instead of just having simple entries,
it uses something called a multi-dimensional data model.
Multi-dimension?
Yeah, it basically means that every bit of info,
Prometheus records, has a specific name.
We call that the metric name.
It could be something like website visits or even
server temperature.
OK, so far so good.
But here's the cool part.
It also has these extra labels attached to it.
Those are the key value dimensions.
Think of it like this.
Instead of just saying temperature, 25 degrees,
Prometheus might record something like temperature.
And then in curly brackets, room equals living room,
sensor equals one, and then 25 degrees.
So it's adding context to the numbers.
Exactly.
Those extra labels like room and sensor,
they give you a much richer picture of what's going on.
Then you can ask some really specific questions like,
what was the temperature in the living room for the past hour?
That makes sense.
Yeah.
And to actually ask those specific questions,
Prometheus has this special language.
It's called PROMQL.
PROMQL?
Stands for Prometheus Query Language.
OK, another acronym.
I know, right?
But query language might sound kind of intimidating,
but it's really just a way to search and analyze
all that data in your software's diary.
So filter in the noise?
Exactly.
Imagine you have a giant spreadsheet
with all these measurements.
PROMQL is like the super-powered search bar,
letting you pull out exactly the info you need.
You can use it to create graphs, spot trends, even
set up alerts.
For example, you could use PROMQL to say,
show me a graph of how many people logged
into the website in the last hour.
Or maybe something like, tell me if the server's memory usage
has been too high for more than five minutes.
That's pretty powerful.
Oh, yeah.
That's the power of PROMQL.
It helps you turn all that raw data into actual useful insights.
Makes sense.
So Prometheus is recording all this data,
but how does it actually collect it?
Well, the way it collects data is pretty interesting, too.
It uses what's known as an HTTP pull model.
An HTTP pull model.
Yeah, so basically, instead of your software
just sending its measurements to Prometheus automatically,
Prometheus actually reaches out and asks
for the latest readings.
And it does this at regular intervals.
So it's actively checking in.
Exactly.
Kind of like a health inspector.
They go around to different restaurants
and check on things instead of waiting for the restaurants
to report in.
That's a good analogy.
Yeah, this whole polling approach,
it gives you more control and can be way more
reliable in some cases.
Especially if the system's being monitored,
they have spotty internet connections.
While Prometheus has to actively go and get the data,
it's actually more robust than relying
on each individual system to constantly push
the data its way.
I see.
So Prometheus is actively going out and asking for this info.
But what happens if you have a process that
only runs for a short time?
Like a script that just does something once a day
and then shuts down.
It wouldn't even be there for Prometheus to check in with.
Oh, that's a great point.
So for those kind of short-lived batch jobs,
as they're called, Prometheus can also handle pushing data.
And it does this through something called a gateway.
A gateway.
Yeah, so instead of waiting around to be asked,
the batch job can send its data directly to this gateway
when it finishes up its work.
Then Prometheus can come along later
and pull the data from there.
Like if you need to send a message,
but the other person's not always available,
so you just leave it at a central mailbox
for them to pick up later.
Makes sense.
So Prometheus is gathering all this data,
but it needs to know where to go to get it
in the first place, right?
How does it figure out what to actually monitor?
I saw something about service discovery and static configuration.
Yeah.
So service discovery is basically Prometheus
being really smart and automatically finding
the things it should be keeping an eye on.
In today's software world, things are constantly changing.
New servers popping up, old ones shutting down.
But Prometheus can actually connect
with systems that manage all these changes,
like Kubernetes or different cloud platforms,
so it automatically knows when something new pops up.
And it just starts monitoring it without you
having to lift a finger.
Wow, that's convenient.
Right.
And then there's static configuration.
That one's a bit simpler.
You basically just give Prometheus
a list, like a list of all the specific addresses
of the systems you want it to monitor.
And you put this list in its configuration file.
Both methods just make sure Prometheus
knows where to find the data it needs.
So we're collecting data.
We got this awesome language to ask all sorts of questions
about it.
But how do we actually see what's going on?
Well.
When I saw something in the documentation about graphing
and dashboarding support, it even
mentioned a built-in expression browser and integration
with Grafana.
Yes.
Seeing all that data visually is super important.
It helps you understand trends and spot problems quickly.
Prometheus actually has a basic tool built right in.
It's called the Expression Browser.
You can type in your PromQL queries right there
and see the results show up as graphs or tables.
Pretty handy.
So neat.
But for something more sophisticated,
something that'll give you a really good overview,
Prometheus often works with another open source
tool called Grafana.
Grafana can hook right into Prometheus
and use it as a data source.
Then you can build these really rich customizable dashboards.
It's great for visualizing all your important monitoring data
all in one place.
That sounds way better for getting a quick overview.
Now think about it.
All this time-based data we're talking about,
it's got to take up a lot of storage, right?
Oh, for sure.
Efficient storage is critical for any system that's
dealing with this much data over time.
But Prometheus is pretty clever about it.
It stores its data in this special format
on the local disk of the server it's running on.
This format is designed specifically
for time series data, making it super efficient to store
and query the info.
And to speed things up even more,
it keeps some of the most recent data in memory.
So it's optimized for speed.
Exactly.
And another important thing is that each Prometheus server
is kind of like its own little island.
It manages its own data and doesn't really
rely on other servers.
This makes it more reliable.
Because even if one server crashes,
the others can keep on trucking.
For larger setups, you might have multiple of Prometheus
instances running, each one keeping
an eye on a different part of your infrastructure.
So it's designed for redundancy, too.
That's great.
But speaking of things going wrong,
how does Prometheus actually let you know when there's a problem?
I think I read something about precise alerting
based on that PromQL language we talked about.
Ah, yeah, this is where Prometheus gets really proactive.
With PromQL, you can set up what are called alerting rules.
Hearing rules.
Yeah.
These rules are basically like instructions.
They say something like, if this specific thing happens
in our data, send out an alert.
Could be something like website response times getting too slow,
or a server running out of memory,
whatever you define as a potential problem.
And because these rules are based on PromQL,
they can be really specific.
Exactly.
They can even factor in those multidimensional labels
we talked about earlier.
So you can get super granular with your alerts.
Now when an alert is triggered, Prometheus
doesn't actually send the notification itself.
It hands it off to another tool called Alert Manager.
Alert Manager.
What's that do?
Well, Alert Manager is the one that
handles all the notifications.
It groups similar alerts together, silences them
if needed, and makes sure they get to the right people
through channels like email, Slack, or even text messages.
So it's like the messenger.
Exactly.
This whole system helps teams respond to issues super quickly
before they turn into major headaches.
That sounds incredibly valuable.
Now, for the folks out there who are actually building these software
services, how easy is it for them to make their apps,
talk to Prometheus, and share all these metrics?
I saw that the documentation mentioned client libraries.
Oh, yeah, the client libraries.
This is a huge advantage of using Prometheus.
It's got these libraries available for instrument and code
in over 10 popular programming languages, Python, Java, Go,
you name it.
Instrument, what's that mean?
It basically means add little snippets of code
to your application so it can expose
all those internal metrics in a format
that Prometheus understands.
So it's like speaking the same language.
Exactly.
These libraries make it super easy for developers
to keep track of all sorts of things,
like how many requests their app is handling,
how long those requests are taking to process,
how much memory the app is using, all that good stuff.
It's like building sensors right into your software
so you can get a clear read-in of its vital signs.
That's a great way to put it.
Now, what about systems or applications
that weren't built using these fancy client libraries,
like existing third-party software, or maybe even
hardware devices?
That's where exporters come in.
Exporters.
Yeah, think of them like translators.
They bridge the gap between different systems.
There's a ton of exporters out there
that can collect metrics from all sorts of third-party stuff,
like your operating system, Docker containers, databases,
web servers, all that.
And then they present those metrics in a way
that Prometheus can understand.
So basically, you can integrate Prometheus
with a huge range of technologies
without having to actually modify those systems directly.
That's incredibly flexible.
It is.
So for anyone listening who might
be interested in trying Prometheus out for themselves,
what's the best way to get started?
The GitHub page mentioned it's open source
and part of the Cloud Native Computing Foundation.
They also talked about different ways to install it.
You got it.
Prometheus is 100% open source, so it's
free to use and modify to your heart's content.
That's awesome.
It is.
And it's also a graduated project
under the Cloud Native Computing Foundation.
That's a pretty big deal, actually.
It means that it's a mature, stable, and widely used
technology within the Cloud Native world.
And getting started is pretty straightforward, too.
Like you mentioned, you can just download pre-compiled versions
for different operating systems straight
from the Prometheus website.
That's usually the quickest and easiest way
to get it up and running.
Makes sense.
But if you're comfortable with containers,
there's also official Docker images available.
And if you're more technically inclined,
or maybe you want to contribute to the project itself,
you can even build it directly from the source code.
The Prometheus website, prometheus.io,
has detailed instructions for all these methods.
So there's really an option for everyone.
Exactly.
So to quickly recap our deep dive into Prometheus,
it's this amazing open source monitoring system
and time series database that helps you understand
how healthy and how well your software
and services are performing.
Uses a multi-dimensional data model,
has this powerful query language called PromQL,
and uses a pull-based approach to gather metrics.
Right.
And it offers precise learning,
integrates with visualization tools like Grafana,
and has a whole ecosystem of client libraries
and exporters to make things easier.
It's honestly a fundamental tool
for keeping the digital world running smoothly.
Couldn't agree more.
And one final thought for you,
even if you don't work directly with software,
just think about how much you rely on digital services
every single day.
Behind the scenes, tools like Prometheus
are constantly working hard to make sure those services
are available and work in the way they should.
True. Yeah.
Just understanding the basics
of how these systems are monitored,
it can really give you a new appreciation
for the complexity and the effort
that goes into keeping our connected world running.
And if you're interested in digging deeper,
I highly encourage you to visit
the official Prometheus website, prometheus.io.
It's a fantastic resource, trust me.
I'll have to check it out.
Definitely do.
Well, that was our deep dive into Prometheus.
Thanks for joining us.
It was fun.
And a big thanks once again to Safe Server
for making this whole thing possible.
If you're looking for reliable software hosting
or expert advice on all things digital transformation,
be sure to visit their website at www.safeserver.de.
They're great.
They really are.
All right, that's it for today.
See ya.
See ya.