Today's Deep-Dive: bruin
Ep. 256

Today's Deep-Dive: bruin

Episode description

The episode discusses Bruin, an open-source data pipeline tool that integrates data ingestion, transformation, and quality control into a single framework. It aims to simplify the complex process of managing data by combining the functionalities of tools like dbt, Airbyte, and Great Expectations. Bruin helps users ingest data from various sources, transform it using SQL or Python, and ensure data quality through built-in checks. Key features include effortless data ingestion, flexible transformations, structured data management, isolated Python environments, and built-in data quality checks. It also offers code reusability, end-to-end validation, data visualization, data comparison, shared terminology via glossaries, and secure handling of sensitive information. Bruin supports a wide range of platforms and offers an active community for support. The tool is designed to be user-friendly, with easy installation and compatibility with various data systems. Its unified approach aims to reduce complexity, boost reliability, and streamline the data pipeline process, making it suitable for both beginners and experienced users.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now for 1 Euro - 30 days free!

Download transcript (.srt)
0:00

Welcome to the deep dive we take your sources articles research our own notes and

0:04

really boil them down pulling out the key

0:06

Insights to help you get informed fast now before we dive in today a big

0:10

Thank you to our supporter safe server if you need solid hosting for your software

0:14

or a hand with your digital transformation

0:17

Safe server is there for you find out more at?

0:20

www.safeserver.de

0:22

Okay, so you've probably felt this pain point moving and managing data. It often

0:28

feels just messy

0:30

Right like this huge sprawling thing pulling stuff from all over trying to make

0:33

sense of it

0:33

But what if what if there was one tool to kind of streamline all that that's what

0:37

we're looking at today

0:38

We're doing a deep dive into Bruin

0:40

It's an open source data pipeline tool and the promise is pretty big bringing data

0:43

ingestion

0:43

Transformation and quality control together all in one framework. We've been

0:47

digging through the github docs the CLI info

0:49

There's quite a bit here. Yeah, and our mission today really is to make Bruin clear

0:53

for you

0:53

Maybe you're deep in the data world already

0:56

Wrestling with these fragmented pipelines, or maybe you're just starting to ask

1:00

what even is a data pipeline. Mm-hmm

1:02

Either way you want to give you a solid handle on what Bruin actually does

1:05

Why people are talking about it and crucially how it makes a potentially really

1:10

complex process

1:11

Well simply we'll try to break it down. So you if you're new to this you can jump

1:16

right in and get it

1:16

Okay, let's start unpacking. One of the first things that jumps out from the

1:20

sources is this really bold claim almost

1:23

Catchy they say Bruin is like if dbt airbite and great expectations had a love

1:28

child

1:29

Uh-huh that paints quite a picture for anyone who knows those tools suggests

1:33

something really integrated

1:34

It does and what's interesting is how that analogy immediately tells you what Bruin

1:39

is

1:39

For listeners, maybe you're familiar dbt. That's huge for data transformation

1:44

Modeling data in your warehouse air bite. That's all about data ingestion moving

1:50

data from A to B open source and

1:52

Great expectations focuses purely on data quality making sure your data is reliable.

1:57

So calling Bruin their love child

1:59

Well, it means it's trying to bundle these three core jobs getting the data in

2:04

changing it and checking it into one single place

2:07

Right, which is a big shift from often having to jungle three separate tools

2:11

Each with its own way of doing things. It really tries to tackle that problem of

2:15

fragmentation

2:15

You mentioned that juggling act is definitely a headache. So if Bruin brings these

2:19

together

2:20

What are these three pillars really? What does it actually let you do by having

2:24

them unified? Okay

2:25

So let's break it down simply Bruin helps you fundamentally with three things in

2:29

your data's journey first ingest data

2:32

That's just getting data from wherever it lives could be an app a database a file

2:36

into your system

2:37

Think of it like turning on the taps. Okay, step one get the data exactly second

2:42

transform data

2:43

Once it's in you need to clean it

2:45

Maybe reshape it combine it get it ready for analysis. So it actually tells you

2:49

something useful

2:50

Bruin lets you do this using SQL which many analysts know or Python for more

2:56

complex stuff

2:57

That's like filtering and preparing the water got it cleaning it up. Mm-hmm and

3:00

third and this is really important ensure data quality

3:04

This means building checks to make sure the data is accurate. It's complete

3:08

It's trustworthy before you use it for making decisions. You need to know the water

3:12

safe to drink, right? Absolutely crucial

3:15

So the big picture here is managing that whole flow raw data in clean trusted

3:20

insights out all within Bruin

3:22

That consistency makes the whole process easier for you to understand and less

3:27

likely to break

3:28

That makes a lot of sense putting it all together like that. But okay high level

3:31

sounds good. How does Bruin actually do this?

3:33

What are the specific features that make this possible and hopefully easier for

3:37

people right the practical stuff?

3:39

And it seems designed to be packed with features, but still user friendly, which is

3:44

key

3:44

Especially if you're starting out

3:46

For you the listener knowing these features helps you see how Bruin simplifies

3:49

things first. There's effortless data ingestion

3:53

Uses something called ingester which like a built-in tool plus Python to pull data

3:57

from lots of different places

3:59

The key takeaway for you. You're not really limited on where your data comes from.

4:03

Okay, so flexibility right from the start exactly

4:05

Then flexible transformations like we said sequel and Python on lots of platforms

4:11

This is great because it doesn't force you into one camp if you're an analyst

4:15

loving sequel

4:16

Fine, if you're a data scientist using Python that works too good to have options.

4:20

Definitely then there's structured data management

4:23

This is about how it actually builds and saves your transformed data like creating

4:28

tables and it specifically mentions

4:30

Handling incremental tables really well incremental. Okay, what's the benefit there?

4:35

Ah, yeah, good question

4:36

It means instead of rebuilding everything from scratch every time which can be slow

4:40

and expensive

4:41

It just processes the new or changed data

4:43

So much faster much more efficient for you, especially with big data sets. Gotcha

4:49

saves time and resource

4:51

That's nicely another point maybe a bit technical but important isolated Python

4:55

environments

4:56

It uses something called OOF to run Python code in its own separate space

5:01

Think of it like giving each project its own clean toolkit. Why does this matter

5:05

for you?

5:06

It stops different projects tools from clashing making your pipelines much more

5:10

stable

5:10

Okay, avoids those weird dependency headaches. Exactly those then a big one built

5:16

in data quality

5:17

You don't need a whole separate tool just for checking data quality

5:20

Those checks are part of Bruin itself. You define your rules right there

5:23

This helps you trust your data from the get-go that feels really integrated quality

5:28

isn't an afterthought

5:28

Right and for making things quicker to build

5:30

code reusability

5:33

uses Jinja templating

5:35

Basically, it helps you avoid writing the same bits of code over and over

5:38

Less code usually means fewer mistakes and faster development for you. Make sense.

5:43

Don't repeat yourself

5:44

And finally end-to-end validation. There's a dry run feature. This lets you test

5:49

your entire pipeline before you actually run it for real

5:51

It's like a dress rehearsal to catch problems early before they cause issues

5:55

downstream

5:56

Okay

5:56

That dry run sounds super useful

5:58

Now you mentioned user-friendly earlier the sources talk about it being written in

6:02

Golang easy install a VS code extension

6:05

How developer-friendly does it actually feel in practice, especially getting

6:10

started?

6:10

Yeah, that seems to be a major focus Golang itself often means good performance and

6:14

reliability, which is obviously important for data pipelines

6:17

Yeah, but for you the user the practical side is key

6:20

They stress this single command installation that really lowers the barrier to

6:25

entry

6:25

You can theoretically get it running very quickly and start building not spend ages

6:30

considering the tool itself

6:31

That's huge if you're new to this minimizes the setup pain exactly and the VS code

6:36

extension

6:37

provides that familiar coding environment hints maybe

6:40

autocompletion

6:42

Things that make development smoother plus the flexibility in where you run at your

6:47

laptop a server like EC2 or even automated in GitHub

6:50

Actions means it adapts to your needs whether you're just playing around or

6:54

building something serious

6:56

So it covers the basics really well and seems easy to get going but what about more

7:00

advanced stuff our sources mentioned some other

7:03

Capabilities that sound like they add quite a bit more power. Yes, definitely

7:07

beyond just moving and transforming

7:09

There are features that offer deeper control and insight which become really

7:13

valuable as your data operations get more complex

7:15

One is data you need visualization

7:18

This lets you actually see how your data flows like a map from the source through

7:22

all the transformations to the end result

7:23

Why is that useful for you? It's massive for understanding

7:26

What depends on what for tracking down errors or even for compliance proving where

7:29

your data came from?

7:30

Ah, so you can actually trace the path that sounds incredibly helpful for debugging.

7:35

It really is then there's data comparison using data diff

7:38

This lets you compare tables

7:40

maybe across different systems or from different points in time think about validating

7:44

that a change you made didn't break anything or

7:46

Quickly spotting differences. It helps you answer. Is this data the same as that

7:51

data?

7:52

Really fast. Okay, like a verification step kind of yeah

7:55

Yeah

7:55

and for teamwork shared terminology via glossaries if you have multiple people

8:00

working with data agreeing on what terms mean like what defines a

8:04

Specific metric is critical a shared glossary helps everyone stay on the same page

8:08

Reducing confusion makes collaboration smoother for you and your team standardizing

8:12

the language. Good point. And finally secrets injection

8:16

This is about security. How do you handle sensitive things like database passwords

8:20

or API keys?

8:21

Bruin helps you inject these securely using environment variables rather than

8:25

writing them directly in your code, which is big

8:27

No, no keeps your credentials safe. That's crucial. Okay, and where can you

8:31

actually use all this the compatibility list looked huge?

8:34

AWS Athena BigQuery Snowflake Postgres Oracle

8:37

Yeah, that breadth of compatibility is definitely one of its big selling points

8:41

The sources say it supports many platforms out of the box and as a first-class

8:46

citizen

8:47

What that means for you is you can likely plug Bruin into whatever data systems

8:52

You're already using or plan to use whether your data is in a modern cloud

8:56

warehouse a traditional database or somewhere else

8:59

Bruin aims to connect to it smoothly. This gives you tremendous flexibility

9:03

You're not locked into one specific vendors ecosystem just because you chose Bruin.

9:09

That's a huge plus helps huge approve things a bit

9:11

Okay, so it's powerful flexible connects to almost anything for listeners thinking.

9:16

Okay, I'm interested

9:16

Maybe I want to try this. What's the next step is their support a community?

9:20

How easy is it to actually you know dip your toes in good question?

9:24

And the sources are pretty positive here tying back to that easy-to-use idea. They

9:28

highlight an active slack community

9:30

That's usually a great sign a place where you can ask questions get help share what

9:34

you're learning with other users

9:36

Maybe even the developers right community support is often key

9:39

Absolutely, and beyond that there's a quick start guide and detailed installation

9:42

docs mentioned

9:43

So it seems they provide the resources to help you get started it reinforces that

9:47

promise. Yeah

9:48

Powerful yes, but also designed for you to actually succeed in using it with help

9:53

available if you hit a snag

9:54

That's good to hear having that support structure makes trying a new tool much less

9:59

daunting

10:00

So we've covered a lot the core idea the features the advanced bits the

10:04

compatibility

10:05

Let's step back. What does this all really mean for you the listener?

10:09

Whether you're just starting with data pipelines or you're managing complex ones

10:13

already

10:13

Why should Bruin be on your radar? Yeah, the so what question?

10:17

I think the core value proposition the reason a unified tool like Bruin matters

10:21

boils down to simplifying complexity and boosting reliability

10:25

By bringing ingestion transformation and quality together. It just reduces the

10:29

number of moving parts

10:30

You have to manage for you

10:32

That means less mental juggling less switching between different tools with

10:35

different interfaces and concepts

10:37

That unified approach inherently makes it easier to build pipelines that are

10:40

understandable and hopefully more robust

10:42

Less chance for things to get lost in translation between tools exactly for someone

10:47

just starting it provides a clearer path

10:50

You can learn the whole end-to-end process within one framework

10:53

Which is likely less overwhelming than trying to stitch together three separate

10:56

things right away a smoother learning curve

10:59

I think so and for more experienced folks. It's about productivity

11:02

Less tool wrangling means more time focusing on the data logic itself

11:07

It provokes consistency makes pipelines easier to maintain it basically streamlines

11:12

the workflow

11:13

So it's about confidence really confidence in the data because the quality checks

11:17

are built in and

11:18

Confidence in the process because it's all managed in one coherent system. Well

11:23

said that's a great way to put it

11:24

Okay, let's push this idea a bit further

11:26

Imagine a future where that friction we talked about the friction between all these

11:31

different data tools just melts away

11:34

Where data teams aren't spending half their time just connecting pipes and fixing

11:38

leaks but can focus almost purely on discovery on finding insights

11:42

Bruin with its unified approach kind of hints at that future a single framework

11:47

simplifying the whole journey

11:48

If that approach becomes more common, how might that actually change the nature of

11:53

working with data?

11:54

What new possibilities might open up something to think about?

11:58

It's a powerful thought and based on what we've seen Bruin's combination of

12:01

features the wide reach from ingestion to quality the

12:04

Flexibility of the platform support all wrapped up with that focus on ease of use

12:08

It makes it a really compelling tool for anyone aiming to build solid dependable

12:12

data pipelines today

12:13

It feels like it's genuinely trying to empower users

12:16

Well, if you found this deep dive useful just think about what else is out there to

12:19

explore

12:20

Your curiosity is what fuels these discussions and a final huge

12:24

Thank you to safe server for supporting the deep dive remember for software hosting

12:28

and digital transformation support check them out at

12:31

exploring

12:31

exploring