Today's Deep-Dive: Open Lineage
Ep. 262

Today's Deep-Dive: Open Lineage

Episode description

The deep dive discusses the challenges of data observability and the role of Open Lineage in solving these issues. Before Open Lineage, companies had to build custom tracking solutions for data lineage, which was time-consuming and prone to breaking with tool updates. Open Lineage standardizes data tracking by embedding lineage collection within data tools, making it more reliable and maintainable. The standard defines core entities like runs, jobs, and data sets, and uses facets to add flexible metadata. Open Lineage captures and sends this information through a standard API, allowing different tools to communicate lineage data consistently. The standard is widely adopted and supported by major tools like Apache Spark and DBT. The document also mentions related projects like Marquez, which visualizes lineage data, and Egeria, which manages metadata across enterprises. Open Lineage is transforming data observability from a fragmented, custom-built process into a consistent, shared framework, embedding observability directly into data infrastructure.

Gain digital sovereignty now and save costs

Let’s have a look at your digital challenges together. What tools are you currently using? Are your processes optimal? How is the state of backups and security updates?

Digital Souvereignty is easily achived with Open Source software (which usually cost way less, too). Our division Safeserver offers hosting, operation and maintenance for countless Free and Open Source tools.

Try it now for 1 Euro - 30 days free!