Today's Deep-Dive: Open Lineage

The deep dive discusses the challenges of data observability and the role of Open Lineage in solving these issues. Before Open Lineage, companies had to build custom tracking solutions for data lineage, which was time-consuming and prone to breaking with tool updates. Open Lineage standardizes data tracking by embedding lineage collection within data tools, making it more reliable and maintainable. The standard defines core entities like runs, jobs, and data sets, and uses facets to add flexible metadata. Open Lineage captures and sends this information through a standard API, allowing different tools to communicate lineage data consistently. The standard is widely adopted and supported by major tools like Apache Spark and DBT. The document also mentions related projects like Marquez, which visualizes lineage data, and Egeria, which manages metadata across enterprises. Open Lineage is transforming data observability from a fragmented, custom-built process into a consistent, shared framework, embedding observability directly into data infrastructure.

https://openlineage.io/

Today's Deep-Dive: Open Lineage

Episode description

Persons