View data lineage on Astro
The Lineage tab in the Astro UI can help you troubleshoot issues with your data pipelines and understand the movement of data across your Organization.
From the Lineage tab on Astro, you can access the following four pages:
- Runs: A real-time overview of all runs that emit data lineage across your Organization. A run can be an Airflow task run or any other process configured to emit lineage metadata to Astronomer, such as a Spark job.
- Datasets: A real-time overview of all recent datasets that your DAGs have read or written to.
- Issues: A view of potential issues or statistical inconsistencies related to your runs or datasets.
- Lineage: A graph view that visualizes data lineage.
- Integrations: A view of your current data lineage integrations.
Lineage datasets are different from Airflow's datasets feature. Airflow datasets are defined explicitly in your DAG code, whereas lineage metadatasets are extracted and generated using lineage metadata. The Astro UI currently does not show information about Airflow datasets.
You can use these pages to diagnose issues that may be difficult to troubleshoot in other environments. For example, if an Airflow task failed because a database schema changed, you can use the Lineage page of the Astro UI to determine which run caused the change and which downstream tasks failed as a result.
For more information on data lineage and related concepts, see Data lineage concepts.
All members of your Astro Organization can view the Lineage tab regardless of their Workspace permissions. The Lineage tab could contain plain-text SQL and Python code from any system that emits lineage metadata to Astro. If this is a security concern for your organization, reach out to Astronomer support.
Prerequisites
To view lineage metadata for Deployments, you must configure Airflow and your external systems to emit lineage metadata. See Enable data lineage for external systems.
View the lineage graph for a data pipeline
You can use the search field at the top of the Astro UI to view the lineage graph for one of your data pipelines, search for a DAG, task, or dataset. You can also search for runs from other tools with lineage integrations, including dbt or Spark.
The search results include the namespace that emitted the matching event. When an Astro Deployment emits the lineage event, the namespace matches the Deployment namespace shown in the Deployments page of the Astro UI. Clicking a search result opens the Lineage page and shows the lineage graph for the selected job or dataset. You can also access the lineage graph for a recent job run in the Runs page below Most Recent Runs.
The Lineage page shows lineage metadata only for the most recent run of a given data pipeline. To explore lineage metadata from previous runs, see Compare lineage graphs from previous runs.
By default, when you access the Lineage page from the left menu, the last lineage graph you viewed is displayed. If you go directly to the Lineage page without viewing a lineage graph, no lineage graph data is displayed. If this happens, you can access a job run using the search bar or use the Runs page to populate the Lineage page with data.
A lineage graph with a single node indicates that the run you selected didn't emit any information about input or output datasets. Typically, this occurs when an Airflow task isn't using a supported Airflow operator. You can still view the duration of this run over time.