Skip to main content

Create DAG documentation in Apache Airflow

One of the more powerful and lesser-known features of Airflow is that you can create Markdown-based DAG documentation that appears in the Airflow UI

DAG Docs Intro Example

After you complete this tutorial, you'll be able to:

  • Add custom doc strings to an Airflow DAG.
  • Add custom doc strings to an Airflow task.

Time to complete

This tutorial takes approximately 15 minutes to complete.

Assumed knowledge

Prerequisites

Step 1: Create an Astro project

To run Airflow locally, you first need to create an Astro project.

  1. Create a new directory for your Astro project:

    mkdir <your-astro-project-name> && cd <your-astro-project-name>
  2. Run the following Astro CLI command to initialize an Astro project in the directory:

    astro dev init
  3. Start your Airflow instance by running:

    astro dev start

Step 2: Create a new DAG

  1. In your dags folder, create a file named docs_example_dag.py.

  2. Copy and paste one of the following DAGs based on which coding style you're most comfortable with.

from airflow import DAG
from airflow.decorators import task, dag
from pendulum import datetime
import requests

@dag(
start_date=datetime(2022,11,1),
schedule="@daily",
catchup=False
)
def docs_example_dag():

@task()
def tell_me_what_to_do():
response = requests.get("https://www.boredapi.com/api/activity")
return response.json()["activity"]

tell_me_what_to_do()

docs_example_dag()

This DAG has one task called tell_me_what_to_do, which queries an API that provides a random activity for the day and prints it to the logs.

Step 3: Add docs to your DAG

You can add Markdown-based documentation to your DAGs that will render in the Grid, Graph and Calendar pages of the Airflow UI.

  1. In your docs_example_dag.py file, add the following doc string above the definition of your DAG:

    doc_md_DAG = """
    ### The Activity DAG

    This DAG will help me decide what to do today. It uses the [BoredAPI](https://www.boredapi.com/) to do so.

    Before I get to do the activity I will have to:

    - Clean up the kitchen.
    - Check on my pipelines.
    - Water the plants.

    Here are some happy plants:

    <img src="https://www.publicdomainpictures.net/pictures/80000/velka/succulent-roses-echeveria.jpg" alt="plants" width="300"/>
    """

    This doc string is written in Markdown. It includes a title, a link to an external website, a bulleted list, as well as an image which has been formatted using HTML. To learn more about Markdown, see The Markdown Guide.

  2. Add the documentation to your DAG by passing doc_md_DAG to the doc_md parameter of your DAG class as shown in the code snippet below:

@dag(
start_date=datetime(2022,11,1),
schedule="@daily",
catchup=False,
doc_md=doc_md_DAG
)
def docs_example_dag():
  1. Go to the Grid view and click on the DAG Docs banner to view the rendered documentation.

    DAG Docs

tip

Airflow will automatically pick up a doc string written directly beneath the definition of the DAG context and add it as DAG Docs. Additionally, using with DAG(): lets you pass the filepath of a markdown file to the doc_md parameter. This can be useful if you want to add the same documentation to several of your DAGs.

Step 4: Add docs to a task

You can also add docs to specific Airflow tasks using Markdown, Monospace, JSON, YAML or reStructuredText. Note that only Markdown will be rendered and other formats will be displayed as rich content.

To add documentation to your task, follow these steps:

  1. Add the following code with a string in Markdown format:

    doc_md_task = """

    ### Purpose of this task

    This task **boldly** suggests a daily activity.
    """
  2. Add the following code with a string written in monospace format:

    doc_monospace_task = """
    If you don't like the suggested activity you can always just go to the park instead.
    """
  3. Add the following code with a string in JSON format:

    doc_json_task = """
    {
    "previous_suggestions": {
    "go to the gym": ["frequency": 2, "rating": 8],
    "mow your lawn": ["frequency": 1, "rating": 2],
    "read a book": ["frequency": 3, "rating": 10],
    }
    }
    """
  4. Add the following code with a string written in YAML format:

    doc_yaml_task = """
    clothes_to_wear: sports
    gear: |
    - climbing: true
    - swimming: false
    """
  5. Add the following code containing reStructuredText:

    doc_rst_task = """
    ===========
    This feature is pretty neat
    ===========

    * there are many ways to add docs
    * luckily Airflow supports a lot of them

    .. note:: `Learn more about rst here! <https://gdal.org/contributing/rst_style.html#>`__
    """
  6. Create a task definition as shown in the following snippet. The task definition includes parameters for specifying each of the documentation strings you created. Pick the coding style you're most comfortable with.

@task(
doc_md=doc_md_task,
doc=doc_monospace_task,
doc_json=doc_json_task,
doc_yaml=doc_yaml_task,
doc_rst=doc_rst_task
)
def tell_me_what_to_do():
response = requests.get("https://www.boredapi.com/api/activity")
return response.json()["activity"]

tell_me_what_to_do()
  1. Go to the Airflow UI and run your DAG.

  2. In the Grid view, click on the green square for your task instance.

  3. Click on Task Instance Details.

    Task Instance Details

  4. See the docs under their respective attribute:

    All Task Docs

Conclusion

Congratulations! You now know how to add fancy documentation to both your DAGs and your Airflow tasks.