Skip to main content

Develop your Astro project

An Astro project contains all of the files necessary to test and run DAGs in a local Airflow environment and on Astro. This guide provides information about adding and organizing Astro project files, including:

  • Adding DAGs
  • Adding Python and OS-level packages
  • Setting environment variables
  • Applying changes
  • Running on-build commands

For information about running your Astro project in a local Airflow, see Run Airflow locally.

tip

As you add to your Astro project, Astronomer recommends reviewing the Astronomer Registry, a library of Airflow modules, providers, and DAGs that serve as the building blocks for data pipelines.

The Astronomer Registry includes:

Prerequisites

Add DAGs

In Apache Airflow, data pipelines are defined in Python code as Directed Acyclic Graphs (DAGs). A DAG is a collection of tasks and dependencies between tasks that are defined as code. See Introduction to Airflow DAGs.

DAGs are stored in the dags folder of your Astro project. To add a DAG to your project:

  1. Add the .py file to the dags folder.
  2. Save your changes. If you're using a Mac, use Command-S.
  3. Refresh your Airflow browser.
tip

Use the astro run <dag-id> command to run and debug a DAG from the command line without starting a local Airflow environment. This is an alternative to testing your entire Astro project with the Airflow webserver and scheduler. See Test your Astro project locally.

Add utility files

Airflow DAGs sometimes require utility files to run workflows. This can include:

  • SQL files.
  • Custom Airflow operators.
  • Python functions.

When more than one DAG in your Astro project needs a certain function or query, creating a shared utility file helps make your DAGs idempotent, more readable, and minimizes the amount of code you have in each DAG.

You can store utility files in the /dags directory of your Astro project. In most cases, Astronomer recommends adding your utility files to the /dags directory and organizing them into sub-directories based on whether they're needed for a single DAG or for multiple DAGs.

In the following example, the dags folder includes both types of utility files:

└── dags
├── my_dag
│ ├── my_dag.py
│ └── my_dag_utils.py # specific DAG utils
└── utils
└── common_utils.py # common utils
  1. To add utility files which are shared between all your DAGs, create a folder named utils in the dags directory of your Astro project. To add utility files only for a specific DAG, create a new folder in dags to store both your DAG file and your utility file.
  2. Add your utility files to the folder you created.
  3. Reference your utility files in your DAG code.
  4. Apply your changes. If you're developing locally, refresh the Airflow UI in your browser.

Utility files in the /dags directory will not be parsed by Airflow, so you don't need to specify them in .airflowignore to prevent parsing. If you're using DAG-only deploys on Astro, changes to this folder are deployed when you run astro deploy --dags and do not require rebuilding your Astro project into a Docker image and restarting your Deployment.

Add Airflow connections, pools, variables

Airflow connections connect external applications such as databases and third-party services to Apache Airflow. See Manage connections in Apache Airflow or Apache Airflow documentation.

To add Airflow connections, pools, and variables to your local Airflow environment, you have the following options:

  • Use the Airflow UI. In Admin, click Connections, Variables or Pools, and then add your values. These values are stored in the metadata database and are deleted when you run the astro dev kill command, which can sometimes be used for troubleshooting.
  • Modify the airflow_settings.yaml file of your Astro project. This file is included in every Astro project and permanently stores your values in plain-text. To prevent you from committing sensitive credentials or passwords to your version control tool, Astronomer recommends adding this file to .gitignore.
  • Use a secret backend, such as AWS Secrets Manager, and access the secret backend locally. See Configure an external secrets backend on Astro.

When you add Airflow objects to the Airflow UI of a local environment or to your airflow_settings.yaml file, your values can only be used locally. When you deploy your project to a Deployment on Astro, the values in this file are not included.

Astronomer recommends using the airflow_settings.yaml file so that you don’t have to manually redefine these values in the Airflow UI every time you restart your project. To ensure the security of your data, Astronomer recommends configuring a secrets backend.

Add test data or files for local testing

Use the include folder of your Astro project to store files for testing locally, such as test data or a dbt project file. The files in your include folder are included in your deploys to Astro, but they are not parsed by Airflow. Therefore, you don't need to specify them in .airflowignore to prevent parsing.

If you're running Airflow locally, apply your changes by refreshing the Airflow UI.

Configure airflow_settings.yaml (Local development only)

The airflow_settings.yaml file includes a template with the default values for all possible configurations. To add a connection, variable, or pool, replace the default value with your own.

  1. Open the airflow_settings.yaml file and replace the default value with your own.


    airflow:
    connections: ## conn_id and conn_type are required
    - conn_id: my_new_connection
    conn_type: postgres
    conn_host: 123.0.0.4
    conn_schema: airflow
    conn_login: user
    conn_password: pw
    conn_port: 5432
    conn_extra:
    pools: ## pool_name and pool_slot are required
    - pool_name: my_new_pool
    pool_slot: 5
    pool_description:
    variables: ## variable_name and variable_value are required
    - variable_name: my_variable
    variable_value: my_value

  2. Save the modified airflow_settings.yaml file in your code editor. If you use a Mac computer, for example, use Command-S.

  3. Import these objects to the Airflow UI. Run:

    astro dev object import
  4. In the Airflow UI, click Connections, Pools, or Variables to see your new or modified objects.

  5. Optional. To add another connection, pool, or variable, you append it to this file within its corresponding section. To create another variable, add it under the existing variables section of the same file. For example:


    variables:
    - variable_name: <my-variable-1>
    variable_value: <my-variable-value>
    - variable_name: <my-variable-2>
    variable_value: <my-variable-value-2>

Add Python, OS-level packages, and Airflow providers

Most DAGs need additional OS or Python packages to run. There are two primary kinds of Python packages that you might have to add to your Astro project:

  • Python libraries. If you’re using Airflow for a data science project, for example, you might use a data science library such as pandas or NumPy (numpy).
  • Airflow providers. Airflow providers are Python packages that contain all relevant Airflow modules for a third-party service. For example, apache-airflow-providers-amazon includes the hooks, operators, and integrations you need to access services on Amazon Web Services (AWS) with Airflow. See Provider packages.

Adding the name of a package to the packages.txt or requirements.txt files of your Astro project installs the package to your Airflow environment. Python packages are installed from your requirements.txt file using pip.

  1. Add the package name to your Astro project. If it’s a Python package, add it to requirements.txt. If it’s an OS-level package, add it to packages.txt. The latest version of the package that’s publicly available is installed by default.

    To pin a version of a package, use the following syntax:

    <package-name>==<version>

    For example, to install NumPy version 1.23.0, add the following to your requirements.txt file:

    numpy==1.23.0
  2. Restart your local environment.

  3. Confirm that your package was installed:

    astro dev bash --scheduler "pip freeze | grep <package-name>"

To learn more about the format of the requirements.txt file, see Requirements File Format in pip documentation. To browse Python libraries, see PyPi. To browse Airflow providers, see the Astronomer Registry.

Set environment variables locally

For local development, Astronomer recommends setting environment variables in your Astro project’s .env file. You can then push your environment variables from the .env file to a Deployment on Astro. To manage environment variables in the Cloud UI, see Environment variables.

If your environment variables contain sensitive information or credentials that you don’t want to expose in plain-text, you can add your .env file to .gitignore when you deploy these changes to your version control tool.

  1. Open the .env file in your Astro project directory.

  2. Add your environment variables to the .env file or run astro deployment variable list --save to copy environment variables from an existing Deployment to the file.

    Use the following format when you set environment variables in your .env file:

    KEY=VALUE

    Environment variables should be in all-caps and not include spaces.

  3. Restart your local environment.

  4. Run the following command to confirm that your environment variables were applied locally:

    astro dev bash --scheduler "/bin/bash && env"

    These commands output all environment variables that are running locally. This includes environment variables set on Astro Runtime by default.

  5. Optional. Run astro deployment variable create --load or astro deployment variable update --load to export environment variables from your .env file to a Deployment. You can view and modify the exported environment variables in the Cloud UI page for your Deployment.

info

For local environments, the Astro CLI generates an airflow.cfg file at runtime based on the environment variables you set in your .env file. You can’t create or modify airflow.cfg in an Astro project.

To view your local environment variables in the context of the generated Airflow configuration, run:

astro dev bash --scheduler "/bin/bash && cat airflow.cfg"

These commands output the contents of the generated airflow.cfg file, which lists your environment variables as human-readable configurations with inline comments.

Use multiple .env files

The Astro CLI looks for .env by default, but if you want to specify multiple files, make .env a top-level directory and create sub-files within that folder.

A project with multiple .env files might look like the following:

my_project
├── Dockerfile
├── dags
│ └── my_dag
├── plugins
│ └── my_plugin
├── airflow_settings.yaml
└── .env
├── dev.env
└── prod.env

Apply changes to a running project

If you're running your Astro project in a local Airflow environment, you must restart your environment when you make changes to certain files and want to apply them locally.

Specifically, you must restart your environment to apply changes for any of the following files:

  • packages.txt
  • Dockerfile
  • requirements.txt
  • airflow_settings.yaml

To restart your local Airflow environment, run:

astro dev restart

Advanced configuration

The following configurations are specific to advanced use cases.

Add Airflow plugins

If you need to build a custom view in the Airflow UI or build an application on top of the Airflow metadata database, you can use Airflow plugins. To use an Airflow plugin, add your plugin files to the plugins folder of your Astro project. To apply changes from this folder to a local Airflow environment, restart your local environment.

To learn more about Airflow plugins and how to build them, see Airflow Plugins in Airflow documentation or the Astronomer Airflow plugins guide.

Use .airflowignore

You can create an .airflowignore file in the dags directory of your Astro project to identify the files to ignore when you deploy to Astro or develop locally. This can be helpful if your team has a single Git repository that contains DAGs for multiple projects.

The .airflowignore file and the files listed in it must be in the same dags directory of your Astro project. Files or directories listed in .airflowignore are not parsed by the Airflow scheduler and the DAGs listed in the file don't appear in the Airflow UI.

For more information about .airflowignore, see .airflowignore in the Airflow documentation. To learn more about the code deploy process, see What happens during a code deploy.

  1. In the dags directory of your Astro project, create a new file named .airflowignore.

  2. List the files or sub-directories you want ignored when you push code to Astro or when you are developing locally. You should list the path for each file or directory relative to the dags directory. For example:

    mydag.py
    data-team-dags
    some-dags/ignore-this-dag.py
  3. Save your changes locally or deploy to Astro.

    Your local Airflow environment is automatically updated as soon as you save your changes to .airflowignore. To apply your change in Astro, you need to deploy. See Deploy code.

Run commands on build

To run additional commands as your Astro project is built into a Docker image, add them to your Dockerfile as RUN commands. These commands run as the last step in the image build process.

For example, if you want to run ls when your image builds, your Dockerfile would look like this:

FROM quay.io/astronomer/astro-runtime:9.1.0
RUN ls

This is supported both on Astro and in the context of local development.

Use an alternative Astro Runtime distribution

Starting with Astro Runtime 9, each version of Astro Runtime has a separate distribution for each currently supported Python version. Use an alternative Python distribution if any of your dependencies require a Python version other than the default Runtime Python version.

To use a specific Python distribution, update the first line in your Astro project Dockerfile to reference the required distribution:

FROM quay.io/astronomer/astro-runtime:<runtime-version>-python-<python-version>

For example, to use Python 3.10 with Astro Runtime version 9.0.0, you update the first line of your Dockerfile to the following:

FROM quay.io/astronomer/astro-runtime:9.0.0-python-3.10

Add a CA certificate to an Astro Runtime image

If you need your Astro Deployment to communicate securely with a remote service using a certificate signed by an untrusted or internal certificate authority (CA), you need to add the CA certificate to the trust store inside your Astro project's Docker image.

  1. In your Astro project Dockerfile, add the following entry below the existing FROM statement which specifies your Astro Runtime image version:

    USER root
    COPY <internal-ca.crt> /usr/local/share/ca-certificates/<your-company-name>/
    RUN update-ca-certificates
    USER astro
  2. Optional. Add additional COPY statements before the RUN update-ca-certificates stanza for each CA certificate your organization is using for external access.

  3. Restart your local environment or deploy to Astro. See Deploy code.

Install Python packages from private sources

Python packages can be installed into your image from public and private locations. To install packages listed on private PyPI indices or a private git-based repository, you need to complete additional configuration in your project.

Depending on where your private packages are stored, use one of the following setups to install these packages to an Astro project by customizing your Runtime image.

info

Deploying a custom Runtime image with a CI/CD pipeline requires additional configurations. For an example implementation, see GitHub Actions CI/CD templates.

Install Python packages from private GitHub repositories

This topic provides instructions for building your Astro project with Python packages from a private GitHub repository.

Although GitHub is used in this example, you should be able to complete the process with any hosted Git repository.

info

The following setup has been validated only with a single SSH key. You might need to modify this setup when using more than one SSH key per Docker image.

Prerequisites

danger

If your organization enforces SAML single sign-on (SSO), you must first authorize your key to be used with that authentication method. See Authorizing an SSH key for use with SAML single sign-on.

This setup assumes that each custom Python package is hosted within its own private GitHub repository. Installing multiple custom packages from a single private GitHub repository is not supported.

Step 1: Specify the private repository in your project

To add a Python package from a private repository to your Astro project, specify the Secure Shell (SSH) URL for the repository in a new private-requirements.txt file. Use the following format for the SSH URL:

git+ssh://git@github.com/<your-github-organization-name>/<your-private-repository>.git

For example, to install mypackage1 and mypackage2 from myorganization, you would add the following to your private-requirements.txt file:

git+ssh://git@github.com/myorganization/mypackage1.git
git+ssh://git@github.com/myorganization/mypackage2.git

This example assumes that the name of each of your Python packages is identical to the name of its corresponding GitHub repository. In other words,mypackage1 is both the name of the package and the name of the repository.

Step 2: Update Dockerfile

  1. Optional. Copy and save any existing build steps in your Dockerfile.

  2. Add the following to your packages.txt file:

    openssh-client
    git
  3. In your Dockerfile, add the following instructions:

    USER root
    RUN mkdir -p -m 0700 ~/.ssh && \
    echo "github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl" >> ~/.ssh/known_hosts

    COPY private-requirements.txt .
    RUN --mount=type=ssh,id=github pip install --no-cache-dir --requirement private-requirements.txt
    USER astro

    ENV PATH="/home/astro/.local/bin:$PATH"

    In order, these instructions:

    • Switch to root user for SSH setup and installation from private repo
    • Add the fingerprint for GitHub to known_hosts
    • Copy your private-requirements.txt file into the image
    • Install Python-level packages from your private repository as specified in your private-requirements.txt file. This securely mounts your SSH key at build time, ensuring that the key itself is not stored in the resulting Docker image filesystem or metadata.
    • Switch back to astro user
    • Add the user bin directory to PATH
    info

    See GitHub's documentation for all available SSH key fingerprints.

    If your repository isn't hosted on GitHub, replace the fingerprint with one from where the package is hosted. Use ssh-keyscan to generate the fingerprint.

Step 3: Build a custom Docker image

  1. Run the following command to automatically generate a unique image name:

    image_name=astro-$(date +%Y%m%d%H%M%S)
  2. Run the following command to create a new Docker image from your Dockerfile. Replace <ssh-key> with your SSH private key file name.

    DOCKER_BUILDKIT=1 docker build -f Dockerfile --progress=plain --ssh=github="$HOME/.ssh/<ssh-key>" -t $image_name .
  3. Optional. Test your DAGs locally. See Restart your local environment.

  4. Deploy the image to Astro using the Astro CLI:

    astro deploy --image-name $image_name

Your Astro project can now utilize Python packages from your private GitHub repository.

Unsupported project configurations

You can't use airflow.cfg or airflow_local_settings.py files in an Astro project. airflow_local_settings.py has no effect on Astro Deployments, and airflow.cfg has no effect on local environments and Astro Deployments.

An alternative to using airflow.cfg is to set Airflow environment variables in your .env file. See Set environment variables locally.

Was this page helpful?

Sign up for Developer Updates

Get a summary of new Astro features once a month.

You can unsubscribe at any time.
By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.