Develop Your Astro project
Overview
This document explains the various ways you can modify and build your Astro project to fit your team's use case. Specifically, this guide provides instructions on how to:
- Build and run a project
- Deploy changes to a project
- Add dependencies to your project
- Run on-build commands
- Add connections, pools, and environment variables locally
Prerequisites
To develop an Astro project and test it locally, you need:
- An existing Astro project.
- The Astro CLI
- Docker
Build and Run a Project Locally
To run your Astro project locally, run the following command:
astrocloud dev start
This command builds your project and spins up 4 Docker containers on your machine, each for a different Airflow component:
- Postgres: Airflow's metadata database
- Webserver: The Airflow component responsible for rendering the Airflow UI
- Scheduler: The Airflow component responsible for monitoring and triggering tasks
- Triggerer: The Airflow component responsible for running Triggers and signaling tasks to resume when their conditions have been met. The Triggerer is used exclusively for tasks that are run with deferrable operators.
Once the project builds, you can access the Airflow UI by going to http://localhost:8080/
and logging in with admin
for both your username and password. You can also access your Postgres database at localhost:5432/postgres
.
info
The Astro CLI is a wrapper around Docker Compose, a tool for defining and running multi-container Docker applications. If you're familiar with Docker Compose, you'll recognize that the astrocloud dev start
command, for example, is functionally equivalent to docker compose start
.
tip
If you see Error: cannot start, project already running
when you run this command, it means your local Airflow environment is already running your project. If there are changes you'd like to apply to your project, see restart your local environment.
Restart Your Local Environment
To restart your local Airflow environment, run the following command:
astrocloud dev restart
These commands rebuild your image and restart the Docker containers running on your local machine with that new image. Alternatively, you can run just astrocloud dev stop
to stop your Docker containers without restarting or rebuilding your project.
Make Changes to Your Project
All Astro projects require you to specify a Debian-based Astro Runtime image in a Dockerfile
. When you run your project locally or on Astro, all of your DAG code, packages, and configurations are built into a Docker image based on Astro Runtime.
Depending on the change you're making to your Astro project, you might have to rebuild your image to run your changes locally.
DAG Code Changes
All changes made to files in the following directories will be live in your local Airflow environment as soon as you save them to your code editor:
dags
plugins
include
Once you save your changes, refresh the Airflow UI in your browser to see them render.
Environment Changes
All changes made to the following files require rebuilding your image:
packages.txt
Dockerfile
requirements.txt
airflow_settings.yaml
To rebuild your project after making a change to any of these files, you must restart your local environment.
Explore Airflow Providers and Modules
As you customize your Astro project and expand your use case for Airflow, we recommend exploring the Astronomer Registry, a library of Airflow modules, providers, and DAGs that serve as the building blocks for data pipelines.
The Astronomer Registry includes:
- Example DAGs for many data sources and destinations. For example, you can build out a data quality use case with Snowflake and Great Expectations based on the Great Expectations Snowflake Example DAG.
- Documentation for Airflow providers, such as Databricks, Snowflake, and Postgres. This documentation is comprehensive and based on Airflow source code.
- Documentation for Airflow modules, such as the PythonOperator, BashOperator, and S3ToRedshiftOperator. These modules include guidance on how to set Airflow connections and their parameters.
As you browse the Astronomer Registry, follow this document for instructions on how to install providers as Python packages and make other changes to your Astro project.
Add Python and OS-level Packages
To build Python and OS-level packages into your Astro project, add them to your requirements.txt
and packages.txt
files. Add Python packages to your requirements.txt
and OS-level packages to your packages.txt
file.
To pin a version of a package, use the following syntax:
<package-name>==<version>
To exclusively use Pymongo 3.7.2, for example, add the following line to your requirements.txt
file:
pymongo==3.7.2
If you don't pin a package to a version, the latest version of the package that's publicly available is installed by default.
Once you've saved these packages in your project files, restart your local environment.
Confirm your package was installed
If you added pymongo
to your requirements.txt
file, for example, you can confirm that it was properly installed by running a docker exec
command into your Scheduler:
- Run
docker ps
to identify the docker containers running on your machine - Copy the container ID of the Scheduler container
- Run the following:
docker exec -it <scheduler-container-id> pip freeze | grep pymongo
pymongo==3.7.2
Add DAGs
DAGs are stored in the dags
folder of your Astro project. To add a DAG to your project, simply add its .py
file to this folder.
Add DAG Helper Functions
To build additional helper functions for DAGs into your Astro project, we recommend adding a folder with a set of files that can be used by Airflow DAGs.
To do this:
Add your directory of helper functions to your local project:
.
├── airflow_settings.yaml
├── dags
│ └── example-dag-basic.py
│ └── example-dag-advanced.py
├── Dockerfile
├── helper_functions
│ └── helper.py
├── include
├── tests
│ └── test_dag_integrity.py
├── packages.txt
├── plugins
│ └── example-plugin.py
└── requirements.txtIn this example, the directory is named
helper_functions
. You can give it any name.
To confirm that your helper functions were successfully installed:
Run
docker ps
to identify the 3 running docker containers on your machineCopy the container ID of your Scheduler container
Run the following command to see your new directory in the container:
$ docker exec -it <scheduler-container-id> /bin/bash
bash-4.4$ ls
Dockerfile airflow_settings.yaml helper_functions logs plugins unittests.cfg
airflow.cfg dags include packages.txt requirements.txt
Configure airflow_settings.yaml
(Local Development Only)
When you first initialize a new Astro project, a file called airflow_settings.yaml
is automatically generated. With this file, you can configure and programmatically generate Airflow Connections, Pools, and Variables so that you don't have to manually redefine these values in the Airflow UI every time you restart your project.
As a security measure, airflow_settings.yaml
works only in local environments. Once you deploy your project to a Deployment on Astro, the values in this file will not be included. To more easily manage Airflow secrets on Astro, we recommend configuring a secrets backend.
caution
If you are storing your project in a public directory or version control tool, we recommend adding this file to your .gitignore
or equivalent secret management service.
Add Airflow Connections, Pools, and Variables
By default, the airflow_settings.yaml
file includes the following template:
airflow:
connections: ## conn_id and conn_type are required
- conn_id: my_new_connection
conn_type: postgres
conn_host: 123.0.0.4
conn_schema: airflow
conn_login: user
conn_password: pw
conn_port: 5432
conn_extra:
pools: ## pool_name and pool_slot are required
- pool_name: my_new_pool
pool_slot: 5
pool_description:
variables: ## variable_name and variable_value are required
- variable_name: my_variable
variable_value: my_value
This template includes default values for all possible configurations. Make sure to replace these default values with your own and specify those that are required to avoid errors at build time. To add another Connection, Pool, or Variable, append it to this file within its corresponding section. To create another Variable, for example, add it under the existing variables
section of the same file:
variables:
- variable_name: <my-variable-1>
variable_value: <my-variable-value>
- variable_name: <my-variable-2>
variable_value: <my-variable-value-2>
Once you save these values in your airflow_settings.yaml
, restart your local environment. When you access the Airflow UI locally, you should see these values in the Connections, Pools, and Variables tabs.
Run Commands on Build
To run additional commands as your Astro project is built into a Docker image, add them to your Dockerfile
as RUN
commands. These commands run as the last step in the image build process.
For example, if you want to run ls
when your image builds, your Dockerfile
would look like this:
FROM quay.io/astronomer/astro-runtime:5.0.1
RUN ls
This is supported both on Astro and in the context of local development.
Override the CLI's Docker Compose File (Local Development Only)
The Astro CLI is built on top of Docker Compose, which is a tool for defining and running multi-container Docker applications. You can override the CLI's Docker Compose configurations by adding a docker-compose.override.yml
file to your Astro project. Any values in this file override the CLI's default settings whenever you run astrocloud dev start
.
To see what values you can override, reference the CLI's Docker Compose file. The linked file is for the original Astro CLI, but the values here are identical to those used in the Astro CLI. Common use cases for Docker Compose overrides include:
- Adding extra containers to mimic services that your Airflow environment needs to interact with locally, such as an SFTP server.
- Change the volumes mounted to any of your local containers.
For example, to add another volume mount for a directory named custom_dependencies
, add the following to your docker-compose.override.yml
file:
version: "3.1"
services:
scheduler:
volumes:
- /home/astronomer_project/custom_dependencies:/usr/local/airflow/custom_dependencies:ro
Make sure to specify version: "3.1"
and follow the format of the source code file linked above.
To see your override file live in your local Airflow environment, run the following command for any container running Airflow:
docker exec -it <container-name> ls -al
info
The Astro CLI does not support overrides to environment variables that are required globally. For the list of environment variables that Astro enforces, see Global Environment Variables. To learn more about environment variables, read Environment Variables.
Set Environment Variables via .env (Local Development Only)
For Astro projects deployed on Astro, we generally recommend setting environment variables via the Cloud UI. For local development, you can use the Astro CLI to set environment variables in your project's .env
file.
To add Environment Variables locally:
- Open the
.env
file in your Astro project directory. - Add your environment variables to the
.env
file. - Rebuild your image by running
astrocloud dev start --env .env
.
When setting environment variables in your .env
file, use the following format:
AIRFLOW__CORE__DAG_CONCURRENCY=5
tip
If your environment variables contain sensitive information or credentials that you don't want to expose in plain-text, you may want to add your .env
file to .gitignore
when you deploy these changes to your version control tool.
Confirm your environment variables were applied
By default, Airflow environment variables are hidden in the Airflow UI for both local environments and Astro Deployments. To confirm your environment variables via the Airflow UI, set AIRFLOW__WEBSERVER__EXPOSE_CONFIG=True
in either your Dockerfile or .env
file.
Alternatively, you can run:
docker ps
This will output the 3 Docker containers that comprise the Airflow environment on your local machine: the Airflow Scheduler, Webserver, and Postgres metadata database.
Now, create a Bash session in your scheduler container by running:
docker exec -it <scheduler-container-name> /bin/bash
If you run ls -1
following this command, you'll see a list of running files:
bash-5.0$ ls -1
Dockerfile airflow.cfg airflow_settings.yaml dags include logs packages.txt plugins requirements.txt unittests.cfg
Now, run:
env
This should output all Environment Variables that are running locally, some of which are set by you and some of which are set by Astronomer by default.
tip
You can also run cat airflow.cfg
to output all contents in that file.
Use multiple .env files
The Astro CLI will look for .env
by default, but if you want to specify multiple files, make .env
a top-level directory and create sub-files within that folder.
A project with multiple .env
files might look like the following:
my_project
├── Dockerfile
└── dags
└── my_dag
├── plugins
└── my_plugin
├── airflow_settings.yaml
├── .env
└── dev.env
└── prod.env
Install Python Packages from Private Sources
Python packages can be installed from public and private locations into your image. To install public packages listed on PyPI, follow the steps in Add Python and OS-level Packages. To install packages listed on private PyPI indices or a private git-based repository, you need to complete additional configuration in your project.
Depending on where your private packages are stored, use one of the following setups to install your packages to an Astro project by customizing your Runtime image.
- Private GitHub Repo
- Private PyPi Index
Install Python Packages from Private GitHub Repositories
This topic provides instructions for building your Astro project with Python packages from a private GitHub repository. At a high level, this setup entails specifying your private packages in requirements.txt
, creating a custom Docker image that mounts a GitHub SSH key for your private GitHub repositories, and building your project with this Docker image.
Although this setup is based on GitHub, the general steps can be completed with any hosted Git repository.
info
The following setup has been validated only with a single SSH key. Due to the nature of ssh-agent
, you might need to modify this setup when using more than one SSH key per Docker image.
Prerequisites
To install Python packages from a private GitHub repository on Astro, you need:
- The Astro CLI.
- An Astro project.
- Custom Python packages that are installable via pip.
- A private GitHub repository for each of your custom Python packages.
- A GitHub SSH Private Key authorized to access your private GitHub repositories.
warning
If your organization enforces SAML single sign-on (SSO), you must first authorize your key to be used with that authentication method. For instructions, see GitHub documentation.
This setup assumes that each custom Python package is hosted within its own private GitHub repository. Installing multiple custom packages from a single private GitHub repository is not supported.
Step 1: Specify the Private Repository in Your Project
To add a Python package from a private repository to your Astro project, specify the repository's SSH URL in your project's requirements.txt
file. This URL should be formatted as:
git+ssh://git@github.com/<your-github-organization-name>/<your-private-repository>.git
For example, to install mypackage1
& mypackage2
from myorganization
, as well as numpy v 1.22.1
, you would add the following to your requirements.txt
file:
git+ssh://git@github.com/myorganization/mypackage1.git
git+ssh://git@github.com/myorganization/mypackage2.git
numpy==1.22.1
This example assumes that the name of each of your Python packages is identical to the name of its corresponding GitHub repository. In other words,mypackage1
is both the name of the package and the name of the repository.
Step 2: Create Dockerfile.build
In your Astro project, create a duplicate of your
Dockerfile
and name itDockerfile.build
.In
Dockerfile.build
, addAS stage
to theFROM
line which specifies your Runtime image. For example, if you use Runtime 5.0.0, yourFROM
line would be:FROM quay.io/astronomer/astro-runtime:5.0.0-base AS stage1
info
If you currently use the default distribution of Astro Runtime, replace your existing image with its corresponding
-base
image as demonstrated in the example above. The-base
distribution is built to be customizable and does not include default build logic. For more information on Astro Runtime distributions, see Distributions.In
Dockerfile.build
after theFROM
line specifying your Runtime image, add the following configuration:LABEL maintainer="Astronomer <humans@astronomer.io>"
ARG BUILD_NUMBER=-1
LABEL io.astronomer.docker=true
LABEL io.astronomer.docker.build.number=$BUILD_NUMBER
LABEL io.astronomer.docker.airflow.onbuild=true
# Install Python and OS-Level Packages
COPY packages.txt .
RUN apt-get update && cat packages.txt | xargs apt-get install -y
FROM stage1 AS stage2
USER root
RUN apt-get -y install git python3 openssh-client \
&& mkdir -p -m 0600 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
# Install Python Packages
COPY requirements.txt .
RUN --mount=type=ssh,id=github pip install --no-cache-dir -q -r requirements.txt
FROM stage1 AS stage3
# Copy requirements directory
COPY --from=stage2 /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
COPY . .In order, these commands:
- Install any OS-level packages specified in
packages.txt
. - Securely mount your SSH key at build time. This ensures that the key itself is not stored in the resulting Docker image filesystem or metadata.
- Install Python-level packages from your private repository as specified in your
requirements.txt
file.
tip
This example
Dockerfile.build
assumes Python 3.9, but some versions of Astro Runtime may be based on a different version of Python. If your image is based on a version of Python that is not 3.9, replacepython 3.9
in the COPY commands listed under the## Copy requirements directory
section of yourDockerfile.build
with the correct Python version.To identify the Python version in your Astro Runtime image, run:
docker run quay.io/astronomer/astro-runtime:<runtime-version>-base python --version
Make sure to replace
<runtime-version>
with your own.info
If your repository is hosted somewhere other than GitHub, replace the domain in the
ssh-keyscan
command with the domain where the package is hosted.- Install any OS-level packages specified in
Step 3: Build a Custom Docker Image
Run the following command to create a new Docker image from your
Dockerfile.build
file, making sure to replace<ssh-key>
with your SSH private key file name and<astro-runtime-image>
with your Astro Runtime image:DOCKER_BUILDKIT=1 docker build -f Dockerfile.build --progress=plain --ssh=github="$HOME/.ssh/<ssh-key>" -t custom-<astro-runtime-image> .
For example, if you have
quay.io/astronomer/astro-runtime:5.0.0-base
in yourDockerfile.build
, this command would be:DOCKER_BUILDKIT=1 docker build -f Dockerfile.build --progress=plain --ssh=github="$HOME/.ssh/<authorized-key>" -t custom-astro-runtime-5.0.0-base .
Replace the contents of your Astro project's
Dockerfile
with the following:FROM custom-<runtime-image>
For example, if your base Runtime image was
quay.io/astronomer/astro-runtime:5.0.0-base
, this line would be:FROM custom-astro-runtime:5.0.0-base
Your Astro project can now utilize Python packages from your private GitHub repository. To test your DAGs, you can either run your project locally or deploy to Astro.
Install Python Packages from a Private PyPI Index
This topic provides instructions for building your Astro project using Python packages from a private PyPI index. In some organizations, python packages are prebuilt and pushed to a hosted private pip server (such as pypiserver or Nexus Repository) or managed service (such as PackageCloud or Gitlab). At a high level, this setup requires specifying your private packages in requirements.txt
, creating a custom Docker image that changes where pip looks for packages, and building your project with this Docker image.
Prerequisites
To build from a private repository, you need:
- An Astro project.
- A private PyPI index with username and password authentication.
Step 1: Add privately hosted packages to requirements.txt
Privately hosted packages should already be built and pushed to the private repository. Depending on the repository used, it should be possible to browse and find the necessary package and version required. The package name and (optional) version can be added to requirements.txt in the same syntax as for publicly listed packages on PyPI. The requirements.txt can contain a mixture of both publicly accessible and private packages.
caution
Ensure that the name of the package on the private repository does not clash with any existing python packages. The order that pip will search indices might produce unexpected results.
Step 2: Create Dockerfile.build
In your Astro project, create a duplicate of your
Dockerfile
namedDockerfile.build
.In
Dockerfile.build
, addAS stage
to theFROM
line which specifies your Runtime image. For example, if you use Runtime 5.0.0, yourFROM
line would be:quay.io/astronomer/astro-runtime:5.0.0-base AS stage1
info
If you currently use the default distribution of Astro Runtime, replace your existing image with its corresponding
-base
image as demonstrated in the example above. The-base
distribution is built to be customizable and does not include default build logic. For more information on Astro Runtime distributions, see Distributions.In
Dockerfile.build
after theFROM
line specifying your Runtime image, add the following configuration. Make sure to replace<url-to-packages>
with the URL leading to the directory with your Python packages:LABEL maintainer="Astronomer <humans@astronomer.io>"
ARG BUILD_NUMBER=-1
LABEL io.astronomer.docker=true
LABEL io.astronomer.docker.build.number=$BUILD_NUMBER
LABEL io.astronomer.docker.airflow.onbuild=true
# Install Python and OS-Level Packages
COPY packages.txt .
RUN apt-get update && cat packages.txt | xargs apt-get install -y
FROM stage1 AS stage2
# Install Python Packages
ARG PIP_EXTRA_INDEX_URL
ENV PIP_EXTRA_INDEX_URL=${PIP_EXTRA_INDEX_URL}
COPY requirements.txt .
RUN pip install --no-cache-dir -q -r requirements.txt
FROM stage1 AS stage3
# Copy requirements directory
COPY --from=stage2 /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
COPY . .In order, these commands:
- Complete the standard installation of OS-level packages in
packages.txt
. - Add the environment variable
PIP_EXTRA_INDEX_URL
to instruct pip on where to look for non-public packages. - Install public and private Python-level packages from your
requirements.txt
file.
- Complete the standard installation of OS-level packages in
Step 3: Build a Custom Docker Image
Run the following command to create a new Docker image from your
Dockerfile.build
file, making sure to substitute in the pip repository and associated credentials:DOCKER_BUILDKIT=1 docker build -f Dockerfile.build --progress=plain --build-arg PIP_EXTRA_INDEX_URL=https://${<repo-username>}:${<repo-password>}@<private-pypi-repo-domain-name> -t custom-<airflow-image> .
For example, if you have
quay.io/astronomer/astro-runtime:5.0.0
in yourDockerfile.build
, this command would be:DOCKER_BUILDKIT=1 docker build -f Dockerfile.build --progress=plain --build-arg PIP_EXTRA_INDEX_URL=https://${<repo-username>}:${<repo-password>}@<private-pypi-repo-domain-name> -t custom-astro-runtime-5.0.0 .
Replace the contents of your Astro project's
Dockerfile
with the following:FROM custom-<airflow-image>
For example, if your base Runtime image was
quay.io/astronomer/astro-runtime:5.0.0
, this line would be:FROM custom-astro-runtime:5.0.0
Your Astro project can now utilize Python packages from your private PyPi index. To test your DAGs, you can either run your project locally or deploy to Astro.