Test and Troubleshoot Locally
Overview
As you develop data pipelines on Astro, we strongly recommend running and testing your DAGs locally before deploying your project to a Deployment on Astro. This document provides information about testing and troubleshooting DAGs in a local Apache Airflow environment with the Astro CLI.
Run a Project Locally
Whenever you want to test your code, the first step is always to start a local Airflow environment. To run your project in a local Airflow environment, follow the steps in Build and Run a Project.
Test DAGs with the Astro CLI
To enhance the testing experience for data pipelines, Astro enables users to run DAG unit tests with two different Astro CLI commands:
astrocloud dev parse
astrocloud dev pytest
Parse DAGs
To quickly parse your DAGs, you can run:
astrocloud dev parse
This command parses your DAGs to ensure that they don't contain any basic syntax or import errors and that they can successfully render in the Airflow UI.
Generally speaking, astrocloud dev parse
is a more convenient but less customizable version of astrocloud dev pytest
. If you don't have any specific test files that you want to run on your DAGs, then we recommend using astrocloud dev parse
as your primary testing tool. For more information about this command, see the CLI Command Reference.
Run Tests with Pytest
To perform unit tests on your Astro project, you can run:
astrocloud dev pytest
This command runs all tests in your project's tests
directory with pytest, a testing framework for Python. With pytest, you can test custom Python code and operators locally without having to start a local Airflow environment.
By default, the tests
directory in your Astro project includes a default DAG integrity test called test_dag_integrity.py
. This test checks that:
- All Airflow tasks have required arguments.
- DAG IDs are unique across the Astro project.
- DAGs have no cycles.
- There are no general import or syntax errors.
astrocloud dev pytest
runs this default test alongside any other custom tests that you add to the tests
directory. For more information about this command, see the CLI Command Reference.
View Airflow Task Logs
You can view logs for individual tasks in the Airflow UI. This is useful if you want to troubleshoot a specific task instance that failed or retried.
To access these logs:
Access the Airflow UI in your local Airflow environment by going to
http://localhost:8080
.Open the DAG you want to troubleshoot:
In the Tree View, click on the task run you want to see logs for:
Click Instance Details:
Click Log:
Access Airflow Component Logs
To show logs for your Airflow Scheduler, Webserver, or metadata database locally, run the following command:
astrocloud dev logs
Once you run this command, the most recent logs for these components appear in your terminal window.
By default, running astrocloud dev logs
shows logs for all Airflow components. If you want to see logs for a specific component, add any of the following flags to your command:
--scheduler
--webserver
--triggerer
To continue monitoring logs, run astrocloud dev logs --follow
. The --follow
flag ensures that the latest logs continue to appear in your terminal window. For more information about this command, see CLI Command Reference
Run Airflow CLI Commands
To run Apache Airflow CLI commands locally, run the following:
astrocloud dev run <airflow-cli-command>
For example, the Airflow CLI command for viewing the values of your airflow.cfg
file is airflow config list
. To run this command with the Astro CLI, you would run astrocloud dev run config list
instead.
In practice, running astro dev run
is the equivalent of running docker exec
in local containers and then running an Airflow CLI command within those containers.
tip
You can only use astro dev run
in a local Airflow environment. To automate Airflow actions on Astro, you can use the Airflow REST API. For example, you can make a request to the dagRuns
endpoint to trigger a DAG run programmatically, which is equivalent to running airflow dags trigger
via the Airflow CLI.
Test the KubernetesPodOperator Locally
Testing DAGs with the KubernetesPodOperator locally requires a local Kubernetes environment. Follow the steps in this topic to create a local Kubernetes environment and monitor the status and logs of individual Kubernetes pods running your task.
Step 1: Start Running Kubernetes
To run Kubernetes locally:
- In Docker Desktop, go to Settings > Kubernetes.
- Check the
Enable Kubernetes
checkbox. - Save your changes and restart Docker.
Step 2: Get Your Kubernetes Configuration
- Open the
$HOME/.kube
directory that was created when you enabled Kubernetes in Docker. - Open the
config
file in this directory. - Under
clusters
, you should see onecluster
withserver: http://localhost:8080
. Change this toserver: https://kubernetes.docker.internal:6443
. If this doesn't work, tryserver: https://host.docker.internal:6445
. - In your Astro project, open your
include
directory and create a new directory called.kube
. Copy theconfig
file that you edited into this directory.
Step 3: Instantiate the KubernetesPodOperator
To instantiate the KubernetesPodOperator in a given DAG, update your DAG file to include the following code:
from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
from airflow import configuration as conf
# ...
namespace = conf.get('kubernetes', 'NAMESPACE')
# This will detect the default namespace locally and read the
# environment namespace when deployed to Astronomer.
if namespace =='default':
config_file = '/usr/local/airflow/include/.kube/config'
in_cluster=False
else:
in_cluster=True
config_file=None
with dag:
k = KubernetesPodOperator(
namespace=namespace,
image="my-image",
labels={"foo": "bar"},
name="airflow-test-pod",
task_id="task-one",
in_cluster=in_cluster, # if set to true, will look in the cluster for configuration. if false, looks for file
cluster_context='docker-desktop', # is ignored when in_cluster is set to True
config_file=config_file,
is_delete_operator_pod=True,
get_logs=True)
Specifically, your operator must have cluster_context='docker-desktop
and config_file=config_file
.
Step 4: Run and Monitor the KubernetesPodOperator
After updating your DAG, run astro dev restart
from the Astro CLI to rebuild your image and run your project in a local Airflow environment.
To examine the logs for any pods that were created by the operator, you can use the following kubectl commands:
kubectl get pods -n $namespace
kubectl logs {pod_name} -n $namespace
By default, Docker for Desktop will run pods in a namespace called default
.
Hard Reset Your Local Environment
In most cases, restarting your local project is sufficient for testing and making changes to your project. However, it is sometimes necessary to kill your Docker containers and metadata database for testing purposes. To do so, run the following command:
astrocloud dev kill
This command forces your running containers to stop and deletes all data associated with your local Postgres metadata database, including Airflow Connections, logs, and task history.