An Airflow Deployment on Astronomer is an instance of Apache Airflow that was created either via the Software UI or the Astronomer CLI. Each Airflow Deployment on Astronomer is hosted on a single Kubernetes namespace, has a dedicated set of resources, and operates with an isolated Postgres Metadata Database.
This guide walks you through the process of creating and configuring an Airflow Deployment on Astronomer.
To create an Airflow Deployment, you'll need:
Create a Deployment
To create an Airflow Deployment on Astronomer:
Log in to your Astronomer platform at
app.BASEDOMAIN, open your Workspace, and click New Deployment.
Use the New Deployment menu to configure the following:
- Description (Optional)
- Executor: We recommend starting with Local.
Click Create Deployment and give the Deployment a few moments to spin up. Within a few seconds, you'll have access to the Settings page of your new Deployment:
This tab is the best place to modify resources for your Deployment. Specifically, you can:
- Select an Airflow Executor
- Allocate resources to your Airflow Scheduler and Webserver
- Add Extra Capacity (Kubernetes only)
- Set Worker Count (Celery only)
- Adjust your Worker Termination Grace Period (Celery only)
The rest of this guide provides additional guidance for configuring each of these settings.
Select an Executor
The Airflow Executor works closely with the Airflow Scheduler to decide what resources will complete tasks as they're queued. The difference between Executors comes down to their available resources and how they utilize those resources to distribute work.
Astronomer supports 3 Executors:
Though it largely depends on your use case, we recommend the Local Executor for development environments and the Celery or Kubernetes Executors for production environments operating at scale.
For a detailed breakdown of each Executor, read Astronomer's Airflow Executors Explained.
Scale Core Resources
Apache Airflow requires two primary components:
- The Airflow Webserver
- The Airflow Scheduler
To scale either resource, simply adjust the corresponding slider in the Software UI to increase its available computing power.
Read the following sections to help you determine which core resources to scale and when.
The Airflow Webserver is responsible for rendering the Airflow UI, where users can monitor DAGs, view task logs, and set various non-code configurations.
If a function within the Airflow UI is slow or unavailable, we recommend increasing the AU allocated towards the Webserver. The default resource allocation is 5 AU.
Note: Introduced in Airflow 1.10.7, DAG Serialization removes the need for the Webserver to regularly parse all DAG files, making the component significantly more light-weight and performant. DAG Serialization is enabled by default in Airflow 1.10.12+ and is required in Airflow 2.0.
The Airflow Scheduler is responsible for monitoring task execution and triggering downstream tasks once dependencies have been met.
If you experience delays in task execution, which you can track via the Gantt Chart view of the Airflow UI, we recommend increasing the AU allocated towards the Scheduler. The default resource allocation is 10 AU.
Tip: To set alerts that notify you via email when your Airflow Scheduler is underprovisioned, refer to Airflow Alerts.
Kubernetes Executor: Set Extra Capacity
The Kubernetes Executor and KubernetesPodOperator each spin up an individual Kubernetes pod for each task that needs to be executed, then spin down the pod once that task is completed.
The amount of AU (CPU and Memory) allocated to Extra Capacity maps to resource quotas on the Kubernetes Namespace in which your Airflow Deployment lives on Astronomer. More specifically, Extra Capacity represents the maximum possible resources that could be provisioned to a pod at any given time.
AU allocated to Extra Capacity does not affect Scheduler or Webserver performance and does not represent actual usage. It will not be charged as a fixed resource.
Celery Executor: Configure Workers
To optimize for flexibility and availability, the Celery Executor works with a set of independent Celery Workers across which it can delegate tasks. On Astronomer, you're free to configure your Celery Workers to fit your use case.
By adjusting the Worker Count slider, users can provision up to 20 Celery Workers on any Airflow Deployment.
Each individual Worker will be provisioned with the AU specified in Worker Resources. If you set Worker Resources to 10 AU and Worker Count to 3, for example, your Airflow Deployment will run with 3 Celery Workers using 10 AU each for a total of 30 AU. Worker Resources has a maximum of 100 AU (10 CPU, 37.5 GB Memory).
Worker Termination Grace Period
On Astronomer, Celery Workers restart following every code deploy to your Airflow Deployment. This is to make sure that Workers are executing with the most up-to-date code. To minimize disruption during task execution, however, Astronomer supports the ability to set a Worker Termination Grace Period.
If a deploy is triggered while a Celery Worker is executing a task and Worker Termination Grace Period is set, the Worker will continue to process that task up to a certain number of minutes before restarting itself. By default, the grace period is ten minutes.
Tip: The Worker Termination Grace Period is an advantage to the Celery Executor. If your Airflow Deployment runs on the Local Executor, the Scheduler will restart immediately upon every code deploy or configuration change and potentially interrupt task execution.
Set Environment Variables
Environment Variables can be used to set Airflow configurations and custom values, both of which can be applied to your Airflow Deployment either locally or on Astronomer.
These can include setting Airflow Parallelism, an SMTP service for Alerts, or a secrets backend to manage Airflow Connections and Variables.
Environment Variables can be set for your Airflow Deployment either in the Variables tab of the Software UI or in your
Dockerfile. If you're developing locally, they can also be added to a local
.env file. For more information on configuring Environment Variables, read Environment Variables on Astronomer.
Customize Release Names
An Airflow Deployment's release name on Astronomer is a unique, immutable identifier for that Deployment that corresponds to its Kubernetes namespace and that renders in Grafana, Kibana, and other platform-level monitoring tools. By default, release names are randomly generated in the following format:
noun-noun-<4-digit-number>. For example:
To customize the release name for a Deployment as you're creating it, you first need to enable the feature on your Astronomer platform. To do so, set the following value in your
manualReleaseNames: true # Allows you to set your release names
Then, push the updated
config.yaml file to your installation as described in Apply a Config Change.
After applying this change, the Release Name field in the Software UI becomes configurable:
Delete a Deployment
You can delete an Airflow Deployment using the Delete Deployment button at the bottom of the Deployment's Settings tab.
When you delete a Deployment, your Airflow Webserver, Scheduler, metadata database, and deploy history will be deleted, and you will lose any configurations set in the Airflow UI.
In your Astronomer database, the corresponding
Deployment record will be given a
deletedAt value and continue to persist until permanently deleted.