Skip to main content

Migrate to Astro from Amazon MWAA

Migrate your Airflow environment from Amazon Managed Workflows for Apache Airflow (MWAA) to Astro.

To complete the migration process, you will:

  • Prepare your source Airflow and create a Deployment on Astro.
  • Migrate metadata from your source Airflow environment.
  • Migrate DAGs and additional Airflow components from your source Airflow environment.
  • Complete the cutover process to Astro.

Prerequisites

Before starting the migration, ensure that the following are true:

On your local machine, make sure you have:

On the cloud service from which you're migrating, ensure that you have:

  • A source Airflow environment on Airflow 2 or later.
  • Read access to the source Airflow environment.
  • Read access to any cloud storage buckets that store your DAGs.
  • Read access to any source control tool that hosts your current Airflow code, such as GitHub.
  • Permission to create new repositories on your source control tool.
  • (Optional) Access to your secrets backend.
  • (Optional) Permission to create new CI/CD pipelines.

All source Airflow environments on 1.x need to be upgraded to at least Airflow 2.0 before you can migrate them. Astronomer professional services can help you with the upgrade process.

(Optional) You can use the AWS CLI to expedite some of the steps in this guide.

Step 1: Install Astronomer Starship

The Astronomer Starship migration utility connects your source Airflow environment to your Astro Deployment and migrates your Airflow connections, Airflow variables, environment variables, and DAGs.

The Starship migration utility works as a plugin with a user interface, or as an Airflow operator if you are migrating from a more restricted Airflow environment.

See the following table for information on which versions of Starship are available, depending on your source Airflow environment:

Source Airflow environmentStarship pluginStarship operator
Airflow 1.x
MWAA v2.0.2✔️️
MWAA v2.2.2✔️️
MWAA v2.4.3✔️️
  1. Download the requirements.txt file for your source Airflow environment from S3. See AWS documentation.
  2. Add astronomer-starship on a new line to your requirements.txt file.
  3. Reupload the file to your S3 bucket.
  4. Update your Airflow environment to use the new version of this file.
tip

To complete this setup from the command line:

  1. Run the following commands to set environment variables on your local machine:

    export MWAA_NAME="MWAA"
    export MWAA_BUCKET="MWAA BUCKET"
  2. Run the following AWS CLI commands to install Starship:

    aws s3 cp "s3://$MWAA_BUCKET/requirements.txt" requirements.txt
    echo 'astronomer-starship' >> requirements.txt
    aws s3 cp requirements.txt "s3://$MWAA_BUCKET/requirements.txt"
    aws mwaa update-environment "$MWAA_NAME" --requirements-s3-object-version="$(aws s3api head-object --bucket=$MWAA_BUCKET --key=requirements.txt --query="VersionId")"

Step 2: Create an Astro Workspace (Optional)

In your Astro Organization, you can create Workspaces, which are a collection of users that have access to the same Deployments. Workspaces are typically owned by a single team.

You can choose to use an existing Workspace, or create a new one. However, you must have at least one Workspace to complete your migration.

  1. Follow the steps in Manage Workspaces to create a Workspace in the Cloud UI for your migrated Airflow environments. Astronomer recommends naming your first Workspace after your data team or initial business use case with Airflow. You can update these names in the Cloud UI after you finish the migration.

  2. Follow the steps in Manage Astro users to add users from your team to the Workspace. See Astro user permissions for details about each available Workspace user role.

:::cli astro

You can add users to a Workspace an Organization using the Astro CLI. See:

:::

You can automate adding batches of users to Astro with shell scripts. See Add a group of users to Astro using the Astro CLI.

Step 3: Create an Astro Deployment (Optional)

A Deployment is an Astro Runtime environment that is powered by the core components of Apache Airflow. In a Deployment, you can deploy and run DAGs, configure worker resources, and view metrics.

You can choose to use an existing Deployment, or create a new one. However, you must have at least one Deployment to complete your migration.

Before you create your Deployment, copy the following information from your source Airflow environment:

  • Environment name
  • Airflow version
  • Environment class or size
  • Number of schedulers
  • Minimum number of workers
  • Maximum number of workers
  • Execution role permissions
  • Airflow configurations
  • Environment variables
Alternative setup for Astro Hybrid

This setup varies slightly for Astro Hybrid users. See Deployment settings for all configurations related to Astro Hybrid Deployments.

  1. In the Cloud UI, select a Workspace.

  2. On the Deployments page, click Deployment.

  3. Complete the following fields:

    • Name: Enter the name of your source Airflow environment.
    • Astro Runtime: Select the Runtime version that's based on the Airflow version in your source Airflow environment. See the following table to determine which version of Runtime to use. Where exact version matches are not available, the nearest Runtime version is provided with its supported Airflow version in parentheses.
    Airflow VersionRuntime Version
    2.03.0.4 (Airflow 2.1.1)¹
    2.24.2.9 (Airflow 2.2.5)
    2.46.3.0 (Airflow 2.4.3)
info

¹The earliest available Airflow version on Astro Runtime is 2.1.1. There are no known risks for upgrading directly from Airflow 2.0 to Airflow 2.1.1 during migration. For a complete list of supported Airflow versions, see Astro Runtime release and lifecycle schedule.

  • Description: (Optional) Enter a description for your Deployment.
  • Cluster: Choose whether you want to run your Deployment in a Standard cluster or Dedicated cluster. If you don't have specific networking or cloud requirements, Astronomer recommends using the default Standard cluster configurations.

To configure and use dedicated clusters, see Create a dedicated cluster. If you don't have the option of choosing between standard or dedicated, that means you are an Astro Hybrid user and must choose a cluster that has been configured for your Organization. See Manage Hybrid clusters.

  • Executor: Choose the same executor as in your source Airflow environment.
  • Scheduler: Set your scheduler size in Astronomer Units (AU). An AU is a unit of CPU and memory allocated to each scheduler in a Deployment. Use the following table to determine how many AUs you need based on the size of your source Airflow environment.
Environment sizeScheduler sizeCPU / memory
Small (Up to ~50 DAGs)Small1vCPU, 2GiB
Medium (Up to ~250 DAGs)Medium2vCPU, 4GiB²
Large (Up to ~1000 DAGs)Large4vCPU, 8GiB²
info

²Some of the following recommendations for CPU and memory might be less than what you currently allocate to Airflow components in your source environment. If you notice significant performance differences or your Deployment on Astro parses DAGs more slowly than your source Airflow environment, adjust your resource use on Astro. See Configure Deployment resources

  • Worker Type: Select the worker type for your default worker queue. See Worker queues.
  • Min / Max # Workers: Set the same minimum and maximum worker count as in source Airflow environment.
  • KPO Pods: (Optional) If you use the KubernetesPodOperator or Kubernetes Executor, set limits on how many resources your tasks can request.
  1. Click Create Deployment.
  2. Specify any system-level environment variables as Astro environment variables. See Environment variables.
  3. Set an email to receive alerts from Astronomer support about your Deployments. See Configure Deployment contact emails.

Step 4: Use Starship to Migrate Airflow Connections and Variables

You might have defined Airflow connections and variables in the following places on your source Airflow environment:

  • The Airflow UI (stored in the Airflow metadata database).
  • Environment variables
  • A secrets backend.

If you defined your Airflow variables and connections in the Airflow UI, you can migrate those to Astro with Starship. You can check which resources will be migrated by going to Admin > Variables and Admin > Connections in the Airflow UI to find your source Airflow environment information.

warning

Some environment variables or Airflow Settings, like global environment variable values, can't be migrated to Astro. See Global environment variables for a list of variables that you can't migrate to Astro.

  1. In the Airflow UI for your source Airflow environment, go to Astronomer > Migration Tool 🚀.

    Location of the Astro migration menu in the Cloud UI

  2. Click Get Token.

  3. If required, log in to cloud.astronomer.io.

  4. Copy the access token that appears after logging in.

  5. In the Migration Tool 🚀 page of the Airflow UI, paste the access token into the Authentication Token field.

  6. Click Sign in.

  7. In the Target Deployment menu, select the Deployment where you want to migrate your Airflow variables and connections, then click Select.

  8. Click Connections. In the table that appears, click Migrate for each connection that you want to migrate to Astro. After the migration is complete, the status Migrated ✅ appears.

  9. Click Variables.

  10. In the table that appears, click Migrate for each variable that you want to migrate to Astro. After the migration is complete, the status Migrated ✅ appears.

  11. Click Environment variables.

  12. In the table that appears, check the box for each environment variable that you want to migrate to Astro, then click Migrate. After the migration is complete, the status Migrated ✅ appears.

Step 5: Create an Astro project

  1. Create a new directory for your Astro project:

    mkdir <your-astro-project-name>
  2. Open the directory:

    cd <your-astro-project-name>
  3. Run the following Astro CLI command to initialize an Astro project in the directory:

    astro dev init

    This command generates a set of files that will build into a Docker image that you can both run on your local machine and deploy to Astro.

  4. (Optional) Run the following command to initialize a new git repository for your Astro project:

    git init

Step 6: Migrate project code and dependencies to your Astro project

  1. Open your Astro project Dockerfile. Update the Runtime version in first line to the version you selected for your Deployment in Step 3. For example, if your Runtime version was 6.3.0, your Dockerfile would look like the following:

    FROM quay.io/astronomer/astro-runtime:6.3.0

    The Dockerfile defines the environment that all your Airflow components run in. You can modify it to make certain resources available to your Airflow environment like certificates or keys. For this migration, you only need to update your Runtime version.

  2. Open your Astro project requirements.txt file and add all Python packages from your source Airflow environment's requirements.txt file. See AWS documentation to find this file in your S3 bucket.

    warning

    To avoid breaking dependency upgrades, Astronomer recommends pinning your packages to the versions running in your soure Airflow environment. For example, if you're running apache-airflow-providers-snowflake version 3.3.0 on MWAA, you would add apache-airflow-providers-snowflake==3.3.0 to your Astro requirements.txt file.

  3. Open your Astro project dags folder. Add your DAG files from either your source control platform or S3.

  4. If you used the plugins folder in your MWAA project, copy the contents of this folder from your source control platform or S3 to the /plugins folder of your Astro project.

Step 7: Configure additional data pipeline infrastructure

The core migration of your project is now complete. Read the following to decide whether you need to set up any additional infrastructure on Astro before you cut over your DAGs.

Set up CI/CD

If you used CI/CD to deploy code to your source Airflow environment, read the following documentation to learn about setting up a similar CI/CD pipeline for your Astro project:

Similarly to MWAA, you can deploy DAGs to Astro directly from an S3 bucket. See Deploy DAGs from an AWS S3 bucket to Astro using AWS Lambda.

Set up a secrets backend

If you currently store Airflow variables or connections in a secrets backend, you also need to integrate your secrets backend with Astro to access those objects from your migrated DAGs. See Configure a Secrets Backend for setup steps.

Step 8: Test locally and check for import errors

Depending on how thoroughly you want to test your Airflow environment, you can test your project locally before deploying to Astro.

  • In your Astro project directory, run astro dev parse to check for any parsing errors in your DAGs.
  • Run astro run <dag-id> to test a specific DAG. This command compiles your DAG and runs it in a single Airflow worker container based on your Astro project configurations.
  • Run astro dev start to start a complete Airflow environment on your local machine. After your project starts up, you can access the Airflow UI at localhost:8080. See Troubleshoot your local Airflow environment.
info

Your migrated Airflow variables and connections are not available locally. You must deploy your project to Astro to test these Airflow objects.

Step 9: Deploy to Astro

  1. Run the following command to authenticate to Astro:

    astro login
  2. Run the following command to deploy your project

    astro deploy

    This command returns a list of Deployments available in your Workspace and prompts you to pick one.

  3. In the Cloud UI, open your Deployment and click Open Airflow. Confirm that you can see your deployed DAGs in the Airflow UI.

Step 10: Cut over from your source Airflow environment to Astro

After you successfully deploy your code to Astro, you need to migrate your workloads from your source Airflow environment to Astro on a DAG-by-DAG basis. Depending on how your workloads are set up, Astronomer recommends letting DAG owners determine the order to migrate and test DAGs.

You can complete the following steps in the few days or weeks following your migration set up. Provide updates to your Astronomer Data Engineer as they continue to assist you through the process and any solve any difficulties that arise.

Continue to validate and move your DAGs until you have fully cut over your source Airflow instance. After you finish migrating from your source Airflow environment, repeat the complete migration process for any other Airflow instances in your source Airflow environment.

Confirm connections and variables

In the Airflow UI for your Deployment, test all connections that you migrated from your source Airflow environment.

Additionally, check Airflow variable values in Admin > Variables.

Test and validate DAGs in Astro

To create a strategy for testing DAGs, determine which DAGs need the most care when running and testing them.

If your DAG workflow is idempotent and can run twice or more without negative effects, you can run and test these DAGs with minimal risk. If your DAG workflow is non-idempotent and can become invalid when you rerun it, you should test the DAG with more caution and downtime.

Cut over DAGs to Astro using Starship

Starship includes features for simultaneously pausing DAGs in your source Airflow environment and starting them on Astro. This allows you to cut over your production workflows without downtime.

For each DAG in your Astro Deployment:

  1. Confirm that the DAG ID in your Deployment is the same as the DAG ID in your source Airflow environment.

  2. In the Airflow UI for your source Airflow environment, go to Astronomer > Migration Tool 🚀.

  3. Click DAGs cutover. In the table that appears, click the Pause icon in the Local column for the DAG you're cutting over.

  4. Click the Start icon in the Remote column for the DAG you're cutting over.

  5. After completing this cutover, the Start and Pause icons switch. If there's an issue after cutting over, click the Remote pause button and then the Local start button to move your workflow back to your source Airflow environment.

Optimize Deployment resource usage

Monitor analytics

As you cut over DAGs, view Deployment metrics to get a sense of how many resources your Deployment is using. Use this information to adjust your worker queues and resource usage accordingly, or to tell when a DAG isn't running as expected.

Modify instance types or use worker queues

If your current worker type doesn't have the right amount of resources for your workflows, see Deployment settings to learn about configuring worker types on your Deployments.

You can additionally configure worker queues to assign each of your tasks to different worker instance types. View your Deployment metrics to help you determine what changes are required.

Enable DAG-only deploys

Deploying to Astro with DAG-only deploys enabled can make deploys faster in cases where you've only modified your dags directory. To enable the DAG-only deploy feature, see Deploy DAGs only.

Was this page helpful?

Sign up for Developer Updates

Get a summary of new Astro features once a month.

You can unsubscribe at any time.
By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.