Configure an External Secrets Backend on Astronomer Software
As of Airflow 1.10.10, users can manage and sync Airflow Connections and Variables from a variety of external secrets backend tools, including Hashicorp Vault, AWS SSM Parameter Store, and GCP Secret Manager.
This guide will walk you through how to leverage Airflow's latest feature on Astronomer with specific instructions for the following tools:
- Hashicorp Vault
- AWS SSM Parameter Store
- Google Cloud Secret Manager
Hashicorp Vault
In this section, we'll walk through how to use Vault as a secrets backend both locally and for your Airflow Deployment running on Astronomer.
Prerequisites
To use this feature, you'll need the following:
- "Admin" access to an Airflow Deployment on Astronomer
- The Astronomer CLI
- A running Hashicorp Vault server
- The Vault CLI
- Your Vault server's Root Token
- Your Vault Server's URL (if you're planning on using a local server, this should be
http://127.0.0.1:8200/
)
If you do not already have a Vault server deployed but would like to test this feature, we'd recommend either:
- Deploying a light-weight server using this Heroku Element
- Deploying a local server via the instructions in our Airflow and Vault guide
Write a Connection to Vault
To start, you'll need to write an Airflow connection URI to your Vault server.
To write the connection to your Vault server as a key/value pair, run:
vault kv put secret/connections/<your-connection> conn_uri=my-conn-type://my-login:my-password@my-host:5432
This methodology should apply to any secret that is expressed as a Connection URI. For an SMTP connection, for example, this would look like the following:
vault kv put secret/connections/smtp_default conn_uri=smtps://user:host@relay.example.com:465
Note: We recommend setting the path to
secret/connections/<your-connection>
to keep all of your Airflow connections organized in theconnections
directory of the mount point.
Confirm your secret was written successfully
To confirm that the secret was written successfully for the example above, run the following:
vault kv gt secret/connections/<your-connection>
For the SMTP example anove, the output would be:
vault kv get secret/connections/smtp_default
====== Metadata ======
Key Value
--- -----
created_time 2020-03-26T14:43:50.819791Z
deletion_time n/a
destroyed false
version 1
====== Data ======
Key Value
--- -----
conn_uri smtps://user:host@relay.example.com:465
Set your Connection Locally
Now that you have a running Vault server with your secret injected, the next step is to configure your Dockerfile
to use this Vault server as the default secrets backend for your local Airflow instance. Following this section, we'll set these variables on an Airflow Deployment running on Astronomer.
To set this configuration up, follow the steps below.
Add the following Environment Variables to the
.env
file in your project directory to be used locally:VAULT__ROOT_TOKEN="YOUR-ROOT-TOKEN"
VAULT__URL="YOUR-VAULT-URL"This will keep your connection to the Vault server secure. Be sure to keep your
.env
in your.gitignore
when you deploy these changes to your project if they contain sensitive credentials. We'll set these variables separately on Astronomer.Add the following lines to your
Dockerfile
to instantiate those vault-specific Environment Variables and use them in the context of Airflow'sAIRFLOW__SECRETS__BACKEND
configuration:ENV VAULT__ROOT_TOKEN=$VAULT__ROOT_TOKEN
ENV VAULT__URL=$VAULT__URL
ENV AIRFLOW__SECRETS__BACKEND="airflow.contrib.secrets.hashicorp_vault.VaultBackend"
ENV AIRFLOW__SECRETS__BACKEND_KWARGS='{"url":$VAULT__URL,"token": $VAULT__ROOT_TOKEN,"connections_path": "connections","variables_path": "variables","kv_engine_version":1}'This tells Airflow to look for connection information at the
secret/connections/*
path in your Vault server.
Note: By default, Airflow uses
"kv_engine_version": 2
, but we've written this secret using v1. You're welcome to change this to accommodate how you write and read your secrets.
If you'd like to further customize what the interaction between Airflow and Vault server will look like, you read the full list of available kwargs for this integration.
Test Your Connection
To test your connection to Vault locally, add the code below as a new DAG in your /dags
directory. This will print your connection information to the task logs and confirm that your connection has been established successfully.
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
from airflow.hooks.base_hook import BaseHook
def get_secrets(**kwargs):
conn = BaseHook.get_connection(kwargs['my_conn_id'])
print(f"Password: {conn.password}, Login: {conn.login}, URI: {conn.get_uri()}, Host: {conn.host}")
with DAG('example_secrets_dags', start_date=datetime(2020, 1, 1), schedule_interval=None) as dag:
test_task = PythonOperator(
task_id='test-task',
python_callable=get_secrets,
op_kwargs={'my_conn_id': 'smtp_default'},
)
Once you've added this DAG to your project:
- Run
$ astro dev stop
and$astro dev start
to rebuild your image locally - Trigger your new DAG (
example_secrets_dags
) via the Airflow UI at http://localhost:8080/admin/ - Click on
test-task
>View Logs
- Confirm your
smtp_default
connection information is being printed in the task logs.
This works because we set the connections_path
in our AIRFLOW__SECRETS__BACKEND_KWARGS
to be connections
. You are welcome to change this path name if you'd prefer to access Variables from a different Vault directory.
Note: If you've written other secrets to your Vault server's
/connections
path, you should be able to test those here as well just by changing themy_conn_id
value in the DAG code above.
Use Vault Secrets in your DAGs
Once you've established that your connection is working, you can leverage the BaseHook.get_connection(kwargs['conn_id'])
method to use your connections in your own DAG code.
You can also pass the conn_id
of your secret directly to most Airflow operators, which will take care of instantiating the connection using a pre-built hook.
Deploy to Astronomer
Once you've confirmed that your connections are being imported correctly locally, you're ready to set your Environment Variables on Astronomer.
- Navigate to
Deployment > Configure > Environment Variables
section of the Software UI - Set your
VAULT__ROOT_TOKEN
andVAULT__URL
as you did above - Click
Deploy Changes
to save and publish your changes - Deploy to Astronomer by running
$ astro deploy
You now should be able to see your connection information being pulled from Vault on Astronomer! If you're interested in pulling Airflow Variables, read below.
Pulling Variables from Vault
While the above section required that you add your connection information as a conn_uri
to Vault, you can also pull Airflow Variables from Vault. To do so, you'll need to add your variable to vault via the following syntax:
vault kv put airflow/variables/hello value=world
This works because we set the variables_path
in our AIRFLOW__SECRETS__BACKEND_KWARGS
to be variables
. You are welcome to change this path name if you'd prefer to access variables from a different Vault directory.
Now, if you print Variables.get('hello')
in Airflow, it will print the value of that variable. In our case, that's world
.
AWS SSM Parameter Store
In this section, we'll walk through how to use AWS SSM Parameter Store as a secrets backend for your Airflow Deployment running on Astronomer.
For the purpose of this doc, we'll assume you already have an SSM Parameter Store instance running.
Prerequisites
To use this feature, you'll need the following:
- "Admin" access to an Airflow Deployment on Astronomer
- The Astronomer CLI
- A running AWS SSM Parameter Store instance
- A valid AWS Access Key ID and Secret Access Key
Add Secrets to Parameter Store
To start, ensure that you have an Airflow Connection or Variable added to Parameter Store for testing.
Those should live at /airflow/connections
and /airflow/variables
, respectively. If you're using a secret with a connection id of smtp_default
, it should exist at /airflow/connections/smtp_default
, for example.
Note: The connection you add must exist as a string representing an Airflow
connection_uri
. Here is an example to help get you started. You can read up on generating aconnection_uri
in the Airflow docs here.
Set your Connection Locally
Now that you have a Connection saved, the next step is to configure your Dockerfile
on Astronomer to use this AWS SSM Parameter Store instance server as the default secrets backend for your local Airflow instance.
To do so, follow the steps below.
Add the following environment variables to the
.env
file in your project directory to be used in your local Airflow instance:AWS_ACCESS_KEY_ID="YOUR-AWS-KEY"
AWS_SECRET_ACCESS_KEY="YOUR-AWS-SECRET-KEY"This will keep our connection to the SSM server secure, though make sure to keep your
.env
file in.gitignore
when you deploy these changes to your project. We'll set these variables separately on Astronomer.Add the following lines to your
Dockerfile
to instantiate those SSM-specific Environment Variables and use them in the context of Airflow'sAIRFLOW__SECRETS__BACKEND
configuration:ENV AWS_ACCESS_KEY_ID $AWS_ACCESS_KEY_ID
ENV AWS_SECRET_ACCESS_KEY $AWS_SECRET_ACCESS_KEY
ENV AIRFLOW__SECRETS__BACKEND="airflow.contrib.secrets.aws_systems_manager.SystemsManagerParameterStoreBackend"
ENV AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "/airflow/connections", "variables_prefix": "/airflow/variables"}'
This tells Airflow to look for connection information at the airflow/connections/*
path in your SSM instance.
If you're interested in further customizing what the interaction between Airflow and your SSM server looks like, learn more by reading the full list of available kwargs for this integration.
Note: If you'd like to reference an AWS profile instead of connecting via Environment Variables, you can also add the
profile
param to your kwargs.
Test Your Connection
Test the connection to AWS SSM Parameter Store by calling the get_conn_uri
method and passing it the conn_id
of the secret you uploaded to your parameter store. If you followed our example, this would be smtp_default
.
Deploy to Astronomer
Once you've confirmed that your connections are being imported correctly locally, you're ready to set your Environment Variables on Astronomer.
- Navigate to
Deployment > Configure > Environment Variables
section of our UI - Set your
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
as you did above in yourDockerfile
- Click "Deploy Changes" to save and publish your changes
- Deploy to Astronomer by running
$ astro deploy
.
You now should be able to see your connection information being pulled from AWS SSM Parameter Store on Astronomer!
Google Cloud Secret Manager
This setup assumes that you already have a Google Cloud project with Secret Manager configured. To use Google Cloud Secrets Manager as your secrets backend for an Airflow Deployment on Astronomer:
Create a service account with the appropriate permissions on Google Cloud.
Add the Secret Manager Secret Accessor role to the service account.
Create and download a JSON service account key for the service account.
In the Software UI, set the following environment variables in your Airflow Deployment, making sure to paste the entire JSON key file in place of
<your-key-file>
:AIRFLOW__SECRETS__BACKEND=airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "airflow-connections", "variables_prefix": "airflow-variables", "gcp_keyfile_dict": <your-key-file>}'We recommend marking the
AIRFLOW__SECRETS__BACKEND_KWARGS
environment variable as "Secret" because it contains sensitive information about Secrets Manager.