Connect Astro to AWS data sources
Use the information provided here to learn how you can securely connect your Astro data plane to your existing AWS instance. A connection to AWS allows Astro to access data stored on your AWS instance and is a necessary step to running pipelines in a production environment.
Connection options
The connection option that you choose is determined by the requirements of your organization and your existing infrastructure. You can choose a straightforward implementation, or a more complex implementation that provides enhanced data security. Astronomer recommends that you review all of the available connection options before selecting one for your organization.
- Public endpoints
- VPC peering
- Transit Gateways
- AWS PrivateLink
Publicly accessible endpoints allow you to quickly connect Astro to AWS. To configure these endpoints, you can use one of the following methods:
- Set environment variables on Astro with your endpoint information. See Set environment variables on Astro.
- Create an Airflow connection with your endpoint information. See Managing Connections.
When you use publicly accessible endpoints to connect Astro and AWS, traffic moves directly between your Astro data plane and the AWS API endpoint. Data in this traffic never reaches the control plane, which is managed by Astronomer.
Every Astro cluster runs in a dedicated Virtual Private Network (VPC). To set up a private connection between an Astro VPC and an AWS VPC, you can create a VPC peering connection. VPC peering ensures private and secure connectivity, reduces network transit costs, and simplifies network layouts.
To create a VPC peering connection between an Astro VPC and an AWS VPC, contact Astronomer support and provide the following information:
- Astro cluster ID and name
- AWS Account ID of the target VPC
- Region of the target VPC
- VPC ID of the target VPC
- Classless Inter-Domain Routing (CIDR) block of the target VPC
After receiving your request, Astronomer support initiates a peering request and creates the routing table entries in the Astro VPC. To allow multidirectional traffic between Airflow and your organization's data sources, the owner of the target VPC needs to accept the peering request and create the routing table entries in the target VPC.
DNS considerations for VPC peering
To resolve DNS hostnames from your target VPC, every Astro VPC has DNS Hostnames, DNS Resolutions, and Requester DNS Resolution enabled. See AWS Peering Connection settings.
If your target VPC resolves DNS hostnames using DNS Hostnames and DNS Resolution, you must also enable the Accepter DNS Resolution setting on AWS. This allows the data plane to resolve the public DNS hostnames of the target VPC to its private IP addresses. To configure this option, see AWS Documentation.
If your target VPC resolves DNS hostnames using private hosted zones, then you must associate your Route53 private hosted zone with the Astro VPC using instructions provided in AWS Documentation.
To retrieve the ID of any Astro VPC, contact Astronomer support. If you have more than one Astro cluster, request the VPC ID of each cluster.
Use AWS Transit Gateway to connect one or more Astro clusters to other VPCs, AWS accounts, and on-premises networks supported by your organization.
AWS Transit Gateway is an alternative to VPC Peering on AWS. Instead of having to establish a direct connection between two VPCs, you can attach over 5,000 networks to a central transit gateway that has a single VPN connection to your corporate network.
While it can be more costly, AWS Transit Gateway requires less configuration and is often recommended for organizations connecting a larger number of VPCs. For more information, see AWS Transit Gateway.
AWS Transit Gateway doesn't provide built-in support for DNS resolution. If you need DNS integration, Astronomer recommends that you use the Route 53 Resolver service. For assistance integrating the Route 53 Resolver service with your Astronomer VPC, contact Astronomer support.
If your transit gateway is in a different region than your Astro cluster, contact Astronomer support. Astronomer support can create a new transit gateway in your AWS account for Astro and set up a cross-region peering connection with your existing transit gateway.
If Astronomer creates a new transit gateway in your AWS account for Astro, keep in mind that your organization will incur additional AWS charges for the new transit gateway as well as the inter-region transfer costs.
Prerequisites
- An Astro cluster
- An existing transit gateway in the same region as your Astro cluster
- Permission to share resources using AWS Resource Access Manager (RAM)
Setup
- In the Cloud UI, click the Clusters tab and copy both the ID and the Account ID for your Astro cluster. ID is your cluster ID, while Account ID is your AWS account ID.
- Create a resource share in AWS RAM with the account ID from step 1. See Creating a resource share in AWS RAM.
- Contact Astronomer support and provide the cluster ID that you copied in Step 1, as well as the CIDR block of the target VPC or on-premises network that you want to connect your Astro cluster with. From here, Astronomer support approves the resource sharing request and creates a transit gateway peering attachment request to your network.
- Accept the transit gateway peering attachment request from your network. See Accept or reject a peering attachment request.
- Create a static route from your CIDR block to the transit gateway and a static route from the transit gateway to the Astro VPC. See Add a route to the transit gateway route table.
- Contact Astronomer support to confirm that you have created the static route. Astronomer support will update the Astro VPC routing table to send traffic from your CIDR block through the transit gateway.
- Optional. Repeat the steps for each Astro cluster that you want to connect to your transit gateway.
Use AWS PrivateLink to create private connections from Astro to your AWS services without exposing your data to the public internet. If your AWS services are located in a different region than Astro, contact Astronomer support.
Astro clusters are pre-configured with the following AWS PrivateLink endpoint services:
- Amazon S3 - Gateway Endpoint
- Amazon Elastic Compute Cloud (Amazon EC2) Autoscaling - Interface Endpoint
- Amazon Elastic Container Registry (ECR) - Interface Endpoints for ECR API and Docker Registry API
- Elastic Load Balancing (ELB) - Interface Endpoint
- AWS Security Token Service (AWS STS) - Interface Endpoint
To request additional endpoints, or assistance connecting to other AWS services, contact Astronomer support.
To access a service in a different region using PrivateLink endpoints, you must use an inter-Region VPC peering connection. You create an intermediate PrivateLink endpoint in the same region as the targeted service, then connect to that endpoint through an inter-Region VPC peering connection.
By default, Astronomer support activates the Enable DNS Name option on supported AWS PrivateLink endpoint services. With this option enabled, you can make requests to the default public DNS service name instead of the public DNS name that is automatically generated by the VPC endpoint service. For example, *.notebook.us-east-1.sagemaker.aws
instead of vpce-xxx.notebook.us-east-1.vpce.sagemaker.aws
. For more information about AWS DNS hostnames, see DNS hostnames.
You'll incur additional AWS infrastructure costs for every AWS PrivateLink endpoint service that you use. See AWS PrivateLink pricing.
Authorization options
Authorization is the process of verifying a user or service's permissions before allowing them access to organizational applications and resources. Astro clusters must be authorized to access external resources from your cloud. Which authorization option that you choose is determined by the requirements of your organization and your existing infrastructure. Astronomer recommends that you review all of the available authorization options before selecting one for your organization.
- AWS IAM roles
- AWS access keys
To grant an Astro cluster access to a service that is running in an AWS account not managed by Astronomer, use AWS IAM roles. IAM roles on AWS are often used to manage the level of access a specific user, object, or group of users has to a resource. This includes an Amazon S3 bucket, Redshift instance, or secrets backend.
In the Cloud UI, click Clusters and then copy the value displayed in the Cluster ID column for the Astro cluster that needs access to AWS service resources.
Create an IAM role in the AWS account that contains your AWS service. See Creating a role to delegate permissions to an AWS service.
In the AWS Management Console, go to the Identity and Access Management (IAM) dashboard.
Click Roles and in the Role name column, select the role you created in step 2.
Click the Trust relationships tab.
Click Edit trust policy and update the
arn
value:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<dataplane-AWS-account-ID>:role/AirflowS3Logs-<cluster-ID>"
]
},
"Action": "sts:AssumeRole"
}
]
}To locate your
<dataplane-AWS-account-ID>
and<cluster-ID>
, in the Cloud UI click Clusters. The<dataplane-AWS-account-ID>
is located in the Account ID column and the cluster ID is located in the ID column.The Astro cluster data plane account includes the
AirflowLogsS3-<clusterid>
role. When you configure an Airflow connection for a Deployment, specify this role when you create the AWS Airflow Connection.Click Update policy.
In the Airflow UI or as an environment variable on Astro, create an Airflow connection to AWS for each Deployment that requires the resources you connected. See Managing connections to Apache Airflow.
Optional. Repeat these steps for each Astro cluster that requires access to external data services on AWS.
Astro supports all Airflow AWS connection types. For more information about the available AWS connection types, see Amazon Web Services Connection. When you create your Airflow AWS connection, you'll need your AWS access key ID and secret access key.
Astronomer recommends using an external secrets backend to store your AWS access key ID and secret access key. See Configure an external secrets backend on Astro.
Hostname resolution options
Securely connect your Astro data plane to resources running in other VPCs or on-premises through a resolving service.
Using Route 53 requires sharing a resolver rule with your Astro account. If this is a security concern, Astronomer recommends using Domain Name System (DNS) forwarding.
- Shared resolver rule
- Domain Name System forwarding
Use Route 53 Resolver rules to allow Astro to resolve DNS queries for resources running in other VPCs or on-premises.
Prerequisites
- An Amazon Route 53 Resolver rule. See Managing forwarding rules.
- Permission to share resources using the AWS Resource Access Manager (RAM)
Share the Amazon Route 53 Resolver rule
To allow Astro to access a private hosted zone, you need to share your Amazon Route 53 Resolver rule with your Astro AWS account.
In the Route 53 Dashboard, click Rules below Resolver in the navigation menu.
Select a Resolver rule and then click Details.
Click Share and enter
Astro
in the Name field.In the Resources - optional section, select Resolver Rules in the Select resource type list and then select one or more rules.
On the Associate permissions page, accept the default settings and then click Next.
On the Grant access to principals page, select Allow sharing only within your organization, and then enter your Astro AWS account ID for your organization in the Enter an AWS account ID field. To get the Astro AWS account ID, go to the Cloud UI, click Settings, and then copy the value in the ID column for the Astro AWS account you want to share the Resolver rule with.
Click Create resource share.
Contact Astronomer support for rule verification
To verify that the Amazon Route 53 Resolver rule was shared correctly, submit a request to Astronomer support. With your request, include the Amazon Route 53 Resolver rule ID. To locate the Resolver rule ID, open the Route 53 Dashboard, and in the left menu click Rules below Resolver. Copy the value in the Resolver ID column.
Create a connection to confirm connectivity (optional)
When Astronomer support confirms that the Amazon Route 53 Resolver rule was successfully associated with the Astro VPC, you can create a connection to the resource that is resolved by the shared rule. See Managing Connections.
Use Domain Name System (DNS) forwarding to allow Astro to resolve DNS queries for resources running in other VPCs or on-premises. Unlike Route 53, you don't need to share sensitive configuration data with your Astro account. To learn more about DNS forwarding, see Forwarding outbound DNS queries to your network.
To use this solution, make sure Astro can connect to the DNS server using a VPC peering or transit gateway connection and then submit a request to Astronomer support. With your request, include the following information:
- The domain name for forwarding requests
- The IP address of the DNS server where requests are forwarded
Create an Airflow connection to confirm connectivity (optional)
When Astronomer support confirms that DNS forwarding was successfully implemented, you can confirm that it works by creating an Airflow connection to a resource running in a VPC or on-premises. See Managing Connections.