Connect Astro to GCP data sources
Use the information provided here to learn how you can securely connect Astro to your existing Google Cloud Platform (GCP) instance. A connection to GCP allows Astro to access data stored on your GCP instance and is a necessary step to running pipelines in a production environment.
Connection options
The connection option that you choose is determined by the requirements of your organization and your existing infrastructure. You can choose a straightforward implementation, or a more complex implementation that provides enhanced data security. Astronomer recommends that you review all of the available connection options before selecting one for your organization.
- Public endpoints
- VPC peering
- Private Service Connect
Publicly accessible endpoints allow you to quickly connect Astro to GCP. To configure these endpoints, you can use one of the following methods:
- Set environment variables on Astro with your endpoint information. See Set environment variables on Astro.
- Create an Airflow connection with your endpoint information. See Managing Connections.
When you use publicly accessible endpoints to connect Astro and GCP, traffic moves directly between your Astro clusters and the GCP API endpoint. Data in this traffic never reaches the control plane, which is managed by Astronomer.
This connection option is only available for dedicated Astro Hosted clusters and Astro Hybrid.
VPC peering ensures private and secure connectivity, reduces network transit costs, and simplifies network layouts.
To create a VPC peering connection between an Astro VPC and a GCP VPC, contact Astronomer support and provide the following information:
- Astro cluster ID and name
- Google Cloud project ID of the target VPC
- VPC NAME of the target VPC
- Classless Inter-Domain Routing (CIDR) block of the target VPC
After receiving your request, Astronomer support initiates a peering request and creates the routing table entries in the Astro VPC. To allow multidirectional traffic between Airflow and your organization's data sources, the owner of the target VPC needs to accept the peering request and create the routing table entries in the target VPC.
Use Private Service Connect (PSC) to create private connections from Astro to GCP services without connecting over the public internet. See Private Service Connect to learn more.
Astro clusters are by default configured with a PSC endpoint with a target of All Google APIs. To provide a secure-by-default configuration, a DNS zone is created with a resource record that will route all requests made to *.googleapis.com
through this PSC endpoint. This ensures that requests made to these services are made over PSC without any additional user configuration. As an example, requests to storage.googleapis.com
will be routed through this PSC endpoint.
A list of Google services and their associated service names are provided in the Google APIs Explorer Directory. Alternatively, you can run the following command in the Google Cloud CLI to return a list of Google services and their associated service names:
gcloud services list --available --filter="name:googleapis.com"
Authorization options
Authorization is the process of verifying a user or service's permissions before allowing them access to organizational applications and resources. Astro clusters must be authorized to access external resources from your cloud. Which authorization option that you choose is determined by the requirements of your organization and your existing infrastructure. Astronomer recommends that you review all of the available authorization options before selecting one for your organization.
- Workload Identity
- Service account keys
To allow data pipelines running on GCP to access Google Cloud services in a secure and manageable way, Google recommends using Workload Identity. All Astro clusters on GCP have Workload Identity enabled by default. Each Astro Deployment is associated with a Google service account that's created by Astronomer and is bound to an identity from your Google Cloud project's fixed workload identity pool.
To grant a Deployment on Astro access to external data services on GCP, such as BigQuery:
In the Cloud UI, select your Deployment, then click Details
Copy the service account shown under Workload Identity.
Grant the Google service account for your Astro Deployment an IAM role that has access to your external data service. With the Google Cloud CLI, run:
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT --member=serviceAccount:<your-astro-service-account> --role=roles/viewer
For instructions on how to grant your service account an IAM role in the Google Cloud console, see Grant an IAM role.
Optional. Repeat these steps for every Astro Deployment that requires access to external data services on GCP.
When you create a connection from Astro to GCP, you can specify the service account key in JSON format, or you can create a secret to hold the service account key. For more information about creating and managing GCP service account keys, see Create and manage service account keys and Creating and accessing secrets.
Astronomer recommends using Google Cloud Secret Manager to store your GCP service account keys and other secrets. See Google Cloud Secret Manager.