A SQL cell contains a SQL query that you can run in isolation or as a dependency in your pipeline. Create SQL cells and Warehouse SQL cells to execute SQL as part of your data pipeline.
- An IDE project and pipeline. See Step 2: Create a pipeline.
- For Warehouse SQL cells, you need:
- A database connection. See Configure Airflow connections.
- Write permissions to the data warehouse in your database connection.
Choose a SQL cell type
A SQL cell runs a SQL query against a database and, by default, stores the results of the query in an XCom value for use in other cells. A Warehouse SQL cell runs a SQL query against a database connection and stores the results in your data warehouse. For more information about how these cells work, see How SQL cells work.
The Astro Cloud IDE includes a local DuckDB instance that stores the XCom outputs for all Python and SQL cells without an external database storage connections. Through this instance, SQL cells can run queries against the dataframe or table outputs of other cells.
Both cell types execute SQL queries, but there are some scenarios where the use of one cell type is preferable. The following table lists the scenarios where Astronomer recommends using a specific type of SQL cell.
|I don't have write access to an external database.||SQL cell|
|I'm going to use the output of the query outside of my external database, such as in a Python cell.||SQL cell|
|I want to query a small amount of data from the output of another IDE cell.||SQL cell|
|I'm going to use the output of the query only in my external database.||Warehouse SQL cell|
|I'm querying a large amount of data.||Warehouse SQL cell|
Regardless of the cell type you choose, you can use your query results in downstream cells. However, using the output of a Warehouse cell in a downstream cell requires fetching the output from your external database, which can take longer than fetching the output of a SQL cell with XComs.
Create a SQL cell
In the Cloud UI, select a Workspace and then select Cloud IDE.
Select a project.
On the Pipelines page click a pipeline name to open the pipeline editor.
Click Add Cell and select one of the following options:
- SQL: Runs a SQL query against a provided database connection and stores the results of the query in an XCom for use in other cells.
- Warehouse SQL: Runs a SQL query against a provided database connection and stores the results in your data warehouse.
Click the cell name and enter a name for the cell.
Add your SQL query to the cell body.
In the Select Connection list, select the connection for the database you want to query against. For SQL cells, select In-memory SQL if you want to query the output of another cell that was stored locally in XComs.
(Optional) If you're using Warehouse SQL cells, configure the following additional fields to define an output table. If you don't define these values, your data pipeline will store the outputs of your cell in a temporary table.
- Database: The database for your output table
- Schema: The schema for your output table
- Table name: The name of your output table
By default, your input is formatted as a literal string and is defined in your DAG with quotation marks. To format your input as a Python expression, click Python expression next to the value name. If you already entered a literal string before changing this setting, remove the quotation marks that the IDE adds to your Python expression.
After you switch to Python expression, your input for the value is defined in the DAG as a Python expression without quotation marks. You can use your input to call any Python value in your pipeline, such as a global import, an Airflow variable, or an environment variable.
Run a SQL cell
Create explicit dependencies for a SQL cell
In your SQL cell, click Dependencies and select a cell to be upstream of your SQL cell. When you run your entire pipeline, the SQL cell cannot begin running until the selected upstream cell finishes running.
To make your SQL cell an upstream dependency for another cell, click Dependencies for the other cell and select the name of your SQL cell.
View complete code for SQL cells
To view your SQL cell within the context of an Airflow DAG, click Code. The Airflow DAG includes your SQL query as well as all of the code required to run it on Airflow.
How SQL cells work
SQL cells use the Astro SDK to execute your queries. A standard SQL cell executes your query using
aql.run_raw_sql and stores the results in XComs. A Warehouse SQL cell runs your SQL query using
aql.transform and loads the results in your data warehouse. See Astro SDK documentation for more information about these operators.