Write Python in the Astro Cloud IDE
A Python cell contains a Python function that you can run in isolation or as a dependency in your pipeline. Create Python cells to execute Python as part of your data pipeline.
Prerequisites
- An IDE project and pipeline. See Step 2: Create a pipeline.
Create a Python cell
In the Cloud UI, select a Workspace and then select Cloud IDE.
Select a project.
On the Pipelines page, click a pipeline name to open the pipeline editor.
Click Add Cell > Python.
Click the cell name and enter a name for the cell.
Add your Python code to the cell body.
Run a Python cell
See Run cells in the Astro Cloud IDE.
Create explicit dependencies for a Python cell
In a Python cell, click Dependencies and select a cell to make it an explicit upstream dependency of your Python cell. When you run your entire pipeline, the Python cell cannot begin running until the selected upstream cell finishes running.
To make a Python cell an upstream dependency for another cell, click Dependencies for the other cell and select the name of your Python cell.
Create data dependencies for a Python cell
You can use the output of other cells in your project within a Python function. You define these dependencies in Python, and the Cloud IDE automatically renders the dependencies in your project code and in the Pipeline view of your project.
Pass a value from one Python cell to another Python cell
Use the value of a Python cell's return
statement in another Python cell by calling the name of the Python cell containing the return
statement. Doing this automatically creates a dependency between the cells.
For example, consider two Python cells. One cell is named hello_world
and includes the following code:
return "Hello, world!"
Another cell is named data_dependency
and includes the following code:
my_string = hello_world
return my_string
The Pipeline view in the Cloud IDE shows the newly created dependency between these two cells.
Pass a value from a SQL cell to a Python cell
Use the results of a SQL cell in your Python cell by calling the name of the SQL cell. The SQL cell must contain a SELECT
statement.
The table created by the SELECT
statement is automatically converted to pandas DataFrame and passed to the Python cell.
The following Python cell is dependent on a SQL cell named my_sql_cell
.
df = my_sql_cell # my_sql_cell is a SQL cell which gets converted to a pandas DataFrame by default
df['col_a'] = df['col_a'] + 1
return df
View complete code for Python cells
To view your Python cell within the context of an Airflow DAG, click Code. The Airflow DAG includes your Python function as well as all of the code required to run it on Airflow.
All Python cells execute Python using aql.dataframe
, which is a function available in the Astro SDK. See Astro SDK documentation.