Pipe
All the SQL queries in Airfold are written inside pipes
.
Pipes are a sequence of one or more SQL queries (Nodes) that are executed in order, each of those SQL queries is called a Node, and each Node is simply a SELECT
statement.
Pipes result in either a published pipe or a materialized pipe.
Use pipes to build features over your data!
What is a Node?
A node is a container for a single SQL SELECT
statement.
Nodes live within pipes and are executed sequentially in the order they are defined in the pipe.
Each node can read data from sources, published pipes, or preceding nodes in the same pipe. It cannot read data from draft pipes or nodes defined later in the pipe.
The schema of the last node in the pipe determines the pipe’s schema.
Best Practices for Nodes
Use nodes to break down complex queries into smaller, more manageable, sequential steps. Then chain nodes together to build the logic incrementally.
Each node can be developed & tested individually. This makes it much easier to build complex query logic in Airfold as you avoid creating large monolithic queries with many sub-queries.
The 3 Types of Pipes
Draft pipes
These are the starting point for all pipes, similar to database views in functionality.
They cannot be referenced by other pipes using the FROM
clause.
Draft pipes are temporary and primarily used for development. Once finalized, a draft pipe can be transitioned into either a published or a materialized pipe, but not both.
Published pipes
This creates an API endpoint that serves the result of your pipe, the result is accessible in JSON, NDJSON, CSV, and Parquet formats.
To publish a draft pipe, assign an endpoint name to the publish
field.
Published pipes can parameterize their SQL using Jinja2 via the params
field.
Unlike draft pipes, published pipes can be accessed (via FROM
) by other pipes.
Materialized pipes
Differing from the batch nature of draft and published pipes, materialized pipes stand out for their ability to incrementally transform data as it is ingested and append their results to a designated source.
To materialize a draft pipe, set the to
field to the name of the target source.
Although materialized pipes themselves cannot be accessed (via FROM
) by other pipes, the source they write to is accessible like any other source.
Supercharge your pipes!
Published pipes are batch and compute their results on read, thus they’re not suited for intensive data transformations.
Instead, a common pattern is to front a materialized pipe with a published pipe. The materialized pipe incrementally transforms the data as it is ingested and writes it to its source, allowing the published pipe to instantly access the already transformed data from the source on read.
Sample Pipes
Note the materialized pipe example uses countState()
instead of count()
.
This is because Airfold uses ClickHouse under the hood which requires appending State
to aggregate functions when used in a materialized pipe.
Fortunately, Airfold does it automatically when materializing a draft pipe using af materialize <pipe_name>
.
Learn more about Materialization in ClickHouse, about the *State
combinator.
Push a Pipe
Push pipes to your workspace using the CLI af push
or the /push
API.
For example, to push the draft pipe above:
Properties
A brief description of the pipe’s purpose or functionality. Optional.
The sequence of nodes within the pipe. Each node represents a stage in data processing.
Specifies the target source for appending the incremental results. When set, it converts the pipe into a materialized pipe. This option cannot be used with publish
and requires the final node’s schema to match the provided source schema. Optional.
The endpoint name where the results are published, turning the pipe into a published pipe. This option cannot be used with to
. Optional.
A list of parameters for parameterizing node SQL queries through Jinja2 templating e.g. {{ param }}
. These parameters are passed via the API and can be used to dynamically alter the pipe’s behavior. Optional.