Query Your Data
Once you have ingested your data, the next step is to query, analyze, and derive insights from it. This is where Airfold Pipes come into play.
Airfold Pipes are the go-to component for interacting with your data. They provide an SQL interface that allows you to write powerful, modular queries to explore and transform your datasets.
What Are Pipes?
A Pipe is a data processing workflow that takes your data, processes it through one or more steps, and returns a meaningful output. Pipes are designed to be:
- Flexible – Transform, and aggregate data with ease
- Powerful – Build complex analytics workflows
- Modular – Break down logic into manageable, testable units
Each Pipe is made up of one or more Nodes
Nodes: The Building Blocks
Nodes are the fundamental units within a Pipe. Each Node contains a single SELECT query representing a specific step in the data processing workflow. Nodes are:
- Standalone or Connected: Nodes can operate independently or as part of a chain
- Like CTEs, but Better: They resemble Common Table Expressions, but can be executed and debugged individually
- Composable: Output from one Node can be referenced by another downstream Node
Nodes can reference data FROM
- Airfold Sources
- Previous nodes
- Other Published Pipes
When a Pipe is triggered, the order in which the Nodes execute is determined by the lineage defined in your FROM statements—Nodes are executed only after all their dependencies have been resolved, ensuring a logical and efficient flow of data through the pipeline.
Types of Pipes
Pipes can be created and managed using either the UI or the CLI. This page outlines how to work with each type of Pipe through both interfaces.
Draft Pipes
Draft pipes are temporary pipelines used for development and testing:
- They do not persist results
- Cannot be referenced by other pipes using FROM
- Serve as the foundation for published or materialized pipes
Use Case: Iterative development and testing of SQL queries
UI
Navigate to “Pipes” on the left menu column, and click on ”+”: Enter a name for this query, then press “Create”.
Here, you have a node where you can enter a single SQL SELECT
statement to query your tables directly.
These SQL statements can interact with your sources/tables in the same way as standard SQL tables.
For example, to get the top referrers as in our quickstart example we can run the following command.
We can run this by clicking on “Run”:
This is your Draft pipe, you can test SQL queries against your database using this pipe.
CLI
Pipes are created and managed through YAML files. To create a draft pipe by using the CLI, create a YAML file as below:
The name of the pipe. Optional
A brief description of the pipe’s purpose or functionality. Optional
The sequence of nodes within the pipe. Each node represents a stage in data processing.
Once you have created your YAML file, push to your workspace by running:
Check out the CLI Quickstart page for more info on using the Airfold CLI tool.
Published Pipes
Published pipes create API endpoints that expose query results in multiple formats like JSON and CSV
- Accessible via endpoints and usable in other pipes
- Supports parameterized SQL using Jinja2 templates
- Results are computed on read
Use Case: Serve your transformed analytics data to dashboards, reports, or downstream tools.
UI
Click on “Save” to save the SQL in your pipe, then click “Publish” to publish it as an endpoint:
That’s it, your draft pipe just became a publihsed pipe, you can go to view endpoints and use any of the Javascript, Python or curl command to ping the endpoint.
CLI
This creates an API endpoint that serves the result of your pipe, the result is accessible in JSON, NDJSON, CSV, and Parquet formats.
To publish a draft pipe, assign an endpoint name to the publish
field.
Published pipes can parameterize their SQL using Jinja2 via the params
field.
Unlike draft pipes, published pipes can be accessed (via FROM
) by other pipes.
The name of the pipe. Optional
A brief description of the pipe’s purpose or functionality. Optional
The sequence of nodes within the pipe. Each node represents a stage in data processing.
The endpoint name where the results are published, turning the pipe into a published pipe. This option cannot be used with to. Optional
A list of parameters for parameterizing node SQL queries through Jinja2 templating e.g. {{ param }}
. These parameters are passed via the API and can be used to dynamically alter the pipe’s behavior. Optional
Materialized Pipes
Materialized pipes transform and store results in a target source:
- Results are appended to a designated source for faster access
- Designed for incremental updates
- Results persist, unlike published pipes
Use Case: Real-time pipelines that write pre-aggregated results to a source
Incremental Materialization
Airfold’s materialized pipes support incremental data transformation by automatically ingesting new data as it arrives. This model ensures your pipeline only processes fresh data ideal for time-series logs, user activity streams, or continuously growing datasets.
- Enables near-real-time aggregations and rollups
Use Case: Building real-time dashboards or time-partitioned metrics that stay current with incoming data.
Note: Incremental Materialized Pipes in Airfold are designed to process only newly arriving data. If historical data already exists in your target tables, you must perform a backfill to ensure completeness.
Backfilling refers to the process of reprocessing or re-ingesting historical data so that it is captured by the incremental materialization logic. Without this step, pre-existing data will be excluded from downstream views and analytics.
To learn how to perform a backfill, see Backfill Your Data.
See optimize your pipe for more details on how to implement incremental materialized pipes.
Refreshable Materialization
Recomputes transformations across the entire dataset
- Can be scheduled refresh interval
- Replace existing materialized data with updated results
- Are suited for dimension tables, slowly changing datasets, or recalculated KPIs
Use Case: Perform a full recomputation of a view over the entire dataset to support complex join logic that cannot be handled incrementally.
See optimize your pipe for more details on how to implement refreshable materialized pipes.