Although endpoints alone can handle high volumes, they may slow down at larger scales.

Thus, optimizing queries is a crucial part of managing production workloads in Airfold, specifically by precomputing and storing results incrementally through materialized pipes.

What are Materialized Pipes?

Materialized pipes precompute and store results as data is ingested. They offer a more efficient way to handle repetitive or resource-intensive queries compared to draft or published pipes.

How they work:

  • Historical Data Pre-Aggregation: When creating a materialized pipe, you have the option to pre-aggregate all historical data to create a baseline of results
  • Incremental Processing: Materialized pipes compute results as new data flows in, maintaining up-to-date outputs without reprocessing the entire dataset
  • Precomputed Results: Instead of executing a query from scratch every time, results are precomputed and stored in a target source
  • Fast Queries: Precomputed results are served instantly, avoiding recomputation

Use Cases

Frequent Queries:

  • Ideal for scenarios where the same query is executed repeatedly, such as dashboards or real-time monitoring

Expected Aggregations:

  • If you know ahead of time which aggregations are needed, such as sales metrics or user activity trends

Note: Materialized pipes are not suitable for ad hoc queries where the structure of the data or aggregations is determined dynamically

Recent Data Analysis:

  • Perfect for use cases where the focus is on recently ingested data
  • Materialized pipes maintain results incrementally as new data arrives

Caveats

Immutable Results:

  • Once data is materialized, you cannot change how the results were computed
  • Any change in logic requires creating a new materialized pipe

No Retroactive Updates:

  • Materialized pipes work with newly ingested data and cannot retroactively apply changes to older data
  • This makes them unsuitable for scenarios where historical recalculations are needed

UI

Upon creating a pipe, navigate to the three dots (...) on the top right corner and click “materialize”:

Confirm your pipe, then select a compatible target source or ”+ Add Source”:

You can also make the pipe refreshable, meaning that it would automatically update at a set time interval. Check “Refreshable” to enable this and customize your interval:

Click “Materialize” to finish.

Your materialized pipe will now store and incrementally update results.

YAML

Draft to Materialized Pipe

Define a Draft Pipe

Start by creating a draft pipe to develop and test your pipeline.

logs_aggregation.yaml
description: 'Aggregate logs by level'
nodes:
  - load:
      description: Load raw logs
      sql: select timestamp, level, file, message from logs
  - aggregate:
      description: Aggregate log counts by level
      sql: |
          select
              level,
              countState() as count
          from load
          group by level
to: logs_by_level

Push this draft pipe by running:

af push logs_aggregation.yaml

Materialize the Draft Pipe

To materialize the pipe, run the af pipe materialize command:

af pipe materialize logs_aggregation.yaml

Refreshable Materialized Pipe

You may also make the materialized pipe refreshable by adding a refresh attribute to your YAML file:

logs_aggregation_refreshable.yaml
description: 'Aggregate logs by level'
nodes:
    - load:
        description: Loading logs
        sql: select timestamp, level, file, message from logs
    - error:
        description: Aggregating logs by level
        sql: |
            select
                level,
                countState() as count
            from load
            group by level
to: logs_by_level
refresh:
    strategy: replace
    interval: EVERY 1 MINUTE

Query Materialized Data

After materializing the pipe, the results are stored in the target source logs_by_level.

You can query it directly by running:

af pipe query logs_by_level