sources make it easy to bring your data into Airfold. sources are akin to tables in a database. They store data according to a defined schema.

All the workspace data lives inside sources, which can be queried, filtered, aggregated, and joined using SQL.

Data is ingested into sources directly through Airfold’s Ingest API or by a materialized pipe.

Additionally, sources can be enhanced into AI sources for transforming unstructured data into structured formats, unlocking powerful analytical potential.

Sample Sources


The added_cols field specifies which columns the AI source needs to fill, and the message field specifies how to fill them, using the columns from the source in curly braces.

The _errors Table

Every Workspace has an associated _errors source table, which contains all errors that occurred during ingestion, for example, if a row failed to be ingested due to schema mismatch.

This way, the whole ingestion process is not interrupted by a single error, and the error can be reviewed and fixed later.

The _errors is treated like any other source, and can be queried, filtered, and joined with other sources, for example, to find the most common errors by source:

SELECT source, COUNT(*) AS count
FROM _errors
GROUP BY source
ORDER BY count DESC

Push a Source

Push sources to your workspace using the CLI af push or the /push API.
For example, to push the source above:

af push source.yaml

Properties

description
string

A brief overview of the source’s content or purpose. Optional.

cols
{name: type}
required

Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types).

settings
string | {key: value} | array

ClickHouse table configuration for the source, which can include ORDER BY, PARTITION BY, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.

message
string

A message for the AI from which it would derive the columns in added_cols. The message should use the source’s cols within curly braces e.g. {col}. This field is mandatory if added_cols is present, transforming the source into an AI source.

added_cols
{name: type}

Specifies additional columns to be filled by the AI, based on the message field’s instructions. This field is mandatory if message is present, transforming the source into an AI source.