sources make it easy to bring your data into Airfold. sources are akin to tables in a database. They store data according to a defined schema.

All the workspace data lives inside sources, which can be queried, filtered, aggregated, and joined using SQL.

Data is ingested into sources directly through Airfold’s Ingest API or by a materialized pipe.

Additionally, sources can be enhanced into AI sources for transforming unstructured data into structured formats, unlocking powerful analytical potential.

Sample Sources


The ai_cols field specifies which columns the AI source needs to fill. The using field specifies the columns to use in the AI transformation.

The _errors Table

Every Workspace has an associated _errors source table, which contains all errors that occurred during ingestion, for example, if a row failed to be ingested due to schema mismatch.

This way, the whole ingestion process is not interrupted by a single error, and the error can be reviewed and fixed later.

The _errors is treated like any other source, and can be queried, filtered, and joined with other sources, for example, to find the most common errors by source:

SELECT source, COUNT(*) AS count
FROM _errors
GROUP BY source
ORDER BY count DESC

Push a Source

Push sources to your workspace using the CLI af push or the /push API.
For example, to push the source above:

af push source.yaml

Properties

description
string

A brief overview of the source’s content or purpose. Optional.

cols
{name: type}
required

Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types).

settings
string | {key: value} | array

ClickHouse table configuration for the source, which can include ORDER BY, PARTITION BY, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.

ai_cols
array

Columns to be filled by the AI source.