Source
Bring your data into Airfold.
sources make it easy to bring your data into Airfold. sources are akin to tables in a database. They store data according to a defined schema.
All the workspace data lives inside sources, which can be queried, filtered, aggregated, and joined using SQL.
Data is ingested into sources directly through Airfold’s Ingest API or by a materialized pipe.
Additionally, sources can be enhanced into AI sources for transforming unstructured data into structured formats, unlocking powerful analytical potential.
Sample Sources
The ai_cols
field specifies which columns the AI source needs to fill. The using
field specifies the columns to use in the AI transformation.
The _errors
Table
Every Workspace has an associated _errors
source table, which contains all errors that occurred during ingestion, for example, if a row failed to be ingested due to schema mismatch.
This way, the whole ingestion process is not interrupted by a single error, and the error can be reviewed and fixed later.
The _errors
is treated like any other source, and can be queried, filtered, and joined with other sources, for example, to find the most common errors by source:
SELECT source, COUNT(*) AS count
FROM _errors
GROUP BY source
ORDER BY count DESC
Push a Source
Push sources to your workspace using the CLI af push
or the /push
API.
For example, to push the source above:
af push source.yaml
Properties
A brief overview of the source’s content or purpose. Optional.
Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types).
ClickHouse table configuration for the source, which can include ORDER BY
, PARTITION BY
, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.
Columns to be filled by the AI source.