Sources
Bring your data into Airfold
Data is ingested into sources directly through Airfold’s Ingest API or by a pipe.
Creation
Define the schema for your source in a YAML file:
Properties
Name of source
A brief overview of the source’s content or purpose (optional)
Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)
ClickHouse table configuration for the source, which can include ORDER BY
, PARTITION BY
, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.
Push
Push sources to your workspace using the CLI command or API.
CLI
For example, to push sales_calls.yaml
, run:
Ingest API
We can also use Ingest API:
The Ingest API supports the following formats:
- NDJSON
- CSV
- Parquet
For example, ingesting NDJSON is as simple as:
Rate Limits
The Ingest API is rate limited to 1000 events/sec per source. If you need to ingest more events, please contact us.
_errors
Table
Every workspace has an associated _errors
source table, which contains all errors that occurred during ingestion (e.g.: if a row failed to be ingested due to schema mismatch).
This way, the whole ingestion process is uninterrupted and the error can be fixed later.
_errors
is treated like any other source, so it can be queried, filtered, and joined with other sources.
For example, we can find the most common errors by source: