Data is ingested into sources directly through Airfold’s Ingest API or by a pipe.

Creation

Define the schema for your source in a YAML file:
sales_calls.yaml
name: sales_calls
description: sales call transcripts
cols:
  ID: UUID
  Transcript: String
settings:
  engine: MergeTree()
  order_by: '`ID`'
  partition_by: tuple()

Properties

type
string
Type of source: Table or AITable, see more on AI table (Optional)
name
string
Name of source
description
string
A brief overview of the source’s content or purpose (optional)
cols
{name: type}
required
Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)
settings
string | {key: value} | array
ClickHouse table configuration for the source, which can include ORDER BY, PARTITION BY, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.

Push

Push sources to your workspace using the CLI command or API.

CLI

For example, to push sales_calls.yaml, run:
af push sales_calls.yaml

Ingest API

We can also use Ingest API: The Ingest API supports the following formats:
  • NDJSON
  • CSV
  • Parquet
For example, ingesting NDJSON is as simple as:
curl --request POST \
  --url https://api.airfold.co/v1/sales_analysis/{source} \
  --header 'Authorization: Bearer <key>' \
  --header 'Content-Type: application/json' \
  --data '[
  { "id": 123, "name": "Sarah"},
  { "id": 456, "name": "Alex"}
]'

Rate Limits

The Ingest API is rate limited to 1000 events/sec per source. If you need to ingest more events, please contact us.

_errors Table

Every workspace has an associated _errors source table, which contains all errors that occurred during ingestion (e.g.: if a row failed to be ingested due to schema mismatch). This way, the whole ingestion process is uninterrupted and the error can be fixed later. _errors is treated like any other source, so it can be queried, filtered, and joined with other sources. For example, we can find the most common errors by source:
SELECT source, COUNT(*) AS count
FROM _errors
GROUP BY source
ORDER BY count DESC