Data is ingested into sources directly through Airfold’s Ingest API or by a pipe.

Creation

Define the schema for your source in a YAML file:

sales_calls.yaml
name: sales_calls
description: sales call transcripts
cols:
  ID: UUID
  Transcript: String
settings:
  engine: MergeTree()
  order_by: '`ID`'
  partition_by: tuple()

Properties

type
string

Type of source: Table or AITable, see more on AI table (Optional)

name
string

Name of source

description
string

A brief overview of the source’s content or purpose (optional)

cols
{name: type}
required

Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)

settings
string | {key: value} | array

ClickHouse table configuration for the source, which can include ORDER BY, PARTITION BY, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.

Push

Push sources to your workspace using the CLI command or API.

CLI

For example, to push sales_calls.yaml, run:

af push sales_calls.yaml

Ingest API

We can also use Ingest API:

The Ingest API supports the following formats:

  • NDJSON
  • CSV
  • Parquet

For example, ingesting NDJSON is as simple as:

curl --request POST \
  --url https://api.airfold.co/v1/sales_analysis/{source} \
  --header 'Authorization: Bearer <key>' \
  --header 'Content-Type: application/json' \
  --data '[
  { "id": 123, "name": "Sarah"},
  { "id": 456, "name": "Alex"}
]'

Rate Limits

The Ingest API is rate limited to 1000 events/sec per source. If you need to ingest more events, please contact us.

_errors Table

Every workspace has an associated _errors source table, which contains all errors that occurred during ingestion (e.g.: if a row failed to be ingested due to schema mismatch).

This way, the whole ingestion process is uninterrupted and the error can be fixed later.

_errors is treated like any other source, so it can be queried, filtered, and joined with other sources.

For example, we can find the most common errors by source:

SELECT source, COUNT(*) AS count
FROM _errors
GROUP BY source
ORDER BY count DESC