Akin to tables in a database, sources store data according to a defined schema.

All the workspace data lives inside sources, which can be queried, filtered, aggregated, and joined using SQL.

UI

Ingesting Data

Navigate to “Sources” on the left menu bar, and click on ”+”:

Here you can either create a source by inferring a schema from Text, File or URL.

Or you can use Connectors to ingest data from external sources.

Inferred sources will be empty.
You can put some data into them using the Ingest API

Select Type

In this example, we will upload a csv file through File Upload:

Then, click Next.

Confirm Schema

Name your source and confirm the schema:

Click Create.

Metrics

Once your source is successfully created, you can view its ingestion metrics and sample of the data.

You may use the table sample features:

  • Filters allows you to filter out certain rows based on a condition
  • Sort can be used to order your rows in ascending/descending order based on a specified column
  • Group by is for organizing your rows by a specified column, grouping related rows together based on a shared value in that column

There are also tabs on each Source page:

Schema provides the table schema and ClickHouse settings.

Data Graph shows all the dependencies for the Source

Logs provides error logs. All the ingestion errors can be seen there.

CLI

Creating a YAML file

Define the schema for your source in a YAML file:

product_catalog.yaml
name: product_catalog
description: catalog of store products
  cols:
  ProductID: String
  Name: String
  Category: String
  Description: String
  Price: Float64
settings:
  engine: MergeTree()
  order_by: "`ProductID`"
  partition_by: tuple()

Properties

type
string

Type of source: Table or AITable, see more on AI table (Optional)

name
string

Name of source

description
string

A brief overview of the source’s content or purpose (optional)

cols
{name: type}
required

Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)

settings
string | {key: value} | array

ClickHouse table configuration for the source, which can include ORDER BY, PARTITION BY, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.

Push

Push sources to your workspace using the CLI command af push.

For example, to push sales_calls.yaml, run:

af push sales_calls.yaml

Unstructured Data

So far, we’ve seen examples of ingesting structured data such as product_catalog shown above.

However, one of Airfold’s key strengths is how it allows you to work with unstructured data.

Examples of unstructured data:

  • Sales calls transcripts
  • Customer Reviews
  • Customer Service Tickets

The process for ingesting unstructured data remains the same as with structured data demonstrated above. However, upon ingestion, you can extract meaningful insights from your unstructured data using AI Tables.