Sources
Bring your data into Airfold
Akin to tables in a database, sources store data according to a defined schema.
All the workspace data lives inside sources, which can be queried, filtered, aggregated, and joined using SQL.
UI
Ingesting Data
Navigate to “Sources” on the left menu bar, and click on ”+”:
Here you can either create a source by inferring a schema from Text, File or URL.
Or you can use Connectors to ingest data from external sources.
Inferred sources will be empty.
You can put some data into them using the Ingest API
Select Type
In this example, we will upload a csv file through File Upload
:
Then, click Next
.
Confirm Schema
Name your source and confirm the schema:
Click Create
.
Metrics
Once your source is successfully created, you can view its ingestion metrics and sample of the data.
You may use the table sample features:
Filters
allows you to filter out certain rows based on a conditionSort
can be used to order your rows in ascending/descending order based on a specified columnGroup by
is for organizing your rows by a specified column, grouping related rows together based on a shared value in that column
There are also tabs on each Source
page:
Schema
provides the table schema and ClickHouse settings.
Data Graph
shows all the dependencies for the Source
Logs
provides error logs. All the ingestion errors can be seen there.
CLI
Creating a YAML file
Define the schema for your source in a YAML file:
Properties
Name of source
A brief overview of the source’s content or purpose (optional)
Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)
ClickHouse table configuration for the source, which can include ORDER BY
, PARTITION BY
, Table Engine, etc. The settings can be a String, a key-value pair, or an array comprising either or both. Optional.
Push
Push sources to your workspace using the CLI command af push
.
For example, to push sales_calls.yaml
, run:
Unstructured Data
So far, we’ve seen examples of ingesting structured data such as product_catalog
shown above.
However, one of Airfold’s key strengths is how it allows you to work with unstructured data.
Examples of unstructured data:
- Sales calls transcripts
- Customer Reviews
- Customer Service Tickets
The process for ingesting unstructured data remains the same as with structured data demonstrated above. However, upon ingestion, you can extract meaningful insights from your unstructured data using AI Tables.