Quickstart (CLI)
This section covers ingesting, creating AI columns, querying, and publishing data using the Airfold CLI.
Overview
In this guide, we’ll develop an API to identify the top mentioned features in sales calls.
We’ll process sales call transcripts to extract insights like competitors mentioned, features mentioned, and whether it was successful. Our dataset, a 70-row CSV file, captures these activities.
Our steps include:
- Ingesting the dataset
- Creating AI columns
- Executing queries to filter, aggregate, and refine the into a top 2 list of features
- Exposing the top features through an API
Create a Workspace
Before we begin, we need to create a workspace to store our data and resources, as well as a token to authenticate our CLI.
- Go to Airfold and create a new workspace.
- Copy an Admin token from the workspace’s Token page.
The token should look like this: aft_6eab8fcd902e4cbfb63ba174469989cd.Ds1PME5dQsJKosKQWVcZiBSlRFBbmhzIocvHg8KQddV
.
Set up the CLI
The CLI requires Python 3.10 or higher.
- Install the CLI using
pip install airfold-cli
. - Run
af config
and paste your token when prompted.
Create a Source
First, download sales_calls.csv
To create a source, a YAML file defining the source is required.
Let’s generate a source by inferring the schema from a CSV file
(replace /path/to/sales_calls.csv
with actual filepath):
The CLI will infer the schema from the CSV file and generate a YAML file with the following contents:
Push the source definition to your workspace:
At this point, our source has been created but no data has been ingested yet.
Verify creation by listing sources:
Ingest Data
With the source set up, ingest the CSV data:
Create an AI Table
To derive insights, we can create AI columns that automatically extract competitor mentions, feature mentions, and success indicators.
In your workspace, create a analysis.yaml
file with this configuration:
ai_cols
defines our AI-generated columns:
successful
: Identifies whether the call was successful based on the transcript contentcompetitors_mentioned
: Extracts any competitor names mentioned in the transcriptfeatures_mentioned
: Lists any product features that were discussed
To see more details on the parameters of ai_cols
, check out source.
Push this configuration to create the AI Table:
Ingest Data from sales_calls
to analysis
Create a pipe to transfer data from sales_calls to analysis.
Pipes in Airfold are used to run sequences of SQL queries that can transform, filter, and aggregate data across tables.
Each step in a pipe is called a “node,” and each node runs a single SQL statement, allowing for more complex data analysis workflows.
Define it in a new pipe_to_analysis.yaml
file:
Push the pipe configuration:
Now, ingest data using:
At this point, we have two populated tables:
sales_calls
: containing original dataID
andTranscript
analysis
: containing data fromsales_calls
and additional AI columns
Further Analysis with Pipes
To identify top-mentioned features in successful calls, create an insights.yaml
file:
arrayJoin(features_mentioned) AS feature
: This function “explodes” the features_mentioned array, creating a row for each feature mentionedcount(): Counts the total successful calls
topK(2)(feature)
: Extracts the two most frequently mentioned featuresWHERE successful = true
: Filters the results to include only successful callspublish
: publishes this as an endpoint, this line may be excluded if you do not wish to publish as an endpoint
Push the insights pipe:
Query Results
Use the API:
Or the CLI:
Which should output:
Next Steps
You’ve successfully ingested, analyzed, and published data using Airfold in a few simple steps! This workflow enables intuitive interaction with unstructured data, transforming raw call transcripts into actionable insights.
Feel free to dive deeper into specific concepts, such as workspaces, sources, and more!