Quickstart (CLI)
This section covers ingesting, querying and publishing data using the Airfold CLI.
Use-case
In this guide, we’ll develop an API to identify the top 10 most searched products on an e-commerce platform.
We’ll process e-commerce events to track user interactions like item views, cart additions, and checkout completions. Our dataset, a 50-million-row CSV file, captures these activities.
Our steps include:
- Ingesting the dataset.
- Executing queries to filter, aggregate, and refine the data into a top 10 list.
- Exposing the top 10 results through an HTTP API.
Create a Workspace
Before we begin, we need to create a workspace to store our data and resources, as well as a token to authenticate our CLI.
- Go to Airfold and create a new workspace.
- Copy an Admin token from the workspace’s Token page.
The token should look like this: aft_6eab8fcd902e4cbfb63ba174469989cd.Ds1PME5dQsJKosKQWVcZiBSlRFBbmhzIocvHg8KQddV
.
Set up the CLI
The CLI requires Python 3.10 or higher.
- Install the CLI using
pip install airfold-cli
. - Run
af config
and paste your token when prompted.
$ af config
Configuring for API URL: https://api.airfold.co
? Api key: <your token>
To see a list of available commands just run af
.
Create a Source
To create a source, a YAML file defining the source is required.
Let’s generate a source by inferring the schema from a CSV file:
$ af source create https://airfold-demo-data.s3.us-east-2.amazonaws.com/quickstart/events.csv .
The CLI will infer the schema from the CSV file and generate a YAML file with the following contents:
name: events
cols:
acceptcharset: String
acceptencoding: String
acceptlanguage: String
browsername: String
browserversion: String
cachecontrol: String
city: String
connection: String
contentlength: String
contenttype: String
country: String
cpuarchitecture: String
devicemodel: String
devicetype: String
devicevendor: String
enginename: String
engineversion: String
event_ts: DateTime64(3)
from: String
headers: String
host: String
ip_address: String
is_bot: Int64
latitude: Float64
log_level: String
log_message: String
longitude: Float64
method: String
origin: String
osname: String
osversion: String
protocol: String
referer: String
region: String
url: String
useragent: String
via: String
xforwaredforip: String
settings:
engine: MergeTree()
order_by: (`event_ts`, `ip_address`)
partition_by: toYYYYMM(event_ts)
Push the source to your workspace:
$ af push ./sources/events.yaml
List the sources in your workspace to confirm that the source was created:
$ af source ls
- events: 0 rows | 0 bytes | 0 errors | created 1 min ago | updated 1 min ago
Ingest Data
Now that we have a source, we can ingest data into it.
Let’s ingest the file we used to generate the source:
$ af source append events https://airfold-demo-data.s3.us-east-2.amazonaws.com/quickstart/events.csv
Create a Pipe
All the SQL queries in Airfold are written inside pipes
.
Pipes are a sequence of one or more SQL queries (Nodes) that are executed in order, each of those SQL queries is called a Node, and each Node is simply a SELECT
statement.
Use pipes to build features over your data!
Let’s create a pipe to identify the counts by log level:
nodes:
- node1:
sql: |
SELECT
log_level,
count()
FROM
events
GROUP BY
log_level
Push the pipe to your workspace:
$ af push levels.yaml
List the pipes in your workspace to confirm that the pipe was created:
$ af pipe ls
- levels: draft | created 1 min ago | updated 1 min ago
Publish the Results
We’ve pushed the pipe to our workspace, but we have not yet published an endpoint from it.
To publish the pipe, we need to turn it into a published pipe.
To do so, we need to set the publish
field to the name of the endpoint we want to publish:
nodes:
- node1:
sql: |
SELECT
log_level,
count()
FROM
events
GROUP BY
log_level
publish: levels
Push the pipe to your workspace:
$ af push levels.yaml
List the pipes in your workspace to confirm that the pipe was created:
$ af pipe ls
- top10: published | created 1 min ago | updated 1 min ago
Query the results using the API:
$ curl --request GET \
--url https://api.airfold.co/v1/pipes/levels.json \
--header 'Authorization: Bearer <use auth token here>'
{
"meta":
[
{
"name": "log_level",
"type": "String"
},
{
"name": "count()",
"type": "UInt64"
}
],
"data":
[
{
"log_level": "WARN",
"count()": "2"
},
{
"log_level": "INFO",
"count()": "8"
}
],
"rows": 2,
"statistics":
{
"elapsed": 0.001772939,
"rows_read": 10,
"bytes_read": 130
}
}
Or using the CLI:
$ af pipe query levels --format json
[...]
Next Steps
Congratulations! You’ve successfully ingested, queried, and published data in Airfold.
Now that you’ve learned the basics, you can explore workspaces, sources, pipes, and tokens.
Or, check the reference of the API and the CLI to learn more about how to interact with Airfold.