Quickstart (CLI)
This section covers ingesting, querying and publishing data using the Airfold CLI.
Use-case
In this guide, we’ll develop an API to identify the top 10 most searched products on an e-commerce platform.
We’ll process e-commerce events to track user interactions like item views, cart additions, and checkout completions. Our dataset, a 50-million-row CSV file, captures these activities.
Our steps include:
- Ingesting the dataset.
- Executing queries to filter, aggregate, and refine the data into a top 10 list.
- Exposing the top 10 results through an HTTP API.
Create a Workspace
Before we begin, we need to create a workspace to store our data and resources, as well as a token to authenticate our CLI.
- Go to Airfold and create a new workspace.
- Copy an Admin token from the workspace’s Token page.
The token should look like this: aft_6eab8fcd902e4cbfb63ba174469989cd.Ds1PME5dQsJKosKQWVcZiBSlRFBbmhzIocvHg8KQddV
.
Set up the CLI
The CLI requires Python 3.10 or higher.
- Install the CLI using
pip install airfold-cli
. - Run
af config
and paste your token when prompted.
To see a list of available commands just run af
.
Create a Source
To create a source, a YAML file defining the source is required.
Let’s generate a source by inferring the schema from a CSV file:
The CLI will infer the schema from the CSV file and generate a YAML file with the following contents:
Push the source to your workspace:
List the sources in your workspace to confirm that the source was created:
Ingest Data
Now that we have a source, we can ingest data into it.
Let’s ingest the file we used to generate the source:
Create a Pipe
All the SQL queries in Airfold are written inside pipes
.
Pipes are a sequence of one or more SQL queries (Nodes) that are executed in order, each of those SQL queries is called a Node, and each Node is simply a SELECT
statement.
Use pipes to build features over your data!
Let’s create a pipe to identify the counts by log level:
Push the pipe to your workspace:
List the pipes in your workspace to confirm that the pipe was created:
Publish the Results
We’ve pushed the pipe to our workspace, but we have not yet published an endpoint from it.
To publish the pipe, we need to turn it into a published pipe.
To do so, we need to set the publish
field to the name of the endpoint we want to publish:
Push the pipe to your workspace:
List the pipes in your workspace to confirm that the pipe was created:
Query the results using the API:
Or using the CLI:
Next Steps
Congratulations! You’ve successfully ingested, queried, and published data in Airfold.
Now that you’ve learned the basics, you can explore workspaces, sources, pipes, and tokens.
Or, check the reference of the API and the CLI to learn more about how to interact with Airfold.