Overview

In this guide, we’ll develop an API to identify the top-mentioned features in sales calls.

We’ll process web events data to extract top referrers using the sample web events CSV file.

Our steps include:

  • Ingesting the dataset
  • Executing queries to get the top 3 referrers
  • Exposing the features through an API

Create a Workspace

Before we begin, we need to create a workspace to store our data and resources, as well as a token to authenticate our CLI.

  1. Go to Airfold and create a new workspace.
  2. Copy an admin token from the workspace’s Keys page.

The token should look like this: aft_6eab8fcd902e4cbfb63ba174469989cd.Ds1PME5dQsJKosKQWVcZiBSlRFBbmhzIocvHg8KQddV.

Set up the CLI.

The CLI requires Python 3.10 or higher.
  1. Install the CLI using pip install airfold-cli.
  2. Run af config and paste your token when prompted.
pip install airfold-cli
af config

Create a Source

First, download web_events_sample.csv

To create a source, a YAML file defining the source is required.

Let’s generate a source by inferring the schema from a CSV file.
(Replace /path/to/web_events_sample.csv with the actual path):

af source create path/to/web_events_sample.csv .

The CLI will infer the schema from the CSV file and generate a YAML file with the following contents:

./sources/web_events_sample.yaml
name: web_events_sample
cols:
  event_id: Int64
  user_id: Int64
  event_type: String
  page_url: String
  timestamp: DateTime
  referrer: String
settings:
  engine: MergeTree()
  order_by: "`event_id`"
  partition_by: toYYYYMM(timestamp)

Push the source definition to your workspace:

af push path/to/web_events_sample.yaml

At this point, our source has been created, but no data has been ingested yet.

Verify creation by listing sources:

af source ls

Ingest Data

With the source set up, ingest the CSV data:

af source append web_events_sample path/to/web_events_sample.csv

Further Analysis with Pipes

To identify top referrers in web events data, create an insights.yaml file:

insights.yaml
nodes:
  - node1:
      sql: |
        SELECT
          referrer,
          COUNT() AS num_referrers
        FROM web_events_sample
        GROUP BY referrer
        ORDER BY num_referrers DESC
        LIMIT 3
publish: insights

Push the insights pipe:

af push path/to/insights.yaml

Query Results

Use the API:

curl --request GET \
--url https://api.airfold.co/v1/pipes/insights.json \
--header 'Authorization: Bearer <use auth token here>'

Or the CLI:

af pipe query insights --format json

Which should output:

{
  "meta": [
    {
      "name": "referrer",
      "type": "String"
    },
    {
      "name": "num_referrers",
      "type": "UInt64"
    }
  ],
  "data": [
    {
      "referrer": "twitter.com",
      "num_referrers": "24"
    },
    {
      "referrer": "direct",
      "num_referrers": "21"
    },
    {
      "referrer": "facebook.com",
      "num_referrers": "18"
    }
  ],
  "rows": 3,
  "rows_before_limit_at_least": 6,
  "statistics": {
    "elapsed": 0.007243332,
    "rows_read": 100,
    "bytes_read": 1830
  }
}

Next Steps

You’ve successfully ingested, analyzed, and published data using Airfold in a few simple steps! This workflow enables intuitive interaction with data, transforming raw web events data into actionable insights.

Feel free to dive deeper into specific concepts, such as workspaces, sources, and more!

Overview

In this guide, we’ll develop an API to identify the top-mentioned features in sales calls.

We’ll process web events data to extract top referrers using the sample web events CSV file.

Our steps include:

  • Ingesting the dataset
  • Executing queries to get the top 3 referrers
  • Exposing the features through an API

Create a Workspace

Before we begin, we need to create a workspace to store our data and resources, as well as a token to authenticate our CLI.

  1. Go to Airfold and create a new workspace.
  2. Copy an admin token from the workspace’s Keys page.

The token should look like this: aft_6eab8fcd902e4cbfb63ba174469989cd.Ds1PME5dQsJKosKQWVcZiBSlRFBbmhzIocvHg8KQddV.

Set up the CLI.

The CLI requires Python 3.10 or higher.
  1. Install the CLI using pip install airfold-cli.
  2. Run af config and paste your token when prompted.
pip install airfold-cli
af config

Create a Source

First, download web_events_sample.csv

To create a source, a YAML file defining the source is required.

Let’s generate a source by inferring the schema from a CSV file.
(Replace /path/to/web_events_sample.csv with the actual path):

af source create path/to/web_events_sample.csv .

The CLI will infer the schema from the CSV file and generate a YAML file with the following contents:

./sources/web_events_sample.yaml
name: web_events_sample
cols:
  event_id: Int64
  user_id: Int64
  event_type: String
  page_url: String
  timestamp: DateTime
  referrer: String
settings:
  engine: MergeTree()
  order_by: "`event_id`"
  partition_by: toYYYYMM(timestamp)

Push the source definition to your workspace:

af push path/to/web_events_sample.yaml

At this point, our source has been created, but no data has been ingested yet.

Verify creation by listing sources:

af source ls

Ingest Data

With the source set up, ingest the CSV data:

af source append web_events_sample path/to/web_events_sample.csv

Further Analysis with Pipes

To identify top referrers in web events data, create an insights.yaml file:

insights.yaml
nodes:
  - node1:
      sql: |
        SELECT
          referrer,
          COUNT() AS num_referrers
        FROM web_events_sample
        GROUP BY referrer
        ORDER BY num_referrers DESC
        LIMIT 3
publish: insights

Push the insights pipe:

af push path/to/insights.yaml

Query Results

Use the API:

curl --request GET \
--url https://api.airfold.co/v1/pipes/insights.json \
--header 'Authorization: Bearer <use auth token here>'

Or the CLI:

af pipe query insights --format json

Which should output:

{
  "meta": [
    {
      "name": "referrer",
      "type": "String"
    },
    {
      "name": "num_referrers",
      "type": "UInt64"
    }
  ],
  "data": [
    {
      "referrer": "twitter.com",
      "num_referrers": "24"
    },
    {
      "referrer": "direct",
      "num_referrers": "21"
    },
    {
      "referrer": "facebook.com",
      "num_referrers": "18"
    }
  ],
  "rows": 3,
  "rows_before_limit_at_least": 6,
  "statistics": {
    "elapsed": 0.007243332,
    "rows_read": 100,
    "bytes_read": 1830
  }
}

Next Steps

You’ve successfully ingested, analyzed, and published data using Airfold in a few simple steps! This workflow enables intuitive interaction with data, transforming raw web events data into actionable insights.

Feel free to dive deeper into specific concepts, such as workspaces, sources, and more!