Bring your data into Airfold
In Airfold, a Source is the foundational storage unit, comparable to a table in a traditional database. It stores structured data in rows and columns based on a predefined schema and serves as the entry point for data ingestion into the platform.
Once data is ingested, it’s stored in a Source and made immediately available for querying with SQL.
Sources power all downstream transformations and API endpoints in Airfold, making them a critical part of your analytics workflow. They are designed for speed, schema clarity, and reliability, helping ensure your data remains fresh, queryable, and production-ready.
To ingest your data into a source, you have two options:
Seamlessly sync data from your existing systems into Airfold using built-in connectors for platforms like S3, Postgres, or Snowflake. Ideal for scheduled or high-volume ingestion.
Push data directly to Airfold in real time using our API. Instantly populate your Source and begin querying; perfect for syncing event data as it happens, with no batching required.
Whether you ingest data with a connector or through the Airfold API, once your data lands in a source, it is instantly queryable and ready to power your analytics pipelines.
Each Source has its own schema that defines the table’s columns, data types, and table settings.
Defining a clear, well-structured schema for your Source is key to ensuring fast, efficient queries with minimal latency and resource usage. There are several Airfold specific things that you need to consider when finalizing your schema such as:
Table settings
engine type
(required)Primary Key
defines how data is stored on disc (required)Partitions
- Used mainly for data management (optional)Data types
There are many data types in Airfold that you might not find in other platforms that can help to cut down on storage cost and query efficiency.
Having an optimized schema is crucial, especially when your Source is storing data at scale. What’s more, your schema settings cannot be changed once the table is created. We strongly recommend reading the schema section in the docs!
To create a source, you can either use the Airfold UI (simplest and quickest method) or you can define your Source details in YAML files and then push to your workspace using the Airfold CLI. Below we have shared an example of using either method.
Navigate to “Sources” on the left menu bar, and click on ”+”:
From this window, you can either choose to create a Source that uses a connector or to create a Source that will infer a schema from a text snippet, a file upload, or an external URL.
💡 Tip: If you plan on ingesting data using the Airfold API, you need to create a source without a connector. Selecting the
Text
option in this window will allow you to create an unconnected Source, but still define its schema - ready for data to be ingested into.
Below is an example of creating a Source using a file upload
. This will create an unconnected Source that will infer its schema from the file uploaded. You can choose whether or not to actually ingest the data from the file OR to just use the file as a means of inferring a schema.
After clicking File Upload
, you will be asked to specify the file.
On the next page, you can confirm the Name
of your Source. The Source name is how you will reference your data in your SQL queries - so make sure to select a SQL-friendly name!
You can also choose whether or not to actually ingest the data from the file (or just infer its schema). Check the box if you want to ingest the data from the uploaded file into your Source.
In this window, you can also modify the schema
and the table settings
. You can update names of columns, data types, engine type etc. Refer to the schema page of these docs for best practices.
Click Create to finalize the Source.
Once your source is successfully created, the UI has several tabs that allow you to see useful information:
The graphs at the top of the page show your Usage metrics
so that you can monitor your key Source metrics. The tabs below the graph:
Data
Gives you a preview of your data. You can explore the data by using:
Filters
allow you to filter out certain rows based on a conditionSort
can be used to order your rows in ascending/descending order based on a specified columnGroup by
is for organizing your rows by a specified column, grouping related rows together based on a shared value in that columnSchema
provides the table schema and ClickHouse settings.
Data Graph
shows all the dependencies for the Source. As you build downstream analytics from the Source, using Pipes, the dependency chart will populate showing all nodes that reference this source.
Logs
provides error logs. Any ingestion errors can be seen there.
If you are using the CLI, Sources are defined with YAML files - usually in your /Sources
directory. For a Source that isn’t using a connector, your YAML should look like this:
The type of table that gets created. For Sources that don’t use connectors, this value should be Table
The name given to the Source
A brief overview of the source’s content or purpose (optional)
Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)
Engine Types
for the source, which can include ORDER BY
, PARTITION BY
, Table Engine,
etc. The settings can be a String, a key-value pair, or an array comprising
either or both. Optional.
Push sources to your workspace using the CLI command af push
.
For example, to push web_events.yaml
, run:
Once you have successfully pushed your new Source, it should be visible in the UI and ready to ingest data!
For your next steps, try writing some SQL queries against your new Source in a Pipe.
Bring your data into Airfold
In Airfold, a Source is the foundational storage unit, comparable to a table in a traditional database. It stores structured data in rows and columns based on a predefined schema and serves as the entry point for data ingestion into the platform.
Once data is ingested, it’s stored in a Source and made immediately available for querying with SQL.
Sources power all downstream transformations and API endpoints in Airfold, making them a critical part of your analytics workflow. They are designed for speed, schema clarity, and reliability, helping ensure your data remains fresh, queryable, and production-ready.
To ingest your data into a source, you have two options:
Seamlessly sync data from your existing systems into Airfold using built-in connectors for platforms like S3, Postgres, or Snowflake. Ideal for scheduled or high-volume ingestion.
Push data directly to Airfold in real time using our API. Instantly populate your Source and begin querying; perfect for syncing event data as it happens, with no batching required.
Whether you ingest data with a connector or through the Airfold API, once your data lands in a source, it is instantly queryable and ready to power your analytics pipelines.
Each Source has its own schema that defines the table’s columns, data types, and table settings.
Defining a clear, well-structured schema for your Source is key to ensuring fast, efficient queries with minimal latency and resource usage. There are several Airfold specific things that you need to consider when finalizing your schema such as:
Table settings
engine type
(required)Primary Key
defines how data is stored on disc (required)Partitions
- Used mainly for data management (optional)Data types
There are many data types in Airfold that you might not find in other platforms that can help to cut down on storage cost and query efficiency.
Having an optimized schema is crucial, especially when your Source is storing data at scale. What’s more, your schema settings cannot be changed once the table is created. We strongly recommend reading the schema section in the docs!
To create a source, you can either use the Airfold UI (simplest and quickest method) or you can define your Source details in YAML files and then push to your workspace using the Airfold CLI. Below we have shared an example of using either method.
Navigate to “Sources” on the left menu bar, and click on ”+”:
From this window, you can either choose to create a Source that uses a connector or to create a Source that will infer a schema from a text snippet, a file upload, or an external URL.
💡 Tip: If you plan on ingesting data using the Airfold API, you need to create a source without a connector. Selecting the
Text
option in this window will allow you to create an unconnected Source, but still define its schema - ready for data to be ingested into.
Below is an example of creating a Source using a file upload
. This will create an unconnected Source that will infer its schema from the file uploaded. You can choose whether or not to actually ingest the data from the file OR to just use the file as a means of inferring a schema.
After clicking File Upload
, you will be asked to specify the file.
On the next page, you can confirm the Name
of your Source. The Source name is how you will reference your data in your SQL queries - so make sure to select a SQL-friendly name!
You can also choose whether or not to actually ingest the data from the file (or just infer its schema). Check the box if you want to ingest the data from the uploaded file into your Source.
In this window, you can also modify the schema
and the table settings
. You can update names of columns, data types, engine type etc. Refer to the schema page of these docs for best practices.
Click Create to finalize the Source.
Once your source is successfully created, the UI has several tabs that allow you to see useful information:
The graphs at the top of the page show your Usage metrics
so that you can monitor your key Source metrics. The tabs below the graph:
Data
Gives you a preview of your data. You can explore the data by using:
Filters
allow you to filter out certain rows based on a conditionSort
can be used to order your rows in ascending/descending order based on a specified columnGroup by
is for organizing your rows by a specified column, grouping related rows together based on a shared value in that columnSchema
provides the table schema and ClickHouse settings.
Data Graph
shows all the dependencies for the Source. As you build downstream analytics from the Source, using Pipes, the dependency chart will populate showing all nodes that reference this source.
Logs
provides error logs. Any ingestion errors can be seen there.
If you are using the CLI, Sources are defined with YAML files - usually in your /Sources
directory. For a Source that isn’t using a connector, your YAML should look like this:
The type of table that gets created. For Sources that don’t use connectors, this value should be Table
The name given to the Source
A brief overview of the source’s content or purpose (optional)
Defines the schema of the source as a list of columns where each column name is a key and its data type is the value (see Data Types)
Engine Types
for the source, which can include ORDER BY
, PARTITION BY
, Table Engine,
etc. The settings can be a String, a key-value pair, or an array comprising
either or both. Optional.
Push sources to your workspace using the CLI command af push
.
For example, to push web_events.yaml
, run:
Once you have successfully pushed your new Source, it should be visible in the UI and ready to ingest data!
For your next steps, try writing some SQL queries against your new Source in a Pipe.