Key Data Types You’ll Encounter in Airfold
ClickHouse has many unique data types, combinations of types, and specialized wrappers. We recommend exploring the official ClickHouse documentation for a full reference. This page highlights a few data types you might not have used before but will likely encounter when working with Airfold.AggregateFunction Type
AggregateFunction
might sound like a function but it is actually a data type, most likely used in an incremental materilized view. This type doesn’t store a final value like a number or string. Instead, it stores the intermediate state of an aggregation—something that can be merged later.
For example, say you have an incrmental view with a function like:
messages_sent
is an Int64 column. This function doesn’t store the result as an Int64 like you might expect. Instead, it produces a column of type:
An AggregateFunction requires two positional arguments:
- The aggregation function (e.g.,
sum
,uniqExact
) - The data type of the column being aggregated (e.g.,
Int64
,String
).
simpleAggregateFunction
The younger brother toAggregateFunction
, SimpleAggregateFunction
stores the final result of an aggregation, rather than its intermediate state. It’s a simpler alternative to AggregateFunction, and while less powerful, it can be useful when you don’t need to merge partial states across batches.
Use SimpleAggregateFunction when:
- You only need the final result of an aggregation (like a total count or sum)
- You don’t need to merge partial results across batches
- You want to store an aggregated value that behaves like a regular column in your queries
LowCardinality(T)
LowCardinality()
is a wrapper type that can be applied to certain data types to optimize performance and reduce storage. For example, LowCardinality(String)
behaves like a regular string—but under the hood, it’s optimized for columns with repeated values.
Instead of storing the full string value in every row, ClickHouse builds a dictionary of unique values and stores small integer references in the actual column. This is similar to dictionary encoding and can significantly reduce memory usage and speed up operations like filtering and grouping.
If you have a column with fewer than ~10,000 distinct values and an eligible base type, using LowCardinality can radically improve query performance and reduce storage costs compared to the equivalent non-encoded type.
You can wrap the following types with LowCardinality(…):
- String
- FixedString(N)
- UUID
- Any Integer type (Int8 to Int64, UInt8 to UInt64)
- Date
- DateTime
- Float32, Float64
- Array(…)
- Tuple(…)
- Map(…)
- Nullable(…)
enum8/enum16
At first glance, LowCardinality(String) might seem similar to ClickHouse’s Enum8 or Enum16, since both are optimized for repeated categorical values. They serve different purposes however: The Enum data type is great for when you have a small, fixed set of valid string values that won’t change - likestatus
, role
, or priority
. It offers excellent performance and enforces strict value validation at insert time:
Nullable(…)
In ClickHouse, nullability is explicit. If a column can contain NULL, its type must be wrapped in Nullable(…). For example:- The underlying value (e.g., an Int64)
- A parallel bitmap that tracks which rows are NULL
UUID
ClickHouse includes a native UUID type for storing universally unique identifiers. It’s ideal for identifying entities like user_id, message_id, or org_id with high uniqueness guarantees and efficient performance. Unlike some databases that store UUIDs as strings, ClickHouse treats UUID as a dedicated binary type, offering several advantages such as compact storage (16 bytes instead of 36+ for a string) and more performant querying (faster filtering, joining, and sorting)UUID = UUID
, not UUID = String
.
Nested and Semi-Structured Data
ClickHouse is primarily a columnar database, but it also supports working with nested and semi-structured data. This is especially useful when dealing with grouped values, JSON objects, or event logs where one row may represent multiple related values. The most common types you’ll see in Airfold for handling this kind of data are:Array(T)
Array(T) is a container type that stores a list of values, where all elements share the same type T. Arrays are commonly used in ClickHouse to represent grouped data, batched events, or multi-valued fields.- First,
groupArray(message_id)
collects all message IDs per user into an array. - Then,
arrayJoin()
explodes the array so that each message_id gets its own row—flattening the structure.
Tip: This is especially useful when working with grouped metrics, multi-event logs, or compact API responses that store arrays in a single row.
Tuple(T1, T2, …)
Stores a fixed set of values of different types. Kind of like a mini-row inside a column.- String is for the event type (e.g., “click”, “open”)
- Int64 is the count of how many times it occurred
- UInt8 is used to represent a boolean (0 = false, 1 = true) — Type bool in ClickHouse is internally stored as UInt8. Possible values are true (1), false (0).