Table Engines

ClickHouse provides a variety of table engines that define how data is physically stored, merged, deduplicated, and aggregated. Choosing the appropriate engine is crucial for designing scalable, performant data pipelines — especially in analytics platforms where raw event ingestion and pre-aggregation coexist. Airfold runs Clickhouse in the backend, for more deep dive details make sure to check Clickhouse documentation on Table Engines. In Airfold, the following are the most commonly used engines:

`MergeTree`

MergeTree is the foundational engine in ClickHouse, optimized for high-volume data ingestion and fast analytical queries. It supports background merging of data parts, primary key indexing, and partitioning. Key Features:

Stores data sorted by a primary key, enabling fast range scans
Breaks data into granules (~8192 rows) for efficient index pruning
Supports partitioning (e.g., by day/month) for retention and query optimization
Merges data parts in the background to reduce small parts and reclaim space
Ideal for append-only data like web events, logs, and telemetry

Common Use Cases:

Event tracking (e.g., web_events, page_views)
Session reconstruction from time-series data
Time-partitioned storage with retention logic

`ReplacingMergeTree`

ReplacingMergeTree extends MergeTree by adding deduplication capabilities during background merges. It is useful when duplicate or outdated rows are expected, such as from retries or updates. Key Features:

Removes duplicates based on the ORDER BY key during merges
If a version column is specified, retains only the most recent version
Deduplication happens only during background merges, not on insert

Common Use Cases:

Cleaning up duplicates from at-least-once ingestion
Storing dimension tables that receive frequent corrections
Handling data updates with soft upserts

Deduplication works on the sorting key, not the primary key. Requires correct ordering and optional versioning to work effectively.

`AggregatingMergeTree`

AggregatingMergeTree is designed to store pre-aggregated data using aggregate function states. It is tailored for analytics workloads where aggregation results are computed once and queried frequently. Key Features:

Stores aggregate function states like sumState(), uniqState(), etc.
Merges rows with the same key using merge functions (e.g., sumMerge())
Reduces query latency and compute cost for recurring aggregations

Common Use Cases:

Materialized views with pre-aggregated metrics (e.g., DAU, conversions)
Storing rollup tables for dashboards or reports
Efficient support for time-bucketed aggregations

️ Not suitable for raw event storage. Must use aggregate functions when inserting data.

Choosing the Right Engine

Use Case	Recommended Engine	Rationale
Raw event ingestion	`MergeTree`	Fast inserts, no deduplication, full event fidelity
Deduplicated or updated records	`ReplacingMergeTree`	Handles retries and corrections via background deduplication
Pre-aggregated summaries	`AggregatingMergeTree`	Efficient storage and querying of aggregate function states

By selecting the correct engine per use case, your queries will run much more faster utilizing the mechanics behind each of the engine type, there are many more engine types which you can see at clickhouse docs but these are the main ones you will usually use.

Get Started

Manage your Workspace

Ingest your data

Transform your data

Production Best Practices

CLI Config Reference

Legal

`MergeTree`

`ReplacingMergeTree`

`AggregatingMergeTree`

Choosing the Right Engine

Get Started

Manage your Workspace

Ingest your data

Transform your data

Production Best Practices

CLI Config Reference

Legal

​MergeTree

​ReplacingMergeTree

​AggregatingMergeTree

​Choosing the Right Engine

`MergeTree`

`ReplacingMergeTree`

`AggregatingMergeTree`

Choosing the Right Engine