ClickHouse provides a variety of table engines that define how data is physically stored, merged, deduplicated, and aggregated. Choosing the appropriate engine is crucial for designing scalable, performant data pipelines — especially in analytics platforms where raw event ingestion and pre-aggregation coexist. Airfold runs Clickhouse in the backend, for more deep dive details make sure to check Clickhouse documentation on Table Engines. In Airfold, the following are the most commonly used engines:

MergeTree

MergeTree is the foundational engine in ClickHouse, optimized for high-volume data ingestion and fast analytical queries. It supports background merging of data parts, primary key indexing, and partitioning.

Key Features:

  • Stores data sorted by a primary key, enabling fast range scans
  • Breaks data into granules (~8192 rows) for efficient index pruning
  • Supports partitioning (e.g., by day/month) for retention and query optimization
  • Merges data parts in the background to reduce small parts and reclaim space
  • Ideal for append-only data like web events, logs, and telemetry

Common Use Cases:

  • Event tracking (e.g., web_events, page_views)
  • Session reconstruction from time-series data
  • Time-partitioned storage with retention logic

ReplacingMergeTree

ReplacingMergeTree extends MergeTree by adding deduplication capabilities during background merges. It is useful when duplicate or outdated rows are expected, such as from retries or updates.

Key Features:

  • Removes duplicates based on the ORDER BY key during merges
  • If a version column is specified, retains only the most recent version
  • Deduplication happens only during background merges, not on insert

Common Use Cases:

  • Cleaning up duplicates from at-least-once ingestion
  • Storing dimension tables that receive frequent corrections
  • Handling data updates with soft upserts

Deduplication works on the sorting key, not the primary key. Requires correct ordering and optional versioning to work effectively.

AggregatingMergeTree

AggregatingMergeTree is designed to store pre-aggregated data using aggregate function states. It is tailored for analytics workloads where aggregation results are computed once and queried frequently.

Key Features:

  • Stores aggregate function states like sumState(), uniqState(), etc.
  • Merges rows with the same key using merge functions (e.g., sumMerge())
  • Reduces query latency and compute cost for recurring aggregations

Common Use Cases:

  • Materialized views with pre-aggregated metrics (e.g., DAU, conversions)
  • Storing rollup tables for dashboards or reports
  • Efficient support for time-bucketed aggregations

️ Not suitable for raw event storage. Must use aggregate functions when inserting data.

Choosing the Right Engine

Use CaseRecommended EngineRationale
Raw event ingestionMergeTreeFast inserts, no deduplication, full event fidelity
Deduplicated or updated recordsReplacingMergeTreeHandles retries and corrections via background deduplication
Pre-aggregated summariesAggregatingMergeTreeEfficient storage and querying of aggregate function states

By selecting the correct engine per use case, your queries will run much more faster utilizing the mechanics behind each of the engine type, there are many more engine types which you can see at clickhouse docs but these are the main ones you will usually use.