Overview of MergeTree-based engines and their use cases in ClickHouse
MergeTree
MergeTree
is the foundational engine in ClickHouse, optimized for high-volume data ingestion and fast analytical queries. It supports background merging of data parts, primary key indexing, and partitioning.
Key Features:
web_events
, page_views
)ReplacingMergeTree
ReplacingMergeTree
extends MergeTree
by adding deduplication capabilities during background merges. It is useful when duplicate or outdated rows are expected, such as from retries or updates.
Key Features:
ORDER BY
key during mergesversion
column is specified, retains only the most recent versionDeduplication works on the sorting key, not the primary key. Requires correct ordering and optional versioning to work effectively.
AggregatingMergeTree
AggregatingMergeTree
is designed to store pre-aggregated data using aggregate function states. It is tailored for analytics workloads where aggregation results are computed once and queried frequently.
Key Features:
sumState()
, uniqState()
, etc.sumMerge()
)️ Not suitable for raw event storage. Must use aggregate functions when inserting data.
Use Case | Recommended Engine | Rationale |
---|---|---|
Raw event ingestion | MergeTree | Fast inserts, no deduplication, full event fidelity |
Deduplicated or updated records | ReplacingMergeTree | Handles retries and corrections via background deduplication |
Pre-aggregated summaries | AggregatingMergeTree | Efficient storage and querying of aggregate function states |