HLL Sketch Functions
HyperLogLog (HLL) sketches provide fast, space-efficient cardinality estimation (count distinct). They are the most commonly used sketch type for approximate distinct counting.
Table of contents
Aggregation Functions
ds_hll_sketch
Builds an HLL sketch from input values.
ds_hll_sketch(value) -> varbinary
ds_hll_sketch(value, lgK) -> varbinary
| Parameter | Type | Description |
|---|---|---|
value | VARCHAR, BIGINT, or DOUBLE | Values to count |
lgK | INTEGER | Log2 of sketch size (4-21, default 12). Higher = more accurate, larger sketch |
Example:
-- Basic usage
SELECT ds_hll_sketch(user_id) AS sketch FROM events;
-- With higher accuracy
SELECT ds_hll_sketch(user_id, 16) AS sketch FROM events;
ds_hll_union
Computes the union of multiple HLL sketches (aggregation).
ds_hll_union(sketch) -> varbinary
ds_hll_union(sketch, lgK) -> varbinary
| Parameter | Type | Description |
|---|---|---|
sketch | VARBINARY | HLL sketch to union |
lgK | INTEGER | Log2 of result sketch size |
Example:
-- Union daily sketches into a weekly estimate
SELECT ds_hll_estimate(ds_hll_union(user_sketch)) AS weekly_users
FROM daily_sketches
WHERE event_date BETWEEN DATE '2024-01-08' AND DATE '2024-01-14';
Scalar Functions
ds_hll_estimate
Returns the cardinality estimate from an HLL sketch.
ds_hll_estimate(sketch) -> double
Example:
SELECT ds_hll_estimate(ds_hll_sketch(user_id)) FROM events;
-- Returns: 9987.3
ds_hll_estimate_bounds
Returns the estimate with upper and lower error bounds.
ds_hll_estimate_bounds(sketch, numStdDevs) -> array(double)
| Parameter | Type | Description |
|---|---|---|
sketch | VARBINARY | HLL sketch |
numStdDevs | INTEGER | Number of standard deviations (1, 2, or 3) |
Returns an array of 3 doubles: [estimate, lower_bound, upper_bound].
Example:
SELECT ds_hll_estimate_bounds(ds_hll_sketch(user_id), 2) FROM events;
-- Returns: [9987.3, 9712.1, 10268.5]
ds_hll_merge
Pairwise merge of two HLL sketches (scalar function, not aggregation).
ds_hll_merge(sketch1, sketch2) -> varbinary
Returns NULL if both inputs are NULL. Returns the non-null sketch if one input is NULL.
Example:
SELECT ds_hll_estimate(ds_hll_merge(
(SELECT ds_hll_sketch(user_id) FROM events WHERE region = 'US'),
(SELECT ds_hll_sketch(user_id) FROM events WHERE region = 'EU')
));
ds_hll_stringify
Returns a human-readable string representation of the sketch.
ds_hll_stringify(sketch) -> varchar
Example:
SELECT ds_hll_stringify(ds_hll_sketch(user_id)) FROM events;
Accuracy
| lgK | Relative Error (68% confidence) | Sketch Size |
|---|---|---|
| 10 | ~3.1% | ~1 KB |
| 12 (default) | ~1.6% | ~4 KB |
| 14 | ~0.8% | ~16 KB |
| 16 | ~0.4% | ~64 KB |