HLL Sketch Functions

HyperLogLog (HLL) sketches provide fast, space-efficient cardinality estimation (count distinct). They are the most commonly used sketch type for approximate distinct counting.

Table of contents

  1. Aggregation Functions
    1. ds_hll_sketch
    2. ds_hll_union
  2. Scalar Functions
    1. ds_hll_estimate
    2. ds_hll_estimate_bounds
    3. ds_hll_merge
    4. ds_hll_stringify
  3. Accuracy

Aggregation Functions

ds_hll_sketch

Builds an HLL sketch from input values.

ds_hll_sketch(value) -> varbinary
ds_hll_sketch(value, lgK) -> varbinary
Parameter Type Description
value VARCHAR, BIGINT, or DOUBLE Values to count
lgK INTEGER Log2 of sketch size (4-21, default 12). Higher = more accurate, larger sketch

Example:

-- Basic usage
SELECT ds_hll_sketch(user_id) AS sketch FROM events;

-- With higher accuracy
SELECT ds_hll_sketch(user_id, 16) AS sketch FROM events;

ds_hll_union

Computes the union of multiple HLL sketches (aggregation).

ds_hll_union(sketch) -> varbinary
ds_hll_union(sketch, lgK) -> varbinary
Parameter Type Description
sketch VARBINARY HLL sketch to union
lgK INTEGER Log2 of result sketch size

Example:

-- Union daily sketches into a weekly estimate
SELECT ds_hll_estimate(ds_hll_union(user_sketch)) AS weekly_users
FROM daily_sketches
WHERE event_date BETWEEN DATE '2024-01-08' AND DATE '2024-01-14';

Scalar Functions

ds_hll_estimate

Returns the cardinality estimate from an HLL sketch.

ds_hll_estimate(sketch) -> double

Example:

SELECT ds_hll_estimate(ds_hll_sketch(user_id)) FROM events;
-- Returns: 9987.3

ds_hll_estimate_bounds

Returns the estimate with upper and lower error bounds.

ds_hll_estimate_bounds(sketch, numStdDevs) -> array(double)
Parameter Type Description
sketch VARBINARY HLL sketch
numStdDevs INTEGER Number of standard deviations (1, 2, or 3)

Returns an array of 3 doubles: [estimate, lower_bound, upper_bound].

Example:

SELECT ds_hll_estimate_bounds(ds_hll_sketch(user_id), 2) FROM events;
-- Returns: [9987.3, 9712.1, 10268.5]

ds_hll_merge

Pairwise merge of two HLL sketches (scalar function, not aggregation).

ds_hll_merge(sketch1, sketch2) -> varbinary

Returns NULL if both inputs are NULL. Returns the non-null sketch if one input is NULL.

Example:

SELECT ds_hll_estimate(ds_hll_merge(
    (SELECT ds_hll_sketch(user_id) FROM events WHERE region = 'US'),
    (SELECT ds_hll_sketch(user_id) FROM events WHERE region = 'EU')
));

ds_hll_stringify

Returns a human-readable string representation of the sketch.

ds_hll_stringify(sketch) -> varchar

Example:

SELECT ds_hll_stringify(ds_hll_sketch(user_id)) FROM events;

Accuracy

lgK Relative Error (68% confidence) Sketch Size
10 ~3.1% ~1 KB
12 (default) ~1.6% ~4 KB
14 ~0.8% ~16 KB
16 ~0.4% ~64 KB

Back to top

Trino DataSketches Plugin — Apache License 2.0