KLL Sketch Functions

KLL (Karnin-Lang-Liberty) sketches provide quantile approximation — compute percentiles, ranks, CDF, and PMF over massive datasets in a single pass. This is the recommended quantile sketch.

Table of contents

  1. Aggregation Functions
    1. ds_kll_sketch
    2. ds_kll_union
  2. Scalar Functions
    1. ds_kll_quantile
    2. ds_kll_quantiles
    3. ds_kll_rank
    4. ds_kll_cdf
    5. ds_kll_pmf
    6. ds_kll_n
    7. ds_kll_k
    8. ds_kll_stringify
  3. Accuracy

Aggregation Functions

ds_kll_sketch

Builds a KLL sketch from numeric values.

ds_kll_sketch(value) -> varbinary
ds_kll_sketch(value, k) -> varbinary
Parameter Type Description
value DOUBLE, REAL, or BIGINT Numeric values
k INTEGER Accuracy parameter (default 200). Higher = more accurate

ds_kll_union

Computes the union of multiple KLL sketches.

ds_kll_union(sketch) -> varbinary

Scalar Functions

ds_kll_quantile

Returns the value at a given normalized rank (0.0 to 1.0).

ds_kll_quantile(sketch, rank) -> double
Parameter Type Description
sketch VARBINARY KLL sketch
rank DOUBLE Normalized rank (0.0 = min, 0.5 = median, 1.0 = max)

Example:

SELECT ds_kll_quantile(ds_kll_sketch(response_ms), 0.5)  AS p50,
       ds_kll_quantile(ds_kll_sketch(response_ms), 0.95) AS p95,
       ds_kll_quantile(ds_kll_sketch(response_ms), 0.99) AS p99
FROM requests;

ds_kll_quantiles

Returns values at multiple normalized ranks.

ds_kll_quantiles(sketch, ranks) -> array(double)
Parameter Type Description
sketch VARBINARY KLL sketch
ranks ARRAY(DOUBLE) Array of normalized ranks

Example:

SELECT ds_kll_quantiles(ds_kll_sketch(response_ms), ARRAY[0.5, 0.95, 0.99])
FROM requests;
-- Returns: [42.1, 187.3, 512.8]

ds_kll_rank

Returns the normalized rank of a given value (inverse of quantile).

ds_kll_rank(sketch, value) -> double

Example:

-- What fraction of requests are under 100ms?
SELECT ds_kll_rank(ds_kll_sketch(response_ms), 100.0)
FROM requests;
-- Returns: 0.73 (73% of requests are under 100ms)

ds_kll_cdf

Cumulative distribution function. Returns cumulative probabilities at split points.

ds_kll_cdf(sketch, splitPoints) -> array(double)

Returns an array of length splitPoints.length + 1 with cumulative probabilities.

Example:

SELECT ds_kll_cdf(ds_kll_sketch(response_ms), ARRAY[50.0, 100.0, 200.0])
FROM requests;
-- Returns: [0.45, 0.73, 0.91, 1.0]
-- 45% < 50ms, 73% < 100ms, 91% < 200ms

ds_kll_pmf

Probability mass function. Returns the fraction of values in each bin.

ds_kll_pmf(sketch, splitPoints) -> array(double)

Returns an array of length splitPoints.length + 1 with bin probabilities that sum to 1.0.

Example:

SELECT ds_kll_pmf(ds_kll_sketch(response_ms), ARRAY[50.0, 100.0, 200.0])
FROM requests;
-- Returns: [0.45, 0.28, 0.18, 0.09]
-- 45% in [min,50), 28% in [50,100), 18% in [100,200), 9% in [200,max]

ds_kll_n

Returns the total number of values inserted into the sketch.

ds_kll_n(sketch) -> bigint

ds_kll_k

Returns the k parameter of the sketch.

ds_kll_k(sketch) -> bigint

ds_kll_stringify

Returns a human-readable string representation.

ds_kll_stringify(sketch) -> varchar

Accuracy

k Normalized Rank Error (99% confidence)
100 ~3.3%
200 (default) ~1.65%
400 ~0.83%
800 ~0.41%

The rank error means: if you ask for the 95th percentile, the true percentile of the returned value is between 93.35% and 96.65% (with k=200 at 99% confidence).


Back to top

Trino DataSketches Plugin — Apache License 2.0