KLL Sketch Functions
KLL (Karnin-Lang-Liberty) sketches provide quantile approximation — compute percentiles, ranks, CDF, and PMF over massive datasets in a single pass. This is the recommended quantile sketch.
Table of contents
Aggregation Functions
ds_kll_sketch
Builds a KLL sketch from numeric values.
ds_kll_sketch(value) -> varbinary
ds_kll_sketch(value, k) -> varbinary
| Parameter | Type | Description |
|---|---|---|
value | DOUBLE, REAL, or BIGINT | Numeric values |
k | INTEGER | Accuracy parameter (default 200). Higher = more accurate |
ds_kll_union
Computes the union of multiple KLL sketches.
ds_kll_union(sketch) -> varbinary
Scalar Functions
ds_kll_quantile
Returns the value at a given normalized rank (0.0 to 1.0).
ds_kll_quantile(sketch, rank) -> double
| Parameter | Type | Description |
|---|---|---|
sketch | VARBINARY | KLL sketch |
rank | DOUBLE | Normalized rank (0.0 = min, 0.5 = median, 1.0 = max) |
Example:
SELECT ds_kll_quantile(ds_kll_sketch(response_ms), 0.5) AS p50,
ds_kll_quantile(ds_kll_sketch(response_ms), 0.95) AS p95,
ds_kll_quantile(ds_kll_sketch(response_ms), 0.99) AS p99
FROM requests;
ds_kll_quantiles
Returns values at multiple normalized ranks.
ds_kll_quantiles(sketch, ranks) -> array(double)
| Parameter | Type | Description |
|---|---|---|
sketch | VARBINARY | KLL sketch |
ranks | ARRAY(DOUBLE) | Array of normalized ranks |
Example:
SELECT ds_kll_quantiles(ds_kll_sketch(response_ms), ARRAY[0.5, 0.95, 0.99])
FROM requests;
-- Returns: [42.1, 187.3, 512.8]
ds_kll_rank
Returns the normalized rank of a given value (inverse of quantile).
ds_kll_rank(sketch, value) -> double
Example:
-- What fraction of requests are under 100ms?
SELECT ds_kll_rank(ds_kll_sketch(response_ms), 100.0)
FROM requests;
-- Returns: 0.73 (73% of requests are under 100ms)
ds_kll_cdf
Cumulative distribution function. Returns cumulative probabilities at split points.
ds_kll_cdf(sketch, splitPoints) -> array(double)
Returns an array of length splitPoints.length + 1 with cumulative probabilities.
Example:
SELECT ds_kll_cdf(ds_kll_sketch(response_ms), ARRAY[50.0, 100.0, 200.0])
FROM requests;
-- Returns: [0.45, 0.73, 0.91, 1.0]
-- 45% < 50ms, 73% < 100ms, 91% < 200ms
ds_kll_pmf
Probability mass function. Returns the fraction of values in each bin.
ds_kll_pmf(sketch, splitPoints) -> array(double)
Returns an array of length splitPoints.length + 1 with bin probabilities that sum to 1.0.
Example:
SELECT ds_kll_pmf(ds_kll_sketch(response_ms), ARRAY[50.0, 100.0, 200.0])
FROM requests;
-- Returns: [0.45, 0.28, 0.18, 0.09]
-- 45% in [min,50), 28% in [50,100), 18% in [100,200), 9% in [200,max]
ds_kll_n
Returns the total number of values inserted into the sketch.
ds_kll_n(sketch) -> bigint
ds_kll_k
Returns the k parameter of the sketch.
ds_kll_k(sketch) -> bigint
ds_kll_stringify
Returns a human-readable string representation.
ds_kll_stringify(sketch) -> varchar
Accuracy
| k | Normalized Rank Error (99% confidence) |
|---|---|
| 100 | ~3.3% |
| 200 (default) | ~1.65% |
| 400 | ~0.83% |
| 800 | ~0.41% |
The rank error means: if you ask for the 95th percentile, the true percentile of the returned value is between 93.35% and 96.65% (with k=200 at 99% confidence).