Trino DataSketches Plugin

Apache DataSketches probabilistic data structures as Trino SQL functions.

Get Started View on GitHub


Overview

This plugin provides 65 SQL functions across 8 sketch families for Trino 479, enabling approximate computations on massive datasets:

Sketch Family Use Case
HLL Cardinality estimation (count distinct)
Theta Cardinality with set operations (union, intersect, exclude)
CPC Compact cardinality estimation
KLL Quantile approximation (percentiles, ranks, CDF, PMF)
Quantiles Classic quantile approximation (DoublesSketch)
Frequencies Frequent items / heavy hitters
Tuple ArrayOfDoubles Cardinality with associated numeric values
Tuple DoubleSummary Cardinality with summary statistics

Why Sketches?

Probabilistic data structures (sketches) let you compute approximate answers to queries like “count distinct” or “what’s the 99th percentile” in a single pass over the data, using a fraction of the memory that exact computation requires. They are:

  • Mergeable — pre-aggregate sketches, then combine them for any time range or dimension
  • Fast — single-pass, no sorting or shuffling needed
  • Compact — a sketch of 10 billion values fits in a few KB
  • Accurate — typical error is 1-3% for cardinality, tighter for quantiles

Compatibility

Component Version
Trino 479
datasketches-java 9.0.0
Java 21+

Back to top

Trino DataSketches Plugin — Apache License 2.0