Frequent Items Sketch Functions

Frequency sketches identify heavy hitters — items that appear frequently in a data stream. They estimate item frequencies and can enumerate the most frequent items.

Table of contents

  1. Aggregation Functions
    1. ds_freq_sketch
    2. ds_freq_union
  2. Scalar Functions
    1. ds_freq_estimate
    2. ds_freq_lower_bound
    3. ds_freq_upper_bound
    4. ds_freq_frequent_items
    5. ds_freq_stringify
  3. When to Use Frequency Sketches

Aggregation Functions

ds_freq_sketch

Builds a frequency sketch from string values.

ds_freq_sketch(value) -> varbinary
ds_freq_sketch(value, maxMapSize) -> varbinary
Parameter Type Description
value VARCHAR String values to track
maxMapSize INTEGER Maximum internal map size (default 64). Higher = tracks more items

ds_freq_union

Computes the union of multiple frequency sketches.

ds_freq_union(sketch) -> varbinary

Scalar Functions

ds_freq_estimate

Returns the estimated frequency of a specific item.

ds_freq_estimate(sketch, item) -> bigint

Example:

SELECT ds_freq_estimate(ds_freq_sketch(search_term), 'trino') AS trino_count
FROM searches;

ds_freq_lower_bound

Returns the lower bound on the frequency of an item.

ds_freq_lower_bound(sketch, item) -> bigint

ds_freq_upper_bound

Returns the upper bound on the frequency of an item.

ds_freq_upper_bound(sketch, item) -> bigint

The true frequency is always between ds_freq_lower_bound and ds_freq_upper_bound. Use lower bound when false positives are costly; use upper bound when false negatives are costly.

ds_freq_frequent_items

Returns the frequent items as a JSON array.

ds_freq_frequent_items(sketch, errorType) -> varchar
Parameter Type Description
sketch VARBINARY Frequency sketch
errorType VARCHAR 'NO_FALSE_POSITIVES' or 'NO_FALSE_NEGATIVES'
  • NO_FALSE_POSITIVES: every returned item is truly frequent (may miss some)
  • NO_FALSE_NEGATIVES: every truly frequent item is returned (may include extras)

Example:

SELECT ds_freq_frequent_items(ds_freq_sketch(search_term), 'NO_FALSE_POSITIVES')
FROM searches;
-- Returns: [{"item":"trino","estimate":15234,"lower":15200,"upper":15268}, ...]

ds_freq_stringify

ds_freq_stringify(sketch) -> varchar

When to Use Frequency Sketches

  • Top-K queries: find the most popular products, search terms, or error codes
  • Anomaly detection: track item frequencies over time, flag items with unusual spikes
  • Data quality: identify duplicate or over-represented values in a dataset

Back to top

Trino DataSketches Plugin — Apache License 2.0