Frequent Items Sketch Functions

Frequency sketches identify heavy hitters — items that appear frequently in a data stream. They estimate item frequencies and can enumerate the most frequent items.

Aggregation Functions
1. ds_freq_sketch
2. ds_freq_union
Scalar Functions
When to Use Frequency Sketches

Aggregation Functions

ds_freq_sketch

Builds a frequency sketch from string values.

ds_freq_sketch(value) -> varbinary
ds_freq_sketch(value, maxMapSize) -> varbinary

Parameter	Type	Description
`value`	`VARCHAR`	String values to track
`maxMapSize`	`INTEGER`	Maximum internal map size (default 64). Higher = tracks more items

ds_freq_union

Computes the union of multiple frequency sketches.

ds_freq_union(sketch) -> varbinary

Scalar Functions

ds_freq_estimate

Returns the estimated frequency of a specific item.

ds_freq_estimate(sketch, item) -> bigint

Example:

SELECT ds_freq_estimate(ds_freq_sketch(search_term), 'trino') AS trino_count
FROM searches;

ds_freq_lower_bound

Returns the lower bound on the frequency of an item.

ds_freq_lower_bound(sketch, item) -> bigint

ds_freq_upper_bound

Returns the upper bound on the frequency of an item.

ds_freq_upper_bound(sketch, item) -> bigint

The true frequency is always between ds_freq_lower_bound and ds_freq_upper_bound. Use lower bound when false positives are costly; use upper bound when false negatives are costly.

ds_freq_frequent_items

Returns the frequent items as a JSON array.

ds_freq_frequent_items(sketch, errorType) -> varchar

Parameter	Type	Description
`sketch`	`VARBINARY`	Frequency sketch
`errorType`	`VARCHAR`	`'NO_FALSE_POSITIVES'` or `'NO_FALSE_NEGATIVES'`

NO_FALSE_POSITIVES: every returned item is truly frequent (may miss some)
NO_FALSE_NEGATIVES: every truly frequent item is returned (may include extras)

Example:

SELECT ds_freq_frequent_items(ds_freq_sketch(search_term), 'NO_FALSE_POSITIVES')
FROM searches;
-- Returns: [{"item":"trino","estimate":15234,"lower":15200,"upper":15268}, ...]

ds_freq_stringify

ds_freq_stringify(sketch) -> varchar

When to Use Frequency Sketches

Top-K queries: find the most popular products, search terms, or error codes
Anomaly detection: track item frequencies over time, flag items with unusual spikes
Data quality: identify duplicate or over-represented values in a dataset

Frequent Items Sketch Functions

Table of contents

Aggregation Functions

ds_freq_sketch

ds_freq_union

Scalar Functions

ds_freq_estimate

ds_freq_lower_bound

ds_freq_upper_bound

ds_freq_frequent_items

ds_freq_stringify

When to Use Frequency Sketches