CPC Sketch Functions

Compressed Probabilistic Counting (CPC) sketches provide cardinality estimation with better accuracy-per-byte than HLL. They are a good choice when storage efficiency is critical.

Table of contents

  1. Aggregation Functions
    1. ds_cpc_sketch
    2. ds_cpc_union
  2. Scalar Functions
    1. ds_cpc_estimate
    2. ds_cpc_estimate_bounds
    3. ds_cpc_merge
    4. ds_cpc_stringify
  3. CPC vs HLL

Aggregation Functions

ds_cpc_sketch

Builds a CPC sketch from input values.

ds_cpc_sketch(value) -> varbinary
ds_cpc_sketch(value, lgK) -> varbinary
Parameter Type Description
value VARCHAR, BIGINT, or DOUBLE Values to count
lgK INTEGER Log2 of sketch size (4-26, default 11)

ds_cpc_union

Computes the union of multiple CPC sketches.

ds_cpc_union(sketch) -> varbinary
ds_cpc_union(sketch, lgK) -> varbinary

Scalar Functions

ds_cpc_estimate

Returns the cardinality estimate.

ds_cpc_estimate(sketch) -> double

ds_cpc_estimate_bounds

Returns estimate with error bounds.

ds_cpc_estimate_bounds(sketch, kappa) -> array(double)
Parameter Type Description
sketch VARBINARY CPC sketch
kappa INTEGER Number of standard deviations (1, 2, or 3)

Returns [estimate, lower_bound, upper_bound].

ds_cpc_merge

Pairwise merge of two CPC sketches.

ds_cpc_merge(sketch1, sketch2) -> varbinary

ds_cpc_stringify

Returns a human-readable string representation.

ds_cpc_stringify(sketch) -> varchar

CPC vs HLL

Property CPC HLL
Accuracy per byte Better Good
Merge speed Slower Faster
Set operations Union only Union only
Maturity Newer Well-established

Use CPC when you need the most compact sketches. Use HLL when merge performance matters or for compatibility with existing systems.


Back to top

Trino DataSketches Plugin — Apache License 2.0