Theta Sketch Functions

Theta sketches provide cardinality estimation with full set operations: union, intersection, and set difference (A-NOT-B). Use Theta when you need to compute the overlap or difference between groups.

Table of contents

  1. Aggregation Functions
    1. ds_theta_sketch
    2. ds_theta_union
    3. ds_theta_intersect
  2. Scalar Functions
    1. ds_theta_estimate
    2. ds_theta_estimate_bounds
    3. ds_theta_merge
    4. ds_theta_intersection
    5. ds_theta_exclude
    6. ds_theta_similarity
    7. ds_theta_stringify
  3. Set Algebra

Aggregation Functions

ds_theta_sketch

Builds a Theta sketch from input values.

ds_theta_sketch(value) -> varbinary
ds_theta_sketch(value, nominalEntries) -> varbinary
Parameter Type Description
value VARCHAR, BIGINT, or DOUBLE Values to count
nominalEntries INTEGER Nominal entries (default 4096). Must be a power of 2

ds_theta_union

Computes the union of multiple Theta sketches.

ds_theta_union(sketch) -> varbinary

ds_theta_intersect

Computes the intersection of multiple Theta sketches (aggregation).

ds_theta_intersect(sketch) -> varbinary

Example:

-- Users who visited BOTH page A and page B
SELECT ds_theta_estimate(ds_theta_intersect(sketch)) FROM (
    SELECT ds_theta_sketch(user_id) AS sketch FROM visits WHERE page = 'A'
    UNION ALL
    SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'B'
);

Scalar Functions

ds_theta_estimate

Returns the cardinality estimate.

ds_theta_estimate(sketch) -> double

ds_theta_estimate_bounds

Returns estimate with error bounds.

ds_theta_estimate_bounds(sketch, numStdDevs) -> array(double)

Returns [estimate, lower_bound, upper_bound].

ds_theta_merge

Pairwise union of two Theta sketches.

ds_theta_merge(sketch1, sketch2) -> varbinary

ds_theta_intersection

Pairwise intersection of two Theta sketches.

ds_theta_intersection(sketch1, sketch2) -> varbinary

ds_theta_exclude

Set difference: returns items in sketchA that are NOT in sketchB.

ds_theta_exclude(sketchA, sketchB) -> varbinary

Example:

-- Users who visited page A but NOT page B
SELECT ds_theta_estimate(ds_theta_exclude(
    (SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'A'),
    (SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'B')
)) AS unique_to_a;

ds_theta_similarity

Computes Jaccard similarity between two sketches.

ds_theta_similarity(sketch1, sketch2) -> array(double)

Returns [lower_bound, estimate, upper_bound] where the estimate is between 0.0 (disjoint) and 1.0 (identical).

Example:

-- How similar are the user sets of two pages?
SELECT ds_theta_similarity(
    (SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'A'),
    (SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'B')
);
-- Returns: [0.28, 0.32, 0.36]  (32% overlap)

ds_theta_stringify

Returns a human-readable string representation.

ds_theta_stringify(sketch) -> varchar

Set Algebra

Theta sketches support the full set algebra. For sets A and B:

Operation Function Result
A ∪ B ds_theta_merge(A, B) All items in either set
A ∩ B ds_theta_intersection(A, B) Items in both sets
A − B ds_theta_exclude(A, B) Items only in A
J(A, B) ds_theta_similarity(A, B) Jaccard similarity

These satisfy the inclusion-exclusion identity: |A ∪ B| = |A| + |B| - |A ∩ B|


Back to top

Trino DataSketches Plugin — Apache License 2.0