Theta Sketch Functions
Theta sketches provide cardinality estimation with full set operations: union, intersection, and set difference (A-NOT-B). Use Theta when you need to compute the overlap or difference between groups.
Table of contents
Aggregation Functions
ds_theta_sketch
Builds a Theta sketch from input values.
ds_theta_sketch(value) -> varbinary
ds_theta_sketch(value, nominalEntries) -> varbinary
| Parameter | Type | Description |
|---|---|---|
value | VARCHAR, BIGINT, or DOUBLE | Values to count |
nominalEntries | INTEGER | Nominal entries (default 4096). Must be a power of 2 |
ds_theta_union
Computes the union of multiple Theta sketches.
ds_theta_union(sketch) -> varbinary
ds_theta_intersect
Computes the intersection of multiple Theta sketches (aggregation).
ds_theta_intersect(sketch) -> varbinary
Example:
-- Users who visited BOTH page A and page B
SELECT ds_theta_estimate(ds_theta_intersect(sketch)) FROM (
SELECT ds_theta_sketch(user_id) AS sketch FROM visits WHERE page = 'A'
UNION ALL
SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'B'
);
Scalar Functions
ds_theta_estimate
Returns the cardinality estimate.
ds_theta_estimate(sketch) -> double
ds_theta_estimate_bounds
Returns estimate with error bounds.
ds_theta_estimate_bounds(sketch, numStdDevs) -> array(double)
Returns [estimate, lower_bound, upper_bound].
ds_theta_merge
Pairwise union of two Theta sketches.
ds_theta_merge(sketch1, sketch2) -> varbinary
ds_theta_intersection
Pairwise intersection of two Theta sketches.
ds_theta_intersection(sketch1, sketch2) -> varbinary
ds_theta_exclude
Set difference: returns items in sketchA that are NOT in sketchB.
ds_theta_exclude(sketchA, sketchB) -> varbinary
Example:
-- Users who visited page A but NOT page B
SELECT ds_theta_estimate(ds_theta_exclude(
(SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'A'),
(SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'B')
)) AS unique_to_a;
ds_theta_similarity
Computes Jaccard similarity between two sketches.
ds_theta_similarity(sketch1, sketch2) -> array(double)
Returns [lower_bound, estimate, upper_bound] where the estimate is between 0.0 (disjoint) and 1.0 (identical).
Example:
-- How similar are the user sets of two pages?
SELECT ds_theta_similarity(
(SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'A'),
(SELECT ds_theta_sketch(user_id) FROM visits WHERE page = 'B')
);
-- Returns: [0.28, 0.32, 0.36] (32% overlap)
ds_theta_stringify
Returns a human-readable string representation.
ds_theta_stringify(sketch) -> varchar
Set Algebra
Theta sketches support the full set algebra. For sets A and B:
| Operation | Function | Result |
|---|---|---|
| A ∪ B | ds_theta_merge(A, B) | All items in either set |
| A ∩ B | ds_theta_intersection(A, B) | Items in both sets |
| A − B | ds_theta_exclude(A, B) | Items only in A |
| J(A, B) | ds_theta_similarity(A, B) | Jaccard similarity |
These satisfy the inclusion-exclusion identity: |A ∪ B| = |A| + |B| - |A ∩ B|