GRRM (Generalized Randomized Response)
anon.ldp_grrm(value, epsilon, d) releases a categorical value under
ε-local DP. Each row independently flips a biased coin: with probability
it returns the true value,
otherwise a uniformly random other value from the domain. The
perturbation happens per row, at the trust boundary — either on insert
or on release.
GRRM is the categorical companion to the Adding Noise category in the masking-functions catalog, which only covers numeric and range data.
Use as a masking rule
Attach ldp_grrm to a categorical column with a security label, as
shown in Declare Masking Rules:
SECURITY LABEL FOR anon ON COLUMN responses.rating
IS 'MASKED WITH FUNCTION anon.ldp_grrm(rating, 1.0, 5)';
Apply via static masking or an anonymous dump:
SELECT anon.anonymize_table('public.responses');
See Security & limitations before applying this through dynamic masking.
When to use it
Use GRRM for a categorical column with a known, public domain: ratings, choices, bin indices, demographic codes. Categories are 1-indexed integers in . For numeric columns use Laplace.
GRRM works best for binary or low-cardinality domains. At fixed ε the truth-telling probability decreases as grows, and (uniform random output) in the limit. For wide categorical domains the one-hot variants usually recover the distribution more accurately at the same ε.
Call forms
anon.ldp_grrm (value int, epsilon float8, d int) -> int
anon.ldp_grrm_pttt(value int, pttt float8, d int) -> int
Both return a value in . ldp_grrm takes ε directly;
ldp_grrm_pttt takes the truth-telling probability
and back-solves ε. The two parameterizations
are interchangeable:
.
-- Inspect the function on a few rows:
SELECT rating,
anon.ldp_grrm (rating, 1.0, 5) AS noisy_at_eps,
anon.ldp_grrm_pttt(rating, 0.6, 5) AS noisy_at_pttt
FROM responses
LIMIT 5;
Parameter helpers
Pure post-processing, no privacy cost:
anon.ldp_truth_probability(epsilon float8, d int) -> float8 -- returns q
anon.ldp_lie_probability (epsilon float8, d int) -> float8 -- returns p
Useful when picking ε for a known , or when displaying the truth-telling rate to users.
Choosing parameters
d. Must be public and fixed before any release. Computing it asCOUNT(DISTINCT col)on the data leaks information about which categories are present.epsilon. Typical 0.5 to 3 for GRRM. ε=1.0 with gives ~46% truth-telling. Below ε≈0.5 the truth rate falls toward and signal recovery requires very large .pttt. Strictly between and . Common choices range 0.6 to 0.9. means uniform random output (no signal); means no privacy.
Estimating frequencies from GRRM output
A naïve GROUP BY on perturbed values is biased: every category absorbs
some of the noise from every other. The closed-form estimator inverts
the GRRM transition matrix.
Per-category, given the observed count in a sample of rows:
SELECT anon.ldp_frequency_estimate(
observed_count => COUNT(*) FILTER (WHERE noisy_rating = 3),
n => COUNT(*),
epsilon => 1.0,
d => 5
) AS unbiased_count
FROM responses_anonymized;
For the full distribution at once:
SELECT anon.ldp_correct_distribution(
counts => ARRAY[c1, c2, c3, c4, c5],
epsilon => 1.0,
d => 5
) AS unbiased_counts
FROM (
SELECT COUNT(*) FILTER (WHERE noisy_rating = 1) AS c1,
COUNT(*) FILTER (WHERE noisy_rating = 2) AS c2,
COUNT(*) FILTER (WHERE noisy_rating = 3) AS c3,
COUNT(*) FILTER (WHERE noisy_rating = 4) AS c4,
COUNT(*) FILTER (WHERE noisy_rating = 5) AS c5
FROM responses_anonymized
) s;
The estimate is unbiased but its variance grows as ε shrinks. For confidence intervals on a single category:
SELECT anon.ldp_ci_lower(observed_count, n, 1.0, 5, alpha => 0.05) AS lo,
anon.ldp_ci_upper(observed_count, n, 1.0, 5, alpha => 0.05) AS hi
FROM ...;
Security & limitations
- Averaging attack under dynamic masking. Each
ldp_grrmcall draws fresh noise, so a masked role that reads the same row times can tally the most common output to recover the truth once is moderate. The same caveat applies toanon.noise()and is documented in the Adding Noise section of the masking-functions catalog. Apply GRRM through static masking or anonymous dumps. Under dynamic masking, ε must be budgeted across all queries a role can issue against the column. dmust be public. Inferring from the data leaks information about which categories are present. Hardcode , or derive it from a public reference table of valid category codes.- Naïve aggregation is biased.
GROUP BYover GRRM output gives a count of perturbed labels, not true labels. Correct vialdp_frequency_estimateorldp_correct_distribution. - Low-cardinality joins can leak. Joining a GRRM-masked column to a table where each row has a unique key lets an attacker combine many noisy reads of the same underlying record. Mask both sides if the join key carries information.
The math
For a public domain of categories the truth probability is and the per-lie probability is . The mechanism samples and returns the truth if , otherwise a uniform random other category. The privacy proof: for any inputs and output , the ratio is bounded by .
Try it live
- /scenario/healthcare, /scenario/financial, /scenario/telemetry, /scenario/survey: Pre-Anonymized / On-the-Fly / Insert-and-Query workflows over GRRM.
- /correction: single-value estimate, full-distribution recovery, and mean-from-frequencies on real GRRM output.