GRRM (Generalized Randomized Response)

anon.ldp_grrm(value, epsilon, d) releases a categorical value under ε-local DP. Each row independently flips a biased coin: with probability q=eε/(eε+d1)q = e^{\varepsilon}/(e^{\varepsilon}+d-1) it returns the true value, otherwise a uniformly random other value from the domain. The perturbation happens per row, at the trust boundary — either on insert or on release.

GRRM is the categorical companion to the Adding Noise category in the masking-functions catalog, which only covers numeric and range data.

Use as a masking rule

Attach ldp_grrm to a categorical column with a security label, as shown in Declare Masking Rules:

SECURITY LABEL FOR anon ON COLUMN responses.rating
  IS 'MASKED WITH FUNCTION anon.ldp_grrm(rating, 1.0, 5)';

Apply via static masking or an anonymous dump:

SELECT anon.anonymize_table('public.responses');

See Security & limitations before applying this through dynamic masking.

When to use it

Use GRRM for a categorical column with a known, public domain: ratings, choices, bin indices, demographic codes. Categories are 1-indexed integers in [1,d][1, d]. For numeric columns use Laplace.

GRRM works best for binary or low-cardinality domains. At fixed ε the truth-telling probability q=eε/(eε+d1)q = e^{\varepsilon}/(e^{\varepsilon}+d-1) decreases as dd grows, and q1/dq \to 1/d (uniform random output) in the limit. For wide categorical domains the one-hot variants usually recover the distribution more accurately at the same ε.

Call forms

anon.ldp_grrm     (value int, epsilon float8, d int) -> int
anon.ldp_grrm_pttt(value int, pttt    float8, d int) -> int

Both return a value in [1,d][1, d]. ldp_grrm takes ε directly; ldp_grrm_pttt takes the truth-telling probability pttt(1/d,1)\text{pttt} \in (1/d, 1) and back-solves ε. The two parameterizations are interchangeable: ε=ln((d1)pttt/(1pttt))\varepsilon = \ln((d-1)\,\text{pttt} / (1 - \text{pttt})).

-- Inspect the function on a few rows:
SELECT rating,
       anon.ldp_grrm     (rating, 1.0, 5) AS noisy_at_eps,
       anon.ldp_grrm_pttt(rating, 0.6, 5) AS noisy_at_pttt
FROM   responses
LIMIT  5;

Parameter helpers

Pure post-processing, no privacy cost:

anon.ldp_truth_probability(epsilon float8, d int) -> float8   -- returns q
anon.ldp_lie_probability  (epsilon float8, d int) -> float8   -- returns p

Useful when picking ε for a known dd, or when displaying the truth-telling rate to users.

Choosing parameters

  • d. Must be public and fixed before any release. Computing it as COUNT(DISTINCT col) on the data leaks information about which categories are present.
  • epsilon. Typical 0.5 to 3 for GRRM. ε=1.0 with d=5d=5 gives ~46% truth-telling. Below ε≈0.5 the truth rate falls toward 1/d1/d and signal recovery requires very large nn.
  • pttt. Strictly between 1/d1/d and 11. Common choices range 0.6 to 0.9. pttt=1/d\text{pttt} = 1/d means uniform random output (no signal); pttt=1\text{pttt}=1 means no privacy.

Estimating frequencies from GRRM output

A naïve GROUP BY on perturbed values is biased: every category absorbs some of the noise from every other. The closed-form estimator inverts the GRRM transition matrix.

Per-category, given the observed count in a sample of nn rows:

SELECT anon.ldp_frequency_estimate(
         observed_count => COUNT(*) FILTER (WHERE noisy_rating = 3),
         n              => COUNT(*),
         epsilon        => 1.0,
         d              => 5
       ) AS unbiased_count
FROM   responses_anonymized;

For the full distribution at once:

SELECT anon.ldp_correct_distribution(
         counts  => ARRAY[c1, c2, c3, c4, c5],
         epsilon => 1.0,
         d       => 5
       ) AS unbiased_counts
FROM (
  SELECT COUNT(*) FILTER (WHERE noisy_rating = 1) AS c1,
         COUNT(*) FILTER (WHERE noisy_rating = 2) AS c2,
         COUNT(*) FILTER (WHERE noisy_rating = 3) AS c3,
         COUNT(*) FILTER (WHERE noisy_rating = 4) AS c4,
         COUNT(*) FILTER (WHERE noisy_rating = 5) AS c5
  FROM   responses_anonymized
) s;

The estimate is unbiased but its variance grows as ε shrinks. For confidence intervals on a single category:

SELECT anon.ldp_ci_lower(observed_count, n, 1.0, 5, alpha => 0.05) AS lo,
       anon.ldp_ci_upper(observed_count, n, 1.0, 5, alpha => 0.05) AS hi
FROM   ...;

Security & limitations

  • Averaging attack under dynamic masking. Each ldp_grrm call draws fresh noise, so a masked role that reads the same row kk times can tally the most common output to recover the truth once kk is moderate. The same caveat applies to anon.noise() and is documented in the Adding Noise section of the masking-functions catalog. Apply GRRM through static masking or anonymous dumps. Under dynamic masking, ε must be budgeted across all queries a role can issue against the column.
  • d must be public. Inferring dd from the data leaks information about which categories are present. Hardcode dd, or derive it from a public reference table of valid category codes.
  • Naïve aggregation is biased. GROUP BY over GRRM output gives a count of perturbed labels, not true labels. Correct via ldp_frequency_estimate or ldp_correct_distribution.
  • Low-cardinality joins can leak. Joining a GRRM-masked column to a table where each row has a unique key lets an attacker combine many noisy reads of the same underlying record. Mask both sides if the join key carries information.

The math

For a public domain of dd categories the truth probability is q=eε/(eε+d1)q = e^{\varepsilon}/(e^{\varepsilon} + d - 1) and the per-lie probability is p=(1q)/(d1)p = (1-q)/(d-1). The mechanism samples uUniform[0,1)u \sim \text{Uniform}[0, 1) and returns the truth if u<qu < q, otherwise a uniform random other category. The privacy proof: for any inputs v,vv, v' and output yy, the ratio Pr[M(v)=y]/Pr[M(v)=y]\Pr[M(v)=y]/\Pr[M(v')=y] is bounded by q/p=eεq/p = e^{\varepsilon}.

Try it live