One-hot variants

anon.ldp_laplace_onehot and anon.ldp_gaussian_onehot release a categorical value as a noisy histogram bin vector rather than a single perturbed category. Each row outputs a float8[d] array — a noisy one-hot encoding of the true value — and summing those arrays across rows gives an unbiased histogram of the underlying distribution. Unlike GRRM, no debiasing post-processing is needed: the column sum is already the right quantity.

These variants extend the Adding Noise category of the masking-functions catalog with a vector-valued formulation.

When to use it

The one-hot variants are the right choice when the goal is a histogram or distribution over a small public domain dd. Each row contributes its noisy vector independently and the aggregator simply sums, so they suit streaming and federated pipelines.

Per-row scalar GRRM followed by frequency estimation gives the same kind of result with a different bias-variance trade-off, and is preferable when dd is large or storing float8[d] per row is too heavy.

Calling the function

The output type is float8[d], not the original column type, so one-hot variants do not fit a per-column SECURITY LABEL. Use them in a masking view or as ad-hoc queries.

-- Each row produces a noisy one-hot vector of length d:
SELECT user_id,
       anon.ldp_laplace_onehot(rating, 1.0, 5) AS noisy_vec
FROM   responses
LIMIT  5;

To get the noisy histogram, sum across rows position by position:

SELECT idx, SUM(value) AS noisy_count
FROM   responses,
       unnest(anon.ldp_laplace_onehot(rating, 1.0, 5))
       WITH ORDINALITY AS u(value, idx)
GROUP  BY idx
ORDER  BY idx;

The Gaussian variant takes a δ\delta in addition to ε:

SELECT idx, SUM(value) AS noisy_count
FROM   responses,
       unnest(anon.ldp_gaussian_onehot(rating, 1.0, 5, 1e-5))
       WITH ORDINALITY AS u(value, idx)
GROUP  BY idx
ORDER  BY idx;

Sensitivity and noise calibration

variantsensitivityper-bin noise
ldp_laplace_onehot(value, ε, d)L1=2L_1 = 2Lap(0,2/ε)\mathrm{Lap}(0, 2/\varepsilon)
ldp_gaussian_onehot(value, ε, d, δ)L2=2L_2 = \sqrt{2}N(0,σ2)\mathcal{N}(0, \sigma^2), σ=22ln(1.25/δ)/ε\sigma = \sqrt{2}\,\sqrt{2\ln(1.25/\delta)}/\varepsilon

The sensitivities are constants, independent of dd. Two coordinates change between neighboring datasets (one bit goes from 0 to 1, another from 1 to 0), giving L1=2L_1 = 2 and L2=2L_2 = \sqrt{2}.

Choosing parameters

  • d. Public domain size; same constraint as GRRM.
  • epsilon. Typical 0.5 to 2 for histogram release. Smaller ε produces wider per-bin error bars, which matter most for small nn and small bins.
  • delta (Gaussian). Same advice as Gaussian: 10510^{-5} is a common default; keep it cryptographically small.

Security & limitations

  • Averaging attack under dynamic masking. Each call produces fresh noise. The same caveat applies to anon.noise(), documented in Adding Noise. Apply through static masking or anonymous dumps. Under dynamic masking, budget ε across every query a role can issue.
  • d must be public. Inferring the domain from the data leaks information about which categories are present.
  • No debiasing on the sum. The histogram from summed one-hot output is already unbiased; do not re-apply frequency estimation on top of it.
  • Negative noisy counts. Per-bin counts can come out negative for rare bins. Clamping to 0\geq 0 is post-processing (still ε-DP / (ε,δ)(\varepsilon, \delta)-DP) but introduces a small upward bias on small bins.

The math

For a value v[1,d]v \in [1, d], the true one-hot encoding is ev{0,1}de_v \in \{0, 1\}^d. Each variant releases e~v=ev+Z\tilde{e}_v = e_v + Z where ZZ is i.i.d. noise. For Laplace, ZiLap(0,2/ε)Z_i \sim \mathrm{Lap}(0, 2/\varepsilon) gives ε-DP because evev1=2\|e_v - e_{v'}\|_1 = 2 for any vvv \ne v'. For Gaussian, ZiN(0,σ2)Z_i \sim \mathcal{N}(0, \sigma^2) with the σ above gives (ε,δ)(\varepsilon, \delta)-DP because evev2=2\|e_v - e_{v'}\|_2 = \sqrt{2}. Summing e~\tilde{e} over nn rows produces an unbiased estimate of the histogram with per-bin std error n2/ε2\sqrt{n} \cdot 2/\varepsilon \cdot \sqrt{2} for Laplace or nσ\sqrt{n}\,\sigma for Gaussian.

Try it live

  • /onehot-histogram: one-hot vs scalar GRRM histogram estimation, head-to-head, for both Laplace and Gaussian.