One-hot variants

anon.ldp_laplace_onehot and anon.ldp_gaussian_onehot release a categorical value as a noisy histogram bin vector rather than a single perturbed category. Each row outputs a float8[d] array — a noisy one-hot encoding of the true value — and summing those arrays across rows gives an unbiased histogram of the underlying distribution. Unlike GRRM, no debiasing post-processing is needed: the column sum is already the right quantity.

These variants extend the Adding Noise category of the masking-functions catalog with a vector-valued formulation.

When to use it

The one-hot variants are the right choice when the goal is a histogram or distribution over a small public domain $d$ . Each row contributes its noisy vector independently and the aggregator simply sums, so they suit streaming and federated pipelines.

Per-row scalar GRRM followed by frequency estimation gives the same kind of result with a different bias-variance trade-off, and is preferable when $d$ is large or storing float8[d] per row is too heavy.

Calling the function

The output type is float8[d], not the original column type, so one-hot variants do not fit a per-column SECURITY LABEL. Use them in a masking view or as ad-hoc queries.

-- Each row produces a noisy one-hot vector of length d:
SELECT user_id,
       anon.ldp_laplace_onehot(rating, 1.0, 5) AS noisy_vec
FROM   responses
LIMIT  5;

To get the noisy histogram, sum across rows position by position:

SELECT idx, SUM(value) AS noisy_count
FROM   responses,
       unnest(anon.ldp_laplace_onehot(rating, 1.0, 5))
       WITH ORDINALITY AS u(value, idx)
GROUP  BY idx
ORDER  BY idx;

The Gaussian variant takes a $\delta$ in addition to ε:

SELECT idx, SUM(value) AS noisy_count
FROM   responses,
       unnest(anon.ldp_gaussian_onehot(rating, 1.0, 5, 1e-5))
       WITH ORDINALITY AS u(value, idx)
GROUP  BY idx
ORDER  BY idx;

Sensitivity and noise calibration

variant	sensitivity	per-bin noise
`ldp_laplace_onehot(value, ε, d)`	$L_1 = 2$	$\mathrm{Lap}(0, 2/\varepsilon)$
`ldp_gaussian_onehot(value, ε, d, δ)`	$L_2 = \sqrt{2}$	$\mathcal{N}(0, \sigma^2)$ , $\sigma = \sqrt{2}\,\sqrt{2\ln(1.25/\delta)}/\varepsilon$

The sensitivities are constants, independent of $d$ . Two coordinates change between neighboring datasets (one bit goes from 0 to 1, another from 1 to 0), giving $L_1 = 2$ and $L_2 = \sqrt{2}$ .

Choosing parameters

d. Public domain size; same constraint as GRRM.
epsilon. Typical 0.5 to 2 for histogram release. Smaller ε produces wider per-bin error bars, which matter most for small $n$ and small bins.
delta (Gaussian). Same advice as Gaussian: $10^{-5}$ is a common default; keep it cryptographically small.

Security & limitations

Averaging attack under dynamic masking. Each call produces fresh noise. The same caveat applies to anon.noise(), documented in Adding Noise. Apply through static masking or anonymous dumps. Under dynamic masking, budget ε across every query a role can issue.
d must be public. Inferring the domain from the data leaks information about which categories are present.
No debiasing on the sum. The histogram from summed one-hot output is already unbiased; do not re-apply frequency estimation on top of it.
Negative noisy counts. Per-bin counts can come out negative for rare bins. Clamping to $\geq 0$ is post-processing (still ε-DP / $(\varepsilon, \delta)$ -DP) but introduces a small upward bias on small bins.

The math

For a value $v \in [1, d]$ , the true one-hot encoding is $e_v \in \{0, 1\}^d$ . Each variant releases $\tilde{e}_v = e_v + Z$ where $Z$ is i.i.d. noise. For Laplace, $Z_i \sim \mathrm{Lap}(0, 2/\varepsilon)$ gives ε-DP because $\|e_v - e_{v'}\|_1 = 2$ for any $v \ne v'$ . For Gaussian, $Z_i \sim \mathcal{N}(0, \sigma^2)$ with the σ above gives $(\varepsilon, \delta)$ -DP because $\|e_v - e_{v'}\|_2 = \sqrt{2}$ . Summing $\tilde{e}$ over $n$ rows produces an unbiased estimate of the histogram with per-bin std error $\sqrt{n} \cdot 2/\varepsilon \cdot \sqrt{2}$ for Laplace or $\sqrt{n}\,\sigma$ for Gaussian.

Try it live

/onehot-histogram: one-hot vs scalar GRRM histogram estimation, head-to-head, for both Laplace and Gaussian.