Gaussian mechanism

anon.ldp_gaussian(value, epsilon, lo, hi, delta) adds zero-mean Gaussian noise to a numeric value, calibrated to the column's public range and the target (ε,δ)(\varepsilon, \delta) guarantee. As with Laplace, lo and hi are the public lower and upper bounds you commit to for the column — defaults the analyst could already state without looking at the data. They drive sensitivity, so tighter bounds give less noise. Each call gives an (ε,δ)(\varepsilon, \delta)-DP release. Gaussian noise has lighter (sub-Gaussian) tails than Laplace and composes more tightly under repeated releases, at the cost of a small δ\delta failure probability.

The Gaussian mechanism extends the Adding Noise category of the masking-functions catalog.

Use as a masking rule

Attach ldp_gaussian to a numeric column with a security label, as shown in Declare Masking Rules:

SECURITY LABEL FOR anon ON COLUMN responses.rating
  IS 'MASKED WITH FUNCTION anon.ldp_gaussian(rating, 1.0, 1, 5, 1e-5)';

Apply via static masking or an anonymous dump:

SELECT anon.anonymize_table('public.responses');

See Security & limitations before applying this through dynamic masking.

When to use it

Gaussian fits the same numeric columns as Laplace and produces (ε,δ)(\varepsilon, \delta)-DP releases instead of pure ε-DP. The trade-off: δ\delta is a small failure probability, but Gaussian noise is sub-Gaussian (lighter tails than Laplace) and composes more tightly under repeated releases. Vector-output mechanisms like the one-hot variants calibrate naturally to L2 sensitivity, which Gaussian uses. For pure ε-DP without δ\delta, use Laplace. For categorical values, use GRRM.

Per-row LDP

-- Inspect the function on a few rows:
SELECT anon.ldp_gaussian(wait_seconds, 1.0, 0, 600, 1e-5) AS noisy_wait
FROM   responses
LIMIT  5;
callsensitivityscale σ\sigma
anon.ldp_gaussian(value, ε, lo, hi, δ)hilo\text{hi} - \text{lo}(hilo)2ln(1.25/δ)/ε(\text{hi} - \text{lo}) \cdot \sqrt{2 \ln(1.25/\delta)} / \varepsilon

ldp_gaussian returns the raw noisy value. Pass clamp => true to round and clip into [lo,hi][\text{lo}, \text{hi}]. Clamping is post-processing (still (ε,δ)(\varepsilon, \delta)-DP) but biases values near the boundary.

Parameter helper

anon.ldp_gaussian_sigma(epsilon float8, lo float8, hi float8, delta float8) -> float8

Returns the standard deviation of the noise the mechanism will draw. Pure post-processing, no privacy cost.

Choosing parameters

  • epsilon. Same range as Laplace: 0.1 to 1.0 typical. Lower means heavier noise.
  • delta. Small and on the order of 1/n1+c1/n^{1+c} for some c>0c > 0. δ=105\delta = 10^{-5} is a common default for nn around 10410^4 to 10610^6. δ\delta is the probability the privacy guarantee fails entirely, so it has to be cryptographically small relative to the dataset size.
  • lo, hi. Public bounds; same constraint as Laplace. Tighter bounds give less noise.
  • clamp. Off by default. Turn it on when downstream expects values inside [lo,hi][\text{lo}, \text{hi}].

Security & limitations

  • Averaging attack under dynamic masking. Each ldp_gaussian call draws fresh noise, so reading the same row kk times reconstructs the true value with std error σ/k\sigma/\sqrt{k}. The same caveat applies to anon.noise() and is documented in the Adding Noise section of the masking-functions catalog. Apply ldp_gaussian through static masking or anonymous dumps. Under dynamic masking, ε has to be budgeted across every query a single role issues against the column.
  • Bounds must be public. MIN(col) and MAX(col) are queries on the data, not bounds. Hardcode lo, hi, or put them in a public reference table.
  • delta is a failure probability, not a knob. δ=0.01\delta = 0.01 is not a stronger or "tunable" version of δ=105\delta = 10^{-5}; it means a 1% chance the privacy guarantee fails entirely. Keep δ\delta cryptographically small relative to nn.
  • Memoization. PostgreSQL may memoize calls with identical arguments and return the same "random" output across rows. If you see identical noisy values where they should differ, run SET LOCAL enable_memoize = off; before the SELECT.

The math

For a query f(D)f(D) with sensitivity Δf\Delta f, the Gaussian mechanism returns f(D)+N(0,σ2)f(D) + \mathcal{N}(0, \sigma^2) with σ=Δf2ln(1.25/δ)/ε\sigma = \Delta f \cdot \sqrt{2 \ln(1.25/\delta)} / \varepsilon. The result is (ε,δ)(\varepsilon, \delta)-DP for any ε(0,1),δ(0,1)\varepsilon \in (0, 1), \delta \in (0, 1). Tighter bounds on σ\sigma hold when ε > 1 or under the analytic Gaussian mechanism; the formula above is the standard textbook bound and what ldp_gaussian implements.

Try it live

  • /onehot-histogram: histogram estimation comparing Gaussian one-hot against scalar Gaussian on the same data.