Laplace mechanism

anon.ldp_laplace adds zero-mean Laplace noise to a numeric value, calibrated to the column's public range. The arguments lo and hi are the public lower and upper bounds you commit to for the column — anything the analyst could already say about the column without looking at the data (e.g., a rating column is in [1,5][1, 5], a wait-time column is in [0,600][0, 600] seconds). They drive sensitivity, so tighter bounds give less noise. Each call gives an ε-DP release with no δ\delta failure parameter. anon.dp_laplace_avg is the central-DP counterpart, applied to a mean rather than per-row.

Both extend the Adding Noise category in the masking-functions catalog.

Use as a masking rule

Attach ldp_laplace to a numeric column with a security label, as shown in Declare Masking Rules:

SECURITY LABEL FOR anon ON COLUMN responses.rating
  IS 'MASKED WITH FUNCTION anon.ldp_laplace(rating, 1.0, 1, 5)';

SECURITY LABEL FOR anon ON COLUMN responses.wait_seconds
  IS 'MASKED WITH FUNCTION anon.ldp_laplace(wait_seconds, 1.0, 0, 600)';

Apply via static masking or an anonymous dump:

SELECT anon.anonymize_table('public.responses');

The original values are overwritten in place with fresh Laplace draws. See Security & limitations before applying this through dynamic masking.

When to use it

Laplace fits numeric columns: ratings, counts, prices, durations, any value bounded by a public range. Use it when you need pure ε-DP without a δ\delta failure parameter. For categorical columns use GRRM.

The same noise distribution covers two very different setups. Choose the per-row form when raw values can't be entrusted to a central party — rows are noised independently before any aggregation. Choose dp_laplace_avg when a curator can see raw values and only the released aggregate is public; the noise is added once, to the aggregate, and is roughly n\sqrt{n} tighter at the same ε. See Central DP for means below.

Per-row LDP

Per-row LDP applies the noise to each row at the trust boundary — either on insert (the database never stores raw values) or on release (raw values are stored but masked on each read). Sensitivity is the full public range, so the noise per row is large.

-- Inspect the function on a few rows:
SELECT anon.ldp_laplace(rating, 1.0, 1, 5) AS noisy_rating
FROM   responses
LIMIT  5;
callsensitivityscale bb
anon.ldp_laplace(value, ε, lo, hi)hilo\text{hi} - \text{lo}(hilo)/ε(\text{hi} - \text{lo})/\varepsilon

The default behavior returns the raw noisy value. Pass clamp => true to round and clip into [lo,hi][\text{lo}, \text{hi}]. Clamping helps when the output feeds a typed column with a check constraint, but it biases values near the boundary.

SELECT anon.ldp_laplace(rating, 1.0, 1, 5, clamp => true)
FROM   responses;

Central DP for means

dp_laplace_avg is a first prototype of the central-DP Laplace mechanism specialized to one query: the arithmetic mean of a bounded numeric column. Laplace generalizes to any query for which you can bound a global sensitivity Δ\Delta — counts, sums, quantiles, regression coefficients. The mechanics are always the same (add Lap(0,Δ/ε)\mathrm{Lap}(0, \Delta/\varepsilon)); the hard part is deriving Δ\Delta for the specific query. dp_laplace_avg does that derivation for the mean and hands you the result.

Sensitivity of a mean over nn rows is (hilo)/n(\text{hi}-\text{lo})/n, so the scale is nn times smaller than per-row LDP at the same ε.

SELECT anon.dp_laplace_avg(
         AVG(wait_seconds)::float8,
         0.5,                       -- epsilon
         0, 600,                    -- public range
         COUNT(*)::int              -- public n
       ) AS private_mean
FROM   responses;

For n=10000n = 10\,000, ε=0.5, range [0,600][0, 600] the scale is 0.120.12.

dp_laplace_avg operates on an aggregate, not a column, so it does not fit a per-column SECURITY LABEL. Wrap it in a masking view to publish a private aggregate.

If the row count is itself sensitive, pass a public lower bound n_min instead of COUNT(*):

SELECT anon.dp_laplace_avg(AVG(wait_seconds)::float8, 0.5, 0, 600,
                           n_min => 1000)
FROM   responses;

Per-row LDP vs central DP

Both paths land on the same mean and spend the same ε. The noise scale is not the same.

ApproachPer-call scale bbStd error of the released mean
Per-row LDP, then average(hilo)/ε=1200(\text{hi}-\text{lo})/\varepsilon = 1200b2/n17b\sqrt{2/n} \approx 17
Central DP on the mean(hilo)/(nε)=0.12(\text{hi}-\text{lo})/(n\varepsilon) = 0.12b20.17b\sqrt{2} \approx 0.17

Numbers are for n=10000n=10\,000, ε=0.5, range [0,600][0, 600]. Central DP is n\sqrt{n} tighter on the mean; the per-row path pays for not trusting an intermediate aggregator.

/amount-mean runs both paths on the same data at the same ε.

Choosing parameters

  • epsilon. Typical 0.1 to 1.0 for per-row LDP. ε=1.0 is reasonable for a one-shot central mean. Lower ε means heavier noise.
  • lo, hi. Must be public. MIN(col) and MAX(col) are queries on the data and leak privacy. Hardcode the bounds, or store them in a public reference table. Tighter bounds give less noise.
  • clamp. Off by default. Turn it on when downstream expects values inside [lo,hi][\text{lo}, \text{hi}].
  • n / n_min. Pass COUNT(*) if the row count is public, n_min if it is sensitive.

Security & limitations

  • Averaging attack under dynamic masking. Each ldp_laplace call draws fresh noise, so reading the same row kk times reconstructs the true value with std error b/kb/\sqrt{k}. The same caveat applies to anon.noise() and is documented in the Adding Noise section of the masking-functions catalog. Apply ldp_laplace through static masking or anonymous dumps. Under dynamic masking, ε has to be budgeted across every query a single role issues against the column.
  • Bounds must be public. MIN(col) and MAX(col) are queries on the data, not bounds. Hardcode lo, hi, or put them in a public reference table.
  • Row count can leak. dp_laplace_avg uses n to compute scale (hi-lo)/(n·ε). Passing the true COUNT(*) makes the scale a public function of n. If n is itself sensitive, pass a public lower bound n_min instead — the released mean is still ε-DP, slightly noisier.
  • Averaging LDP output is not central DP. AVG() over per-row Laplace outputs is consistent for the true mean, but the variance is n\sqrt{n} worse than central DP. Use dp_laplace_avg if the trust model allows it.
  • Memoization. PostgreSQL may memoize calls with identical arguments and return the same "random" output across rows. If you see identical noisy values where they should differ, run SET LOCAL enable_memoize = off; before the SELECT.

The math

Sample noise from Lap(0,b)\mathrm{Lap}(0, b) with b=Δf/εb = \Delta f / \varepsilon and return f(D)+noisef(D) + \mathrm{noise}. The Laplace density 12bex/b\frac{1}{2b} e^{-|x|/b} has mean 00, variance 2b22b^2, and standard deviation 2b\sqrt{2}\,b. The privacy proof is a ratio-of-densities argument (see Concepts): two neighboring datasets shift the density's center by at most Δf\Delta f, so the ratio at any output is bounded by eεe^{\varepsilon}.

Try it live

  • /amount-mean: per-row ldp_laplace (with and without clamping) compared against dp_laplace_avg on the same data at the same ε.