Choose a mechanism

Two questions narrow it down. Match your answers against the recap table for the function name.

What are you releasing?

  • A value per row (one anonymized record used downstream as-is): you need a per-row mechanism. The output replaces the original value on each row.
  • An aggregate over many rows (a count, mean, histogram): two paths are possible. Apply a per-row mechanism and aggregate the noisy output, or release the aggregate directly with central DP. The second is tighter when available.

Who can see raw values?

  • Nobody. Every row must be noised before reaching any aggregator. Use the LDP form of the mechanism. Sensitivity is the full per-row range, so noise per row is large.
  • A central party can see raw values, and only the released statistic is public. Use the central form when one exists.

In the anon extension's setting the database itself is the curator. Central DP is available whenever the database can be trusted with raw values; LDP earns its keep when that trust does not hold (third-party storage, untrusted hosting, telemetry collected from independent endpoints).

What an LDP database looks like here

The scenario demos show three concrete LDP deployment modes on the same data:

  • Insert & Query — values are noised at insert time. Only the LDP-perturbed values are persisted; raw values never sit on disk. Fits a setup where clients perturb locally before submitting, or where the application layer wraps every INSERT in an LDP function call.
  • On-the-Fly Query — raw values are stored, but every read from a masked role applies the LDP function per row. The masked role never sees raw values; the role with SECURITY LABEL ... 'MASKED' triggers the substitution on each query. The dynamic-masking flavor.
  • Pre-Anonymized — raw values are noised once and the LDP output overwrites them in place via static masking. After the run, the table holds only perturbed values.

Open any of /scenario/healthcare, /scenario/financial, /scenario/telemetry, or /scenario/survey to compare the three modes side-by-side on the same scenario.

Recap

You're releasing…MechanismFunctionGuarantee
A categorical value per row (small dd)GRRManon.ldp_grrmε-LDP
A numeric value per rowLaplace, per-rowanon.ldp_laplaceε-LDP
A numeric value per row (tighter composition)Gaussian, per-rowanon.ldp_gaussian(ε, δ)-LDP
A histogram of a categorical columnOne-hot Laplaceanon.ldp_laplace_onehotε-LDP
A histogram of a categorical column (tighter composition)One-hot Gaussiananon.ldp_gaussian_onehot(ε, δ)-LDP
The mean of a numeric column (trusted curator)Laplace, centralanon.dp_laplace_avgε-DP

GRRM and one-hot rows give per-row releases. To turn a column of GRRM output into an unbiased histogram, follow up with frequency estimation. One-hot output is already unbiased on the column sum, no debiasing needed.

Decision tree

If none of the recap rows fits exactly, follow this.
  1. Can the database be trusted with raw values?
    • Yes: central DP. If you're releasing a mean, use dp_laplace_avg. For other aggregates, derive the global sensitivity Δ\Delta of the query and add Laplace noise with scale Δ/ε\Delta/\varepsilon manually.
    • No: LDP. Continue.
  2. Categorical or numeric column?
    • Categorical: GRRM for small dd (binary, low-cardinality codes); one-hot for any dd when the goal is a histogram and you want to skip the debiasing step.
    • Numeric: Laplace by default. Gaussian when you will compose many releases on the same column and want lighter tails, accepting a small δ failure probability.
  3. Do you also want a column-level masking rule?
    • Yes: attach a SECURITY LABEL to the column and apply through static masking or anonymous dumps. See each mechanism page for the exact syntax.
    • No: call the function ad-hoc from a SELECT to inspect output before committing.

What's next