Quick start

From CREATE EXTENSION anon to your first private query in five minutes.

Prerequisite

Install PostgreSQL Anonymizer following the official installation guide. The LDP and DP mechanisms covered in this tutorial ship as part of the same extension. Once anon is installed, every ldp_* and dp_* function documented here is available in the anon schema.

Enable in your database

CREATE EXTENSION anon;
SELECT anon.init();

Verify the LDP/DP functions are available:

\df anon.ldp_*
\df anon.dp_*

Your first private queries

Examples assume a hypothetical table:

CREATE TABLE responses (
  user_id      bigint PRIMARY KEY,
  rating       int,        -- 1..5
  wait_seconds float8      -- public range 0..600
);

1 — Categorical: GRRM on a single value

SELECT user_id,
       anon.ldp_grrm(rating, 1.0, 5) AS noisy_rating
FROM   responses;

Each call returns the true rating with probability q=e1/(e1+4)0.46q = e^{1}/(e^{1} + 4) \approx 0.46, otherwise a uniformly random other category.

2 — Numeric per-row: Laplace

SELECT user_id,
       anon.ldp_laplace(wait_seconds, 0.5, 0, 600) AS noisy_wait
FROM   responses;

Per-call scale is (hilo)/ε=1200(\text{hi}-\text{lo})/\varepsilon = 1200. The noise dwarfs the signal on any single row. That is the LDP cost of not trusting the aggregator.

3 — Numeric aggregate: central DP on the mean

SELECT anon.dp_laplace_avg(
         AVG(wait_seconds)::float8,
         0.5,                       -- epsilon
         0, 600,                    -- public range
         COUNT(*)::int              -- public n
       ) AS private_mean
FROM   responses;

Sensitivity is (hilo)/n(\text{hi}-\text{lo})/n. For n=10000n = 10\,000 that's a scale of 600/(100000.5)=0.12600/(10\,000 \cdot 0.5) = 0.12. Compare to a plain AVG(wait_seconds) and the two agree closely.

If n itself is sensitive, pass a public lower bound n_min instead.

4 — Frequency estimate from GRRM output

A naïve GROUP BY on perturbed values is biased: every category absorbs some of the noise from every other. The closed-form estimator inverts the GRRM transition matrix.

SELECT anon.ldp_frequency_estimate(
         observed_count => COUNT(*) FILTER (WHERE noisy_rating = 5),
         n              => COUNT(*),
         epsilon        => 1.0,
         d              => 5
       ) AS rating_5_estimate
FROM   responses_anonymized;

5 — Pure post-processing: parameter helpers

SELECT anon.ldp_truth_probability(1.0, 5)        AS q,
       anon.ldp_lie_probability  (1.0, 5)        AS p,
       anon.ldp_laplace_scale    (1.0, 0, 600)   AS laplace_scale_b;

Helpers are pure functions of the public parameters. No privacy budget is spent.

Where to next

  • Concepts — the math behind ε, sensitivity, and the local/central distinction.
  • Mechanisms — pick the right primitive for your task with the decision matrix.
  • Interactive demos — run the mechanisms on real seeded data with live ε sliders, no install required.