Metadata-Version: 2.4
Name: insurance-fairness-ldp
Version: 0.1.0
Summary: Local Differential Privacy for discrimination-free insurance pricing. Implements the Zhang/Liu/Shi (2025) correction matrix framework — the insurer never sees the true sensitive attribute.
Project-URL: Homepage, https://github.com/burning-cost/insurance-fairness-ldp
Project-URL: Repository, https://github.com/burning-cost/insurance-fairness-ldp
Project-URL: Issues, https://github.com/burning-cost/insurance-fairness-ldp/issues
Author-email: Burning Cost <pricing.frontier@gmail.com>
License: MIT
Keywords: FCA,LDP,actuarial,discrimination-free pricing,fairness,insurance,local differential privacy,pricing,randomised response
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: polars>=1.0.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: scipy>=1.11.0
Provides-Extra: catboost
Requires-Dist: catboost>=1.2.0; extra == 'catboost'
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# insurance-fairness-ldp

Discrimination-free insurance pricing using Local Differential Privacy (LDP). The insurer never sees the true sensitive attribute.

## The problem

UK insurers face a genuine bind on ethnicity pricing. GDPR Article 9 makes it legally uncomfortable to collect ethnicity data. FCA's 2025 ethnicity penalty analysis (EP25/2) found a residual £28/year gap in motor premiums that isn't explained by claims risk. The FCA Consumer Duty requires demonstrable fair value. And the Equality Act 2010 Section 19 exposes insurers to indirect discrimination risk via postcode rating.

The standard fairness toolkit (audit models, run counterfactuals, apply Lindholm corrections) requires the insurer to hold the sensitive attribute at some point. That creates the GDPR Article 9 exposure in the first place.

LDP flips the architecture. Policyholders submit a *privatised* version of their sensitive attribute — one that satisfies epsilon-LDP before it leaves their hands. The insurer never sees the true value. The mathematical correction happens on the privatised data, and the result is a discrimination-free premium that is actuarially valid.

## What this library implements

The Zhang/Liu/Shi (arXiv:2504.11775, 2025) correction matrix framework for the Lindholm (2022) discrimination-free pricing formula, operating exclusively on privatised sensitive attributes.

The core formula is Lindholm's::

    h*(X) = sum_k f_k(X) * P*(D=k)

where each group model f_k(X) is trained using LDP-corrected sample weights derived from the Pi^{-1} correction matrix, and P*(D) is a reference distribution estimated from the debiased noisy frequencies via T^{-1}.

No existing Python package implements this. OpenDP does k-RR but not the group-specific pricing correction. Fairlearn and AIF360 require the true sensitive attribute. InsurFair (R) implements Lindholm but not under LDP.

## Architecture warning

The formal LDP privacy guarantee requires a Trusted Third Party (TTP) architecture: policyholders submit their privatised responses directly to the TTP, not to the insurer. When a single organisation runs this code, the formal privacy guarantee does not apply in the same sense. This library provides the correct mathematical framework for the multi-party case and is suitable for research, simulation, and compliance demonstration. Deploying as a live privacy guarantee requires proper TTP infrastructure.

## Quick start

```python
import numpy as np
from sklearn.linear_model import Ridge
from insurance_fairness_ldp import (
    KaryRandomisedResponse,
    LDPDiscriminationFreePrice,
    LDPFairnessReport,
)

# Step 1: Define the LDP mechanism (epsilon controls privacy/accuracy trade-off)
krr = KaryRandomisedResponse(
    epsilon=1.0,
    categories=["White", "Asian", "Black", "Other"],
)

# Step 2: In a real deployment, policyholders apply k-RR themselves.
# In simulation or research, we apply it:
S_private = krr.privatise(true_ethnicity_array, random_state=42)

# Step 3: Fit discrimination-free pricing model
model = LDPDiscriminationFreePrice(
    base_estimator=Ridge(),
    mechanism=krr,
    reference_dist="marginal",  # or supply P*(D) directly
)
model.fit(X_train, S_private_train, y_train)

# Step 4: Generate discrimination-free premiums
premiums = model.predict(X_test)

# Step 5: Generate regulatory report
report = LDPFairnessReport.from_model(
    model, X_test, S_private_test, y=y_test
)
report.to_markdown("ldp_fairness_report.md")
print(report.summary())
```

## Unknown epsilon

When epsilon is not known (because privatisation was done externally), use anchor-point estimation:

```python
from insurance_fairness_ldp import NoiseRateEstimator

# Anchor: observations where you know the true category with near-certainty
anchor_selector = lambda X: X[:, 0] > 65  # e.g. policyholders known to be in group 0

estimator = NoiseRateEstimator(
    categories=["White", "Asian", "Black", "Other"],
    anchor_category="White",
    anchor_selector=anchor_selector,
)
estimator.fit(S_private, X=X)
print(estimator.summary())

# Convert to mechanism and use in pricing
krr_estimated = estimator.to_mechanism()
```

## Choosing epsilon

| epsilon | pi (k=2) | C1 (k=2) | Privacy | Accuracy |
|---------|----------|----------|---------|----------|
| 0.5     | 0.622    | 2.45     | Very strong | Poor |
| 1.0     | 0.731    | 1.73     | Strong | Acceptable |
| 2.0     | 0.880    | 1.27     | Moderate | Good |
| 5.0     | 0.993    | 1.01     | Minimal | Excellent |

For UK insurance research, epsilon=1 to 2 gives meaningful privacy with acceptable accuracy loss. The accuracy constant C1 tells you how much the LDP correction inflates the generalisation error bound relative to direct observation: C1=2 means the bound is 2x worse.

## API reference

### `KaryRandomisedResponse(epsilon, categories)`

k-ary Randomised Response mechanism. Perturbs a sensitive categorical attribute to satisfy epsilon-LDP.

- `.privatise(s, random_state)` — apply k-RR to an array of true values
- `.correction_matrix()` — return the k x k transition matrix T
- `.pi` — truth probability P(S=d | D=d)
- `.k`, `.epsilon`, `.categories`

### `CorrectionMatrix(pi, k)`

Computes the LDP correction matrices.

- `.T_inv()` — inverse of T; used to debias frequency distributions
- `.Pi_inv(group_probs)` — group-reweighted correction; used in loss weighting
- `.debias_probs(noisy_probs, clip=True)` — apply T^{-1} to a frequency vector
- `.accuracy_constant()` — C1 value
- `CorrectionMatrix.from_mechanism(krr)` — factory from a KaryRandomisedResponse

### `LDPDiscriminationFreePrice(base_estimator, mechanism, reference_dist)`

Main pricing class. sklearn-compatible.

- `.fit(X, S_private, y, exposure=None)` — train with LDP-corrected sample weights
- `.predict(X)` — return h*(X) discrimination-free premiums
- `.predict_group(X, category)` — return f_k(X) for a single group
- `.group_models_` — dict of fitted group models
- `.reference_dist_` — P*(D) used in the Lindholm formula

### `NoiseRateEstimator(categories, anchor_category, anchor_selector)`

Anchor-point estimation of pi (unknown epsilon case).

- `.fit(S_private, X, bootstrap, n_bootstrap, random_state)`
- `.pi_`, `.epsilon_`, `.std_error_`, `.n_anchor_`
- `.to_mechanism(categories)` — convert to KaryRandomisedResponse
- `.summary()` — text summary

### `LDPFairnessReport`

Structured report with `summary()` and `to_markdown()` methods.

- `LDPFairnessReport.from_model(model, X, S_private, y, h_naive, notes)`

### Functions

- `privatise(s, epsilon, categories, random_state)` — convenience wrapper
- `discrimination_free_indicator(h_star, h_naive, norm)` — pricing distance metric
- `group_loss_corrected(y_true, y_pred, S_private, categories, Pi_inv)` — LDP-corrected group loss
- `calibration_by_group_ldp(y_true, y_pred, S_private, categories)` — calibration check
- `c1_adjusted_error_bound(base_bound, c1, k, p_s_k_star)` — bound inflation
- `debiased_group_means(y, S_private, categories, T_inv)` — unbiased conditional means

## UK regulatory context

- **GDPR Article 9 / DPA 2018 Schedule 1**: If the insurer receives only privatised S, there is a defensible argument they have not "processed" special category data in the Article 9 sense. The TTP processes it; the insurer receives noise.
- **FCA EP25/2 (2025)**: The FCA found a £28/year residual ethnicity gap in motor after risk adjustment. This library provides a technical route to demonstrate non-discrimination even when ethnicity data is unavailable.
- **Equality Act 2010, Section 19**: The Lindholm reference distribution P*(D) being independent of X removes the indirect discrimination mechanism.
- **Test-Achats (2012)**: UK insurers have been prohibited from using gender in pricing for 13 years. LDP extends this architecture to ethnicity and disability.
- **Data (Use and Access) Act 2025**: Reduces the sensitivity of the protected-attribute decision pathway, supporting ADM compliance.

## How this fits the Burning Cost stack

| Library | Requires true D? | Purpose |
|---------|-----------------|---------|
| insurance-fairness-diag | Yes | Diagnose proxy leakage |
| insurance-fairness | Yes | Audit model discrimination |
| insurance-fairness-ot | Yes | Wasserstein discrimination-free prices |
| **insurance-fairness-ldp** | **No** | Discrimination-free prices without ever seeing D |

The natural workflow: run `insurance-fairness-diag` to detect proxy leakage, then use `insurance-fairness-ldp` to correct for it without requiring access to the restricted attribute.

## Installation

```bash
pip install insurance-fairness-ldp
```

Optional CatBoost support:

```bash
pip install insurance-fairness-ldp[catboost]
```

## References

Zhang, Liu, Shi (2025). Discrimination-Free Insurance Pricing under Local Differential Privacy. arXiv:2504.11775.

Lindholm, Richman, Tsanakas, Wüthrich (2022). Discrimination-Free Insurance Pricing. ASTIN Bulletin 52(1), 55-89.

Makhlouf et al. (2024). A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness. arXiv:2405.14725. CSF 2024.

Warner (1965). Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. JASA 60(309), 63-69.

## Licence

MIT. Copyright Burning Cost, 2026.
