# diff-diff

> A Python library for Difference-in-Differences causal inference analysis. Provides sklearn-like estimators with statsmodels-style output for econometric analysis.

- Version: 3.4.0
- Repository: https://github.com/igerber/diff-diff
- License: MIT
- Dependencies: numpy, pandas, scipy (no statsmodels dependency)
- Optional: Rust backend for performance (via maturin)

## Quick Start

```python
import pandas as pd
from diff_diff import DifferenceInDifferences, generate_did_data

# Generate synthetic data with known treatment effect
data = generate_did_data(n_units=200, treatment_effect=5.0, seed=42)

# Fit basic 2x2 DiD
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(results.summary())
print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
```

## Design Patterns

- **sklearn-like API**: All estimators use `fit()` method, `get_params()`/`set_params()` for configuration.
- **Formula interface**: Supports R-style formulas like `"outcome ~ treated * post"`.
- **Results objects**: Rich dataclass containers with `summary()`, `to_dict()`, `to_dataframe()`.
- **Estimator aliases**: Short names available (e.g., `DiD`, `CS`, `SA`, `BJS`, `Gardner`, `SDiD`, `TWFE`, `DDD`, `CDiD`, `EDiD`, `Stacked`, `Bacon`).

## Practitioner Workflow (based on Baker et al. 2025)

For rigorous DiD analysis, follow the 8-step framework (call `diff_diff.get_llm_guide("practitioner")`).
After estimation, call:

```python
from diff_diff import practitioner_next_steps
guidance = practitioner_next_steps(results)
```

Returns context-aware guidance on remaining diagnostic steps (parallel trends
testing, sensitivity analysis, heterogeneity checks, robustness comparisons).

## Estimators

### DifferenceInDifferences

Basic 2x2 Difference-in-Differences estimator.

```python
DifferenceInDifferences(
    robust: bool = True,                    # HC1 robust standard errors
    cluster: str | None = None,             # Column for cluster-robust SEs
    alpha: float = 0.05,                    # Significance level
    inference: str = "analytical",          # "analytical" or "wild_bootstrap"
    n_bootstrap: int = 999,                 # Bootstrap replications (if inference="wild_bootstrap")
    bootstrap_weights: str = "rademacher",  # "rademacher", "webb", or "mammen"
    seed: int | None = None,                # Random seed
    rank_deficient_action: str = "warn",    # "warn", "error", or "silent"
)
```

**Alias:** `DiD`

**fit() parameters:**

```python
did.fit(
    data: pd.DataFrame,
    outcome: str = None,                   # Outcome variable column
    treatment: str = None,                 # Treatment indicator column (0/1)
    time: str = None,                      # Post-treatment indicator column (0/1)
    formula: str = None,                   # R-style formula (e.g., "y ~ treated * post")
    covariates: list[str] = None,          # Linear control variables
    fixed_effects: list[str] = None,       # Low-dimensional FE (dummy variables)
    absorb: list[str] = None,             # High-dimensional FE (within-transformation)
) -> DiDResults
```

**Usage:**

```python
from diff_diff import DifferenceInDifferences

did = DifferenceInDifferences(robust=True)
results = did.fit(data, outcome='y', treatment='treated', time='post')
results.print_summary()

# Formula interface
results = did.fit(data, formula='y ~ treated * post')

# With covariates and fixed effects
results = did.fit(data, outcome='y', treatment='treated', time='post',
                  covariates=['age', 'income'], absorb=['firm_id'])
```

### TwoWayFixedEffects

Two-Way Fixed Effects estimator for panel data. Inherits from DifferenceInDifferences.

```python
TwoWayFixedEffects(
    robust: bool = True,
    cluster: str | None = None,   # Auto-clusters at unit level if None
    alpha: float = 0.05,
)
```

**Alias:** `TWFE`

**fit() parameters:**

```python
twfe.fit(
    data: pd.DataFrame,
    outcome: str,
    treatment: str,
    time: str,
    unit: str,
    covariates: list[str] = None,
) -> DiDResults
```

**Usage:**

```python
from diff_diff import TwoWayFixedEffects

twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treatment='treated', time='post', unit='unit_id')
results.print_summary()
```

**Note:** TWFE can be biased with staggered treatment timing and heterogeneous effects. Consider CallawaySantAnna, SunAbraham, or ImputationDiD for staggered designs.

### MultiPeriodDiD

Event-study style DiD with period-specific treatment effects. Inherits from DifferenceInDifferences.

```python
MultiPeriodDiD(
    robust: bool = True,
    cluster: str | None = None,
    alpha: float = 0.05,
)
```

**Alias:** `EventStudy`

**fit() parameters:**

```python
mp_did.fit(
    data: pd.DataFrame,
    outcome: str,
    treatment: str,
    time: str,
    post_periods: list = None,             # Which periods are post-treatment
    covariates: list[str] = None,
    fixed_effects: list[str] = None,
    absorb: list[str] = None,
    reference_period: Any = None,          # Reference period (default: last pre-period)
) -> MultiPeriodDiDResults
```

**Usage:**

```python
from diff_diff import MultiPeriodDiD, plot_event_study

did = MultiPeriodDiD()
results = did.fit(data, outcome='sales', treatment='treated',
                  time='period', post_periods=[4, 5, 6, 7])
results.print_summary()
plot_event_study(results)
```

### CallawaySantAnna

Callaway-Sant'Anna (2021) estimator for staggered DiD with heterogeneous treatment effects.

```python
CallawaySantAnna(
    control_group: str = "never_treated",        # "never_treated" or "not_yet_treated"
    anticipation: int = 0,                       # Anticipation periods
    estimation_method: str = "dr",               # "dr", "ipw", or "reg"
    alpha: float = 0.05,
    cluster: str | None = None,                  # Defaults to unit-level clustering
    n_bootstrap: int = 0,                        # 0 = analytical SEs, 999+ recommended
    bootstrap_weights: str | None = None,        # "rademacher", "mammen", or "webb"
    seed: int | None = None,
    rank_deficient_action: str = "warn",
    base_period: str = "varying",                # "varying" or "universal"
    cband: bool = True,                          # Simultaneous confidence bands
    pscore_trim: float = 0.01,                   # Propensity score trimming bound
)
```

**Alias:** `CS`

**fit() parameters:**

```python
cs.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,              # Column: first treatment period (0 or inf for never-treated)
    covariates: list[str] = None,
    aggregate: str = None,         # None, "simple", "event_study", "group", or "all"
    balance_e: int = None,         # Balance event study at this relative period
) -> CallawaySantAnnaResults
```

**Usage:**

```python
from diff_diff import CallawaySantAnna, plot_event_study

cs = CallawaySantAnna(estimation_method="dr", n_bootstrap=999, seed=42)
results = cs.fit(data, outcome='outcome', unit='unit', time='period',
                 first_treat='first_treat', aggregate='event_study')
results.print_summary()
plot_event_study(results)
```

### ChaisemartinDHaultfoeuille

de Chaisemartin & D'Haultfœuille (2020/2022) estimator for **non-absorbing (reversible) treatments**. The only library estimator that handles treatments which can switch on AND off over time. Ships `DID_M` (= `DID_1` at horizon `l = 1`) plus the full multi-horizon event study `DID_l` for `l = 1..L_max` from the dynamic companion paper (NBER WP 29873). Includes normalized estimator `DID^n_l`, cost-benefit aggregate `delta`, dynamic placebos `DID^{pl}_l`, and sup-t simultaneous confidence bands.

```python
ChaisemartinDHaultfoeuille(
    alpha: float = 0.05,
    cluster: str | None = None,                  # Must be None; non-None raises NotImplementedError
    n_bootstrap: int = 0,                        # 0 = analytical SE only
    bootstrap_weights: str = "rademacher",       # "rademacher", "mammen", or "webb"
    seed: int | None = None,
    placebo: bool = True,                        # Auto-compute single-lag placebo
    twfe_diagnostic: bool = True,                # Auto-compute Theorem 1 TWFE decomposition
    drop_larger_lower: bool = True,              # Drop multi-switch groups (matches R DIDmultiplegtDYN)
    by_path: int | None = None,                  # Top-k per-path event study; requires drop_larger_lower=False, L_max>=1; supports binary or integer-coded discrete D (D in Z); composes with survey_design (analytical TSL + replicate-weight; multiplier bootstrap n_bootstrap>0 still gated under survey) and heterogeneity (per-path predict_het, mirrors R did_multiplegt_dyn(..., by_path, predict_het); composes with placebo=True for per-path placebo predict_het on backward horizons — survey_design + placebo + heterogeneity warns and falls back to forward-horizon-only heterogeneity until the pre-period cell allocator is derived); mutex with paths_of_interest
    paths_of_interest: list[tuple[int, ...]] | None = None,  # User-specified path subset, alternative to by_path=k (Python-only API; mutex with by_path; composes with survey_design, heterogeneity, and placebo same as by_path=k)
    rank_deficient_action: str = "warn",         # Used by TWFE diagnostic OLS
)
```

**Alias:** `DCDH`

**fit() parameters:**

```python
est.fit(
    data: pd.DataFrame,
    outcome: str,
    group: str,                                  # Group identifier
    time: str,
    treatment: str,                              # Per-observation binary treatment or non-binary intensity
    # ---- multi-horizon ----
    L_max: int | None = None,                    # Max horizon; None = l=1 only
    # ---- covariates and extensions ----
    aggregate: str | None = None,                # Reserved; raises NotImplementedError
    controls: list[str] | None = None,           # DID^X residualization-style covariates
    trends_linear: bool | None = None,           # DID^{fd} group-specific linear trends
    trends_nonparam: Any | None = None,          # DID^s state-set-specific trends
    honest_did: bool = False,                    # HonestDiD sensitivity on placebos
    # ---- survey support ----
    survey_design: SurveyDesign | None = None,   # pweight (TSL or replicate BRR/Fay/JK1/JKn/SDR); n_bootstrap > 0 uses Hall-Mammen PSU multiplier
) -> ChaisemartinDHaultfoeuilleResults
```

`L_max` controls multi-horizon computation. `controls`, `trends_linear`, `trends_nonparam`, `honest_did`, `heterogeneity`, `design2`, and `survey_design` are all supported; only `aggregate` still raises `NotImplementedError`.

**Usage:**

```python
from diff_diff import ChaisemartinDHaultfoeuille
from diff_diff.prep import generate_reversible_did_data

data = generate_reversible_did_data(
    n_groups=80, n_periods=6, pattern="single_switch", seed=42,
)

est = ChaisemartinDHaultfoeuille()
results = est.fit(
    data, outcome="outcome", group="group",
    time="period", treatment="treatment",
)
results.print_summary()

# Decomposition
print(f"DID_M (overall):  {results.overall_att:.3f}")
print(f"DID_+ (joiners):  {results.joiners_att:.3f}")
print(f"DID_- (leavers):  {results.leavers_att:.3f}")
print(f"Placebo (DID^pl): {results.placebo_effect:.3f}")

# Multi-horizon event study
results = est.fit(data, outcome="outcome", group="group",
                  time="period", treatment="treatment", L_max=3)
for h in sorted(results.event_study_effects):
    e = results.event_study_effects[h]
    print(f"  DID_{h} = {e['effect']:.3f} (SE={e['se']:.3f})")
print(f"Cost-benefit delta: {results.cost_benefit_delta['delta']:.3f}")
df = results.to_dataframe("event_study")  # includes placebos as negative horizons
```

**Standalone TWFE diagnostic** (without fitting the full estimator):

```python
from diff_diff import twowayfeweights

diagnostic = twowayfeweights(
    data, outcome="outcome", group="group", time="period", treatment="treatment",
)
print(f"Plain TWFE coefficient: {diagnostic.beta_fe:.3f}")
print(f"Fraction of negative weights: {diagnostic.fraction_negative:.3f}")
print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}")
```

**Notes:**
- Validated against R `DIDmultiplegtDYN` v2.3.3 at horizon `l = 1` via `tests/test_chaisemartin_dhaultfoeuille_parity.py`
- Placebo SE contract: single-period placebo `DID_M^pl` (`L_max=None`) has `NaN` SE because the per-period aggregation path has no influence-function derivation; inference fields stay NaN-consistent **even when `n_bootstrap > 0`** for the single-period path (single-period bootstrap covers only `DID_M`, `DID_+`, and `DID_-`). Multi-horizon dynamic placebos `DID^{pl}_l` (`L_max >= 1`) have valid analytical SE via the placebo influence function (same cohort-recentered structure as positive horizons, applied to backward outcome differences), with bootstrap SE override when `n_bootstrap > 0` — bootstrap at `L_max >= 1` covers `DID_M`, `DID_+`, `DID_-`, per-horizon event-study effects (`event_study_effects[l]`), and placebo horizons (`placebo_event_study[-l]`), with shared weights across horizons for valid joint (sup-t) bands. This is a library extension beyond the dynamic companion paper, which states Theorem 1 variance for `DID_l` only.
- The analytical CI is conservative under Assumption 8 (independent groups) of the dynamic companion paper, exact only under iid sampling
- Survey design supported: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) or replicate-weight variance (BRR/Fay/JK1/JKn/SDR). Opt-in PSU-level Hall-Mammen wild bootstrap via `n_bootstrap > 0`. Replicate + `n_bootstrap > 0` rejected with `NotImplementedError` (replicate variance is closed-form)

### SunAbraham

Sun-Abraham (2021) interaction-weighted estimator for staggered DiD.

```python
SunAbraham(
    control_group: str = "never_treated",        # "never_treated" or "not_yet_treated"
    anticipation: int = 0,
    alpha: float = 0.05,
    cluster: str | None = None,                  # Defaults to unit-level clustering
    n_bootstrap: int = 0,                        # 0 = analytical cluster-robust SEs
    seed: int | None = None,
    rank_deficient_action: str = "warn",
)
```

**Alias:** `SA`

**fit() parameters:**

```python
sa.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    covariates: list[str] = None,
) -> SunAbrahamResults
```

**Usage:**

```python
from diff_diff import SunAbraham

sa = SunAbraham()
results = sa.fit(data, outcome='outcome', unit='unit',
                 time='period', first_treat='first_treat')
results.print_summary()
```

### ImputationDiD

Borusyak-Jaravel-Spiess (2024) imputation DiD estimator. Efficient estimator producing shorter CIs than CS/SA under homogeneous effects.

```python
ImputationDiD(
    anticipation: int = 0,
    alpha: float = 0.05,
    cluster: str | None = None,                  # Defaults to unit-level clustering
    n_bootstrap: int = 0,                        # 0 = analytical (Theorem 3 variance)
    bootstrap_weights: str = "rademacher",       # "rademacher", "mammen", or "webb"
    seed: int | None = None,
    rank_deficient_action: str = "warn",
    horizon_max: int | None = None,              # Max event-study horizon
    aux_partition: str = "cohort_horizon",        # "cohort_horizon", "cohort", or "horizon"
)
```

**Alias:** `BJS`

**fit() parameters:**

```python
imp.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    covariates: list[str] = None,
    aggregate: str = None,         # None, "simple", "event_study", "group", or "all"
    balance_e: int = None,
) -> ImputationDiDResults
```

**Usage:**

```python
from diff_diff import ImputationDiD, plot_event_study

est = ImputationDiD()
results = est.fit(data, outcome='outcome', unit='unit',
                  time='period', first_treat='first_treat',
                  aggregate='event_study')
results.print_summary()
plot_event_study(results)
```

### TwoStageDiD

Gardner (2022) two-stage DiD estimator. Point estimates match ImputationDiD; uses GMM sandwich variance.

```python
TwoStageDiD(
    anticipation: int = 0,
    alpha: float = 0.05,
    cluster: str | None = None,
    n_bootstrap: int = 0,
    bootstrap_weights: str = "rademacher",
    seed: int | None = None,
    rank_deficient_action: str = "warn",
    horizon_max: int | None = None,
)
```

**Alias:** `Gardner`

**fit() parameters:**

```python
ts.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    covariates: list[str] = None,
    aggregate: str = None,         # None, "simple", "event_study", "group", or "all"
    balance_e: int = None,
) -> TwoStageDiDResults
```

**Usage:**

```python
from diff_diff import TwoStageDiD

est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit',
                  time='period', first_treat='first_treat')
results.print_summary()
```

### SpilloverDiD

Butts (2021) ring-indicator spillover-aware DiD. Augments two-stage Gardner with ring-indicator covariates that identify direct effect on treated (`tau_total`) and per-ring spillover effects on near-control units (`delta_j`). Handles non-staggered and staggered timing in one estimator. Recommends `vcov_type="conley"` with cutoff = `d_bar` (paper Section 3.1).

```python
SpilloverDiD(
    rings: list[float],                  # K+1 sorted breakpoints; K rings
    d_bar: float | None = None,          # Far-away cutoff (defaults to max(rings))
    vcov_type: str = "hc1",              # "hc1", "conley", or default cluster
    conley_coords: tuple[str, str] | None = None,  # (lat_col, lon_col), required
    conley_metric: str = "haversine",    # or "euclidean" / callable
    conley_cutoff_km: float | None = None,
    conley_lag_cutoff: int | None = None,
    cluster: str | None = None,
    alpha: float = 0.05,
    anticipation: int = 0,
    event_study: bool = False,           # Wave C: per-event-time × ring decomposition (Butts Table 2)
    horizon_max: int | None = None,      # Bin event-times outside [-H,+H] into endpoint pools (event-study mode); H>=1 or None — H=0 rejected (use event_study=False for aggregate spec)
    rank_deficient_action: str = "warn",
)
```

**fit() parameters:**

```python
sp.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    treatment: str | None = None,        # binary D_it; auto-converted to first_treat
    first_treat: str | None = None,      # OR onset time per unit (Gardner)
    covariates: list[str] | None = None, # Deferred: NotImplementedError if non-None
    survey_design: object = None,        # Deferred: NotImplementedError if non-None
) -> SpilloverDiDResults
```

**Restrictions and Wave C/D status:**

- `covariates=` raises `NotImplementedError` (planned follow-up). Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates.
- `survey_design=` raises `NotImplementedError` (planned follow-up — SurveyDesign integration)
- `vcov_type="classical"` raises `NotImplementedError` (Wave D restriction). Wave D GMM first-stage correction has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`. Use `vcov_type="hc1"`, `vcov_type="conley"`, or pair with `cluster=<col>` for CR1 — all three apply the Wave D GMM correction.
- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs include the Wave D Gardner GMM first-stage correction (see next bullet).
- Stage-2 variance applies the Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D, SHIPPED). The IF outer-product formula `psi_i = gamma_hat' X_{10,i} eps_{10,i} - X_{2,i} eps_{2,i}` is used unconditionally; kernel `K` is path-dependent (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + Conley (1999); no reference software combines all three. Point estimates unchanged from Wave B/C; SE values shift upward by 1-few percent.
- Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed

**Usage:**

```python
from diff_diff import SpilloverDiD

est = SpilloverDiD(
    rings=[0, 50, 100, 200],
    conley_coords=("lat", "lon"),
    vcov_type="conley",
    conley_cutoff_km=200.0,
    conley_lag_cutoff=0,
)
results = est.fit(data, outcome="y", unit="unit", time="time", treatment="D")
print(f"tau_total = {results.att:.4f}")
print(results.spillover_effects)         # per-ring DataFrame
```

### SyntheticDiD

Synthetic Difference-in-Differences (Arkhangelsky et al. 2021). Combines DiD with synthetic control by re-weighting control units.

```python
SyntheticDiD(
    zeta_omega: float | None = None,        # Unit weight regularization (auto-computed if None)
    zeta_lambda: float | None = None,       # Time weight regularization (auto-computed if None)
    alpha: float = 0.05,
    variance_method: str = "placebo",       # "placebo", "bootstrap", or "jackknife"
    n_bootstrap: int = 200,                 # Replications for variance estimation
    seed: int | None = None,
)
```

**Alias:** `SDiD`

**fit() parameters:**

```python
sdid.fit(
    data: pd.DataFrame,
    outcome: str,
    treatment: str,
    unit: str,
    time: str,
    post_periods: list,
) -> SyntheticDiDResults
```

**Usage:**

```python
from diff_diff import SyntheticDiD

sdid = SyntheticDiD(seed=42)
results = sdid.fit(data, outcome='outcome', treatment='treated',
                   unit='unit', time='period', post_periods=[5, 6, 7, 8])
results.print_summary()
weights_df = results.get_unit_weights_df()
```

### TripleDifference

Triple Difference (DDD) estimator following Ortiz-Villavicencio & Sant'Anna (2025).

```python
TripleDifference(
    estimation_method: str = "dr",            # "dr", "reg", or "ipw"
    robust: bool = True,
    cluster: str | None = None,
    alpha: float = 0.05,
    pscore_trim: float = 0.01,
    rank_deficient_action: str = "warn",
)
```

**Alias:** `DDD`

**fit() parameters:**

```python
ddd.fit(
    data: pd.DataFrame,
    outcome: str,
    group: str,                # Treated group indicator (0/1)
    partition: str,            # Eligible partition indicator (0/1)
    time: str,                 # Post-treatment indicator (0/1)
    covariates: list[str] = None,
) -> TripleDifferenceResults
```

**Usage:**

```python
from diff_diff import TripleDifference

ddd = TripleDifference(estimation_method="dr")
results = ddd.fit(data, outcome='outcome', group='group',
                  partition='partition', time='post')
results.print_summary()
```

### ContinuousDiD

Continuous Difference-in-Differences estimator (Callaway, Goodman-Bacon & Sant'Anna 2024). Estimates dose-response curves ATT(d) and ACRT(d).

```python
ContinuousDiD(
    degree: int = 3,                          # B-spline degree (3 = cubic)
    num_knots: int = 0,                       # Interior knots
    dvals: np.ndarray | None = None,          # Custom dose evaluation grid
    control_group: str = "never_treated",     # "never_treated" or "not_yet_treated"
    anticipation: int = 0,
    base_period: str = "varying",             # "varying" or "universal"
    alpha: float = 0.05,
    n_bootstrap: int = 0,
    bootstrap_weights: str = "rademacher",
    seed: int | None = None,
    rank_deficient_action: str = "warn",
)
```

**Alias:** `CDiD`

**fit() parameters:**

```python
cdid.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    dose: str,                     # Column with continuous treatment dose
    aggregate: str = None,         # None, "dose", "eventstudy"
) -> ContinuousDiDResults
```

**Usage:**

```python
from diff_diff import ContinuousDiD

est = ContinuousDiD(n_bootstrap=199, seed=42)
results = est.fit(data, outcome='outcome', unit='unit', time='period',
                  first_treat='first_treat', dose='dose', aggregate='dose')
results.print_summary()
```

### HeterogeneousAdoptionDiD

HeterogeneousAdoption DiD estimator (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). Targets a Weighted Average Slope (WAS) at the dose support boundary on **Heterogeneous Adoption Designs** — designs where treatment varies in dose intensity across units. Comparison comes from dose variation across units. The estimator does NOT require dropping never-treated units: a small share of never-treated units is fully compatible (paper edge case — Garrett et al. 2020 retained 12 untreated counties out of 2,954), and on staggered event-study panels never-treated units are explicitly retained as the untreated-group comparison (paper Appendix B.2). Uses a bias-corrected local-linear estimator at the dose support boundary on continuous-dose designs (Design 1' / Design 1) and a 2SLS Wald-IV estimator on the mass-point design.

```python
HeterogeneousAdoptionDiD(
    design: str = "auto",              # "auto" / "continuous_at_zero" / "continuous_near_d_lower" / "mass_point"
    d_lower: float | None = None,      # Support infimum; auto-detected when None
    kernel: str = "epanechnikov",      # Local-linear kernel
    alpha: float = 0.05,
    vcov_type: str | None = None,      # Mass-point only: "classical" (default) or "hc1"
    robust: bool = False,              # Mass-point only: HC1 robust SE shorthand
    cluster: str | None = None,        # Mass-point only: cluster column for CR1 cluster-robust SE
    n_bootstrap: int = 999,            # Multiplier-bootstrap iterations for sup-t bands (event-study + weighted)
    seed: int | None = None,
)
```

**Alias:** `HAD`

**fit() parameters:**

```python
had.fit(
    data: pd.DataFrame,
    outcome_col: str,
    dose_col: str,
    time_col: str,
    unit_col: str,
    first_treat_col: str | None = None,        # Required on staggered panels (last-cohort auto-filter trigger)
    aggregate: str = "overall",                # "overall" (single scalar WAS) or "event_study" (per-horizon WAS)
    survey: SurveyDesign | None = None,        # DEPRECATED alias of survey_design=
    weights: np.ndarray | None = None,         # DEPRECATED pweight shortcut alias
    cband: bool = True,                        # Simultaneous (sup-t) confidence bands on weighted event-study fits
    *,
    survey_design: SurveyDesign | None = None, # Canonical survey-design kwarg (weights, strata, PSU, FPC)
    trends_lin: bool = False,                  # Eq 17 linear-trend detrending. Requires aggregate="event_study"; needs F>=3 (pre-period depth) for the regression; rejects ALL weighting entry paths (survey_design= / survey= / weights= all raise NotImplementedError under trends_lin).
) -> HeterogeneousAdoptionDiDResults | HeterogeneousAdoptionDiDEventStudyResults
```

**Usage:**

```python
from diff_diff import HeterogeneousAdoptionDiD, did_had_pretest_workflow

# `aggregate="overall"` (the default) is two-period-only: did_had_pretest_workflow
# reduces to a single first-difference and HeterogeneousAdoptionDiD.fit hard-rejects
# panels with more than two periods. `aggregate="event_study"` requires a
# multi-period panel. Use distinct data objects for the two regimes.

# Vet the testable identifying assumptions on the two-period panel first:
report = did_had_pretest_workflow(
    data_2p, outcome_col='y', unit_col='unit', time_col='t',
    dose_col='d', first_treat_col='first_treat')
print(report.summary())

# Single-period scalar WAS (aggregate="overall" default) on the two-period panel:
est = HeterogeneousAdoptionDiD()
results = est.fit(data_2p, outcome_col='y', unit_col='unit',
                  time_col='t', dose_col='d',
                  first_treat_col='first_treat')
print(results.summary())

# Multi-period per-horizon WAS on the multi-period panel:
es = est.fit(data_mp, outcome_col='y', unit_col='unit',
             time_col='t', dose_col='d',
             first_treat_col='first_treat',
             aggregate='event_study')
```

**Staggered panels.** On multi-cohort panels with `aggregate="event_study"`, `fit()` auto-filters to the last treatment cohort plus never-treated units (paper Appendix B.2) and emits a `UserWarning` naming kept/dropped counts. The estimand is then a **last-cohort-only WAS**, not a multi-cohort average. For full multi-cohort staggered support, see `ChaisemartinDHaultfoeuille`.

**Mass-point + survey constraint.** When fitting `design="mass_point"` with `survey_design=` (or the deprecated `survey=` alias), `vcov_type="hc1"` (or `robust=True`) is required: the survey path composes the standard error via Binder-TSL on the HC1-scale influence function, so the default classical sandwich path raises `NotImplementedError`. The same HC1 requirement also fires on the `weights=` shortcut when `aggregate="event_study"` AND `cband=True`: the per-horizon IF matrix is HC1-scale and the sup-t bootstrap normalizes by it, so mixing in a classical analytical SE would produce inconsistent variance families. Classical vcov is allowed on `weights=` + `aggregate="overall"` and on `weights=` + `aggregate="event_study"` + `cband=False`. Passing `vcov_type="hc1"` is a safe default on weighted survey + sup-t examples since `vcov_type` is unused on the continuous designs (CCT-2014 robust SE is the only formula there).

### StackedDiD

Stacked DiD estimator (Wing, Freedman & Hollingsworth 2024). Addresses TWFE bias with corrective Q-weights.

```python
StackedDiD(
    kappa_pre: int = 1,                       # Pre-treatment event-time periods
    kappa_post: int = 1,                      # Post-treatment event-time periods
    weighting: str = "aggregate",             # "aggregate", "population", or "sample_share"
    clean_control: str = "not_yet_treated",   # "not_yet_treated", "strict", or "never_treated"
    cluster: str = "unit",                    # "unit" or "unit_subexp"
    alpha: float = 0.05,
    anticipation: int = 0,
    rank_deficient_action: str = "warn",
)
```

**Alias:** `Stacked`

**fit() parameters:**

```python
stacked.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    aggregate: str = None,         # None, "simple", or "event_study"
    population: str = None,        # Required when weighting="population"
) -> StackedDiDResults
```

**Usage:**

```python
from diff_diff import StackedDiD, plot_event_study

est = StackedDiD(kappa_pre=2, kappa_post=2)
results = est.fit(data, outcome='outcome', unit='unit',
                  time='period', first_treat='first_treat',
                  aggregate='event_study')
results.print_summary()
plot_event_study(results)
```

### EfficientDiD

Efficient DiD estimator (Chen, Sant'Anna & Xie 2025). Achieves the semiparametric efficiency bound for ATT(g,t) on the **no-covariate path**. Also supports an optional doubly-robust covariate path (sieve-based propensity score ratios + linear OLS outcome regression): the DR property gives consistency if either the OR or the PS is correctly specified, but the linear OLS outcome regression does not generically attain the efficiency bound unless the conditional mean is linear in the covariates. Pass column names to the `covariates` parameter on `fit()`.

```python
EfficientDiD(
    pt_assumption: str = "all",              # "all" (overidentified) or "post" (just-identified)
    alpha: float = 0.05,
    cluster: str | None = None,              # Column name for cluster-robust SEs (Liang-Zeger on EIF values); cluster-level multiplier bootstrap when n_bootstrap > 0
    n_bootstrap: int = 0,                    # Multiplier bootstrap iterations
    bootstrap_weights: str = "rademacher",   # "rademacher", "mammen", or "webb"
    seed: int | None = None,
    anticipation: int = 0,
)
```

**Alias:** `EDiD`

**fit() parameters:**

```python
edid.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    covariates: list[str] = None,  # Time-invariant unit-level covariates; uses doubly-robust sieve path when non-None
    aggregate: str = None,         # None, "simple", "event_study", "group", or "all"
    balance_e: int = None,
) -> EfficientDiDResults
```

**Usage:**

```python
from diff_diff import EfficientDiD

edid = EfficientDiD(pt_assumption="all")
results = edid.fit(data, outcome='y', unit='id', time='t',
                   first_treat='first_treat', aggregate='all')
results.print_summary()
```

### TROP

Triply Robust Panel estimator (Athey, Imbens, Qu & Viviano 2025). Combines nuclear norm regularization, distance-based unit weights, and time decay weights.

```python
TROP(
    method: str = "local",                     # "local" or "global"
    lambda_time_grid: list[float] = None,     # Time weight decay grid [0, 0.1, 0.5, 1, 2, 5]
    lambda_unit_grid: list[float] = None,     # Unit weight decay grid [0, 0.1, 0.5, 1, 2, 5]
    lambda_nn_grid: list[float] = None,       # Nuclear norm grid [0, 0.01, 0.1, 1, 10]
    max_iter: int = 100,
    tol: float = 1e-6,
    alpha: float = 0.05,
    n_bootstrap: int = 200,
    seed: int | None = None,
)
```

**fit() parameters:**

```python
trop.fit(
    data: pd.DataFrame,
    outcome: str,
    treatment: str,                # Absorbing-state treatment indicator (0/1). Must be 0 for all pre-treatment periods and 1 for treatment and post-treatment periods.
    unit: str,
    time: str,
) -> TROPResults
```

**Usage:**

```python
from diff_diff import TROP

trop = TROP(method='local', seed=42)
results = trop.fit(data, outcome='outcome', treatment='treated',
                   unit='unit', time='period')
results.print_summary()
```

### BaconDecomposition

Goodman-Bacon (2021) decomposition of TWFE into 2x2 DiD comparisons.

```python
BaconDecomposition(
    weights: str = "approximate",   # "approximate" or "exact"
)
```

**Alias:** `Bacon`

**fit() parameters:**

```python
bacon.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
) -> BaconDecompositionResults
```

**Usage:**

```python
from diff_diff import BaconDecomposition, plot_bacon

bacon = BaconDecomposition(weights="exact")
results = bacon.fit(data, outcome='outcome', unit='unit',
                    time='period', first_treat='first_treat')
results.print_summary()
plot_bacon(results)
```

### StaggeredTripleDifference

Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD estimator for designs with two eligibility criteria and staggered treatment timing.

```python
StaggeredTripleDifference(
    estimation_method: str = "dr",          # "dr", "ipw", or "reg"
    control_group: str = "notyettreated",   # "nevertreated" or "notyettreated"
    alpha: float = 0.05,
    anticipation: int = 0,
    base_period: str = "varying",           # "varying" or "universal"
    n_bootstrap: int = 0,
    bootstrap_weights: str = "rademacher",
    seed: int | None = None,
    cband: bool = True,
    pscore_trim: float = 0.01,
    cluster: str | None = None,
    rank_deficient_action: str = "warn",
    epv_threshold: float = 10,              # Min events-per-variable for propensity score
    pscore_fallback: str = "error",         # "error" or "unconditional"
)
```

**Alias:** `SDDD`

**fit() parameters:**

```python
sddd.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    first_treat: str,
    eligibility: str,               # Binary eligibility column (0/1)
    covariates: list[str] | None = None,
    aggregate: str | None = None,   # None, "simple", "group", "event_study", "all"
    balance_e: int | None = None,   # Max event time for balanced event study
    survey_design=None,
) -> StaggeredTripleDiffResults
```

**Usage:**

```python
from diff_diff import StaggeredTripleDifference

sddd = StaggeredTripleDifference(estimation_method='dr', n_bootstrap=999)
results = sddd.fit(data, outcome='y', unit='id', time='t',
                   first_treat='ft', eligibility='eligible',
                   aggregate='event_study')
results.print_summary()
```

### WooldridgeDiD

Wooldridge (2023, 2025) Extended Two-Way Fixed Effects (ETWFE) estimator. OLS path uses direct saturated-regression coefficients for ATT(g,t). Logit and Poisson QMLE paths use Average Structural Function (ASF) based ATT with delta-method standard errors.

```python
WooldridgeDiD(
    method: str = "ols",                    # "ols", "logit", or "poisson"
    control_group: str = "not_yet_treated", # "not_yet_treated" or "never_treated"
    anticipation: int = 0,                  # Number of anticipation periods
    demean_covariates: bool = True,         # Demean covariates within cohort*period cells
    alpha: float = 0.05,
    cluster: str | None = None,
    n_bootstrap: int = 0,
    bootstrap_weights: str = "rademacher",
    seed: int | None = None,
    rank_deficient_action: str = "warn",
)
```

**Alias:** `ETWFE`

**fit() parameters:**

```python
etwfe.fit(
    data: pd.DataFrame,
    outcome: str,
    unit: str,
    time: str,
    cohort: str,                    # First treatment period (0 or NaN = never treated)
    exovar: list[str] | None = None,  # Time-invariant covariates (no interaction/demeaning)
    xtvar: list[str] | None = None,   # Time-varying covariates (demeaned within cohort*period)
    xgvar: list[str] | None = None,   # Covariates interacted with each cohort indicator
) -> WooldridgeDiDResults
```

**Aggregation types:**

```python
results.aggregate("simple")    # Overall ATT
results.aggregate("group")     # ATT by cohort
results.aggregate("calendar")  # ATT by calendar period
results.aggregate("event")     # ATT by event time (relative to treatment)
```

**Usage:**

```python
from diff_diff import WooldridgeDiD

# OLS (linear outcomes)
etwfe = WooldridgeDiD(method='ols')
results = etwfe.fit(data, outcome='y', unit='id', time='t', cohort='first_treat')
print(results.aggregate("simple"))

# Logit (binary outcomes)
etwfe_logit = WooldridgeDiD(method='logit')
results = etwfe_logit.fit(data, outcome='y_bin', unit='id', time='t', cohort='first_treat')

# Poisson (count outcomes)
etwfe_pois = WooldridgeDiD(method='poisson')
results = etwfe_pois.fit(data, outcome='y_count', unit='id', time='t', cohort='first_treat')
```

### Convenience Functions

```python
# Functional interfaces (create estimator + call fit in one step)
from diff_diff import imputation_did, two_stage_did, triple_difference, stacked_did, trop, bacon_decompose

results = imputation_did(data, outcome='y', unit='id', time='t', first_treat='ft')
results = two_stage_did(data, outcome='y', unit='id', time='t', first_treat='ft')
results = triple_difference(data, outcome='y', group='g', partition='p', time='t')
results = stacked_did(data, outcome='y', unit='id', time='t', first_treat='ft',
                      kappa_pre=2, kappa_post=2)
results = trop(data, outcome='y', treatment='d', unit='id', time='t')
results = bacon_decompose(data, outcome='y', unit='id', time='t', first_treat='ft')
```

## Results Objects

**Flat-alias compatibility note.** Every staggered result class in this
section (those with canonical `overall_*` / `overall_att_*` / `avg_*`
prefixed inference fields) ALSO exposes the unprefixed flat names
`att` / `se` / `conf_int` / `p_value` / `t_stat` as read-only `@property`
aliases over the canonical fields. The canonical prefixed fields remain
the documented and computed surface; the flat aliases are pure
read-throughs for compatibility with external adapters that
`getattr(res, "se", None)`-style query the inference surface (e.g.
`balance.interop.diff_diff.as_balance_diagnostic()`). Tables below list
the canonical names; assume the flat aliases are present on every
staggered class unless explicitly noted otherwise.

### DiDResults

Returned by `DifferenceInDifferences.fit()` and `TwoWayFixedEffects.fit()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `att` | `float` | Average Treatment effect on the Treated |
| `se` | `float` | Standard error of ATT |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value (H0: ATT = 0) |
| `conf_int` | `tuple[float, float]` | Confidence interval |
| `n_obs` | `int` | Number of observations |
| `n_treated` | `int` | Number of treated units |
| `n_control` | `int` | Number of control units |
| `alpha` | `float` | Significance level |
| `coefficients` | `dict` | All regression coefficients |
| `vcov` | `np.ndarray` | Variance-covariance matrix |
| `residuals` | `np.ndarray` | Regression residuals |
| `fitted_values` | `np.ndarray` | Fitted values |
| `r_squared` | `float` | R-squared |
| `inference_method` | `str` | "analytical" or "wild_bootstrap" |
| `n_bootstrap` | `int` | Number of bootstrap replications |
| `n_clusters` | `int` | Number of clusters |
| `bootstrap_distribution` | `np.ndarray` | Bootstrap ATT distribution |

**Methods:** `summary(alpha=None)`, `print_summary()`, `to_dict()`, `to_dataframe()`

**Properties:** `is_significant`, `significance_stars`

### MultiPeriodDiDResults

Returned by `MultiPeriodDiD.fit()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `period_effects` | `dict[Any, PeriodEffect]` | Period-specific effects (pre and post) |
| `avg_att` | `float` | Average ATT across post-periods |
| `avg_se` | `float` | SE of average ATT |
| `avg_t_stat` | `float` | T-statistic for average ATT |
| `avg_p_value` | `float` | P-value for average ATT |
| `avg_conf_int` | `tuple[float, float]` | CI for average ATT |
| `n_obs` | `int` | Number of observations |
| `n_treated` | `int` | Number of treated observations |
| `n_control` | `int` | Number of control observations |
| `pre_periods` | `list` | Pre-treatment period identifiers |
| `post_periods` | `list` | Post-treatment period identifiers |
| `reference_period` | `Any` | Reference (omitted) period |
| `r_squared` | `float` | R-squared |
| `vcov` | `np.ndarray` | Variance-covariance matrix |
| `interaction_indices` | `dict` | Period to VCV column index mapping |

**Methods:** `summary()`, `print_summary()`, `get_effect(period)`, `to_dict()`, `to_dataframe()`

**Properties:** `pre_period_effects`, `post_period_effects`, `is_significant`, `significance_stars`

### PeriodEffect

Individual period treatment effect (used in MultiPeriodDiDResults).

| Attribute | Type | Description |
|-----------|------|-------------|
| `period` | `Any` | Time period identifier |
| `effect` | `float` | Treatment effect estimate |
| `se` | `float` | Standard error |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value |
| `conf_int` | `tuple[float, float]` | Confidence interval |

**Properties:** `is_significant`, `significance_stars`

### CallawaySantAnnaResults

Returned by `CallawaySantAnna.fit()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `group_time_effects` | `dict[(g,t), GroupTimeEffect]` | ATT(g,t) for each (group, time) |
| `overall_att` | `float` | Overall ATT |
| `overall_se` | `float` | SE of overall ATT |
| `overall_t_stat` | `float` | T-statistic |
| `overall_p_value` | `float` | P-value |
| `overall_conf_int` | `tuple[float, float]` | CI for overall ATT |
| `groups` | `list` | Treatment cohorts |
| `time_periods` | `list` | All time periods |
| `n_obs` | `int` | Number of observations |
| `event_study_effects` | `dict[int, dict]` | Event study effects by relative time |
| `group_effects` | `dict` | Group-level aggregated effects |

**Methods:** `summary()`, `print_summary()`, `to_dataframe(level="event_study"|"group_time"|"group")`

### SunAbrahamResults

Returned by `SunAbraham.fit()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `event_study_effects` | `dict[int, dict]` | Effects by relative time |
| `overall_att` | `float` | Overall ATT |
| `overall_se` | `float` | SE of overall ATT |
| `overall_t_stat` | `float` | T-statistic |
| `overall_p_value` | `float` | P-value |
| `overall_conf_int` | `tuple[float, float]` | CI |
| `cohort_weights` | `dict[int, dict]` | Interaction weights per period |
| `groups` | `list` | Treatment cohorts |
| `n_obs` | `int` | Number of observations |
| `n_treated_units` | `int` | Number of ever-treated units |
| `n_control_units` | `int` | Number of never-treated units |
| `control_group` | `str` | Control group type used |
| `cohort_effects` | `dict` | Cohort-level effects |

**Methods:** `summary()`, `print_summary()`, `to_dataframe(level="event_study"|"cohort")`

### SyntheticDiDResults

Returned by `SyntheticDiD.fit()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `att` | `float` | Average Treatment effect on the Treated |
| `se` | `float` | Standard error (bootstrap or placebo-based) |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value |
| `conf_int` | `tuple[float, float]` | Confidence interval |
| `n_obs` | `int` | Number of observations |
| `n_treated` | `int` | Number of treated units |
| `n_control` | `int` | Number of control units |
| `unit_weights` | `dict` | Control unit synthetic weights |
| `time_weights` | `dict` | Pre-treatment time weights |
| `pre_periods` | `list` | Pre-treatment periods |
| `post_periods` | `list` | Post-treatment periods |
| `variance_method` | `str` | "bootstrap", "jackknife", or "placebo" |
| `noise_level` | `float` | Estimated noise level |
| `zeta_omega` | `float` | Unit weight regularization |
| `zeta_lambda` | `float` | Time weight regularization |
| `pre_treatment_fit` | `float` | Pre-treatment RMSE |

**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_unit_weights_df()`, `get_time_weights_df()`

**Validation diagnostics** (call after `fit()`):
- `get_weight_concentration(top_k=5)` - effective N and top-k weight share; flags fragile synthetic controls dominated by a few donor units
- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"` with unit-level LOO granularity: available on non-survey and pweight-only jackknife fits; raises `NotImplementedError` on full-design survey jackknife (PSU-level LOO, see `result.placebo_effects` for raw PSU-level replicates) and `ValueError` when LOO is unavailable (single treated unit, only one control with nonzero effective weight, etc.)
- `in_time_placebo()` - re-estimate on shifted fake treatment dates in the pre-period; near-zero placebo ATTs indicate a credible design
- `sensitivity_to_zeta_omega()` - re-estimate across a grid of unit-weight regularization values; checks ATT robustness to the auto-selected zeta_omega

### TripleDifferenceResults

Returned by `TripleDifference.fit()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `att` | `float` | ATT estimate |
| `se` | `float` | Standard error |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value |
| `conf_int` | `tuple[float, float]` | Confidence interval |
| `n_obs` | `int` | Total observations |
| `n_treated_eligible` | `int` | Treated + eligible count |
| `n_treated_ineligible` | `int` | Treated + ineligible count |
| `n_control_eligible` | `int` | Control + eligible count |
| `n_control_ineligible` | `int` | Control + ineligible count |
| `estimation_method` | `str` | "dr", "reg", or "ipw" |
| `group_means` | `dict` | Cell means |
| `pscore_stats` | `dict` | Propensity score diagnostics |
| `r_squared` | `float` | R-squared (for "reg") |

**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`

### BaconDecompositionResults

Returned by `BaconDecomposition.fit()` and `bacon_decompose()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `twfe_estimate` | `float` | Overall TWFE coefficient |
| `comparisons` | `list[Comparison2x2]` | All 2x2 comparisons |
| `total_weight_treated_vs_never` | `float` | Weight on treated vs never-treated |
| `total_weight_earlier_vs_later` | `float` | Weight on earlier vs later |
| `total_weight_later_vs_earlier` | `float` | Weight on forbidden comparisons |
| `weighted_avg_treated_vs_never` | `float` | Avg effect from clean comparisons |
| `weighted_avg_earlier_vs_later` | `float` | Avg effect from earlier vs later |
| `weighted_avg_later_vs_earlier` | `float` | Avg effect from forbidden comparisons |
| `n_timing_groups` | `int` | Number of treatment timing groups |
| `n_never_treated` | `int` | Number of never-treated units |
| `timing_groups` | `list` | Treatment timing cohorts |
| `n_obs` | `int` | Total observations |
| `decomposition_error` | `float` | Error: TWFE minus weighted sum |

**Methods:** `summary()`, `print_summary()`, `to_dataframe()`

### Comparison2x2

Individual 2x2 DiD comparison (used in BaconDecompositionResults).

| Attribute | Type | Description |
|-----------|------|-------------|
| `treated_group` | `Any` | Timing group used as treated |
| `control_group` | `Any` | Timing group used as control |
| `comparison_type` | `str` | "treated_vs_never", "earlier_vs_later", or "later_vs_earlier" |
| `estimate` | `float` | 2x2 DiD estimate |
| `weight` | `float` | Weight in TWFE average |
| `n_treated` | `int` | Number of treated observations |
| `n_control` | `int` | Number of control observations |
| `time_window` | `tuple[float, float]` | (start, end) time window |

### Common Results Pattern for Staggered Estimators

ImputationDiDResults, TwoStageDiDResults, StackedDiDResults, and EfficientDiDResults share a similar structure:

| Attribute | Type | Description |
|-----------|------|-------------|
| `overall_att` | `float` | Overall ATT |
| `overall_se` | `float` | SE of overall ATT |
| `overall_t_stat` | `float` | T-statistic |
| `overall_p_value` | `float` | P-value |
| `overall_conf_int` | `tuple[float, float]` | CI |
| `event_study_effects` | `dict[int, dict]` | Event study effects (if aggregate includes event_study) |
| `group_effects` | `dict` | Group-level effects (if aggregate includes group) |
| `groups` | `list` | Treatment cohorts |
| `time_periods` | `list` | All time periods |
| `n_obs` | `int` | Number of observations |
| `n_treated_units` | `int` | Number of treated units |
| `n_control_units` | `int` | Number of control units |

Each event study effect dict contains: `effect`, `se`, `t_stat`, `p_value`, `conf_int`, `n_obs` (or `n_groups`).

**Methods:** `summary()`, `print_summary()`, `to_dataframe()`

### ContinuousDiDResults

| Attribute | Type | Description |
|-----------|------|-------------|
| `dose_response_att` | `DoseResponseCurve` | Dose-response curve for ATT |
| `dose_response_acrt` | `DoseResponseCurve` | Dose-response curve for ACRT |
| `overall_att` | `float` | Overall ATT |
| `overall_att_se` | `float` | SE of overall ATT |
| `overall_att_t_stat` | `float` | T-statistic for ATT |
| `overall_att_p_value` | `float` | P-value for ATT |
| `overall_att_conf_int` | `tuple[float, float]` | CI for ATT |
| `overall_acrt` | `float` | Overall ACRT |
| `overall_acrt_se` | `float` | SE of overall ACRT |
| `overall_acrt_t_stat` | `float` | T-statistic for ACRT |
| `overall_acrt_p_value` | `float` | P-value for ACRT |
| `overall_acrt_conf_int` | `tuple[float, float]` | CI for ACRT |
| `group_time_effects` | `dict[tuple, dict]` | Group-time level effects |
| `dose_grid` | `np.ndarray` | Evaluation grid for dose-response |
| `groups` | `list` | Treatment cohorts |
| `time_periods` | `list` | All time periods |
| `n_obs` | `int` | Number of observations |
| `n_treated_units` | `int` | Treated units |
| `n_control_units` | `int` | Control units |
| `event_study_effects` | `dict[int, dict] or None` | Event study effects (if `aggregate="eventstudy"`) |

**DoseResponseCurve** sub-dataclass:

| Attribute | Type | Description |
|-----------|------|-------------|
| `dose_grid` | `np.ndarray` | Dose values |
| `effects` | `np.ndarray` | Estimated effects at each dose |
| `se` | `np.ndarray` | Standard errors |
| `conf_int_lower` | `np.ndarray` | Lower CI bound |
| `conf_int_upper` | `np.ndarray` | Upper CI bound |
| `target` | `str` | `"att"` or `"acrt"` |

**Methods:** `summary()`, `print_summary()`, `to_dataframe()`

### HeterogeneousAdoptionDiDResults

Single-period results container for `HeterogeneousAdoptionDiD`. The table below enumerates every public dataclass field; a regression test in `tests/test_guides.py` (`test_llms_full_had_results_class_field_lists_match_real_dataclass`) compares this list against the real `dataclasses.fields()` of the result class.

| Attribute | Type | Description |
|-----------|------|-------------|
| `att` | `float` | Point estimate of the WAS parameter on the β-scale |
| `se` | `float` | Standard error on the β-scale |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value |
| `conf_int` | `tuple[float, float]` | Confidence interval |
| `alpha` | `float` | CI level used at fit time |
| `design` | `str` | Resolved design: `"continuous_at_zero"`, `"continuous_near_d_lower"`, or `"mass_point"` |
| `target_parameter` | `str` | `"WAS"` (Design 1') or `"WAS_d_lower"` (Design 1 / mass-point) |
| `d_lower` | `float` | Support infimum (`0.0` on Design 1', `min(d)` otherwise) |
| `dose_mean` | `float` | `D_bar = (1/G) * sum(D_{g,2})` |
| `n_obs` | `int` | Units contributing to estimation |
| `n_treated` | `int` | Units with `D > d_lower` |
| `n_control` | `int` | Units at or below `d_lower` |
| `n_mass_point` | `int | None` | Mass-point design only: units exactly at `d_lower`; `None` on continuous designs |
| `n_above_d_lower` | `int | None` | Mass-point design only: units strictly above `d_lower`; `None` on continuous designs |
| `inference_method` | `str` | `"analytical_nonparametric"` or `"analytical_2sls"` |
| `vcov_type` | `str | None` | Mass-point only: `"classical"`, `"hc1"`, or `"cr1"` |
| `cluster_name` | `str | None` | Cluster column name when CR1 cluster-robust SE is requested; `None` otherwise |
| `survey_metadata` | `SurveyMetadata | None` | Repo-standard survey metadata when `survey_design=` / `weights=` is supplied |
| `bandwidth_diagnostics` | `BandwidthResult | None` | MSE-DPI selector output (continuous designs); `None` on `mass_point` |
| `bias_corrected_fit` | `BiasCorrectedFit | None` | Phase 1c bias-corrected local-linear fit object (continuous designs); `None` on `mass_point` |
| `variance_formula` | `str | None` | HAD-specific SE label on weighted fits, populated on BOTH continuous and mass-point designs: `"pweight"` (continuous, CCT 2014 weighted-robust on the `weights=` shortcut), `"survey_binder_tsl"` (continuous, Binder 1983 TSL on the `survey_design=` path), `"pweight_2sls"` (mass-point + `weights=`; label is applied uniformly across vcov families — classical / HC1 / CR1 — on the weighted 2SLS path, with the actual sandwich resolved via `vcov_type`), or `"survey_binder_tsl_2sls"` (mass-point, Binder 1983 TSL on the `survey_design=` path). `None` on unweighted fits |
| `effective_dose_mean` | `float | None` | Weighted denominator used by the β̂-scale rescaling, populated on weighted fits across all designs: weighted `mean(d)` (`continuous_at_zero`), weighted `mean(d − d_lower)` (`continuous_near_d_lower`), or weighted Wald-IV dose gap `mean(d | Z=1, w) − mean(d | Z=0, w)` (`mass_point`). `None` on unweighted fits |

**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`

### HeterogeneousAdoptionDiDEventStudyResults

Per-horizon event-study results container for `HeterogeneousAdoptionDiD` with `aggregate="event_study"`. The anchor horizon `e = -1` is excluded by construction. The table below enumerates every public dataclass field; a regression test (`test_llms_full_had_results_class_field_lists_match_real_dataclass`) compares this list against the real `dataclasses.fields()`.

| Attribute | Type | Description |
|-----------|------|-------------|
| `event_times` | `np.ndarray` | Integer event-time labels `e = t - F`, sorted ascending |
| `att` | `np.ndarray` | Per-horizon WAS point estimates |
| `se` | `np.ndarray` | Per-horizon standard errors |
| `t_stat` | `np.ndarray` | Per-horizon t-statistics |
| `p_value` | `np.ndarray` | Per-horizon p-values |
| `conf_int_low` | `np.ndarray` | Pointwise CI lower bounds |
| `conf_int_high` | `np.ndarray` | Pointwise CI upper bounds |
| `n_obs_per_horizon` | `np.ndarray` | Per-horizon contributing-unit counts |
| `alpha` | `float` | CI level used at fit time |
| `design` | `str` | Shared across horizons (paper Appendix B.2 invariant) |
| `target_parameter` | `str` | Same convention as the single-period result |
| `d_lower` | `float` | Support infimum, shared across horizons |
| `dose_mean` | `float` | `D_bar` on the fit sample |
| `F` | `object` | First-treatment period label |
| `n_units` | `int` | Unique units contributing to the fit (post last-cohort filter) |
| `inference_method` | `str` | `"analytical_nonparametric"` or `"analytical_2sls"` |
| `vcov_type` | `str | None` | Mass-point only: `"classical"`, `"hc1"`, or `"cr1"`; `None` on continuous designs |
| `cluster_name` | `str | None` | Cluster column name when CR1 is requested; `None` otherwise |
| `survey_metadata` | `SurveyMetadata | None` | Populated on weighted fits |
| `bandwidth_diagnostics` | `list[BandwidthResult | None] | None` | Per-horizon MSE-DPI selector output (continuous designs); `None` on `mass_point`; entries can be `None` on degenerate horizons |
| `bias_corrected_fit` | `list[BiasCorrectedFit | None] | None` | Per-horizon Phase 1c bias-corrected local-linear fit objects; `None` on `mass_point`; entries can be `None` on degenerate horizons |
| `filter_info` | `dict | None` | Staggered last-cohort auto-filter metadata (`F_last`, `n_kept`, `n_dropped`, `dropped_cohorts`); `None` when no filter applied |
| `variance_formula` | `str | None` | HAD-specific SE label applied UNIFORMLY across all horizons, populated on BOTH continuous and mass-point designs: `"pweight"` (continuous, CCT 2014 weighted-robust on the `weights=` shortcut), `"survey_binder_tsl"` (continuous, Binder 1983 TSL on the `survey_design=` path), `"pweight_2sls"` (mass-point + `weights=`; label applied uniformly across vcov families — classical / HC1 / CR1 — on the weighted 2SLS path, with the actual sandwich resolved via `vcov_type`), or `"survey_binder_tsl_2sls"` (mass-point, Binder 1983 TSL on the `survey_design=` path). `None` on unweighted fits |
| `effective_dose_mean` | `float | None` | Weighted denominator used by the β̂-scale rescaling, populated on weighted fits across all designs: weighted `sum(w·d)/sum(w)` (`continuous_at_zero`), weighted `sum(w·(d − d_lower))/sum(w)` (`continuous_near_d_lower`), or weighted Wald-IV dose gap (`mass_point`). Scalar (not per-horizon) because the β̂-scale denominator is computed once on the fit sample. `None` on unweighted fits |
| `cband_low` | `np.ndarray | None` | Simultaneous (sup-t) band lower bounds; `None` on unweighted fits or when `cband=False` |
| `cband_high` | `np.ndarray | None` | Simultaneous (sup-t) band upper bounds |
| `cband_crit_value` | `float | None` | Sup-t critical value used for the simultaneous band |
| `cband_method` | `str | None` | `"multiplier_bootstrap"` when populated |
| `cband_n_bootstrap` | `int | None` | Bootstrap iterations used for the band |

**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`

### TROPResults

| Attribute | Type | Description |
|-----------|------|-------------|
| `att` | `float` | ATT estimate |
| `se` | `float` | Bootstrap standard error |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value |
| `conf_int` | `tuple[float, float]` | CI |
| `n_obs` | `int` | Number of observations |
| `n_treated` | `int` | Number of treated units |
| `n_control` | `int` | Number of control units |
| `n_treated_obs` | `int` | Number of treated unit-time observations |
| `lambda_time` | `float` | Selected time decay parameter |
| `lambda_unit` | `float` | Selected unit decay parameter |
| `lambda_nn` | `float` | Selected nuclear norm parameter |
| `n_bootstrap` | `int` | Number of bootstrap replications |

**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`

## Diagnostics

### Placebo Tests

```python
from diff_diff import (
    run_placebo_test,
    placebo_timing_test,
    placebo_group_test,
    permutation_test,
    leave_one_out_test,
    run_all_placebo_tests,
)

# Unified interface
results = run_placebo_test(
    data, outcome='y', treatment='treated', time='period',
    test_type='fake_timing',           # "fake_timing", "fake_group", "permutation", "leave_one_out"
    fake_treatment_period=1,           # For fake_timing
    post_periods=[3, 4, 5],
)

# Run all tests at once
all_results = run_all_placebo_tests(
    data, outcome='y', treatment='treated', time='period', unit='unit_id',
    pre_periods=[0, 1, 2], post_periods=[3, 4, 5],
    n_permutations=500, seed=42,
)
```

**Individual test functions:**

```python
# Fake timing test
placebo_timing_test(data, outcome, treatment, time,
                    fake_treatment_period, post_periods=None, alpha=0.05)

# Fake group test
placebo_group_test(data, outcome, time, unit,
                   fake_treated_units, post_periods=None, alpha=0.05)

# Permutation test
permutation_test(data, outcome, treatment, time, unit,
                 n_permutations=1000, alpha=0.05, seed=None)

# Leave-one-out test
leave_one_out_test(data, outcome, treatment, time, unit, alpha=0.05)
```

All return `PlaceboTestResults` with attributes: `test_type`, `placebo_effect`, `se`, `t_stat`, `p_value`, `conf_int`, `n_obs`, `is_significant`.

### Parallel Trends Testing

```python
from diff_diff import check_parallel_trends, check_parallel_trends_robust, equivalence_test_trends

# Simple trend comparison
result = check_parallel_trends(
    data, outcome='y', time='period', treatment_group='treated',
    pre_periods=[0, 1, 2],
)

# Distributional comparison (Wasserstein distance + permutation inference)
result = check_parallel_trends_robust(
    data, outcome='y', time='period', treatment_group='treated',
    unit='unit_id', pre_periods=[0, 1, 2],
    n_permutations=1000, seed=42,
)

# TOST equivalence test
result = equivalence_test_trends(
    data, outcome='y', time='period', treatment_group='treated',
    unit='unit_id', pre_periods=[0, 1, 2],
    equivalence_margin=0.5,
)
```

### Wild Cluster Bootstrap

```python
from diff_diff import wild_bootstrap_se, WildBootstrapResults

# Directly via estimator
did = DifferenceInDifferences(inference="wild_bootstrap", n_bootstrap=999,
                              bootstrap_weights="webb", cluster="state")
results = did.fit(data, outcome='y', treatment='treated', time='post')
```

## HAD Pretests

Diagnostic pretests for the `HeterogeneousAdoptionDiD` identifying assumptions (de Chaisemartin, Ciccia, D'Haultfœuille & Knau 2026). The composite workflow `did_had_pretest_workflow` is the recommended entry point — call it before reporting WAS as causal. The workflow follows paper Section 4.2's three-step battery: **step 1** is the QUG support-infimum test (decides whether Design 1' or Design 1 applies); **step 2** is the Assumption 7 pre-trends test (joint Stute on the event-study path; explicitly NOT covered on the overall path because a single-pre-period panel cannot support the joint variant); **step 3** is the Assumption 8 linearity test (`stute_test` or `yatchew_hr_test`). On the default `aggregate="overall"` path the workflow runs steps 1 + 3 only and the returned `verdict` flags the Assumption 7 gap; pass `aggregate="event_study"` on a multi-period panel **with at least one earlier placebo pre-period beyond the base `F-1`** to close that gap. With only the base `F-1` pre-period available (minimal 3-period event-study, or 4-period under `trends_lin=True` where the consumed `F-2` placebo is dropped), the workflow still sets `pretrends_joint=None`, `all_pass=False`, and appends `joint pre-trends skipped (no earlier pre-period)` to the verdict — step 2 stays uncovered.

```python
from diff_diff import (
    did_had_pretest_workflow,
    qug_test, stute_test, yatchew_hr_test,
    stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test,
)

# Composite workflow:
#   aggregate="overall"      -> steps 1 + 3 (QUG + Assumption 8 linearity)
#                               step 2 (Assumption 7 pre-trends) NOT covered;
#                               verdict explicitly flags this gap.
#   aggregate="event_study"  -> steps 1 + 2 + 3 (QUG + joint Stute pre-trends +
#                               joint homogeneity-linearity Stute) on multi-period panels.
report = did_had_pretest_workflow(
    data, outcome_col='y', unit_col='unit', time_col='t',
    dose_col='d', first_treat_col='first_treat',
    aggregate='overall',
    survey_design=None)   # SurveyDesign for survey-aware pretests (Phase 4.5 C)
print(report.summary())
print(report.all_pass, report.verdict)
```

Individual tests:

- `qug_test(d)` — paper Theorem 4 support-infimum test (`H_0: d_lower = 0`; the QUG decides whether Design 1' or Design 1 applies in step 1 of the workflow). Extreme order statistics, Exp(1)/Exp(1) limit law. The QUG itself does NOT test Assumption 5 (which is the Design 1 sign-identification condition and is not testable via pre-trends per registry). **Permanently rejects** non-`None` `survey_design=` / `weights=` (`NotImplementedError`) per Phase 4.5 C0 deferral — extreme-value functionals are not smooth in the empirical CDF, so standard survey machinery does not yield a calibrated test.
- `stute_test(d, dy)` — Assumption 8 linearity of `E[ΔY|D]` (paper Section 4.2 step 3) via Stute Cramér-von Mises functional with Mammen wild bootstrap. Survey-aware via PSU-level Mammen multiplier bootstrap.
- `yatchew_hr_test(d, dy, *, null="linearity")` — Assumption 8 linearity of `E[ΔY|D]` (alternative test for step 3) via Yatchew (1997) heteroskedasticity-robust variance-ratio test. The `null="mean_independence"` mode (R `YatchewTest::yatchew_test(order=0)`) is also exposed for placebo-style mean-independence testing. Survey-aware via closed-form weighted variance components (no bootstrap).
- `stute_joint_pretest(residuals_dict, d)` — joint Cramér-von Mises across K horizons with shared-η Mammen wild bootstrap (Delgado-Manteiga 2001 / Hlávka-Hušková 2020). Residuals-in core; the two data-in wrappers below construct residuals for the two paper-spelled nulls.
- `joint_pretrends_test(...)` — Assumption 7 joint pre-trends on K pre-periods (paper Section 4.2 step 2 closure on the event-study path).
- `joint_homogeneity_test(...)` — joint linearity-and-homogeneity on K post-periods (event-study step 3 alternative).

The QUG-under-survey deferral is permanent; the linearity-family pretests support `survey_design=` (pweight, PSU, FPC) per Phase 4.5 C. Stratified designs and replicate-weight designs are deferred to follow-up PRs.

## Honest DiD Sensitivity Analysis

Rambachan & Roth (2023) robust inference allowing bounded parallel trends violations.

### Delta Restriction Classes

```python
from diff_diff import DeltaSD, DeltaRM, DeltaSDRM

# Smoothness: bounds on second differences
delta_sd = DeltaSD(M=0.5)

# Relative magnitudes: post violations <= Mbar * max pre violation
delta_rm = DeltaRM(Mbar=1.0)

# Combined restriction
delta_sdrm = DeltaSDRM(M=0.5, Mbar=1.0)
```

### HonestDiD Class

```python
from diff_diff import HonestDiD

honest = HonestDiD(
    method="relative_magnitude",     # "smoothness", "relative_magnitude", or "combined"
    M=1.0,                           # Restriction parameter
    alpha=0.05,
    l_vec=None,                      # Weighting vector (None = uniform)
)

# Fit to event study results
bounds = honest.fit(event_study_results)
print(bounds.summary())

# Sensitivity analysis over M grid
sensitivity = honest.sensitivity_analysis(
    event_study_results,
    M_grid=[0, 0.5, 1.0, 1.5, 2.0],
)
sensitivity.plot()
```

### Convenience Functions

```python
from diff_diff import compute_honest_did, sensitivity_plot

bounds = compute_honest_did(results, method="relative_magnitude", M=1.0, alpha=0.05)
sensitivity_plot(results, method="relative_magnitude", M_grid=[0, 0.5, 1, 1.5, 2])
```

### HonestDiDResults

| Attribute | Type | Description |
|-----------|------|-------------|
| `lb` | `float` | Lower bound of identified set |
| `ub` | `float` | Upper bound of identified set |
| `ci_lb` | `float` | Lower bound of robust CI |
| `ci_ub` | `float` | Upper bound of robust CI |
| `M` | `float` | Restriction parameter value |
| `method` | `str` | Restriction type |
| `original_estimate` | `float` | Original point estimate |
| `original_se` | `float` | Original SE |
| `ci_method` | `str` | "FLCI" or "C-LF" |
| `event_study_bounds` | `dict` | Per-period bounds (optional) |

**Properties:** `is_significant` (CI excludes zero)

## Power Analysis

```python
from diff_diff import PowerAnalysis, compute_mde, compute_power, compute_sample_size, simulate_power

# Class-based interface
pa = PowerAnalysis(alpha=0.05, power=0.80, alternative='two-sided')
mde_result = pa.mde(n_treated=50, n_control=50, sigma=1.0)
sample_result = pa.sample_size(effect_size=0.5, sigma=1.0)
power_result = pa.power(effect_size=0.5, n_treated=50, n_control=50, sigma=1.0)

# Convenience functions
mde_result = compute_mde(n_treated=50, n_control=50, sigma=1.0)
power_result = compute_power(effect_size=0.5, n_treated=50, n_control=50, sigma=1.0)
sample_result = compute_sample_size(effect_size=0.5, sigma=1.0)

# Simulation-based power
sim_result = simulate_power(
    n_units=200, n_periods=8, treatment_period=4,
    effect_sizes=[0.1, 0.5, 1.0, 2.0],
    n_simulations=500, seed=42,
)
```

## Pre-Trends Power Analysis

```python
from diff_diff import PreTrendsPower, compute_pretrends_power, compute_mdv

# Class-based
ptp = PreTrendsPower()
results = ptp.compute(event_study_results, M_grid=[0, 0.5, 1.0, 2.0])

# Convenience functions
results = compute_pretrends_power(event_study_results, M_grid=[0, 0.5, 1.0, 2.0])
mdv = compute_mdv(event_study_results, target_power=0.80)
```

## Visualization

All plotting functions return a matplotlib `Figure` object.

### plot_event_study

```python
from diff_diff import plot_event_study

plot_event_study(
    results,                           # MultiPeriodDiDResults, CS, SA, BJS, Gardner, Stacked, or DataFrame
    effects=None,                      # Manual dict of effects (alternative to results)
    se=None,                           # Manual dict of SEs
    periods=None,
    reference_period=None,
    pre_periods=None,
    post_periods=None,
    alpha=0.05,
    figsize=(10, 6),
    title="Event Study",
    xlabel="Period Relative to Treatment",
    ylabel="Treatment Effect",
    color="#2563eb",
    show_zero_line=True,
    show_reference_line=True,
    shade_pre=True,
    ax=None,
    show=True,
    use_cband=True,                    # Use simultaneous confidence bands if available
)
```

### plot_group_effects

```python
from diff_diff import plot_group_effects

plot_group_effects(
    results,                           # CallawaySantAnnaResults
    groups=None,
    figsize=(10, 6),
    title="Treatment Effects by Cohort",
    alpha=0.05,
    show=True,
    ax=None,
)
```

### plot_sensitivity

```python
from diff_diff import plot_sensitivity

plot_sensitivity(
    sensitivity_results,               # SensitivityResults from HonestDiD
    show_bounds=True,
    show_ci=True,
    breakdown_line=True,
    figsize=(10, 6),
    title="Honest DiD Sensitivity Analysis",
    ax=None,
    show=True,
)
```

### plot_honest_event_study

```python
from diff_diff import plot_honest_event_study

plot_honest_event_study(
    honest_results,                    # HonestDiDResults with event_study_bounds
    periods=None,
    reference_period=None,
    figsize=(10, 6),
    title="Event Study with Honest Confidence Intervals",
    ax=None,
    show=True,
)
```

### plot_bacon

```python
from diff_diff import plot_bacon

plot_bacon(
    results,                           # BaconDecompositionResults
    plot_type="scatter",               # "scatter" or "bar"
    figsize=(10, 6),
    show_weighted_avg=True,
    show_twfe_line=True,
    ax=None,
    show=True,
)
```

### plot_power_curve

```python
from diff_diff import plot_power_curve

plot_power_curve(
    results=None,                      # PowerResults, SimulationPowerResults, or DataFrame
    effect_sizes=None,
    powers=None,
    mde=None,
    target_power=0.80,
    plot_type="effect",                # "effect" or "sample_size"
    figsize=(10, 6),
    show_mde_line=True,
    show_target_line=True,
    ax=None,
    show=True,
)
```

### plot_pretrends_power

```python
from diff_diff import plot_pretrends_power

plot_pretrends_power(
    results=None,                      # PreTrendsPowerResults or PreTrendsPowerCurve
    M_values=None,
    powers=None,
    mdv=None,
    target_power=0.80,
    figsize=(10, 6),
    ax=None,
    show=True,
)
```

## Data Preparation Utilities

### Data Manipulation

```python
from diff_diff import (
    make_treatment_indicator,
    make_post_indicator,
    wide_to_long,
    balance_panel,
    validate_did_data,
    summarize_did_data,
    create_event_time,
    aggregate_to_cohorts,
    rank_control_units,
)

# Create binary treatment indicator
df = make_treatment_indicator(data, column='group', treated_values='A', new_column='treated')
df = make_treatment_indicator(data, column='size', threshold=75, new_column='treated')

# Create binary post indicator
df = make_post_indicator(data, time_column='year', treatment_start=2020, new_column='post')
df = make_post_indicator(data, time_column='year', post_periods=[2020, 2021])

# Reshape wide to long
long_df = wide_to_long(data, value_columns=['y2018', 'y2019', 'y2020'],
                       id_column='unit', time_name='year', value_name='outcome')

# Balance panel (keep only units observed in all periods)
balanced_df = balance_panel(data, unit='unit', time='period')

# Validate DiD data
validation = validate_did_data(data, outcome='y', treatment='treated',
                               time='period', unit='unit')

# Summarize DiD data
summary = summarize_did_data(data, outcome='y', treatment='treated',
                             time='period', unit='unit')

# Create event time column
df = create_event_time(data, time='period', first_treat='first_treat', new_column='event_time')

# Aggregate to cohort level
cohort_df = aggregate_to_cohorts(data, outcome='y', unit='unit', time='period',
                                 first_treat='first_treat')

# Rank control units by similarity to treated
ranking = rank_control_units(data, outcome='y', unit='unit', time='period',
                             treatment='treated')
```

### Data Generation

```python
from diff_diff import (
    generate_did_data,
    generate_staggered_data,
    generate_panel_data,
    generate_event_study_data,
    generate_factor_data,
    generate_ddd_data,
    generate_continuous_did_data,
)

# Basic 2x2 DiD data
data = generate_did_data(n_units=100, n_periods=4, treatment_effect=5.0,
                         treatment_fraction=0.5, treatment_period=2, seed=42)

# Staggered adoption data
data = generate_staggered_data(n_units=100, n_periods=10,
                               treatment_effect=2.0, dynamic_effects=True,
                               never_treated_frac=0.3, seed=42)

# Panel data with optional trend violations
data = generate_panel_data(n_units=100, n_periods=8, treatment_period=4,
                           parallel_trends=True, seed=42)

# Event study data
data = generate_event_study_data(n_units=300, n_pre=5, n_post=5,
                                 treatment_effect=5.0, seed=42)

# Factor model data (for TROP)
data = generate_factor_data(n_units=50, n_pre=10, n_post=5,
                            n_treated=10, n_factors=2, seed=42)

# Triple difference data
data = generate_ddd_data(n_per_cell=100, treatment_effect=2.0, seed=42)

# Continuous dose data
data = generate_continuous_did_data(n_units=500, n_periods=4,
                                    att_function="linear", att_slope=2.0, seed=42)
```

## Real-World Datasets

```python
from diff_diff import load_card_krueger, load_castle_doctrine, load_divorce_laws, load_mpdta
from diff_diff import load_dataset, list_datasets, clear_cache

# List available datasets
for name, desc in list_datasets().items():
    print(f"{name}: {desc}")

# Load by name
data = load_dataset("card_krueger")

# Named loaders
ck = load_card_krueger()          # Card & Krueger (1994) minimum wage
castle = load_castle_doctrine()    # Castle Doctrine / Stand Your Ground laws
divorce = load_divorce_laws()      # Unilateral divorce laws (staggered)
mpdta = load_mpdta()              # Minimum wage panel (simulated, from R did package)

# Force re-download
data = load_card_krueger(force_download=True)

# Clear local cache
clear_cache()
```

## Survey Support

All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation.

```python
from diff_diff import SurveyDesign, CallawaySantAnna

# Create survey design with strata, PSU, FPC
sd = SurveyDesign(
    weights='weight',           # Sampling weight column
    strata='stratum',           # Stratification variable
    psu='psu',                  # Primary Sampling Unit (cluster)
    fpc='fpc',                  # Finite Population Correction
    weight_type='pweight',      # "pweight", "fweight", or "aweight"
    nest=False,                 # PSU IDs nested within strata?
    lonely_psu='remove',        # "remove", "certainty", or "adjust"
)

# Fit with survey design
cs = CallawaySantAnna(estimation_method='dr')
results = cs.fit(data, outcome='y', unit='id', time='t',
                 first_treat='ft', survey_design=sd)

# Survey metadata on results
print(results.survey_metadata.design_effect)  # Design effect
print(results.survey_metadata.effective_n)  # Effective sample size
print(results.survey_metadata.df_survey)    # Survey degrees of freedom
```

**Replicate weight designs:**

```python
sd = SurveyDesign(
    weights='weight',
    replicate_weights=['rep_0', 'rep_1', ..., 'rep_79'],  # Column names
    replicate_method='SDR',     # "BRR", "Fay", "JK1", "JKn", or "SDR"
)
```

**Subpopulation analysis:**

```python
sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F')
```

**Key features:**
- Taylor Series Linearization (TSL) variance with strata + PSU + FPC
- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH)
- Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #355): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo (stratified permutation + weighted FW) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) paths also support pweight-only and full strata/PSU/FPC designs
- DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`)
- Repeated cross-sections: `CallawaySantAnna(panel=False)`
- Compatibility matrix: see `docs/choosing_estimator.rst` Survey Design Support section

No R or Python package offers design-based variance estimation for modern heterogeneity-robust DiD estimators.

## Linear Algebra Helpers

```python
from diff_diff import LinearRegression, InferenceResult

# Low-level regression helper
reg = LinearRegression(
    include_intercept=True,
    robust=True,
    cluster_ids=cluster_array,
)
reg.fit(X, y)
inference = reg.get_inference(coef_index)  # -> InferenceResult
```

### InferenceResult

| Attribute | Type | Description |
|-----------|------|-------------|
| `coefficient` | `float` | Point estimate |
| `se` | `float` | Standard error |
| `t_stat` | `float` | T-statistic |
| `p_value` | `float` | P-value |
| `conf_int` | `tuple[float, float]` | Confidence interval |

### Conley Spatial HAC Standard Errors

Conley (1999) spatial heteroskedasticity-and-autocorrelation-consistent standard
errors. Use when residuals are spatially (and optionally temporally) correlated
(geo experiments, regional shocks, common-supplier effects). Two operating
modes:

- **Cross-sectional:** Direct `compute_robust_vcov` / `LinearRegression` on
  a single-period design.
- **Panel block-decomposed** (matches R `conleyreg` with `lag_cutoff > 0`):
  `DifferenceInDifferences` / `MultiPeriodDiD` / `TwoWayFixedEffects` with
  `vcov_type="conley"` and `conley_lag_cutoff=<int>`. The sandwich sums
  within-period spatial pairs plus within-unit Bartlett serial pairs
  (excluding lag=0 to avoid double-counting); NOT a multiplicative
  product kernel.

`DifferenceInDifferences(vcov_type="conley").fit(..., unit="<col>")` is
supported (Wave A #118). `unit` is a fit-time kwarg (NOT on `__init__`;
unused unless Conley is set; not part of `get_params()` / `set_params()`)
mirroring `MultiPeriodDiD.fit(unit=...)` / `TwoWayFixedEffects.fit(unit=...)`.

```python
import numpy as np
from diff_diff.linalg import LinearRegression
from diff_diff import DifferenceInDifferences, MultiPeriodDiD, TwoWayFixedEffects

# Cross-sectional design: 1 row per unit, n × 2 lat/lon coords.
reg = LinearRegression(
    vcov_type="conley",
    include_intercept=True,
    conley_coords=coords,                # n × 2 array of (lat, lon) in degrees
    conley_cutoff_km=200.0,              # required; no default
    conley_metric="haversine",           # or "euclidean", or callable
    conley_kernel="bartlett",            # or "uniform"
).fit(X, y)
se = np.sqrt(np.diag(reg.vcov_))

# Panel design: TWFE with within-unit Bartlett serial HAC.
# TWFE's `time` column is intrinsically a binary post indicator
# (treatment * time interaction); only numeric encodings are supported on
# this surface. `conley_lag_cutoff=1` includes the cross-period pair under
# the Bartlett taper. For multi-period panels, use MultiPeriodDiD.
res = TwoWayFixedEffects(
    vcov_type="conley",
    conley_coords=("lat", "lon"),        # column names on `data`
    conley_cutoff_km=500.0,
    conley_lag_cutoff=1,                 # within-unit Bartlett, lag 1 panel period
).fit(data, outcome="y", treatment="treated", time="post", unit="unit_id")

# Panel design: MultiPeriodDiD with multi-period time.
# MultiPeriodDiD builds period dummies (NOT a treated * time product), so
# `time` can be any orderable encoding — int years (2020, 2021, ...),
# YYYYMM (202012, 202101, ...), datetime64, pd.Period, strings.
# `_compute_conley_vcov` normalizes time to dense codes 0..T-1 internally,
# so `conley_lag_cutoff` always counts panel periods regardless of label.
mp_res = MultiPeriodDiD(
    vcov_type="conley",
    conley_coords=("lat", "lon"),
    conley_cutoff_km=200.0,
    conley_lag_cutoff=2,                 # within-unit Bartlett up to 2 panel periods
).fit(data, outcome="y", treatment="treated", time="period",
      post_periods=[2, 3], unit="unit_id")

# 2-period DiD on a panel: DiD.fit(unit="<col>") opts into the Conley
# panel block-decomposed sandwich; on a 2-period design the ATT/SE match
# MultiPeriodDiD(...).fit(..., post_periods=[1], reference_period=0) bit-exactly.
did_res = DifferenceInDifferences(
    vcov_type="conley",
    conley_coords=("lat", "lon"),
    conley_cutoff_km=200.0,
    conley_lag_cutoff=1,
).fit(data, outcome="y", treatment="treated", time="post", unit="unit_id")

# Combined spatial + cluster product kernel: pass cluster=<col> alongside
# Conley to apply K_total[i,j] = K_space(d_ij/h) · 1{c_i = c_j}. On the
# panel path the cluster must be constant within each unit across periods
# (e.g. an above-unit grouping like region); time-varying cluster raises
# ValueError. TWFE's default auto-cluster on the Conley path is silently
# dropped — users opt into the combined kernel explicitly.
combined_res = TwoWayFixedEffects(
    vcov_type="conley",
    cluster="region",                    # above-unit grouping; time-invariant within unit
    conley_coords=("lat", "lon"),
    conley_cutoff_km=500.0,
    conley_lag_cutoff=1,
).fit(data, outcome="y", treatment="treated", time="post", unit="unit_id")
```

**Note on `conley_lag_cutoff` semantics:** the lag is counted in **panel
periods** (number of distinct sorted values in the `time` column), NOT in
raw-label differences. Internally, time labels are normalized to dense codes
`0..T-1` via `np.unique(return_inverse=True)`. For example, MultiPeriodDiD
with `time` values `(2020, 2021, 2022)` and `(202012, 202101, 202102)` both
produce the same codes `(0, 1, 2)` and the same lag matrix. **TWFE caveat:**
the time-label normalization runs inside `_compute_conley_vcov`, but TWFE's
own design step `_treatment_post = treated * time` requires numeric `time`;
non-numeric labels (datetime64, pd.Period, strings) are TWFE-incompatible
end-to-end. Use MultiPeriodDiD if you need datetime/Period/string time
labels. **Deviation from R `conleyreg`:** R uses raw `time` values directly
in the lag computation, which silently mishandles non-dense encodings.
diff-diff is the more robust default; for bit-exact R parity, pass `time`
as a dense integer index.

**Variance estimator — cross-sectional:**

    Var̂(β) = (X'X)^{-1} · ( Σ_{i,j} K(d_ij / h) · X_i ε_i ε_j X_j' ) · (X'X)^{-1}

**Variance estimator — panel block-decomposed (matches R `conleyreg`):**

    XeeX_spatial = Σ_t  Σ_{i,j∈units}    K_space(d_ij/h)             · X_{i,t} ε_{i,t} ε_{j,t} X_{j,t}'
    XeeX_serial  = Σ_u  Σ_{|t-s|≤L,t≠s}  (1 - |t-s|/(L+1))           · X_{u,t} ε_{u,t} ε_{u,s} X_{u,s}'
    Var̂(β)       = (X'X)^{-1} · ( XeeX_spatial + XeeX_serial ) · (X'X)^{-1}

The temporal kernel is hardcoded Bartlett-style regardless of `conley_kernel`
(matches `conleyreg::time_dist.cpp`).

**Kernels:**
- `"bartlett"` (default): `K(u) = max(0, 1 - |u|)` on pairwise distance `d_ij/h`. The radial 1-D form, matching R `conleyreg` / Stata `acreg`. Conley 1999's explicit PSD-guaranteed Bartlett formula (Eq 3.14) is the 2-D **separable product** window on a lattice; the 1-D radial specialization that diff-diff implements is a practitioner convention and is not formally PSD-guaranteed.
- `"uniform"`: `K(u) = 1{|u| ≤ 1}`. Easier to interpret.

Both kernels: `UserWarning` is emitted if the resulting meat has a materially negative eigenvalue (< -1e-12) — neither kernel is formally PSD-guaranteed in the radial 1-D pairwise-distance form.

**Distance metrics:**
- `"haversine"` (default): great-circle in km, Earth's mean radius 6371.01 km (matching R `conleyreg`). Validates `lat ∈ [-90, 90]`, `lon ∈ [-180, 180]`.
- `"euclidean"`: from projected coordinates; user owns the units.
- `callable(coords1, coords2) -> n×n array`: custom distance for non-geographic networks.

**No default bandwidth.** `conley_cutoff_km` is required. Conley (1999) Section 5
recommends a sensitivity grid (e.g., 50, 100, 200, 500 km) and reporting the
SE range.

**Restrictions in this release:**
- `DifferenceInDifferences` / `MultiPeriodDiD` / `TwoWayFixedEffects` `+ vcov_type="conley"` without `conley_lag_cutoff` → `ValueError` (no defensible default; explicit user choice required).
- `DifferenceInDifferences` / `MultiPeriodDiD` `+ vcov_type="conley"` without `unit=` at fit-time → `ValueError`.
- Combining `vcov_type="conley"` with explicit `cluster=<col>` applies the combined spatial + cluster product kernel (Wave A #119). On the panel path the cluster must be constant within each unit across periods (validator raises `ValueError` otherwise). TWFE's default auto-cluster on the Conley path is silently dropped; users opt into the combined kernel explicitly.
- `DifferenceInDifferences` / `MultiPeriodDiD` / `TwoWayFixedEffects` `(vcov_type="conley", inference="wild_bootstrap")` → `NotImplementedError` (wild bootstrap does not consume the analytical sandwich).
- `DifferenceInDifferences` / `MultiPeriodDiD` / `TwoWayFixedEffects` `(vcov_type="conley")` + `survey_design=` → `NotImplementedError` (Bertanha-Imbens 2014 follow-up).
- `SyntheticDiD(vcov_type="conley")` → `TypeError` (uses bootstrap, not analytical sandwich).
- Any-mode `vcov_type="conley"` `+ weights=` / `survey_design=` on `LinearRegression` / `compute_robust_vcov` → `NotImplementedError` (Bertanha-Imbens 2014 weighted-Conley deferred to follow-up PR).
- A sparse k-d-tree fast path auto-activates for `n > 5_000` with `conley_kernel="bartlett"` AND `conley_metric` in `{"haversine", "euclidean"}` (Wave A #120); callable metrics and uniform kernel fall back to the dense path. `n > 20_000` with the dense fallback still emits a memory-OOM `UserWarning`.
- Callable `conley_metric` is validated at the boundary (shape `(n, n)`, finite, non-negative, symmetric to `atol=1e-10`, AND zero on the diagonal `|d(i, i)| ≤ 1e-10` so the kernel reduces to `K(0) = 1` on the HC0 contribution); each failure raises a targeted `ValueError` (Wave A #123).

**Parity:** matches R `conleyreg` (Düsterhöft 2021, CRAN v0.1.9) to ≤ 1e-6
on six benchmark fixtures in
`benchmarks/data/r_conleyreg_conley_golden.json`: three cross-sectional and
three panel fixtures with `lag_cutoff > 0` (`panel_haversine_lag1`,
`panel_haversine_lag2`, `panel_lat_lon_realistic_lag1`). Earth radius
6371.01 km matches conleyreg.

## Rust Backend

diff-diff includes an optional Rust backend for performance-critical operations.

```python
from diff_diff import HAS_RUST_BACKEND

if HAS_RUST_BACKEND:
    print("Rust backend available - computations will be faster")
```

The Rust backend accelerates: OLS solving, robust VCV computation, bootstrap weight generation, synthetic control weights, and simplex projection. It is used transparently when available. Force backend selection via environment variables:

```bash
DIFF_DIFF_BACKEND=python pytest   # Force pure Python
DIFF_DIFF_BACKEND=rust pytest     # Force Rust (fail if unavailable)
```

## Choosing an Estimator

| Scenario | Recommended Estimator |
|----------|----------------------|
| Classic 2x2 design (one treated group, one time split) | `DifferenceInDifferences` |
| Panel data with unit + time FE | `TwoWayFixedEffects` |
| Event study with multiple periods | `MultiPeriodDiD` |
| Staggered treatment timing | `CallawaySantAnna`, `ImputationDiD`, or `SunAbraham` |
| Few treated units / synthetic control | `SyntheticDiD` |
| Interactive fixed effects / factor confounding | `TROP` |
| Continuous treatment intensity, per-dose ATT(d) / ACRT(d) (requires never-treated controls) | `ContinuousDiD` |
| Continuous treatment intensity, WAS at dose support boundary (compatible with universal rollout or small never-treated share) | `HeterogeneousAdoptionDiD` |
| Two-criterion treatment, simultaneous (2x2x2 DDD) | `TripleDifference` |
| Two-criterion treatment, staggered timing + eligibility | `StaggeredTripleDifference` |
| Nonlinear outcome (binary/count) with staggered timing | `WooldridgeDiD` |
| Diagnosing TWFE bias | `BaconDecomposition` |
| Efficiency-optimal estimation | `EfficientDiD` |
| Corrective weighting for stacked regressions | `StackedDiD` |
| Robustness to parallel trends violations | `HonestDiD` |

## BusinessReport

Plain-English stakeholder narrative from any of the 16 fitted result types.
Renders `summary()` (short paragraph), `full_report()` (multi-section
markdown), and `to_dict()` (stable AI-legible schema — single source of
truth; prose renders from the dict).

```python
from diff_diff import BusinessReport

report = BusinessReport(
    results,
    outcome_label="Revenue per user",
    outcome_unit="$",  # "$" / "%" / "pp" / "log_points" / "count" recognized
    outcome_direction="higher_is_better",
    business_question="Did the campaign lift revenue?",
    treatment_label="the campaign",
    alpha=0.05,  # single knob: drives both CI level and phrasing threshold
    auto_diagnostics=True,  # default; auto-constructs DiagnosticReport
)

print(report.summary())       # 6-10 sentence paragraph
print(report.full_report())   # structured markdown
report.to_dict()              # AI-legible schema; stable top-level keys
```

Constructor rejects `BaconDecompositionResults` with a helpful TypeError
(Bacon is a diagnostic, not an estimator; wrap the underlying estimator
and pass the Bacon object to `DiagnosticReport(precomputed={'bacon': ...})`).

Schema top-level keys (all always present; missing content uses a
`{"status": "skipped", "reason": "..."}` shape rather than being absent):

- `schema_version`, `estimator`, `context`
- `headline`, `target_parameter`, `assumption`, `pre_trends`, `sensitivity`
- `sample`, `heterogeneity`, `robustness`, `diagnostics`
- `next_steps`, `caveats`, `references`

`target_parameter` (experimental) names what the headline scalar
represents for each estimator — overall ATT, DID_M, DID_1, cost-
benefit delta, dose-response aggregate, ASF-based ETWFE, etc.
Fields: `name` (short stakeholder label), `definition` (full prose),
`aggregation` (machine-readable dispatch tag, e.g., `"simple"`,
`"event_study"`, `"delta"`, `"no_scalar_headline"`),
`headline_attribute` (which raw attribute holds the scalar, or
`None` when no scalar exists by design — e.g., dCDH with
`trends_linear=True` and `L_max>=2`), `reference` (citation).

Status enum values: `ran | skipped | error | not_applicable | not_run | computed`.
`headline.status` also supports `"no_scalar_by_design"` on the dCDH
no-scalar branch.

## DiagnosticReport

Unified diagnostic runner orchestrating `check_parallel_trends`,
`compute_pretrends_power`, `HonestDiD.sensitivity`, `bacon_decompose`,
plus estimator-native surfaces for SyntheticDiD (`pre_treatment_fit`,
`get_weight_concentration`, `in_time_placebo`, `sensitivity_to_zeta_omega`)
and TROP (factor-model metrics). EfficientDiD PT uses the native
`hausman_pretest`. The `design_effect` section is read-only: it
echoes `survey_metadata.design_effect` / `effective_n` from the
fitted result along with a plain-English band label. The
`epv` section is similarly read-only, reporting from
`results.epv_diagnostics` plus `results.epv_threshold`.

```python
from diff_diff import DiagnosticReport

dr = DiagnosticReport(
    results,
    data=df,  # optional; needed for 2x2 PT, Bacon-from-scratch
    outcome="outcome",
    unit="unit",
    time="period",
    first_treat="first_treat",
    alpha=0.05,
    # Opt-outs (all default True except placebo)
    run_parallel_trends=True,
    run_sensitivity=True,
    run_placebo=False,          # opt-in; not implemented in MVP
    run_bacon=True,
    run_design_effect=True,
    run_heterogeneity=True,
    run_epv=True,
    run_pretrends_power=True,   # drives power-aware PT phrasing
    sensitivity_M_grid=(0.5, 1.0, 1.5, 2.0),
    sensitivity_method="relative_magnitude",
    # Escape hatch for users who ran a diagnostic with custom args:
    precomputed={"sensitivity": my_honest_did_results},
)

dr.run_all()             # triggers compute, caches
print(dr.summary())      # overall-interpretation paragraph
dr.to_dict()             # AI-legible schema
dr.to_dataframe()        # one row per check
dr.applicable_checks     # tuple of checks that will run for this estimator
dr.skipped_checks        # dict of {check: plain-English reason}
```

Schema top-level keys: `schema_version, estimator, headline_metric,
target_parameter, parallel_trends, pretrends_power, sensitivity,
placebo, bacon, design_effect, heterogeneity, epv,
estimator_native_diagnostics, skipped, warnings,
overall_interpretation, next_steps`. The `target_parameter` block
mirrors BR's (same shape and dispatch); see BR's section above for
field semantics including the `headline_attribute=None` /
`aggregation="no_scalar_headline"` case for dCDH
`trends_linear=True, L_max>=2` fits.

### Verdicts and tiers

Pre-trends verdict (three bins, documented in `docs/methodology/REPORTING.md`):

- `joint_p >= 0.30` -> `no_detected_violation`
- `0.05 <= joint_p < 0.30` -> `some_evidence_against`
- `joint_p < 0.05` -> `clear_violation`

Power tier (drives BR phrasing for the `no_detected_violation` verdict):

- `mdv / |att| < 0.25` -> `well_powered`
- `0.25 <= mdv / |att| < 1.0` -> `moderately_powered`
- `mdv / |att| >= 1.0` -> `underpowered`
- power not runnable -> `unknown` (BR falls back to underpowered phrasing)

### Methodology notes

BR and DR do no estimator fitting and do not re-derive variance from
raw data — every effect, SE, p-value, CI, and sensitivity bound is
read from the fitted result or produced by an existing diff-diff
utility (may call `check_parallel_trends`, `bacon_decompose`, or
`EfficientDiD.hausman_pretest` when the panel + column kwargs are
supplied). The `design_effect` section is read-only: it echoes
`survey_metadata.design_effect` / `effective_n` from the fitted
result rather than calling `compute_deff_diagnostics`. Report-layer
cross-period aggregations are enumerated in
`docs/methodology/REPORTING.md`. Both schemas are experimental in the
current release; see that document for phrasing rules, the
no-traffic-light decision, unit-translation policy, and schema
stability policy.
