Metadata-Version: 2.4
Name: societal-costs-pipeline-2026-02-13-1300
Version: 0.1.0
Summary: Societal costs pipeline CLI.
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# societal_costs

Workspace for the societal costs study.

## Crates

- `crates/costs_core` — core domain types, aggregation, pricing, and validation.
- `crates/costs_models` — two-part, Tweedie, and log-normal models + selection tools.
- `crates/costs_matching` — matching + balance diagnostics.
- `crates/costs_cohort` — cohort assignment + diagnostics.
- `crates/costs_components` — register-derived component builders (LMDB volume so far).
- `crates/costs_population` — population files + public stats QC.
- `crates/costs_pipeline_common` — config, errors, tracing, and shared helpers.
- `crates/costs_pipeline_io` — report/Arrow/CSV/JSON writers.
- `crates/costs_registers` — register ingestion flows (BEF/LPR/LMDB/VNDS/DOD/etc).
- `crates/costs_pipeline_core` — orchestration + analysis/matching/modeling logic.
- `crates/costs_pipeline` — thin facade that re-exports common/core/io APIs.

## LMDB example

See `crates/costs_pipeline/README.md` for the LMDB ingestion example.

## Pipeline config

Start from `pipeline.example.toml` and update the register file paths to match
your local dataset layout. Most SAS7BDAT paths accept either a single file
or a directory (all `.sas7bdat` files are loaded, sorted by name). You can
also set `base_path` in the config to prefix all relative register paths.
File filtering uses `study.data_*` and optional `study.baseline_*` /
`study.followup_*` windows to avoid reading out-of-scope files.
Event registers (VNDS/DOD/DODSAARS/DODSAASG) are **not** year-filtered.
If you use contact-based SCD classification, set
`components.lpr.scd_inpatient_contact_types` (or flip
`components.lpr.scd_allow_outpatient = true` to allow outpatient contacts).

Pipeline outputs include consolidated QC under `qc_summary.csv/json`, plus
step-specific outputs such as `lpr_episode_qc.csv`, `matching_outcome_qc.csv`,
`matching_parent_qc.csv`, and `population_public_stats.csv`.

For terminal progress and timing, set:
- `pipeline.progress = true`
- `pipeline.log_level = "info"` (or `debug`)

## Additional docs

- `docs/README.md` — onboarding index and quick-start path.
- `crates/costs_core/README.md` — core domain and aggregation usage.
- `crates/costs_models/README.md` — model APIs and examples.
- `crates/costs_models/examples/README.md` — runnable model examples.
- `crates/costs_components/README.md` — LMDB note (missing price fields).
- `crates/costs_pipeline/README.md` — pipeline usage and LMDB example.
- `docs/notes/pipeline_contracts.md` — step-level inputs, outputs, invariants.

