Metadata-Version: 2.4
Name: societal-costs-cli-2802-20
Version: 0.1.0
Classifier: Programming Language :: Python
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Summary: Unified orchestrator CLI for societal-costs prematch and analysis stages.
Keywords: societal-costs,epidemiology,pipeline,cli,rust
License: MIT OR Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# societal_costs_cli

Rust project split into two workspaces:

- study workspace (root) for societal-costs-specific orchestration and outputs
- generic cohort builder workspace (`../../../generic_cohort_builder/`) for study-agnostic register/domain crates

## Study workspace crates (`crates/studies/societal_costs/cli/crates/`)

- `analysis_core`: reusable analysis/modeling/QC logic for this study
- `pipeline_runner`: orchestration/step runner
- `study_config`: study config schema + canonical output filename constants
- `study_reporting`: study report row contracts + report writers
- `ingest_flow`: study ingest/cohort pipeline flow orchestration
- `societal_costs_policy`: internal study-policy/protocol component kept separate from runtime crates

## Shared orchestration crate

- `orchestrator_core` (`../../../orchestrator_core`): study-agnostic orchestration primitives (manifest/status/resume/artifact contracts, stage adapters, typed stage failure contracts)

## Platform crates (`../../../generic_cohort_builder/crates/`)

- `cohort_builder_common`: shared errors, tracing helpers, and path utilities
- `cohort_builder_domain`: study-agnostic cohort/register-domain logic
- `cohort_builder_domain_scd`: SCD-specific domain contracts/classification logic
- `cohort_builder_ingest`: study-agnostic register IO modules
- `cohort_builder_io`: study-agnostic Arrow wide writer utilities

## Usage

```bash
cargo run --release -- prematch /path/to/pipeline.toml
cargo run --release -- analyze /path/to/pipeline.toml
cargo run --release -- run-all /path/to/pipeline.toml
# or use canonical default config:
cargo run --release -- run-all
```

The CLI enforces prematch steps and writes outputs to `report.output_dir`.

Useful flags:

```bash
# Dry-run stage plan only
cargo run --release -- run-all /path/to/pipeline.toml --print-plan

# Dry-run using canonical default config
cargo run --release -- run-all --print-plan

# Disable resume checks at CLI level
cargo run --release -- run-all /path/to/pipeline.toml --no-resume-stages

# Analysis always runs in-process in unified orchestrator mode.
cargo run --release -- run-all /path/to/pipeline.toml
```

Migration notes for legacy CLI naming/entrypoints:

- `cli_migration.md`

Optional config controls:

```toml
[orchestrator]
resume_prematch_stage = true
resume_analysis_stage = true
prematch_max_retries = 0
analysis_max_retries = 0
prematch_retry_backoff_ms = 0
analysis_retry_backoff_ms = 0
```

Prematch source mode:

```toml
[pipeline]
# build (default): run raw-register prematch build
# files: reuse existing core handoff artifacts in report.output_dir
cohort_source = "build"
```

Config template:

- `pipeline.template.toml`
- `pipeline.default.toml`

No-config resolution order for `run-all`:

1. canonical checked-in config (`config/pipeline.default.toml`)
2. generated fallback preset (only when canonical default is unavailable)

Optional override:

- `SOCIETAL_COSTS_RUN_ALL_CONFIG` for explicit default config path.

## Checks

Study workspace:

```bash
cargo check --workspace
cargo test --workspace
scripts/check_workspace_boundaries.sh
```

Platform workspace:

```bash
cd ../../../generic_cohort_builder
cargo check --workspace
cargo test --workspace
```

## Output contract

Report outputs are Arrow/CSV/Markdown; JSON report writers are disabled (except `run_manifest.json` for resume state).

Stage manifests:

- `prematch_manifest.json`
- `analysis_manifest.json`

Both manifests are written at stage start (`status=running`) and finalized to `completed` or `failed`.
Resume requires strict manifest compatibility (schema/version + config digest).
Manifests also include runtime diagnostics (`execution.*`, `host.*`) for remote troubleshooting.

Schema compatibility/migration policy:

- `manifest_schema_policy.md`

## Crate Roles

- `societal_costs_cli`:
  study-specific workflow orchestrator (prematch/analyze/run-all, manifests, resume, logging); canonical operational entrypoint.
- `societal_costs_policy`:
  internal study-policy library (protocol contracts, matching constraints, disclosure policy); runtime crates are intentionally decoupled from it.
- `orchestrator_core`:
  study-agnostic orchestration crate shared across studies.

## LPR definitions

Contact vs episode semantics and stitching/classification rules:

- `lpr_contacts_and_episodes.md`

