Metadata-Version: 2.4
Name: ins_pricing
Version: 0.9.16
Summary: Reusable modelling, pricing, governance, and reporting utilities.
Author: meishi125478
License: Proprietary
Keywords: pricing,insurance,bayesopt,ml
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20
Requires-Dist: pandas>=1.4
Provides-Extra: bayesopt
Requires-Dist: torch>=1.10.1; extra == "bayesopt"
Requires-Dist: optuna>=3.0; extra == "bayesopt"
Requires-Dist: xgboost>=1.6; extra == "bayesopt"
Requires-Dist: scikit-learn>=1.1; extra == "bayesopt"
Requires-Dist: statsmodels>=0.13; extra == "bayesopt"
Requires-Dist: joblib>=1.2; extra == "bayesopt"
Requires-Dist: matplotlib>=3.5; extra == "bayesopt"
Provides-Extra: plotting
Requires-Dist: matplotlib>=3.5; extra == "plotting"
Requires-Dist: scikit-learn>=1.1; extra == "plotting"
Provides-Extra: explain
Requires-Dist: torch>=1.10.1; extra == "explain"
Requires-Dist: shap>=0.41; extra == "explain"
Requires-Dist: scikit-learn>=1.1; extra == "explain"
Provides-Extra: geo
Requires-Dist: contextily>=1.3; extra == "geo"
Requires-Dist: matplotlib>=3.5; extra == "geo"
Provides-Extra: gnn
Requires-Dist: torch>=1.13; extra == "gnn"
Requires-Dist: pynndescent>=0.5; extra == "gnn"
Requires-Dist: torch-geometric>=2.3; extra == "gnn"
Provides-Extra: frontend
Requires-Dist: nicegui>=3.6; extra == "frontend"
Requires-Dist: openpyxl>=3.1; extra == "frontend"
Provides-Extra: all
Requires-Dist: torch>=1.10.1; extra == "all"
Requires-Dist: optuna>=3.0; extra == "all"
Requires-Dist: xgboost>=1.6; extra == "all"
Requires-Dist: scikit-learn>=1.1; extra == "all"
Requires-Dist: statsmodels>=0.13; extra == "all"
Requires-Dist: joblib>=1.2; extra == "all"
Requires-Dist: matplotlib>=3.5; extra == "all"
Requires-Dist: shap>=0.41; extra == "all"
Requires-Dist: contextily>=1.3; extra == "all"
Requires-Dist: pynndescent>=0.5; extra == "all"
Requires-Dist: torch-geometric>=2.3; extra == "all"
Requires-Dist: nicegui>=3.6; extra == "all"
Requires-Dist: openpyxl>=3.1; extra == "all"

# ins_pricing

[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/ins_pricing)](https://pypi.org/project/ins_pricing/)
[![License: Proprietary](https://img.shields.io/badge/license-Proprietary-red)]()

An enterprise-grade Python library for insurance ML model training, pricing, model governance, and reporting.

Core dependencies are **numpy** and **pandas** only. Heavy ML dependencies (torch, optuna, xgboost, shap, etc.) are optional extras loaded lazily — you can use the pricing, governance, and reporting modules without installing any ML packages.

## Installation

```bash
pip install ins_pricing                    # core (numpy + pandas)
pip install ins_pricing[bayesopt]          # + torch, optuna, xgboost, sklearn, statsmodels
pip install ins_pricing[explain]           # + torch, shap, sklearn
pip install ins_pricing[plotting]          # + matplotlib, sklearn
pip install ins_pricing[geo]              # + contextily, matplotlib
pip install ins_pricing[gnn]             # + torch, torch-geometric, pynndescent
pip install ins_pricing[frontend]         # + gradio
pip install ins_pricing[all]             # everything
```

> **GPU note:** Install the correct PyTorch build for your platform/GPU *before* installing extras. Torch Geometric requires platform-specific wheels. Multi-GPU training uses DDP (NCCL on Linux) or DataParallel; Windows falls back to Gloo.

## Modules

| Module | What it does |
|--------|-------------|
| **modelling** | BayesOpt-driven training for GLM, XGBoost, ResNet, FT-Transformer, GNN. Includes evaluation (calibration, bootstrap CI), explainability (permutation, SHAP, integrated gradients), and plotting (lift, oneway, geo heatmaps). |
| **pricing** | Factor tables, numeric binning, exposure calculation, premium rating, calibration, data quality checks, PSI monitoring. |
| **production** | Inference registry (`ModelSpec`, `PredictorRegistry`), batch scoring, preprocessing pipelines, drift detection, production metrics. |
| **governance** | JSON-backed model registry, approval workflows, audit logging, release/rollback management. |
| **reporting** | Markdown report builder and daily scheduler. |
| **frontend** | Gradio web UI for config-driven training, explanation, plotting, prediction, and FT two-step workflows. |
| **cli** | Entry points: `BayesOpt_entry.py` (training), `Explain_entry.py` (explanation), `watchdog_run.py` (auto-restart). |
| **utils** | Validation, loss resolution, device management, metrics, profiling, logging, safe pickle, path resolution. |

## Quick Start

### Train models

```python
from ins_pricing.modelling.bayesopt import BayesOptConfig
from ins_pricing.modelling import BayesOptModel

config = BayesOptConfig.from_file("config.json")
model = BayesOptModel(train_data, test_data, config=config)
model.optimize_model("xgb", max_evals=100)
model.optimize_model("resn", max_evals=50)
model.optimize_model("ft", max_evals=50)
```

### Build pricing factors

```python
from ins_pricing.pricing import compute_base_rate, build_factor_table, rate_premium

base_rate = compute_base_rate(df, loss_col="claim_amt", exposure_col="exposure")
age_factors = build_factor_table(df, factor_col="age_band", loss_col="claim_amt",
                                  exposure_col="exposure", base_rate=base_rate)
premium = rate_premium(df, exposure_col="exposure", base_rate=base_rate,
                        factor_tables={"age_band": age_factors})
```

### Score in production

```python
from ins_pricing.production import load_predictor_from_config, batch_score

predictor = load_predictor_from_config("config.json", "xgb", device="cuda")
scored = batch_score(predictor.predict, df, batch_size=10000)
```

### Govern and release

```python
from ins_pricing.governance import ModelRegistry, ReleaseManager

registry = ModelRegistry("registry/models.json")
registry.register("pricing_xgb", "v2", metrics={"rmse": 0.11})

release = ReleaseManager("registry/deployments", registry=registry)
release.deploy("prod", "pricing_xgb", "v2", actor="ops")
release.rollback("prod", actor="ops")   # revert to previous version
```

### Generate reports

```python
from ins_pricing.reporting import ReportPayload, write_report, schedule_daily

payload = ReportPayload(model_name="pricing_xgb", model_version="v2",
                         metrics={"rmse": 0.11, "loss_ratio": 0.63})
write_report(payload, "reports/monthly.md")
schedule_daily(lambda: write_report(payload, "reports/daily.md"), run_time="02:00")
```

## CLI

```bash
# Training (single GPU)
python ins_pricing/cli/BayesOpt_entry.py --config-json config.json

# Training (multi-GPU DDP)
torchrun --nproc_per_node=2 ins_pricing/cli/BayesOpt_entry.py --config-json config.json

# Explanation
python ins_pricing/cli/Explain_entry.py --config-json config_explain.json

# Gradio frontend
python -m ins_pricing.frontend.app
```

All workflows are config-driven. The `runner.mode` field in the JSON config determines the task:

| `runner.mode` | Task |
|---------------|------|
| `entry` | Model training |
| `explain` | Permutation importance, SHAP, integrated gradients |
| `incremental` | Incremental batch training |
| `watchdog` | Auto-restart monitoring |

## Project Structure

```
ins_pricing/
  modelling/
    bayesopt/           Training core: config, trainers, models, runtime
    explain/            Permutation importance, SHAP, integrated gradients
    plotting/           Lift curves, oneway, diagnostics, geo heatmaps
    evaluation.py       Calibration, threshold selection, bootstrap CI
  pricing/              Factor tables, exposure, calibration, data quality
  production/           Inference registry, scoring, preprocessing, monitoring
  governance/           Model registry, approval, audit, release management
  reporting/            Report builder and daily scheduler
  frontend/             Gradio web UI
  cli/                  CLI entry points and shared utilities
  utils/                Validation, losses, device, metrics, profiling, IO
  tests/                170 tests mirroring module structure
examples/               Demo notebooks and JSON config templates (not packaged)
```

## Training Output Layout

Training writes to `output_dir/` with three subdirectories:

```
output_dir/
  plot/       Diagnostic plots (oneway, lift, loss curves, geo)
  Results/    Metrics JSON, best params, evaluation snapshots
  model/      Saved model artifacts (XGB, ResNet, FT, GNN checkpoints)
```

## Loss and Distribution

Set `distribution` in your config to control the loss function. It takes precedence over `loss_name`.

| distribution | loss | XGBoost objective |
|-------------|------|-------------------|
| `tweedie` | `tweedie` | `reg:tweedie` |
| `poisson` | `poisson` | `count:poisson` |
| `gamma` | `gamma` | `reg:gamma` |
| `gaussian` / `normal` / `mse` | `mse` | `reg:squarederror` |
| `laplace` / `mae` | `mae` | `reg:absoluteerror` |
| `bernoulli` / `binomial` / `binary` | `logloss` | `binary:logistic` |

See `ins_pricing/modelling/bayesopt/README.md` for full details.

## Development

```bash
# Install in dev mode
pip install -e ".[bayesopt,plotting,explain,geo,gnn,frontend]"

# Run tests
pytest ins_pricing/tests/ -v

# Build and publish
make build && make check && make upload
```

## PyPI Upload

```bash
# Linux / macOS
export TWINE_PASSWORD='your_pypi_token'
make build && make upload

# Windows
set TWINE_PASSWORD=your_pypi_token
python -m build
upload_to_pypi.bat
```
