Metadata-Version: 2.4
Name: zen-fronts
Version: 1.0.5
Summary: Async Multi Objective Hyperparameter PBT (fast-cython, parameterless)
License-Expression: MIT
Project-URL: Homepage, https://github.com/TovarnovM/zen_fronts
Project-URL: Repository, https://github.com/TovarnovM/zen_fronts
Project-URL: Issues, https://github.com/TovarnovM/zen_fronts/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: owrf1d>=1.0.4
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: hypothesis>=6.100; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: Cython>=3.0; extra == "dev"
Requires-Dist: numpy>=1.23; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: setuptools; extra == "dev"
Provides-Extra: examples
Requires-Dist: matplotlib>=3.8; extra == "examples"
Requires-Dist: numpy>=1.26; extra == "examples"
Dynamic: license-file

# ZenFronts

**ZenFronts** is a compact facade for **asynchronous, noisy, multi-objective population management** (Population-Based Training style), built around a **Cython-accelerated Monte Carlo Pareto-ranking kernel**.

ZenFronts targets the operational regime where:

* objective signals arrive **asynchronously** (partial metric updates are common),
* objective values are **stochastic** (noise, heavy tails, drift),
* selection must be **reproducible** (deterministic under a fixed seed),
* one wants a clear and auditable separation between **evaluation**, **state**, and **selection**.

ZenFronts does not prescribe a particular optimizer. Instead, it provides a rigorously defined, high-throughput **selection engine** and a small, policy-driven API to manage a population.

---

## Visual intuition

The figure below shows a typical snapshot produced by `examples/demo_noisy_zdt1_snapshots.py`.

* **Left panel (objective space):** point means (`μ`) with semi-transparent uncertainty ellipses derived from per-objective variances (`σ²`).
* **Right panel (rank space):** the same population expressed in **stable rank coordinates**, which is the space in which the Pareto machinery is applied.

Color encodes an aggregate selection-quality scalar (`quality_score`) computed from Monte Carlo statistics.

![ZenFronts snapshot (objective space vs rank space)](https://raw.githubusercontent.com/TovarnovM/zen_fronts/main/docs/assets/snap_075.png)

> Note on reproducibility: for release-tag pinning on PyPI, replace `main` with your tag (e.g. `v1.0.4`) in the URL.

---

## At a glance

### What ZenFronts provides

* **Population state and lifecycle**

  * `add_point`, `add_random_point`, `delete_point` (tombstone model)
  * stable `point_id`s for logging/auditing
* **Asynchronous objective/criterion storage**

  * per-(point, criterion) statistics: `μ`, `trend`, `σ²`, `t_last`, `ready`
  * updates from raw samples or external statistics
  * eligibility gate: only *fully ready* points are considered by selection
* **Compiled Monte Carlo Pareto selection core**

  * Monte Carlo sampling from `(μ, σ²)` per objective
  * Pareto ranking in stable rank-space
  * per-point distribution summaries: place/front/within (mean, std, median, quantiles)
* **Deterministic selection**

  * fixed `seed` yields stable winners/losers and statistics
* **Versioned selection-statistics schema**

  * stable contract for downstream pipelines and experiment tracking

### What ZenFronts does not provide

* a parameter space DSL (you inject a sampler/mutator)
* a training/evaluation runtime (you call `update_crits` when metrics arrive)
* a black-box optimizer with hidden heuristics (policies are explicit)

---

## Installation

```bash
pip install zen-fronts
```

For local development with examples and tests:

```bash
pip install -e ".[dev,examples]"
```

---

## Quick start

### Runnable demo (produces the snapshot above)

```bash
python examples/demo_noisy_zdt1_snapshots.py --out out/demo --epochs 80
```

### Minimal API sketch

```python
from zen_fronts import ZenFronts
from zen_fronts.selection import SelectionConfig

# Define two objectives (minimization in this example)
crits = {"f1": "min", "f2": "min"}

zf = ZenFronts(
    crits=crits,
    selection=SelectionConfig(
        n_samples=256,
        percentile=0.2,
        seed=42,
        collect_stats=True,
        quantiles_mode_i=2,          # 0 exact, 1 P² streaming, 2 auto-by-budget
        quantiles_budget=200_000,    # used when quantiles_mode_i=2
    ),
    sampler=lambda rng: {"x1": rng.random(), "x2": rng.random()},
    # mutator must return (child_params, meta)
    mutator=lambda parent, rng, **kw: (
        {
            "x1": parent["x1"] + rng.normal(0.0, 0.05),
            "x2": parent["x2"] + rng.normal(0.0, 0.05),
        },
        {},
    ),
)

# 1) initialize a population
zf.add_random_point(128)

# 2) repeatedly: ingest metrics -> refresh -> replace
for t in range(1, 101):
    # ingest / compute metrics for active points
    for pid in zf.active_point_ids():
        params = zf.params(pid)
        f1, f2 = evaluate(params)  # user-defined
        zf.update_crits(pid, {"f1": f1, "f2": f2}, t=float(t))

    losers = zf.refresh(now=float(t))

    for loser in losers:
        parent = zf.choose_parent(loser)
        child, _meta = zf.perform_new(parent, looser=loser, remove_looser=True)

        # IMPORTANT: evaluate the child at least once, otherwise it stays invisible
        f1, f2 = evaluate(zf.params(child))
        zf.update_crits(child, {"f1": f1, "f2": f2}, t=float(t) + 0.1)
```

---

## Conceptual model

### Points and tombstones

A **point** is an individual in the population, identified by an integer `point_id`.

* `delete_point(point_id)` marks a point inactive (a **tombstone**).
* tombstones do not participate in selection.
* `params(point_id)` raises `KeyError` for tombstones.
* `info(point_id)` remains available; its last selection snapshot is preserved.

This supports stable experiment logging: you can keep IDs forever while still maintaining a bounded active population.

### Criteria and readiness

Each criterion is defined by a name and a direction (`"min"` or `"max"`). Internally, selection operates in a unified maximization convention.

ZenFronts implements an explicit **ready gate**:

* a point becomes eligible for selection only when it is **ready on all criteria**,
* partial updates are allowed (typical for asynchronous pipelines),
* `refresh()` considers only `active ∩ fully_ready` points.

### Selection in rank space

Selection is performed using Monte Carlo sampling from the per-criterion distributions.

For each Monte Carlo sample:

1. sample a synthetic objective vector for each point,
2. transform each objective into **stable ranks** (per objective),
3. compute Pareto fronts and within-front ranks in this rank space,
4. accumulate per-point statistics across samples.

This rank-space formulation reduces sensitivity to scaling and improves robustness under noise.

---

## How to run an optimization cycle (operational semantics)

A full guide is provided in [`docs/how_to_run_cycle.md`](docs/how_to_run_cycle.md). The essential rules are below.

### 1) When is a point considered ready?

A point participates in `refresh()` iff it is:

* **active** (not a tombstone), and
* **ready on every criterion**.

Practical consequence: adding a point (or creating a child) does not automatically make it eligible for selection; you must provide at least one update for every criterion.

### 2) What to do with newly created children

`perform_new()` creates a child via your `mutator` and (optionally) removes a loser.

A new child is **not ready** until you evaluate it and call `update_crits(child_id, ...)` for **all** criteria.

A safe pattern per epoch is:

1. ingest updates for existing active points,
2. call `losers = refresh()`,
3. for each loser: choose a parent, spawn a child, evaluate the child once, update criteria.

This prevents two common pathologies:

* silently shrinking the effective population (many non-ready children),
* delayed “mass activation” of children that suddenly distort selection.

### 3) Choosing `percentile` and `n_samples`

Let `N` be the number of active-and-ready points. ZenFronts selects:

* `k = ceil(percentile · N)` losers (and symmetrically winners)

Heuristics:

* `percentile` (selection pressure)

  * start at **0.2–0.3** for PBT-like loops,
  * for small `N`, keep `k ≥ 2` to reduce jitter.
* `n_samples` (Monte Carlo stability)

  * low noise: **64–128**
  * medium noise: **128–256**
  * high noise / heavy tails: **512+**

Quantiles (median/q25/q75) can be computed exactly or via streaming P². A strict contract is enforced by tests:

> **Changing the quantile mode must not change winners/losers, mean/std, or `quality_score`.**

---

## Selection statistics contract (versioned)

After each `refresh()`, ZenFronts persists a per-point selection snapshot under `info(pid)["selection"]`.

The structure is explicitly versioned:

* `schema_name = "zen_fronts.selection_stats"`
* `schema_version = "1.0.0"`

Downstream code can validate and normalize this contract:

```python
from zen_fronts.selection.schema import validate_selection_stats

st = validate_selection_stats(zf.info(pid)["selection"])  # raises on incompatible schema
```

Compatibility rule:

* newer **minor/patch** versions are accepted,
* newer **major** versions are rejected.

This enables additive evolution of telemetry without breaking consumers.

---

## Performance

ZenFronts is engineered so that, at typical population sizes (≈128–256, ≈3 objectives), the dominant cost is the compiled Monte Carlo kernel.

### Empirical measurements (AMD Ryzen 5950X, Linux, Python 3.12.12, NumPy 2.4.1)

End-to-end `refresh()` (median; `M=3`, `percentile=0.2`, criteria supplied as external stats; `collect_stats=True`):

| N (active+ready) | n_samples | median refresh() | practical interpretation |
| ---------------: | --------: | ---------------: | ------------------------ |
|              128 |        64 |           ~10 ms | interactive              |
|              128 |       128 |           ~18 ms | interactive              |
|              128 |       256 |           ~35 ms | interactive              |
|              128 |       512 |           ~68 ms | moderate                 |
|              128 |      1024 |          ~134 ms | heavy                    |
|              256 |        64 |           ~28 ms | interactive              |
|              256 |       128 |           ~52 ms | interactive              |
|              256 |       256 |          ~102 ms | moderate                 |
|              256 |       512 |          ~202 ms | heavy                    |
|              256 |      1024 |          ~396 ms | very heavy               |

Rule of thumb at `M=3` on this hardware:

* `refresh_ms ≈ a + b · n_samples`

  * for `N=128`: `b ≈ 0.13 ms/sample`
  * for `N=256`: `b ≈ 0.38 ms/sample`

### Scaling intuition

The kernel builds and uses a domination matrix; the dominant term scales approximately as:

* **O(n_samples · N² · M)**

Consequences:

* doubling `n_samples` roughly doubles runtime,
* doubling `N` increases runtime by roughly 3–4× in the measured regime,
* increasing the number of objectives scales sublinearly in some regimes due to early-exit in dominance checks.

### Benchmark scripts

* Core kernel only:

  ```bash
  python examples/bench_mc_rank.py --out out/bench_mc_rank.csv
  ```
* End-to-end loop (“as in production”):

  ```bash
  python examples/bench_refresh.py --out out/bench_refresh.csv \
    --Ns 128,256 --Ss 64,128,256,512,1024 --epochs 30 --reps 3
  ```

---

## Development notes

### Cython HTML annotation

To generate Cython HTML annotation (hot-line visualization), enable `annotate=True` in the `cythonize(...)` call or guard it behind an environment variable (recommended).

Typical local workflow:

```bash
CYTHON_ANNOTATE=1 python setup.py build_ext --inplace
```

The generated `*.html` files appear next to the compiled modules.

---

## Repository navigation

* `src/zen_fronts/` — library implementation
* `src/zen_fronts/selection_core/` — Cython Monte Carlo ranking kernel
* `docs/how_to_run_cycle.md` — operational semantics and tuning guidelines
* `examples/` — runnable demo + benchmarks
* `tests/` — unit tests and property-based tests (Hypothesis)

---

## License

MIT (see `LICENSE`).
