Metadata-Version: 2.4
Name: shenron
Version: 0.16.1
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
License-File: LICENSE
Summary: Generate Shenron docker-compose deployments from model config files
Author: doubleword.ai
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/doublewordai/shenron
Project-URL: Repository, https://github.com/doublewordai/shenron

# Shenron

Shenron now ships as a config-driven generator for production LLM docker-compose deployments.

`shenron` reads a model config YAML and generates:
- `docker-compose.yml`
- `.generated/onwards_config.json`
- `.generated/prometheus.yml`
- `.generated/scouter_reporter.env`
- `.generated/engine_start.sh`
- `.generated/engine_start_N.sh` + `.generated/sglangmux_start.sh` when `models:` has 2+ entries

## Quick Start

```bash
uv pip install shenron
shenron get
docker compose up -d
```

`shenron get` reads a per-release config index asset, shows available configs with arrow-key selection, downloads the chosen config, and generates deployment artifacts in the current directory. Using `--release latest` also rewrites `shenron_version` in the downloaded config to `latest`. You can also override config values on download with:
- `--api-key` (writes `api_key`)
- `--scouter-api-key` (writes `scouter_ingest_api_key`)
- `--scouter-collector-instance` (writes `scouter_collector_instance`; alias: `--scouter-colector-instance`)

By default, `shenron get` pulls release configs from `doublewordai/shenron-configs`.

Use `shenron get --helm` to download the Helm chart bundle for the selected release and extract it to `./shenron-helm` (or set `--dir`). This gives you a chart directory ready for `helm install`.

You can also install directly with Helm from release assets in `shenron-configs`:
- `helm repo add shenron https://github.com/doublewordai/shenron-configs/releases/download/v0.15.1`
- `helm install my-shenron shenron/shenron --version 0.15.1`

`shenron .` still works and expects exactly one config YAML (`*.yml` or `*.yaml`) in the current directory, unless you pass a config file path directly.

## Configs

Repo configs are stored in `configs/`.

Available starter configs:
- `configs/Qwen06B-cu126-TP1.yml`
- `configs/Qwen06B-cu129-TP1.yml`
- `configs/Qwen06B-cu130-TP1.yml`
- `configs/Qwen30B-A3B-cu126-TP1.yml`
- `configs/Qwen30B-A3B-cu129-TP1.yml`
- `configs/Qwen30B-A3B-cu129-TP2.yml`
- `configs/Qwen30B-A3B-cu130-TP2.yml`
- `configs/Qwen235-A22B-cu129-TP2.yml`
- `configs/Qwen235-A22B-cu129-TP4.yml`
- `configs/Qwen235-A22B-cu130-TP2.yml`

This file uses the same defaults that were previously hardcoded in `docker/run_docker_compose.sh`.

Engine selection and args:
- `engine`: `vllm` or `sglang` (default: `vllm`)
- `engine_args`: engine CLI args appended after core settings.
- `engine_env`: top-level default engine environment variables as alternating `KEY, VALUE` entries.
- `models[*].engine_envs`: per-model engine environment variables as alternating `KEY, VALUE` entries.
- `engine_port`, `engine_host`: engine bind settings used for generated scripts and targets.
- `engine_use_cuda_ipc_transport`: when `true`, exports `SGLANG_USE_CUDA_IPC_TRANSPORT=1` before launching SGLang.
- `models`: optional per-model engine config. With 1 entry, Shenron generates a single `engine_start.sh` from that model entry. With 2+ entries, Shenron starts `sglangmux` (requires `engine: sglang`).
- `sglangmux_listen_port`, `sglangmux_host`, `sglangmux_upstream_timeout_secs`, `sglangmux_model_ready_timeout_secs`, `sglangmux_model_switch_timeout_secs`, `sglangmux_log_dir`: optional `sglangmux` settings (hyphenated aliases like `sglangmux-listen-port` are also accepted).

`engine_args`, `engine_env`, and `models[*].engine_envs` values accept YAML scalars (string/number/bool). If you need to pass a structured value (like `--override-generation-config`), provide a YAML mapping and it will be JSON-encoded.
`engine_env` and `models[*].engine_envs` must have an even number of entries (`KEY VALUE` pairs), and variable names must be valid shell env identifiers.
Set `VLLM_ENABLE_RESPONSES_API_STORE` and `VLLM_FLASHINFER_MOE_BACKEND` through `engine_env` or `models[*].engine_envs`.
Legacy keys (`vllm_args`, `sglang_args`, `vllm_port`, `vllm_host`, `sglang_env`, `sglang_use_cuda_ipc_transport`) are still accepted as aliases.

### Single Config `models:` Schema (Single-Model + optional sglangmux)

When `models:` has 2+ entries, Shenron generates one engine launch script per model plus a mux launcher:

```yaml
engine: sglang
sglangmux_listen_port: 8100
sglangmux_host: 0.0.0.0
sglangmux_upstream_timeout_secs: 120
sglangmux_model_ready_timeout_secs: 600
sglangmux_model_switch_timeout_secs: 120
sglangmux_log_dir: /tmp/sglangmux

models:
- model_name: Qwen/Qwen3-0.6B
  engine_port: 8001
  api_key: sk-model-a
  engine_envs: [VLLM_ENABLE_RESPONSES_API_STORE, -1]
  engine_args: [--tp, 1]
- model_name: Qwen/Qwen3-30B-A3B
  engine_port: 8002
  api_key: sk-model-b
  engine_use_cuda_ipc_transport: true
  engine_args: [--tp, 2]
```

Rules in `models:` mode:
- with exactly 1 model entry: works for any `engine` value and Shenron generates `.generated/engine_start.sh`
- with 2+ model entries: `engine` must be `sglang`
- each `models[*].model_name` must be unique
- each `models[*].engine_port` must be set and unique
- with 2+ model entries: `sglangmux_listen_port` must be different from all model ports
- when `models:` is set, top-level `model_name`/`engine_port`/`engine_host` can be omitted

With 2+ model entries, `.generated/onwards_config.json` contains one target per model and all target URLs point to `http://vllm:<sglangmux_listen_port>/v1`.

## Generated Compose Behavior

`docker-compose.yml` is fully rendered from config values:
- model image tag from `shenron_version` + `cuda_version`
- `onwards` image tag from `onwards_version`
- service ports from config
- no `${SHENRON_VERSION}` placeholders

## Development

```bash
# Run tests (Rust + CLI + compose checks)
./scripts/ci.sh

# Install local package for manual testing
python3 -m pip install -e .

# Generate from repo config
shenron configs/Qwen06B-cu126-TP1.yml --output-dir /tmp/shenron-test
```

## Release Automation

- `release-assets.yaml` publishes stamped config files (`*.yml`) as release assets.
- `release-assets.yaml` also publishes `configs-index.txt`, which powers `shenron get`.
- `release-assets.yaml` packages Helm chart assets as `shenron-<version>.tgz` + `index.yaml` (Helm repository format).
- `release-assets.yaml` mirrors `*.yml`, `configs-index.txt`, `shenron-*.tgz`, and `index.yaml` into `${OWNER}/shenron-configs` under the same tag as the main `shenron` release.
- Set `CONFIGS_REPO_TOKEN` (or reuse `RELEASE_PLEASE_TOKEN`) with write access to the configs repo release assets; optional repo variable `CONFIGS_REPO` overrides the default target (`${OWNER}/shenron-configs`).
- `python-release.yaml` builds/publishes the `shenron` package to PyPI on release tags.
- Docker image build/push via Depot remains in `ci.yaml` and still triggers when `docker/Dockerfile.vllm.cu*`, `docker/Dockerfile.sglang.cu*`, or `VERSION` changes.

## License

MIT, see `LICENSE`.

