Metadata-Version: 2.4
Name: litellm-wzrd-momentum
Version: 0.1.0
Summary: Demand-aware model routing for LiteLLM — route to models with accelerating adoption.
License-Expression: MIT
Project-URL: Homepage, https://github.com/twzrd-sol/litellm-wzrd-momentum
Project-URL: Documentation, https://github.com/twzrd-sol/litellm-wzrd-momentum#readme
Project-URL: Signal API, https://api.twzrd.xyz/v1/signals/momentum
Project-URL: WZRD Protocol, https://twzrd.xyz
Keywords: litellm,llm,routing,momentum,wzrd,ai,model-selection
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.24
Provides-Extra: litellm
Requires-Dist: litellm>=1.0; extra == "litellm"
Provides-Extra: requests
Requires-Dist: requests>=2.28; extra == "requests"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"

# litellm-wzrd-momentum

Demand-aware model routing for LiteLLM.

Every LLM router optimizes on cost, latency, and quality. This adds a fourth axis:
**which models are gaining real-world traction right now?**

The signal comes from [WZRD](https://twzrd.xyz) — live velocity tracking across
HuggingFace downloads, GitHub stars, and OpenRouter routing volume. Updated every 5 minutes.

## Install

```bash
pip install litellm-wzrd-momentum
```

## Quick start

```python
from litellm import Router
from wzrd_momentum_strategy import register

router = Router(model_list=[
    {"model_name": "qwen-9b",  "litellm_params": {"model": "openrouter/qwen/qwen-3.5-9b"}},
    {"model_name": "qwen-35b", "litellm_params": {"model": "openrouter/qwen/qwen-3.5-35b-a3b"}},
    {"model_name": "llama-70b","litellm_params": {"model": "openrouter/meta-llama/llama-3.3-70b-instruct"}},
])

register(router, alias_map={
    "qwen-9b":  ["Qwen/Qwen3.5-9B"],
    "qwen-35b": ["Qwen/Qwen3.5-35B-A3B"],
    "llama-70b": ["meta-llama/Llama-3.3-70B-Instruct"],
})

# Every call now routes via momentum
response = await router.acompletion(
    model="qwen-9b",
    messages=[{"role": "user", "content": "Hello"}],
)
```

## How it works

1. On each routing decision, fetches [WZRD momentum signals](https://api.twzrd.xyz/v1/signals/momentum) (cached 5 min)
2. Scores each deployment: `trend + momentum × 0.3 + delta × 0.25`, weighted by confidence
3. Returns the highest-scoring deployment to LiteLLM
4. LiteLLM handles retries, fallbacks, and provider errors as normal

If WZRD is unreachable, returns the first deployment. Your inference pipeline never breaks.

## Behavior defaults

- `cache_ttl=300` seconds (5 minutes)
- confidence policy:
  - `normal`: full signal weight (eligible for proactive routing)
  - `low`: half signal weight (observe-first posture)
  - `insufficient`: zero signal weight (observe-only; no proactive push)
- fallback policy: if WZRD is down or payload contract drifts, route by deployment order (first candidate)
- contract guard: requires `signal_version` and model-level fields
  (`model`, `velocity_trend`, `momentum_score`, `velocity_delta_pct`, `history_confidence`)

## Score table

| Trend | Score | Signal |
|-------|-------|--------|
| surging | +3.0 | Downloads/stars growing >50% day-over-day |
| accelerating | +2.0 | Growing 10-50% day-over-day |
| stable | 0.0 | Flat or <10% growth |
| decelerating | -1.0 | Slowing 5-30% day-over-day |
| cooling | -2.0 | Dropping >30% day-over-day |

Confidence scaling: `normal` = full weight, `low` = 50%, `insufficient` = 0% (new models with <3 days of data).

## Alias mapping

WZRD tracks models by HuggingFace/GitHub name (`Qwen/Qwen3.5-9B`).
LiteLLM uses provider-specific names (`openrouter/qwen/qwen-3.5-9b`).

The `alias_map` bridges them explicitly. Without it, the strategy auto-matches
by extracting slugs from `litellm_params.model` — works for most cases, but
explicit mapping is more reliable.

```python
register(router, alias_map={
    "qwen-9b": ["Qwen/Qwen3.5-9B", "Qwen/Qwen3-9B"],  # multiple variants
    "llama-70b": ["meta-llama/Llama-3.3-70B-Instruct"],
})
```

## Proxy integration

LiteLLM's proxy doesn't support custom strategies via YAML config.
For proxy deployments, create a wrapper script:

```python
# wzrd_proxy.py
import litellm
from litellm import Router
from wzrd_momentum_strategy import register

# Your normal proxy config
router = Router(model_list=[...])
register(router, alias_map={...})

# Start proxy with the patched router
from litellm.proxy.proxy_server import app
```

Or use the pre-router pattern from `integrations/litellm-wzrd-router/` which
works as middleware before any LiteLLM call (SDK or proxy).

## Manual setup

If you prefer explicit control over the `register()` convenience:

```python
from wzrd_momentum_strategy import WZRDMomentumStrategy

strategy = WZRDMomentumStrategy(
    router,
    wzrd_url="https://api.twzrd.xyz/v1/signals/momentum",
    alias_map={"qwen-9b": ["Qwen/Qwen3.5-9B"]},
    cache_ttl=300,
)
router.set_custom_routing_strategy(strategy)
```

## API

The momentum data comes from a public, free, no-auth endpoint:

```
GET https://api.twzrd.xyz/v1/signals/momentum
GET https://api.twzrd.xyz/v1/signals/momentum?platform=huggingface&trending=true
```

Returns trend classification, momentum score, velocity delta, confidence,
and routing implications for 42+ tracked AI models.

## Expected output (live sample)

For a candidate set like `qwen-9b`, `nemotron-120b`, `llama-70b`, expected behavior is:

- route to `nemotron-120b` when it is `surging`
- deprioritize `qwen-9b` when `decelerating`
- deprioritize `llama-70b` when `cooling`

The exact winner changes as momentum updates, but routing should follow trend
and confidence consistently.

## v0.1.0 release notes

- Added LiteLLM `CustomRoutingStrategyBase` plugin with one-line registration helper
- Added trend + momentum + delta scoring with confidence weighting
- Added explicit alias map matching and automatic fallback matching from provider model slugs
- Added contract guard for WZRD payload shape (`signal_version` + required model fields)
- Added graceful degradation fallback to first deployment when WZRD is unavailable
- Added test suite coverage for scoring order, confidence behavior, matching paths, async routing,
  caching behavior, register helper, and payload contract guard

## License

MIT
