Metadata-Version: 2.4
Name: antaris-router
Version: 4.5.3
Summary: File-based model router for LLM cost optimization. Zero dependencies.
Author-email: Antaris Analytics <dev@antarisanalytics.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Antaris-Analytics/antaris-router
Project-URL: Documentation, https://router.antarisanalytics.ai
Project-URL: Repository, https://github.com/Antaris-Analytics/antaris-router
Project-URL: Issues, https://github.com/Antaris-Analytics/antaris-router/issues
Keywords: ai,llm,router,cost,optimization,models,deterministic
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# antaris-router

**Adaptive model routing with semantic classification and outcome learning. Zero external dependencies.**

Routes prompts to optimal models using TF-IDF classification (no embeddings required). Tracks routing decisions and outcomes to improve accuracy over time. Fallback chains provide automatic failover. All state persists to JSON files.

```bash
pip install antaris-router
```

Version 4.0.1 | Suite Compatibility: antaris-suite 4.2.0 | Python 3.9+ | stdlib only

## Benchmarks

- **Routing accuracy**: 100% (8/8 correct on standard test suite)
- **Self-improving**: accuracy increases with outcome data accumulation
- **Latency**: median 0.05ms, p99 0.09ms
- **Memory**: <5MB for typical workloads

## Key Exports

```python
from antaris_router import AdaptiveRouter, Router, RoutingDecision, ModelConfig
```

## Complete Workflow Example

```python
from antaris_router import AdaptiveRouter, ModelConfig

# Initialize router with file-based persistence
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

# Register models with tier ranges and costs
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

router.register_model(ModelConfig(
    name="claude-sonnet",
    tier_range=("simple", "complex"), 
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))

router.register_model(ModelConfig(
    name="claude-opus",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

# Route prompts to appropriate models
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Route to: {result.model}")
print(f"Tier: {result.tier}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Fallback chain: {result.fallback_chain}")

# Use the model (your implementation)
response = your_llm_client.call(result.model, result.prompt)
quality_score = evaluate_response(response)  # 0.0-1.0

# Report outcome so router learns
router.report_outcome(
    prompt_hash=result.prompt_hash,
    quality_score=quality_score,
    success=quality_score > 0.7
)

# Save learned state
router.save()

# View routing analytics
analytics = router.routing_analytics()
print(f"Total decisions: {analytics['total_decisions']}")
print(f"Cost savings: ${analytics['cost_savings']:.2f}")
```

## Semantic Classification

Uses TF-IDF vectorization with cosine similarity for semantic understanding. No external embeddings or API calls required.

```python
# These prompts route to different tiers despite similar length
router.route("What is 2 + 2?")                    # tier: trivial
router.route("Implement OAuth2 flow")             # tier: moderate  
router.route("Design distributed consensus")      # tier: expert
```

**Classification Features:**
- ~50 labeled examples across 5 complexity tiers
- TF-IDF term weighting for semantic understanding
- Cosine similarity for classification decisions
- `teach()` method for manual corrections

```python
# Correct misclassification
router.teach("Optimize Kubernetes for cost", "complex")
```

## Quality Tracking with Outcome Learning

Router builds quality profiles per model per tier based on reported outcomes.

```python
# Quality score calculation
score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)

# View model performance
profiles = router.get_model_profiles()
print(profiles["gpt-4o-mini"]["moderate"])
# {'quality_score': 0.73, 'attempts': 45, 'successes': 33}

# Models below threshold (default 0.30) are skipped
router.set_escalation_threshold(0.35)
```

**Learning Process:**
1. Router makes initial routing decision
2. You use the suggested model
3. Call `report_outcome()` with quality score and success flag
4. Router updates quality profiles
5. Future routing considers learned performance data

## Fallback Chains

Automatic failover when primary models are unavailable or perform poorly.

```python
# Configure fallback order
router = AdaptiveRouter(
    data_dir="./routing_data",
    fallback_chain=["gpt-4o-mini", "claude-sonnet", "claude-opus"]
)

result = router.route("Debug this memory leak")
print(result.model)           # Primary choice
print(result.fallback_chain)  # Ordered alternatives

# Escalate to next model if primary fails
next_model = router.escalate(result.prompt_hash)
```

## A/B Testing Support

Randomly routes a percentage of requests to premium models for validation.

```python
# Route 5% to premium models regardless of classification
router = AdaptiveRouter("./data", ab_test_rate=0.05)

# Track A/B test results
stats = router.get_ab_stats()
print(f"A/B tests: {stats['total_tests']}")
print(f"Premium win rate: {stats['premium_win_rate']:.2f}")
```

## Context-Aware Routing

Adjusts routing based on conversation state and user expertise.

```python
# Iteration count influences tier selection
result = router.route("Fix this bug", context={"iteration": 1})   # Normal tier
result = router.route("Fix this bug", context={"iteration": 5})   # Escalated tier

# Conversation length sets minimum tier
result = router.route("Any thoughts?", context={"conversation_length": 20})

# User expertise level
result = router.route("Optimize this", context={"user_expertise": "expert"})

# Query complexity analysis
result = router.route(long_complex_prompt, context={"analyze_complexity": True})
```

**Context Parameters:**
- `iteration`: Attempt number (escalates on repeated failures)
- `conversation_length`: Message count (longer = higher minimum tier)
- `user_expertise`: "novice", "intermediate", "expert"
- `analyze_complexity`: Enable structural complexity analysis

## Cost Tracking and Optimization

Tracks usage costs and calculates savings versus premium-only routing.

```python
# Cost analysis
cost_report = router.get_cost_analysis(days=7)
print(f"Total cost: ${cost_report['total_cost']:.2f}")
print(f"Savings vs premium: ${cost_report['savings']:.2f}")
print(f"Cost per request: ${cost_report['avg_cost_per_request']:.4f}")

# Usage breakdown by model
for model, data in cost_report['by_model'].items():
    print(f"{model}: {data['requests']} requests, ${data['cost']:.2f}")
```

## Confidence Gating

Routes to cheaper models when confidence is high, escalates when uncertain.

```python
from antaris_router import ConfidenceRouter

router = ConfidenceRouter(
    confidence_threshold=0.8,  # Use cheap model if confidence > 0.8
    cheap_model="gpt-4o-mini",
    premium_model="claude-sonnet"
)

result = router.route("Simple math problem")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}")  # Likely cheap model

result = router.route("Complex system architecture question")
print(f"Confidence: {result.confidence:.2f}") 
print(f"Model: {result.model}")  # Likely premium model
```

## Tier System

Five complexity levels from trivial lookups to expert system design.

| Tier | Examples | Characteristics |
|------|----------|----------------|
| trivial | "What is 2+2?", "Define REST" | Single fact lookup, <10 words |
| simple | "Reverse string in Python", "TCP vs UDP" | Basic programming, short explanations |
| moderate | "Implement JWT auth", "Design Redis cache" | Multi-step implementation, system components |
| complex | "Microservices architecture", "Database sharding" | System design, multiple technologies |
| expert | "Distributed consensus algorithm", "HFT platform" | Research-level problems, novel solutions |

```python
# View tier distribution
analytics = router.routing_analytics()
print(analytics['tier_distribution'])
# {'trivial': 0.25, 'simple': 0.30, 'moderate': 0.25, 'complex': 0.15, 'expert': 0.05}
```

## File-Based State Persistence

All routing decisions and learning data persists to JSON files.

```
routing_data/
├── routing_examples.json    # Classification training data
├── routing_model.json       # TF-IDF model weights
├── routing_decisions.json   # Decision history
├── model_profiles.json      # Quality scores per model/tier
└── router_config.json       # Model registry and settings
```

```python
# Manual state management
router.save()                    # Save all state
router.load()                    # Load from disk
router.backup("backup_dir")      # Create backup
router.export_data()             # Export for analysis
```

## MCP Server Integration

Optional MCP server for external integrations.

```python
from antaris_router.mcp import MCPServer

# Start MCP server
server = MCPServer(router, port=8000)
server.start()

# MCP endpoints
# GET /route?prompt=... - Get routing decision
# POST /outcome - Report outcome
# GET /analytics - View routing statistics
```

## Legacy Router (v1 API)

Keyword-based classification with SLA monitoring.

```python
from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,
    budget_per_hour_usd=5.00,
    min_quality_score=0.7
)

router = Router(config_path="config.json", sla=sla)
decision = router.route("Implement user authentication")

# SLA monitoring
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()
```

## Integration Examples

**With OpenAI:**
```python
import openai

result = router.route(prompt)
response = openai.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)
```

**With Anthropic:**
```python
import anthropic

result = router.route(prompt)
response = anthropic.messages.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)
```

**With Local Models (Ollama):**
```python
import requests

# Register local model at $0 cost
router.register_model(ModelConfig(
    name="llama3-8b-local",
    tier_range=("trivial", "simple"),
    cost_per_1k_input=0.0,
    cost_per_1k_output=0.0
))

result = router.route(prompt)
if "local" in result.model:
    response = requests.post("http://localhost:11434/api/generate", 
                           json={"model": result.model, "prompt": prompt})
```

## Architecture

```
AdaptiveRouter
├── SemanticClassifier
│   └── TFIDFVectorizer      # Term frequency analysis
├── QualityTracker
│   ├── RoutingDecision      # Decision records
│   └── ModelProfiles        # Per-model quality scores
├── ContextAdjuster          # Context-aware tier adjustment
├── FallbackChain           # Model escalation logic
└── ABTester                # Validation routing

Router (Legacy)
├── TaskClassifier          # Keyword-based classification
├── ModelRegistry           # Model capabilities
├── CostTracker             # Usage analysis
└── SLAMonitor              # Budget and latency enforcement
```

## Testing

```bash
git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-router
pip install pytest
python -m pytest tests/ -v
```

All 194 tests pass. Zero external dependencies required.

## Performance Characteristics

- **Cold start latency**: 0.05ms median
- **Memory usage**: <5MB typical workload
- **Classification accuracy**: 100% on test suite (8/8 cases)
- **Storage overhead**: ~1KB per 1000 routing decisions
- **TF-IDF model size**: ~50KB for 5-tier classification

## Limitations

- Classification is statistical, not deterministic
- Requires outcome feedback for learning
- TF-IDF less accurate than embeddings for edge cases
- No real-time pricing data
- Does not call models directly

## License

Apache 2.0 License. See LICENSE for details.

---

**Part of the antaris-suite:**
- [antaris-memory](https://pypi.org/project/antaris-memory/) - Persistent memory for agents
- [antaris-guard](https://pypi.org/project/antaris-guard/) - Security and prompt injection detection  
- [antaris-context](https://pypi.org/project/antaris-context/) - Context window optimization
