Metadata-Version: 2.4
Name: infershrink
Version: 0.1.0a1
Summary: Cut LLM costs 80%+ with one line of code. Compresses prompts and routes to the cheapest capable model.
Project-URL: Homepage, https://github.com/MusashiMiyamoto1-cloud/tokenshrink
Project-URL: Repository, https://github.com/MusashiMiyamoto1-cloud/tokenshrink
Project-URL: Issues, https://github.com/MusashiMiyamoto1-cloud/tokenshrink/issues
Author-email: Musashi Miyamoto <MusashiMiyamoto1@icloud.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai,anthropic,cost-optimization,llm,model-routing,openai,prompt-compression
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Provides-Extra: all
Requires-Dist: anthropic>=0.18; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Requires-Dist: tokenshrink>=0.1; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18; extra == 'anthropic'
Provides-Extra: compression
Requires-Dist: tokenshrink>=0.1; extra == 'compression'
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Description-Content-Type: text/markdown

# InferShrink

**Cut LLM costs 80%+ with one line of code.**

InferShrink wraps your OpenAI or Anthropic client to automatically compress prompts and route requests to the cheapest model that can handle the task — no code changes required.

[![PyPI](https://img.shields.io/pypi/v/infershrink)](https://pypi.org/project/infershrink/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/pypi/pyversions/infershrink)](https://pypi.org/project/infershrink/)

## Installation

```bash
pip install infershrink

# With OpenAI support
pip install infershrink[openai]

# With prompt compression (TokenShrink)
pip install infershrink[compression]

# Everything
pip install infershrink[all]
```

## Quick Start

```python
import openai
from infershrink import optimize

client = optimize(openai.Client())

# Use exactly as before — InferShrink handles the rest
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# ↑ This simple question was automatically routed to gpt-4o-mini
#   saving you ~95% on this request
```

## How It Works

```
                        ┌─────────────────────┐
    Your Code           │     InferShrink      │         LLM API
                        │                      │
  ┌──────────┐   ──►   │  1. Classify task     │
  │ messages  │         │     complexity        │
  │ model     │         │                      │
  └──────────┘         │  2. Compress prompt   │   ──►  ┌──────────┐
                        │     (TokenShrink)     │        │ Cheapest │
                        │                      │        │ capable  │
                        │  3. Route to cheap    │   ◄──  │ model    │
                        │     model             │        └──────────┘
                        │                      │
                        │  4. Track savings     │
                        └─────────────────────┘
```

### The Pipeline

1. **Classify** — Rule-based classifier analyzes your messages for complexity signals (code blocks, tool calls, message length, sensitive keywords)
2. **Compress** — If [TokenShrink](https://pypi.org/project/tokenshrink/) is installed, prompts are compressed to reduce token count
3. **Route** — Simple tasks go to cheap models (gpt-4o-mini), complex tasks stay on powerful ones (gpt-4o, claude-opus-4-6)
4. **Track** — Every request's cost savings are tracked so you can see your ROI

### Complexity Levels

| Level | Signals | Default Model |
|-------|---------|---------------|
| **SIMPLE** | Short messages, no code, basic questions (<500 tokens) | gpt-4o-mini |
| **MODERATE** | Some code, medium length, summarization | gpt-4o |
| **COMPLEX** | Heavy code, multi-step reasoning, long prompts | gpt-5.2 / claude-opus-4-6 |
| **SECURITY_CRITICAL** | Passwords, API keys, financial data | *(never downgraded, never compressed)* |

## Configuration

Override any defaults:

```python
client = optimize(openai.Client(), config={
    "tiers": {
        "tier1": {"models": ["gpt-4o-mini"], "max_complexity": "SIMPLE"},
        "tier2": {"models": ["gpt-4o"], "max_complexity": "MODERATE"},
        "tier3": {"models": ["claude-opus-4-6"], "max_complexity": "COMPLEX"},
    },
    "compression": {
        "enabled": True,
        "min_tokens": 500,
        "skip_for": ["SECURITY_CRITICAL"],
    },
    "quality_floor": 0.95,
    "cost_tracking": True,
})
```

## Cost Tracking

```python
from infershrink import optimize

client = optimize(openai.Client())

# ... make some API calls ...

# View savings
print(client.infershrink_tracker.summary())
# InferShrink Session Stats
# ========================================
# Total requests:       42
# Requests downgraded:  38
# Requests compressed:  25
# Original tokens:      156,000
# Compressed tokens:    98,000
# Tokens saved:         58,000 (37.2%)
# Estimated savings:    $2.3400
# ========================================

# Programmatic access
stats = client.infershrink_tracker.stats()
print(f"Saved ${stats.total_estimated_savings_usd:.2f}")
```

## Anthropic Support

Works the same way:

```python
import anthropic
from infershrink import optimize

client = optimize(anthropic.Anthropic())

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this article..."}],
)
```

## Comparison

| Feature | InferShrink | RouteLLM | Burnwise |
|---------|-------------|----------|----------|
| One-line integration | ✅ | ❌ | ❌ |
| Prompt compression | ✅ (TokenShrink) | ❌ | ❌ |
| Model routing | ✅ | ✅ | ✅ |
| Cost tracking | ✅ | ❌ | ✅ |
| Security-aware | ✅ | ❌ | ❌ |
| Zero dependencies | ✅ | ❌ | ❌ |
| OpenAI + Anthropic | ✅ | ✅ | ❌ |

## How TokenShrink Helps

[TokenShrink](https://pypi.org/project/tokenshrink/) compresses natural language prompts while preserving semantic meaning. When installed, InferShrink uses it to:

- Reduce token count by 20–60% on verbose prompts
- Lower costs even *before* model routing kicks in
- Skip compression for short prompts or security-critical content

Install it: `pip install tokenshrink`

## API Reference

### `optimize(client, config=None)`

Wrap an OpenAI or Anthropic client with InferShrink optimizations.

- **client** — An `openai.Client()` or `anthropic.Anthropic()` instance
- **config** — Optional dict to override default configuration
- **Returns** — A wrapped client with the same interface

### `classify(messages)`

Classify message complexity without wrapping a client.

```python
from infershrink import classify

result = classify([{"role": "user", "content": "Hello!"}])
print(result.complexity)  # Complexity.SIMPLE
```

### `Tracker`

Access via `client.infershrink_tracker`:

- `.stats()` → `SessionStats` dataclass
- `.summary()` → Human-readable string
- `.reset()` → Clear all tracked data

## License

Apache 2.0 — see [LICENSE](LICENSE).

## Links

- **TokenShrink**: [pypi.org/project/tokenshrink](https://pypi.org/project/tokenshrink/)
- **Source**: [github.com/MusashiMiyamoto1-cloud/tokenshrink](https://github.com/MusashiMiyamoto1-cloud/tokenshrink)
