Metadata-Version: 2.4
Name: tiny-agent-os
Version: 1.2.5
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: httpx>=0.28.0
Requires-Dist: ruff>=0.9.0 ; extra == 'dev'
Requires-Dist: mypy>=1.14.0 ; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0 ; extra == 'dev'
Requires-Dist: pytest>=8.0.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.25.0 ; extra == 'dev'
Requires-Dist: python-dotenv>=1.0.0 ; extra == 'dev'
Requires-Dist: grimp>=3.0 ; extra == 'dev'
Requires-Dist: vulture>=2.11.0 ; extra == 'dev'
Requires-Dist: pylint>=3.0.0 ; extra == 'dev'
Requires-Dist: python-dotenv>=1.0.0 ; extra == 'examples'
Provides-Extra: dev
Provides-Extra: examples
License-File: LICENSE
Summary: Python agent loop
Keywords: agent,llm,openrouter,streaming
Author: Fabian
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# TinyAgent

![tinyAgent Logo](https://raw.githubusercontent.com/alchemiststudiosDOTai/tinyAgent/master/static/images/new-ta-logo.jpg)

A small, modular agent framework for building LLM-powered applications in Python.

Inspired by [smolagents](https://github.com/huggingface/smolagents) and [Pi](https://github.com/badlogic/pi-mono) — borrowing the minimal-abstraction philosophy from the former and the conversational agent loop from the latter.

> **Beta** — TinyAgent is usable but not production-ready. APIs may change between minor versions.

> **Note:** Reference copy of alchemy-rs available at `/home/tuna/alchemy-rs-ref`

## Overview

TinyAgent provides a lightweight foundation for creating conversational AI agents with tool use capabilities. It features:

- **Streaming-first architecture**: All LLM interactions support streaming responses
- **Tool execution**: Define and execute tools with structured outputs
- **Event-driven**: Subscribe to agent events for real-time UI updates
- **Provider agnostic**: Works with any OpenAI-compatible `/chat/completions` endpoint (OpenRouter, OpenAI, Chutes, local servers)
- **Prompt caching**: Reduce token costs and latency with Anthropic-style cache breakpoints
- **Dual provider paths**: Pure-Python or optional Rust binding via PyO3 for native-speed streaming
- **Type-safe**: Full type hints throughout

## Quick Start

```python
import asyncio
from tinyagent import Agent, AgentOptions, OpenRouterModel, stream_openrouter

# Create an agent
agent = Agent(
    AgentOptions(
        stream_fn=stream_openrouter,
        session_id="my-session"
    )
)

# Configure
agent.set_system_prompt("You are a helpful assistant.")
agent.set_model(OpenRouterModel(id="anthropic/claude-3.5-sonnet"))
# Optional: use any OpenAI-compatible /chat/completions endpoint
# agent.set_model(OpenRouterModel(id="gpt-4o-mini", base_url="https://api.openai.com/v1/chat/completions"))

# Simple prompt
async def main():
    response = await agent.prompt_text("What is the capital of France?")
    print(response)

asyncio.run(main())
```

## Installation

```bash
pip install tiny-agent-os
```

## Core Concepts

### Agent

The [`Agent`](api/agent.md) class is the main entry point. It manages:

- Conversation state (messages, tools, system prompt)
- Streaming responses
- Tool execution
- Event subscription

### Messages

Messages follow a typed dictionary structure:

- `UserMessage`: Input from the user
- `AssistantMessage`: Response from the LLM
- `ToolResultMessage`: Result from tool execution

### Tools

Tools are functions the LLM can call:

```python
from tinyagent import AgentTool, AgentToolResult

async def calculate_sum(tool_call_id: str, args: dict, signal, on_update) -> AgentToolResult:
    result = args["a"] + args["b"]
    return AgentToolResult(
        content=[{"type": "text", "text": str(result)}]
    )

tool = AgentTool(
    name="sum",
    description="Add two numbers",
    parameters={
        "type": "object",
        "properties": {
            "a": {"type": "number"},
            "b": {"type": "number"}
        },
        "required": ["a", "b"]
    },
    execute=calculate_sum
)

agent.set_tools([tool])
```

### Events

The agent emits events during execution:

- `AgentStartEvent` / `AgentEndEvent`: Agent run lifecycle
- `TurnStartEvent` / `TurnEndEvent`: Single turn lifecycle
- `MessageStartEvent` / `MessageUpdateEvent` / `MessageEndEvent`: Message streaming
- `ToolExecutionStartEvent` / `ToolExecutionUpdateEvent` / `ToolExecutionEndEvent`: Tool execution

Subscribe to events:

```python
def on_event(event):
    print(f"Event: {event.type}")

unsubscribe = agent.subscribe(on_event)
```

### Prompt Caching

TinyAgent supports [Anthropic-style prompt caching](api/caching.md) to reduce costs on multi-turn conversations. Enable it when creating the agent:

```python
agent = Agent(
    AgentOptions(
        stream_fn=stream_openrouter,
        session_id="my-session",
        enable_prompt_caching=True,
    )
)
```

Cache breakpoints are automatically placed on user message content blocks so the prompt prefix stays cached across turns. See [Prompt Caching](api/caching.md) for details.

## Rust Binding: `tinyagent._alchemy`

TinyAgent ships with an optional Rust-based LLM provider implemented in
`src/lib.rs`. It wraps the [`alchemy-llm`](https://crates.io/crates/alchemy-llm)
Rust crate and exposes it to Python via [PyO3](https://pyo3.rs) as
`tinyagent._alchemy`, giving you native-speed OpenAI-compatible streaming without
leaving the Python process.

### Why

The pure-Python providers (`openrouter_provider.py`, `proxy.py`) work fine, but the Rust
binding gives you:

- **Lower per-token overhead** -- SSE parsing, JSON deserialization, and event dispatch all
  happen in compiled Rust with a multi-threaded Tokio runtime.
- **Unified provider abstraction** -- `alchemy-llm` normalizes differences across providers
  (OpenRouter, Anthropic, custom endpoints) behind a single streaming interface.
- **Full event fidelity** -- text deltas, thinking deltas, tool call deltas, and terminal
  events are all surfaced as typed Python dicts.

### How it works

```
Python (async)             Rust (Tokio)
─────────────────          ─────────────────────────
stream_alchemy_*()  ──>    alchemy_llm::stream()
                            │
AlchemyStreamResponse       ├─ SSE parse + deserialize
  .__anext__()       <──    ├─ event_to_py_value()
  (asyncio.to_thread)       └─ mpsc channel -> Python
```

1. Python calls `openai_completions_stream(model, context, options)` which is a `#[pyfunction]`.
2. The Rust side builds an `alchemy-llm` request, opens an SSE stream on a shared Tokio
   runtime, and sends events through an `mpsc` channel.
3. Python reads events by calling the blocking `next_event()` method via
   `asyncio.to_thread`, making it async-compatible without busy-waiting.
4. A terminal `done` or `error` event signals the end of the stream. The final
   `AssistantMessage` dict is available via `result()`.

### Building

Requires a Rust toolchain (1.70+) and [maturin](https://www.maturin.rs/).

```bash
pip install maturin
maturin develop            # debug build, installs into current venv
maturin develop --release  # optimized build
```

### Python API

Two functions are exposed from the `tinyagent._alchemy` module:

| Function | Description |
|---|---|
| `collect_openai_completions(model, context, options?)` | Blocking. Consumes the entire stream and returns `{"events": [...], "final_message": {...}}`. Useful for one-shot calls. |
| `openai_completions_stream(model, context, options?)` | Returns an `OpenAICompletionsStream` handle for incremental consumption. |

The `OpenAICompletionsStream` handle has two methods:

| Method | Description |
|---|---|
| `next_event()` | Blocking. Returns the next event dict, or `None` when the stream ends. |
| `result()` | Blocking. Returns the final assistant message dict. |

All three arguments are plain Python dicts:

```python
model = {
    "id": "anthropic/claude-3.5-sonnet",
    "base_url": "https://openrouter.ai/api/v1/chat/completions",
    "provider": "openrouter",          # required for env-key fallback/inference
    "api": "openai-completions",       # optional; inferred from provider when omitted/blank
    "headers": {"X-Custom": "val"},   # optional
    "reasoning": False,                  # optional
    "context_window": 128000,            # optional
    "max_tokens": 4096,                  # optional
}

context = {
    "system_prompt": "You are helpful.",
    "messages": [
        {"role": "user", "content": [{"type": "text", "text": "Hello"}]}
    ],
    "tools": [                       # optional
        {"name": "sum", "description": "Add numbers", "parameters": {...}}
    ],
}

options = {
    "api_key": "sk-...",            # optional
    "temperature": 0.7,              # optional
    "max_tokens": 1024,              # optional
}
```

**Routing contract (`provider`, `api`, `base_url`)**:
- `provider`: backend identity used for API-key fallback and provider defaults
- `api`: alchemy unified API selector (`openai-completions` or `minimax-completions`)
- `base_url`: concrete HTTP endpoint

If `api` is omitted/blank, the Python side infers:
- `provider in {"minimax", "minimax-cn"}` => `minimax-completions`
- otherwise => `openai-completions`

Legacy API aliases are normalized for backward compatibility:
- `api="openrouter"` / `api="openai"` => `openai-completions`
- `api="minimax"` => `minimax-completions`

### Using via TinyAgent (high-level)

You don't need to call the Rust binding directly. Use the `alchemy_provider` module:

```python
from tinyagent import Agent, AgentOptions
from tinyagent.alchemy_provider import OpenAICompatModel, stream_alchemy_openai_completions

agent = Agent(
    AgentOptions(
        stream_fn=stream_alchemy_openai_completions,
        session_id="my-session",
    )
)
agent.set_model(
    OpenAICompatModel(
        provider="openrouter",
        id="anthropic/claude-3.5-sonnet",
        base_url="https://openrouter.ai/api/v1/chat/completions",
    )
)
```

MiniMax global:
```python
agent.set_model(
    OpenAICompatModel(
        provider="minimax",
        id="MiniMax-M2.5",
        base_url="https://api.minimax.io/v1/chat/completions",
        # api is optional here; inferred as "minimax-completions"
    )
)
```

MiniMax CN:
```python
agent.set_model(
    OpenAICompatModel(
        provider="minimax-cn",
        id="MiniMax-M2.5",
        base_url="https://api.minimax.chat/v1/chat/completions",
        # api is optional here; inferred as "minimax-completions"
    )
)
```

Cross-provider tool-call smoke examples:
- One-agent workflow: `examples/example_tool_calls_three_providers.py`
- Raw Rust binding workflow (multi-turn tools): `scripts/smoke_rust_tool_calls_three_providers.py`
  - Command: `uv run python scripts/smoke_rust_tool_calls_three_providers.py`


### Limitations

- Rust binding currently dispatches only `openai-completions` and `minimax-completions`.
- Image blocks are not yet supported (text and thinking blocks work).
- `next_event()` is blocking and runs in a thread via `asyncio.to_thread` -- this adds
  slight overhead compared to a native async generator, but keeps the GIL released during
  the Rust work.

## Documentation

- [Architecture](ARCHITECTURE.md): System design and component interactions
- [API Reference](api/): Detailed module documentation
- [Prompt Caching](api/caching.md): Cache breakpoints, cost savings, and provider requirements
- [OpenAI-Compatible Endpoints](api/openai-compatible-endpoints.md): Using `OpenRouterModel.base_url` with OpenRouter, OpenAI, Chutes, and local compatible backends
- [Usage Semantics](api/usage-semantics.md): Unified `message["usage"]` schema across Python and Rust provider paths
- [Changelog](../CHANGELOG.md): Release history

## Project Structure

```
tinyagent/
├── agent.py              # Agent class
├── agent_loop.py         # Core agent execution loop
├── agent_tool_execution.py  # Tool execution helpers
├── agent_types.py        # Type definitions
├── caching.py            # Prompt caching utilities
├── openrouter_provider.py   # OpenRouter integration
├── alchemy_provider.py   # Rust-based provider (PyO3)
├── proxy.py              # Proxy server integration
└── proxy_event_handlers.py  # Proxy event parsing
```

