Metadata-Version: 2.4
Name: tryvoice
Version: 0.1.0
Summary: Pluggable voice chat runtime
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi
Requires-Dist: python-multipart
Requires-Dist: uvicorn[standard]
Requires-Dist: aiohttp
Requires-Dist: python-dotenv
Requires-Dist: loguru
Requires-Dist: edge-tts
Requires-Dist: psutil
Provides-Extra: vad
Requires-Dist: onnxruntime>=1.16; extra == "vad"
Requires-Dist: numpy; extra == "vad"
Requires-Dist: pydub; extra == "vad"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: httpx; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# TryVoice

Hands-free voice runtime for AI agents. Talk to your AI coding assistant without touching the keyboard.

TryVoice wraps AI agents (like [OpenClaw](https://github.com/anthropics/openclaw) and [Claude Code](https://docs.anthropic.com/en/docs/claude-code)) into a voice interface with wake word activation, push-to-talk, and real-time streaming — all running in your browser.

> **Early Preview (v0.1.0-alpha)** — actively developed, expect rough edges.

## What It Does

- **Wake word activation** — say a keyword to start talking, no hands needed (powered by [OpenWakeWord](https://github.com/dscripka/openWakeWord))
- **Push-to-talk** — hold a button to speak, release to send
- **Real-time streaming** — hear the AI respond as it generates, with interruptible playback
- **Multi-bot slots** — up to 4 independent agent sessions side by side
- **Mobile-ready** — PWA support, works on phone browsers
- **Pluggable adapters** — connect any AI agent via the Adapter SDK

## Prerequisites

TryVoice is a **voice layer on top of existing AI agents**. You need at least one of:

- **[OpenClaw](https://github.com/anthropics/openclaw)** — running with a gateway endpoint
- **[Claude Code](https://docs.anthropic.com/en/docs/claude-code)** — installed on the same machine (`claude` CLI available in PATH)

> More agent adapters coming soon. See [Building an Adapter](#building-an-adapter) to connect your own agent.

## Quick Start

### 1. Clone and setup

```bash
git clone https://github.com/AaronZ021/tryvoice-oss.git
cd tryvoice
bash scripts/setup.sh
```

This creates a virtual environment, installs all packages, and builds the frontend.

### 2. Configure

```bash
# Edit .env (created from .env.example during setup)
vim .env
```

**For Claude Code** (voice-driven coding on your local machine):

```bash
TRYVOICE_ACTIVE_ADAPTER=claude-code
# That's it — TryVoice auto-discovers your Claude Code sessions
```

**For OpenClaw** (voice interface to OpenClaw agents):

```bash
TRYVOICE_ACTIVE_ADAPTER=openclaw
OPENCLAW_GATEWAY_URL=http://localhost:18789
OPENCLAW_GATEWAY_TOKEN=your_gateway_token_here
```

### 3. Run

```bash
source .venv/bin/activate
python -m backend.cli
# Open https://localhost:7860 in your browser
```

### 4. Enable wake word (optional but recommended)

Once the UI loads, click the microphone settings icon and select a wake word keyword (e.g., "hey jarvis", "alexa", "americano"). The browser will listen in the background — say the keyword to start a voice turn hands-free.

## Docker

```bash
# Clone and configure
git clone https://github.com/AaronZ021/tryvoice-oss.git
cd tryvoice
cp .env.example .env
# Edit .env with your adapter settings (see above)

docker compose up
# Open https://localhost:7860
```

## Architecture

```
┌─────────────┐     WebSocket      ┌──────────────────┐
│  Browser UI  │◄──────────────────►│   TryVoice       │
│  (PWA)       │                    │   Runtime         │
│              │                    │                   │
│  Wake Word   │                    │  ┌────────────┐   │
│  STT / TTS   │                    │  │  Adapter    │   │──► OpenClaw
│  Audio I/O   │                    │  │  Registry   │   │──► Claude Code
│              │                    │  │  (plugin)   │   │──► Your adapter
└─────────────┘                    └──┴────────────┴───┘
```

**Voice flow:** Wake word / PTT → STT (browser Web Speech API or Groq Whisper) → Adapter → Agent → Streaming text → TTS (Edge TTS) → Audio playback

## Voice Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `TRYVOICE_ACTIVE_ADAPTER` | `echo` | Active adapter (`claude-code`, `openclaw`, or custom) |
| `GROQ_API_KEY` | — | Groq API key for server-side STT (optional, browser fallback) |
| `EDGE_TTS_VOICE` | `zh-CN-XiaoxiaoNeural` | Edge TTS voice (300+ voices available) |
| `PORT` | `7860` | Server port |

See [.env.example](.env.example) for all options.

## Supported Adapters

| Adapter | Package | Use Case |
|---------|---------|----------|
| `claude-code` | `tryvoice-adapter-claude-code` | Voice control for Claude Code terminal sessions |
| `openclaw` | `tryvoice-adapter-openclaw` | Voice interface to OpenClaw agent gateway |
| `echo` | Built-in | Testing and demo (echoes your speech back) |

## Building an Adapter

Connect TryVoice to any AI agent by implementing the Adapter protocol:

```bash
pip install tryvoice-adapter-sdk
```

```python
from tryvoice_adapter_sdk import AgentAdapter, AdapterCapabilities, AdapterEvent

class MyAdapter:
    def report_capabilities(self) -> AdapterCapabilities:
        return AdapterCapabilities(supports_stream=True, ...)

    async def stream_user_turn(self, session_key, text, ...):
        # Call your agent, yield AdapterEvent chunks
        yield AdapterEvent(kind="token", text="Hello!")
        yield AdapterEvent(kind="turn_end")
```

Register via entry point in `pyproject.toml`:

```toml
[project.entry-points."tryvoice.adapters"]
my-agent = "my_package.adapter:MyAdapter"
```

## Development

### Prerequisites

- Python 3.9+ (3.11 recommended)
- Node.js 20+ (for frontend build)

### Setup

```bash
git clone https://github.com/AaronZ021/tryvoice-oss.git
cd tryvoice
bash scripts/setup.sh
source .venv/bin/activate
python -m backend.cli
```

### Project structure

```
tryvoice/
├── apps/
│   ├── host-runtime/      # Python FastAPI backend (adapter layer, session FSM, voice providers)
│   └── client-web/        # TypeScript frontend (Vite, state machine, wake word, audio)
├── scripts/               # Setup and build scripts
├── pyproject.toml          # Python package config
├── Dockerfile              # Multi-stage build (Node + Python)
└── docker-compose.yml      # Single-command deployment
```

## License

Apache License 2.0 — see [LICENSE](LICENSE).
