Metadata-Version: 2.4
Name: bithuman
Version: 1.6.1
Summary: Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream live avatars to browsers. 1-2 CPU cores, <200ms latency. ARM, x86, macOS.
License: Commercial
Project-URL: Homepage, https://bithuman.ai
Project-URL: Documentation, https://docs.bithuman.ai/#/
Project-URL: Repository, https://github.com/bithuman-product/platform
Project-URL: Changelog, https://github.com/bithuman-product/platform/releases
Keywords: avatar,digital-human,lip-sync,real-time,ai,visual-agent,conversational-ai,edge-ai,voice-agent,video-chatbot,ai-assistant
Platform: Linux
Platform: Mac OS X
Platform: Windows
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Multimedia :: Graphics :: 3D Rendering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: OS Independent
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.26.0
Requires-Dist: h5py~=3.13
Requires-Dist: loguru~=0.7
Requires-Dist: soxr>=0.5
Requires-Dist: soundfile~=0.13
Requires-Dist: pydantic~=2.10
Requires-Dist: pydantic-settings~=2.8
Requires-Dist: networkx<4.0,>=3.1
Requires-Dist: pyzmq~=26.2; python_version < "3.14"
Requires-Dist: msgpack~=1.1
Requires-Dist: PyYAML~=6.0
Requires-Dist: aiohttp~=3.11
Requires-Dist: onnxruntime>=1.18; python_version >= "3.10"
Requires-Dist: onnxruntime>=1.14; python_version < "3.10"
Requires-Dist: eval_type_backport>=0.1.1; python_version < "3.10"
Requires-Dist: av>=12.0
Requires-Dist: PyJWT>=2.8
Requires-Dist: requests>=2.31
Requires-Dist: lz4>=4.3
Requires-Dist: PyTurboJPEG>=1.7
Requires-Dist: Pillow>=9.0
Requires-Dist: opencv-python-headless>=4.8
Requires-Dist: tqdm>=4.60
Provides-Extra: agent
Requires-Dist: livekit-agents~=1.1; extra == "agent"
Dynamic: platform
Dynamic: requires-python

# bitHuman Avatar Runtime

![bitHuman Banner](https://docs.bithuman.ai/docs/assets/images/bithuman-banner.jpg)

**Real-time avatar engine for visual AI agents, digital humans, and creative characters.**

[![PyPI version](https://badge.fury.io/py/bithuman.svg)](https://pypi.org/project/bithuman/)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

bitHuman powers **visual AI agents** and **conversational AI** with photorealistic avatars and real-time lip-sync. Build voice agents with faces, video chatbots, AI assistants, and interactive digital humans — all running on edge devices with just 1-2 CPU cores and <200ms latency. Raw generation speed is **100+ FPS on CPU alone**, enabling real-time streaming applications.

## Installation

```bash
pip install bithuman --upgrade
```

## Optimize Your Models

If you have existing `.imx` models, convert them to **IMX v2** format for dramatically better performance:

```bash
bithuman convert avatar.imx
```

IMX v2 replaces the legacy TAR/H.264 container with a binary format that uses **WebP-encoded lip-sync patches** and an indexed file layout for instant random access.

| Metric | Legacy (TAR) | IMX v2 | Improvement |
|--------|-------------|--------|-------------|
| **Model size** | 100 MB | 50-70 MB | 30-50% smaller |
| **Load time** | ~10s | ~1s | 10x faster |
| **Runtime speed** | ~30 FPS | 100+ FPS | 3-10x faster |
| **Peak memory** | ~10 GB | ~2 GB | 80% less |

Conversion is automatic on first load, but running `bithuman convert` ahead of time saves the wait.

## CLI Quick Start

The `bithuman` CLI provides everything you need to work with avatar models.

### Generate a lip-synced video

```bash
bithuman generate avatar.imx --audio speech.wav --key YOUR_API_KEY
```

Takes an avatar model (`.imx`) and an audio file, outputs a lip-synced MP4 video with audio. The avatar animates naturally with idle motion, blinking, head movements, and precise lip-sync to the speech.

### Stream a live avatar to your browser

```bash
# Terminal 1: Start the streaming server
bithuman stream avatar.imx --key YOUR_API_KEY

# Terminal 2: Send audio to trigger lip-sync
bithuman speak speech.wav
```

Open `http://localhost:3001` to see the avatar streaming live with full motion graph (idle animations, transitions). Audio plays back through the browser via WebSocket.

### Inspect a model

```bash
bithuman info avatar.imx           # Show model metadata
bithuman list-videos avatar.imx    # List all videos in the model
```

### All CLI commands

| Command | Description |
|---------|-------------|
| `bithuman generate <model> --audio <file>` | Generate lip-synced MP4 video from model + audio |
| `bithuman stream <model>` | Start live avatar streaming server at localhost:3001 |
| `bithuman speak <audio>` | Send audio to a running stream server |
| `bithuman action <name>` | Trigger a video action on a running stream server (e.g. wave, nod) |
| `bithuman info <model>` | Show model metadata (format, videos, audio clusters) |
| `bithuman list-videos <model>` | List all videos contained in a model |
| `bithuman convert <model>` | Convert legacy .imx to optimized IMX v2 format |
| `bithuman validate <path>` | Validate model(s) load correctly |

## Python API

For programmatic control over the avatar pipeline:

```python
import asyncio
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16

async def main():
    # Load the avatar model
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret="YOUR_API_KEY",
    )
    await runtime.start()

    # Load and stream audio
    audio_float, sr = load_audio("speech.wav")
    audio_int16 = float32_to_int16(audio_float)

    async def stream_audio():
        chunk_size = sr // 25  # 25 chunks/sec = video FPS
        for i in range(0, len(audio_int16), chunk_size):
            await runtime.push_audio(
                audio_int16[i:i + chunk_size].tobytes(), sr
            )
        await runtime.flush()

    asyncio.create_task(stream_audio())

    # Receive lip-synced video frames
    async for frame in runtime.run():
        if frame.end_of_speech:
            break
        # frame.bgr_image — numpy array (H, W, 3), BGR format
        # frame.rgb_image — numpy array (H, W, 3), RGB format
        # frame.audio_chunk — synchronized audio (int16, 16kHz mono)

    await runtime.stop()

asyncio.run(main())
```

## How It Works

1. **Load model** — `.imx` file contains the avatar's appearance and animations
2. **Push audio** — Stream audio bytes via `push_audio()`, call `flush()` when done
3. **Get frames** — Iterate `runtime.run()` to receive lip-synced video frames

The runtime handles the full motion graph internally: idle animations, talking with lip-sync, and smooth transitions between states.

## Performance

| Feature | Spec |
|---------|------|
| **Raw FPS** | 100+ on CPU (Intel i5-12400 / Apple M2) |
| **CPU Usage** | 1-2 cores |
| **Memory** | ~2 GB RAM (IMX v2) |
| **Latency** | <200ms end-to-end |
| **Lip-sync format** | WebP patches — fast decode, compact storage |

## Supported Platforms

| Platform | Architecture |
|----------|--------------|
| **Linux** | x86_64, ARM64 |
| **macOS** | Intel, Apple Silicon |
| **Windows** | x86_64 |
| **Edge Devices** | Raspberry Pi 4/5, Mac Mini, Chromebook |

## Use Cases

- **Visual AI Agents** — Give your voice agents a face with real-time lip-sync
- **Conversational AI** — Build video chatbots and AI assistants with human-like presence
- **Live Streaming** — Stream avatars to browsers, integrate with LiveKit, WebRTC, or custom pipelines
- **Video Generation** — Generate lip-synced video content from audio at 100+ FPS
- **Edge AI Deployment** — Run avatars locally on Raspberry Pi, Mac Mini, or any edge device
- **Digital Twins** — Create photorealistic replicas for customer service, education, or entertainment

## Integrations & Self-Hosting

Build visual AI agents with TTS, STT, and LLMs:

- **[LiveKit Integration](https://docs.bithuman.ai/#/examples/livekit-bithuman-plugin-self-hosted)** — Real-time video streaming for conversational AI
- **[Self-Hosting Guide](https://docs.bithuman.ai/#/)** — Deploy on your own edge infrastructure

## Getting a bitHuman Model

To generate your own avatar model (`.imx` file):

1. Visit [bithuman.ai](https://bithuman.ai)
2. Register and subscribe
3. Upload a photo or video to create your avatar
4. Download your `.imx` model file

## Learn More

- [Documentation](https://docs.bithuman.ai/#/)
- [bithuman.ai](https://bithuman.ai)

## License

Commercial license required. See [bithuman.ai/pricing](https://bithuman.ai/pricing).
