Metadata-Version: 2.4
Name: bithuman
Version: 1.7.8
Summary: Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream live avatars to browsers. 1-2 CPU cores, <200ms latency. ARM, x86, macOS.
License: Commercial
Project-URL: Homepage, https://bithuman.ai
Project-URL: Documentation, https://docs.bithuman.ai/#/
Project-URL: Repository, https://github.com/bithuman-product/platform
Project-URL: Changelog, https://github.com/bithuman-product/platform/releases
Keywords: avatar,digital-human,lip-sync,real-time,ai,visual-agent,conversational-ai,edge-ai,voice-agent,video-chatbot,ai-assistant
Platform: Linux
Platform: Mac OS X
Platform: Windows
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Multimedia :: Graphics :: 3D Rendering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: OS Independent
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.26.0
Requires-Dist: h5py~=3.13
Requires-Dist: loguru~=0.7
Requires-Dist: soxr>=0.5
Requires-Dist: soundfile~=0.13
Requires-Dist: pydantic~=2.10
Requires-Dist: pydantic-settings~=2.8
Requires-Dist: networkx<4.0,>=3.1
Requires-Dist: pyzmq~=26.2; python_version < "3.14"
Requires-Dist: msgpack~=1.1
Requires-Dist: PyYAML~=6.0
Requires-Dist: aiohttp~=3.11
Requires-Dist: onnxruntime>=1.18; python_version >= "3.10"
Requires-Dist: onnxruntime>=1.14; python_version < "3.10"
Requires-Dist: eval_type_backport>=0.1.1; python_version < "3.10"
Requires-Dist: av>=12.0
Requires-Dist: PyJWT>=2.8
Requires-Dist: requests>=2.31
Requires-Dist: lz4>=4.3
Requires-Dist: PyTurboJPEG>=1.7
Requires-Dist: Pillow>=9.0
Requires-Dist: opencv-python-headless>=4.8
Requires-Dist: tqdm>=4.60
Provides-Extra: agent
Requires-Dist: livekit-agents~=1.1; extra == "agent"
Dynamic: platform
Dynamic: requires-python

# bitHuman Avatar Runtime

![bitHuman Banner](https://docs.bithuman.ai/docs/assets/images/bithuman-banner.jpg)

**Real-time avatar engine for visual AI agents, digital humans, and creative characters.**

[![PyPI version](https://badge.fury.io/py/bithuman.svg)](https://pypi.org/project/bithuman/)
[![Python](https://img.shields.io/badge/python-3.9--3.14-blue.svg)](https://www.python.org/downloads/)
[![Platforms](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey.svg)]()

bitHuman powers **visual AI agents** and **conversational AI** with photorealistic avatars and real-time lip-sync. Build voice agents with faces, video chatbots, AI assistants, and interactive digital humans — all running on edge devices with just 1-2 CPU cores and <200ms latency. Raw generation speed is **100+ FPS on CPU alone**, enabling real-time streaming applications.

## Installation

```bash
pip install bithuman --upgrade
```

Pre-built wheels for all major platforms — no compilation required:

| | Linux | macOS | Windows |
|---|:---:|:---:|:---:|
| **x86_64** | yes | yes | yes |
| **ARM64** | yes | yes (Apple Silicon) | — |
| **Python** | 3.9 — 3.14 | 3.9 — 3.14 | 3.9 — 3.14 |

For [LiveKit](https://livekit.io) agent integration:

```bash
pip install bithuman[agent]
```

## Quick Start

### Generate a lip-synced video

```bash
bithuman generate avatar.imx --audio speech.wav --key YOUR_API_KEY
```

### Stream a live avatar to your browser

```bash
# Terminal 1: Start the streaming server
bithuman stream avatar.imx --key YOUR_API_KEY

# Terminal 2: Send audio to trigger lip-sync
bithuman speak speech.wav
```

Open `http://localhost:3001` to see the avatar streaming live.

### Python API (async)

```python
import asyncio
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16

async def main():
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret="YOUR_API_KEY",
    )
    await runtime.start()

    # Load and stream audio
    audio, sr = load_audio("speech.wav")
    audio_int16 = float32_to_int16(audio)

    async def stream_audio():
        chunk_size = sr // 25  # match video FPS
        for i in range(0, len(audio_int16), chunk_size):
            await runtime.push_audio(
                audio_int16[i:i + chunk_size].tobytes(), sr
            )
        await runtime.flush()

    asyncio.create_task(stream_audio())

    # Receive lip-synced video frames
    async for frame in runtime.run():
        if frame.has_image:
            image = frame.bgr_image       # numpy (H, W, 3), uint8
            audio = frame.audio_chunk     # synchronized audio
        if frame.end_of_speech:
            break

    await runtime.stop()

asyncio.run(main())
```

### Python API (sync)

```python
from bithuman import Bithuman
from bithuman.audio import load_audio, float32_to_int16

runtime = Bithuman.create(model_path="avatar.imx", api_secret="YOUR_API_KEY")

audio, sr = load_audio("speech.wav")
audio_int16 = float32_to_int16(audio)

chunk_size = sr // 100
for i in range(0, len(audio_int16), chunk_size):
    runtime.push_audio(audio_int16[i:i+chunk_size].tobytes(), sr)
runtime.flush()

for frame in runtime.run():
    if frame.has_image:
        image = frame.bgr_image
    if frame.end_of_speech:
        break
```

## How It Works

1. **Load model** — `.imx` file contains the avatar's appearance, animations, and lip-sync data
2. **Push audio** — Stream audio bytes in real-time via `push_audio()`, call `flush()` when done
3. **Get frames** — Iterate `runtime.run()` to receive lip-synced video frames with synchronized audio

The runtime handles the full motion graph internally: idle animations, talking with lip-sync, head movements, blinking, and smooth transitions between states.

## Performance

| Metric | Value |
|--------|-------|
| **Raw FPS** | 100+ on CPU (Intel i5-12400, Apple M2) |
| **CPU cores** | 1-2 cores at 25 FPS |
| **End-to-end latency** | <200ms |
| **Memory (IMX v2)** | ~200 MB per session |
| **Model load time** | <10ms (IMX v2) |
| **Audio formats** | WAV, MP3, FLAC, OGG, M4A |

## Features

- **Real-time lip-sync** — Audio-driven mouth animation at 25 FPS with synchronized audio output
- **Cross-platform** — Linux, macOS, Windows; x86_64 and ARM64; Python 3.9-3.14
- **Edge-ready** — 1-2 CPU cores, no GPU required for inference
- **Sync + Async** — `Bithuman` for threads, `AsyncBithuman` for async/await
- **Streaming-first** — Push audio chunks in real-time, receive frames as they're generated
- **Actions & emotions** — Trigger avatar gestures (wave, nod) and emotion states (joy, surprise)
- **Interrupt support** — Cancel mid-speech for natural conversation flow
- **LiveKit integration** — Built-in support for [LiveKit Agents](https://docs.livekit.io/agents/) (WebRTC streaming)
- **CLI tools** — Generate videos, stream live, convert models, validate setups
- **IMX v2 format** — Optimized binary container with O(1) random access and WebP patches
- **Zero scipy dependency** — Pure numpy audio pipeline, minimal install footprint

## API Reference

### `AsyncBithuman` / `Bithuman`

The main runtime for avatar animation.

```python
# Create and initialize
runtime = await AsyncBithuman.create(
    model_path="avatar.imx",     # Path to .imx model
    api_secret="API_KEY",        # API secret (recommended)
    # token="JWT_TOKEN",         # Or JWT token directly
)
await runtime.start()

# Push audio (int16 PCM, any sample rate — auto-resampled to 16kHz)
await runtime.push_audio(audio_bytes, sample_rate)
await runtime.flush()            # Signal end of speech
runtime.interrupt()              # Cancel current playback

# Receive frames
async for frame in runtime.run():
    frame.bgr_image              # np.ndarray (H, W, 3) uint8 BGR
    frame.rgb_image              # np.ndarray (H, W, 3) uint8 RGB
    frame.audio_chunk            # AudioChunk — synchronized audio
    frame.end_of_speech          # True when all audio processed
    frame.has_image              # True if image available
    frame.frame_index            # Frame number
    frame.source_message_id      # Correlates to input

# Controls
await runtime.push(VideoControl(action="wave"))          # Trigger action
await runtime.push(VideoControl(target_video="idle"))    # Switch state
runtime.set_muted(True)                                  # Mute processing

# Info
runtime.get_frame_size()          # (width, height)
runtime.get_first_frame()         # First idle frame as np.ndarray
runtime.get_expiration_time()     # Token expiry (unix timestamp)
runtime.is_token_validated()      # Auth status

await runtime.stop()
```

### Data Classes

```python
from bithuman import AudioChunk, VideoControl, VideoFrame, Emotion, EmotionPrediction

# AudioChunk — container for audio data
chunk = AudioChunk(data=np.array([...], dtype=np.int16), sample_rate=16000)
chunk.duration    # float — length in seconds
chunk.bytes       # bytes — raw PCM bytes

# VideoControl — input to the runtime
ctrl = VideoControl(
    audio=chunk,                    # Audio to lip-sync
    action="wave",                  # Trigger action (wave, nod, etc.)
    target_video="talking",         # Switch video state
    end_of_speech=True,             # Mark end of speech
    force_action=False,             # Override action deduplication
    emotion_preds=[                 # Set emotion state
        EmotionPrediction(emotion=Emotion.JOY, score=0.9),
    ],
)

# VideoFrame — output from runtime.run()
frame.bgr_image           # np.ndarray (H, W, 3) uint8 — BGR
frame.rgb_image           # np.ndarray (H, W, 3) uint8 — RGB
frame.audio_chunk         # AudioChunk — synchronized audio
frame.end_of_speech       # bool — True when done
frame.has_image           # bool — True if image available
frame.frame_index         # int — frame number
frame.source_message_id   # Hashable — correlates to VideoControl

# Emotion enum
Emotion.ANGER | Emotion.DISGUST | Emotion.FEAR | Emotion.JOY
Emotion.NEUTRAL | Emotion.SADNESS | Emotion.SURPRISE
```

### Audio Utilities

```python
from bithuman.audio import (
    load_audio,               # Load WAV/MP3/FLAC/OGG/M4A -> (float32, sr)
    float32_to_int16,         # float32 -> int16
    int16_to_float32,         # int16 -> float32
    resample,                 # Resample to target rate
    write_video_with_audio,   # Save MP4 with audio track
    AudioStreamBatcher,       # Real-time audio buffer
)

audio, sr = load_audio("speech.mp3")             # Any format
audio_int16 = float32_to_int16(audio)            # Ready for push_audio
audio_16k = resample(audio, sr, 16000)           # Resample
write_video_with_audio("out.mp4", frames, audio, sr, fps=25)
```

### Exceptions

All exceptions inherit from `BithumanError`:

| Exception | When |
|-----------|------|
| `TokenExpiredError` | JWT has expired |
| `TokenValidationError` | Invalid signature or claims |
| `TokenRequestError` | Auth server unreachable |
| `AccountStatusError` | Billing or access issue (HTTP 402/403) |
| `ModelNotFoundError` | Model file doesn't exist |
| `ModelLoadError` | Corrupt or incompatible model |
| `ModelSecurityError` | Security restriction triggered |
| `RuntimeNotReadyError` | Operation called before initialization |

## LiveKit Agent Integration

Build conversational AI agents with avatar faces using [LiveKit Agents](https://docs.livekit.io/agents/):

```python
from bithuman import AsyncBithuman
from bithuman.utils.agent import LocalAvatarRunner, LocalVideoPlayer, LocalAudioIO

# Initialize bitHuman runtime
runtime = await AsyncBithuman.create(
    model_path="avatar.imx",
    api_secret="YOUR_API_KEY",
)

# Connect to LiveKit agent session
avatar = LocalAvatarRunner(
    bithuman_runtime=runtime,
    audio_input=session.audio,
    audio_output=LocalAudioIO(session, agent_output),
    video_output=LocalVideoPlayer(window_size=(1280, 720)),
)
await avatar.start()
```

See [`examples/livekit_agent/`](https://github.com/bithuman-product/platform/tree/main/libs/bithuman/python_module/examples/livekit_agent) for a complete working example with OpenAI Realtime voice.

## Optimize Your Models

Convert existing `.imx` models to **IMX v2** for dramatically better performance:

```bash
bithuman convert avatar.imx
```

| Metric | Legacy (TAR) | IMX v2 | Improvement |
|--------|-------------|--------|-------------|
| **Model size** | 100 MB | 50-70 MB | 30-50% smaller |
| **Load time** | ~10s | <10ms | 1000x faster |
| **Runtime speed** | ~30 FPS | 100+ FPS | 3-10x faster |
| **Peak memory** | ~10 GB | ~200 MB | 98% less |

Conversion is automatic on first load, but pre-converting saves startup time.

## CLI Reference

| Command | Description |
|---------|-------------|
| `bithuman generate <model> --audio <file>` | Generate lip-synced MP4 from model + audio |
| `bithuman stream <model>` | Start live streaming server at localhost:3001 |
| `bithuman speak <audio>` | Send audio to running stream server |
| `bithuman action <name>` | Trigger avatar action (wave, nod, etc.) |
| `bithuman info <model>` | Show model metadata |
| `bithuman list-videos <model>` | List all videos in a model |
| `bithuman convert <model>` | Convert legacy to optimized IMX v2 |
| `bithuman validate <path>` | Validate model files load correctly |

## Configuration

### Environment Variables

| Variable | Description |
|----------|-------------|
| `BITHUMAN_API_SECRET` | API secret for authentication |
| `BITHUMAN_RUNTIME_TOKEN` | JWT token (alternative to API secret) |
| `BITHUMAN_VERBOSE` | Enable debug logging |
| `CONVERT_THREADS` | Number of threads for model conversion (0 or unset = auto-detect) |

### Runtime Settings

| Setting | Default | Description |
|---------|---------|-------------|
| `FPS` | `25` | Target frames per second |
| `OUTPUT_WIDTH` | `1280` | Output frame width (0 = native resolution) |
| `PRELOAD_TO_MEMORY` | `False` | Cache model in RAM for faster decode |
| `PROCESS_IDLE_VIDEO` | `True` | Run inference during silence (natural idle) |

## Use Cases

- **Visual AI Agents** — Give your voice agents a face with real-time lip-sync
- **Conversational AI** — Build video chatbots and AI assistants with human-like presence
- **Live Streaming** — Stream avatars to browsers via WebSocket, LiveKit, or WebRTC
- **Video Generation** — Generate lip-synced content from audio at 100+ FPS
- **Edge AI** — Run locally on Raspberry Pi, Mac Mini, Chromebook, or any edge device
- **Digital Twins** — Photorealistic replicas for customer service, education, or entertainment

## Examples

| Example | Description |
|---------|-------------|
| [`example.py`](https://github.com/bithuman-product/platform/tree/main/libs/bithuman/python_module/examples/example.py) | Async runtime with live video + audio playback |
| [`example_sync.py`](https://github.com/bithuman-product/platform/tree/main/libs/bithuman/python_module/examples/example_sync.py) | Synchronous runtime with threading |
| [`livekit_agent/`](https://github.com/bithuman-product/platform/tree/main/libs/bithuman/python_module/examples/livekit_agent) | LiveKit Agent with OpenAI Realtime voice |
| [`livekit_webrtc/`](https://github.com/bithuman-product/platform/tree/main/libs/bithuman/python_module/examples/livekit_webrtc) | WebRTC streaming server |

## Troubleshooting

### macOS: Duplicate FFmpeg library warnings

```
objc: Class AVFFrameReceiver is implemented in both .../cv2/.dylibs/libavdevice...
and .../av/.dylibs/libavdevice...
```

This happens when `opencv-python` (full) is installed alongside `av` (PyAV) — both bundle FFmpeg dylibs. Fix by switching to the headless variant:

```bash
pip install opencv-python-headless
```

This replaces `opencv-python` and removes the duplicate dylibs. The `bithuman` package already depends on `opencv-python-headless`, so this only occurs when another package has pulled in the full `opencv-python`.

### Model conversion fails with TypeError

If you see `TypeError: an integer is required` during conversion, upgrade to the latest version:

```bash
pip install bithuman --upgrade
```

This was fixed in v1.6.2. The issue affected models in legacy TAR format during auto-conversion.

## Getting a bitHuman Model

To create your own avatar model (`.imx` file):

1. Visit [bithuman.ai](https://bithuman.ai)
2. Register and subscribe
3. Upload a photo or video to create your avatar
4. Download your `.imx` model file

## Links

- [Documentation](https://docs.bithuman.ai/#/)
- [GitHub](https://github.com/bithuman-product/platform)
- [PyPI](https://pypi.org/project/bithuman/)
- [Changelog](https://github.com/bithuman-product/platform/blob/main/libs/bithuman/python_module/CHANGELOG.md)
- [bithuman.ai](https://bithuman.ai)

## License

Commercial license required. See [bithuman.ai](https://bithuman.ai) for pricing.
