Metadata-Version: 2.4
Name: voice-command
Version: 0.1.0
Summary: VAD-driven streaming voice dictation for macOS with editing commands, tech-term correction, and keyboard output
Project-URL: Homepage, https://github.com/depoledna/voice-command
Project-URL: Repository, https://github.com/depoledna/voice-command
Project-URL: Issues, https://github.com/depoledna/voice-command/issues
Author: Denis
License-Expression: MIT
License-File: LICENSE
Keywords: dictation,macos,mlx,speech-to-text,vad,voice,whisper
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.12
Requires-Dist: faster-whisper>=1.2.1
Requires-Dist: jiwer>=4.0.0
Requires-Dist: librosa>=0.11.0
Requires-Dist: mlx-lm>=0.30.7
Requires-Dist: mlx-whisper>=0.4.3
Requires-Dist: noisereduce>=3.0.3
Requires-Dist: psutil>=7.2.2
Requires-Dist: pydantic>=2.12.5
Requires-Dist: pydub>=0.25.1
Requires-Dist: pynput>=1.8.1
Requires-Dist: scipy>=1.17.1
Requires-Dist: silero-vad-lite>=0.2.1
Requires-Dist: sounddevice>=0.5.5
Requires-Dist: transformers>=5.2.0
Description-Content-Type: text/markdown

# voice-command

VAD-driven streaming voice dictation for macOS (Apple Silicon). Speaks into your mic, text appears in a terminal buffer or gets typed directly into any app.

All inference runs locally — Whisper for ASR, Silero for VAD, Qwen for tech-term correction. No cloud APIs.

## Requirements

- macOS with Apple Silicon
- Python 3.12+
- Microphone access (System Settings > Privacy > Microphone)
- Accessibility access for `--type` mode (System Settings > Privacy > Accessibility)

## Install

### From PyPI

```bash
pip install voice-command
# or
uv tool install voice-command
```

### From source

```bash
git clone https://github.com/depoledna/voice-command.git
cd voice-command
uv sync
```

Models download automatically on first run (~300MB for Whisper + ~1GB for Qwen).

## Usage

```bash
# Terminal buffer mode (TUI)
voice-cmd

# Type directly into the focused app
voice-cmd --type

# Transcribe a recording
voice-cmd --file recording.m4a

# List audio devices
voice-cmd --list-devices

# Use a specific device
voice-cmd --device 1
```

When running from source, use `uv run python voice_cmd.py` instead of `voice-cmd`.

## Voice Commands

| Command | Action |
|---------|--------|
| `period` / `comma` / `question mark` | Insert punctuation |
| `new line` | Line break |
| `new paragraph` | Double line break |
| `scratch that` | Delete last ~5 words |
| `delete last N words` | Delete last N words |
| `undo` | Undo last action |
| `clear all` | Clear buffer |
| `stop listening` | Pause |
| `start listening` | Resume |
| `copy all` | Copy to clipboard |
| `done` | Copy to clipboard and exit |
| `show commands` | Show command list |

Commands can appear inline with dictated text: "Send the email **period** **new line** Don't forget the attachment" produces two lines with proper punctuation.

## Pipeline

1. **Audio** - `sounddevice` captures mic input, resampled to 16kHz
2. **VAD** - Silero VAD with hysteresis detects speech boundaries (32ms frames, pre-roll buffering)
3. **ASR** - MLX Whisper (small, 8-bit) with dev-vocabulary prompt
4. **LLM** - Qwen3 1.7B (4-bit) fixes tech terms: "fast api" -> "FastAPI", "type script" -> "TypeScript"
5. **Commands** - Sentence splitting + leading/trailing command extraction
6. **Output** - TUI buffer display or keystroke diff-typing via pynput

## Type Mode

`--type` mode sends keystrokes to the focused app. It detects when its own terminal is focused and skips typing to avoid feedback loops. A 3-second countdown lets you switch to the target app after launching.

## Benchmarks

```bash
# Compare ASR models (requires test fixtures in tests/fixtures/)
uv run python tests/benchmark.py

# Pipeline diagnostics
uv run python tests/diagnose_pipeline.py
```

## Releasing

1. Update the version in `pyproject.toml`
2. Commit: `git commit -am "chore: bump version to X.Y.Z"`
3. Tag: `git tag vX.Y.Z`
4. Push: `git push origin main --tags`

The GitHub Actions workflow builds and publishes to PyPI automatically via trusted publishers (OIDC).

### First-time PyPI setup

1. Go to https://pypi.org/manage/account/publishing/
2. Add a "pending publisher":
   - Package name: `voice-command`
   - Owner: `depoledna`
   - Repository: `voice-command`
   - Workflow: `release.yml`
   - Environment: `pypi`
3. In the GitHub repo, go to Settings > Environments > create `pypi`

## License

MIT
