Metadata-Version: 2.4
Name: mcp-vm-blackbox
Version: 0.13.0
Summary: MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines
Project-URL: Homepage, https://github.com/bitflight-devops/vm-flightsimulator
Project-URL: Issues, https://github.com/bitflight-devops/vm-flightsimulator/issues
Project-URL: Repository, https://github.com/bitflight-devops/vm-flightsimulator
Author-email: Jamie Nelson <jamie@bitflight.io>
Maintainer-email: Jamie Nelson <jamie@bitflight.io>
License: MIT
License-File: LICENSE
Keywords: mcp,mcp-server,model-context-protocol,vagrant,virtualbox,winrm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.15,>=3.11
Requires-Dist: aiosqlite>=0.20.0
Requires-Dist: anthropic>=0.86.0
Requires-Dist: anyio>=4.12.1
Requires-Dist: asyncssh>=2.22.0
Requires-Dist: av>=17.0.0
Requires-Dist: fastapi>=0.135.1
Requires-Dist: fastmcp[tasks]>=3.1.1
Requires-Dist: gitpython>=3.1.46
Requires-Dist: httpx>=0.28.1
Requires-Dist: libtmux>=0.53.1
Requires-Dist: pillow>=12.1.1
Requires-Dist: podman>=5.7.0
Requires-Dist: prefab-ui==0.18.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: python-gitlab>=4.0.0
Requires-Dist: python-vagrant>=1.0
Requires-Dist: pywinrm>=0.5.0
Requires-Dist: typer>=0.24.1
Requires-Dist: uvicorn[standard]>=0.42.0
Requires-Dist: watchfiles>=1.1.1
Description-Content-Type: text/markdown

# vm-flightsimulator

[![PyPI version](https://badge.fury.io/py/mcp-vm-blackbox.svg)](https://badge.fury.io/py/mcp-vm-blackbox)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://badge.fury.io/gh/bitflight-devops/vm-flightsimulator.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/bitflight-devops/vm-flightsimulator/actions/workflows/test.yml/badge.svg)](https://github.com/bitflight-devops/vm-flightsimulator/actions/workflows/test.yml)

A Claude Code plugin that gives AI agents the ability to **see and act on real virtual machines**, turning them into a unified machine control substrate for validating software across complete workflows.

The plugin provides a **collaborative testing partner**: describe what you want to test, and AI-driven pilots autonomously work through it with full observability. A real-time dashboard lets you watch progress and steer agents mid-flight.

---

## What This Is

**Not** a specific-scenario tool. **A toolkit** for long-running, AI-driven validation work on any virtual machine.

You ask: "Install this app and verify it works end-to-end." The plugin provides the control surface (screenshot, keyboard, mouse, PowerShell, WinRM). Pilots fly the VM, inspectors analyze what happened, and a distributed coordinator keeps everything moving without blocking.

Currently targets **VirtualBox + Vagrant**. Designed for expansion to AWS, OpenTofu, Cloudflare Workers, bare metal, and beyond.

---

## The Actor Model

Understanding who does what:

### Pilots

AI agents that act on VMs. They see (screenshots), type (keyboard/mouse), run commands (PowerShell/SSH), and report outcomes. Follow a tight **declare-execute-reflect** loop: log intent → perform action → log outcome. Autonomous; do not return to the orchestrator after every action.

### Inspectors

AI agents that analyze without touching. Dispatched on-demand to read pilot transcripts, cross-reference recordings, detect stalls, and produce classified reports. Can work even when the pilot is gone — they reconstruct what happened from durable evidence. Enable scheduled monitoring and post-hoc analysis.

### Mechanics

Handle the infrastructure layer: cross-VM networking, database setup, VM snapshots, Vagrant provisioning, port forwarding. Run as loadable skills (for quick infra tasks) or dedicated agents (for complex multi-VM setup). Set up the stage; pilots perform on it; inspectors review the performance.

### Ground Control

The orchestrator's coordination skill. Dispatches pilots in background, monitors progress via inspectors, never blocks on any single task. Routes work, handles steering, schedules follow-ups.

### Dashboard

Real-time UI showing fleet status, job timelines, screenshots, and video playback. Two-way channel: operators watch pilots and can send steering prompts mid-flight. TODO list rendered live — pilots and engineers both add work as needed.

---

## Core Principles

**The harness enables, it doesn't do.** It provides generic control (keyboard/mouse), coordination (job tracking, scheduling), and observability (recordings, dashboards, step logs). The AI figures out the task-specific steps.

**Orchestrator is never blocked.** Pilots run asynchronously; communication is file- and database-mediated. When a pilot is stuck, the inspector detects it and reports without interrupting the pilot.

**Progressive disclosure.** The orchestrator knows WHERE things are (file paths, job IDs), not WHAT they contain (full logs, recordings). Reads only what it needs to make routing decisions. Raw evidence flows through inspectors.

**Hook-based steering.** Pilots check for incoming messages via a `PreToolUse` hook on every tool call. Messages from the orchestrator or dashboard inject course corrections in real time. The pilot does not poll — the hook fires automatically.

---

## Onboarding Experience

The plugin opens with a conversation:

> "What VM scenarios will we be piloting in this simulator? Let's chat. Tell me what you have already, your ideas, your constraints and goals, and we will plan out the testing system for it."

The AI listens, discovers what infrastructure you have, helps you plan the environment (loading reference skills for Vagrant, Packer, Docker, PostgreSQL, Windows Server, etc.), and then builds it. You never write a Vagrantfile or answer a configuration form.

---

## Current Capabilities

**VirtualBox & Vagrant**

- VM lifecycle (start, stop, snapshot, restore)
- Provisioning and configuration
- Cross-VM networking

**Windows & Linux Interaction**

- GUI automation (screenshot, keyboard, mouse) via vision-control
- PowerShell scripts and commands via WinRM
- SSH / Bash on Linux VMs
- Hardware control (keyboard combos, clipboard)

**Observation & Coordination**

- Session recording (WebM/VP8) via VBoxManage
- Frame extraction at specific timestamps
- Job tracking with step-level audit logs
- Issue register for classified problems
- Real-time dashboard with live video playback

**Fleet Management**

- Multi-VM coordination
- Job store (durable across sessions)
- Destroy guards (prevent accidents)
- Scheduled task support for unattended operation

---

## Future Direction

The unified control model stays the same; the adapters grow:

- **OpenTofu** — Infrastructure as code for cloud resources
- **AWS** — EC2 instances, service discovery, networking
- **Cloudflare Workers** — Edge compute validation
- **Bare metal** — Physical machine provisioning and testing

One set of pilot/mechanic/inspector/dashboard code. Different adapters underneath. The plugin becomes a true unified machine control substrate.

---

## Quick Start

### 1. Install the MCP server

```bash
# Run on demand
uvx mcp-vm-blackbox

# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
```

### 2. Install the plugin

**Option A — Marketplace (Claude Code only):**

```bash
claude plugin marketplace add bitflight-devops/vm-flightsimulator
```

Then open `/plugins` and install `vm-flightsimulator`.

**Option B — `vm-blackbox-installer` (all platforms):**

```bash
# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local
```

The installer copies skills and agents to your plugin directory and registers the MCP server.

### 3. Start collaborating

```text
"I need a two-VM setup with PostgreSQL on Ubuntu and a Java webapp on Windows. Let's plan it."
"Install this app on my-vm and record the full process."
"Check on the installer progress — is it done yet?"
"What went wrong in the last run? Give me a timeline."
```

The plugin automatically loads the right skill and agent for the task.

---

## Configuration

Paths, durable job state, and optional overrides are controlled by environment variables (for example where **`.mcp-vm-blackbox`** resolves on disk). See **[docs/configuration.md](docs/configuration.md)** for the full list, data-root resolution order, and troubleshooting notes for operators and AI-assisted debugging.

---

## Prerequisites

| Requirement                | Version         |
| -------------------------- | --------------- |
| Python                     | 3.11+           |
| uv                         | latest          |
| VirtualBox                 | 7.1+            |
| Vagrant                    | 2.3+            |
| Packer (for VM builds)     | 1.10+           |
| tmux (for detached builds) | any             |
| WinRM on guest             | for Windows VMs |

---

## Architecture

Three-layer cooperative design:

```
┌─────────────────────────────────────────────────────────┐
│                      Skills                              │
│  vm-vision-control  vm-ground-control  vm-radio-control  │
│  vm-blackbox-record                                      │
│         (define approved loops and tooling)              │
└────────────────────┬────────────────────────────────────┘
                     │ dispatches
┌────────────────────▼────────────────────────────────────┐
│                      Agents                              │
│  vm-pilot     vm-pilot-inspector     vm-mechanic         │
│  (acts)       (observes + reports)   (infrastructure)    │
└────────────────────┬────────────────────────────────────┘
                     │ calls
┌────────────────────▼────────────────────────────────────┐
│                   MCP Server                             │
│   vm_screenshot  vm_powershell  vm_type  vm_key          │
│   vm_mouse_click  vagrant_*  ci_*  podman_*  build_*     │
│              (executes against real infrastructure)      │
└─────────────────────────────────────────────────────────┘
```

**Skills** define the approved loop. They do not take actions.

**Agents** take actions. Pilots drive VMs. Inspectors read state. Mechanics set up infrastructure.

**MCP server** executes tool calls against VirtualBox, Vagrant, WinRM, SSH, CI hosts, and container runtimes.

---

## Skills

### vm-vision-control — GUI Interaction

The mandatory entry point for any task that touches a VM's desktop. The loop is strict:

```
1. Screenshot     →  vm_screenshot
2. Read image     →  Read tool on the saved_to path
3. Decide         →  Analyse screen, determine next action
4. Act            →  vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat         →  Return to step 1
```

**Natural language triggers:** "click on the VM", "type into the VM", "what's on the screen", "navigate the installer"

**Timing between steps:**

| Operation                  | Wait       |
| -------------------------- | ---------- |
| Click a button             | 0.5 – 1 s  |
| Open an application        | 3 – 5 s    |
| Launch an installer        | 10 – 15 s  |
| Installer panel transition | 2 – 3 s    |
| Installer completion       | 30 – 60 s  |
| VM boot                    | 60 – 120 s |

Full reference: [docs/skills/vm-vision-control.md](docs/skills/vm-vision-control.md)

---

### vm-ground-control — Orchestration

Use for any operation taking more than ~30 seconds. Dispatches `vm-pilot` in background and returns a structured block to parse.

```python
agent_id = Task(
    description="Run the installer",
    subagent_type="vm-pilot",
    prompt="""
GOAL: Run silent installer and report success.

STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT>
3. Register-ScheduledTask ...
4. Poll every 30s until State = Ready or 15 min elapsed
5. Read install log; take screenshot.

RETURN FORMAT:
JOB_ID: <uuid>
STATUS: SUCCESS | FAILED | BLOCKED | IN_PROGRESS
SCOPE: <vm_name> / <task_description>
OUTCOME: <2-4 sentences>
ISSUES: <count> (<classifications>) | none
BLOCKED_BY: <description> | —
DETAIL:
  steps: <path_to_step_log>
  issues: <path_to_issue_register> | none
  recording: <path_to_recording> | none
  screenshots: <glob_pattern> | none
  video: <true | false>
  pilot.screen_state: <description>
  pilot.files_read: <comma-separated filenames>
NEXT: <recommended action>
""",
    run_in_background=True,
)
```

**Store the agent ID** — you need it to check progress and resume.

Route on `STATUS`:

| STATUS        | Action                                |
| ------------- | ------------------------------------- |
| `SUCCESS`     | Proceed                               |
| `FAILED`      | Check ISSUES, fix and re-dispatch     |
| `BLOCKED`     | Read BLOCKED_BY, resolve, re-dispatch |
| `IN_PROGRESS` | Wait and re-check via radio-control   |

Full reference: [docs/skills/vm-ground-control.md](docs/skills/vm-ground-control.md)

---

### vm-radio-control — Progress Observation

Check a running pilot without interrupting. Dispatches `vm-pilot-inspector` to read transcripts and query VM state.

```python
Task(
    description="Check installer progress",
    subagent_type="vm-pilot-inspector",
    prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
    run_in_background=False,
)
```

**Output types:**

| Type         | Returns                           | Use when                        |
| ------------ | --------------------------------- | ------------------------------- |
| `quick`      | JOB_ID + STATUS + SCOPE + OUTCOME | Fast pulse, context is tight    |
| `progress`   | Full canonical template           | Normal progress check (default) |
| `screenshot` | Report + UI coordinates           | Need exact screen state         |
| `transcript` | Report + last 10 turns            | Pilot appears stuck             |

Full reference: [docs/skills/vm-radio-control.md](docs/skills/vm-radio-control.md)

---

### vm-blackbox-record — Session Recording

Record VM screen as WebM/VP8 video and extract frames.

```bash
# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"

# Run your operation (VM is live)

# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"

# Extract frames — MCP only: vm_extract_frames (see docs/skills/vm-blackbox-record.md)
```

Recording runs on the host via VirtualBox — no guest changes needed.

Full reference: [docs/skills/vm-blackbox-record.md](docs/skills/vm-blackbox-record.md)

---

## Agents

### vm-pilot

Hands-and-eyes agent. Takes screenshots, runs PowerShell, sends keystrokes. Returns structured results.

**Five tools:**

| Tool            | Does                              |
| --------------- | --------------------------------- |
| `vm_screenshot` | Capture screen, return PNG + path |
| `vm_powershell` | Run PowerShell; return output     |
| `vm_type`       | Type text (256 char limit)        |
| `vm_key`        | Send enter/tab/escape/space       |
| `vm_info`       | Return VM hardware + state        |

Follows declare-execute-reflect loop. When blocked, populates `ISSUES` and `BLOCKED_BY` and returns `STATUS: BLOCKED` or `FAILED`.

Full reference: [docs/agents/vm-pilot.md](docs/agents/vm-pilot.md)

---

### vm-pilot-inspector

Observer agent. Reads pilot transcript, queries VM state, takes screenshots, analyzes recordings. Returns structured reports.

**Never takes control actions.** No typing, no key sends, no process invocation.

Full reference: [docs/agents/vm-pilot-inspector.md](docs/agents/vm-pilot-inspector.md)

---

## MCP Tools (28 total)

### VM Inspection (4 tools)

`vm_list`, `vm_info`, `vm_screenshot`, `vm_screenshot_api`

### VM Interaction (5 tools)

`vm_powershell`, `vm_type`, `vm_key`, `vm_key_combo`, `vm_mouse_click`

### VM Lifecycle (3 tools)

`vm_start`, `vm_stop`, `vm_wait_ready`

### Vagrant (5 tools)

`vagrant_status`, `vagrant_up`, `vagrant_provision`, `vagrant_destroy`, `vagrant_winrm`

### Build Orchestration (3 tools)

`build_start`, `build_watch`, `build_status`

### CI Tools (4 tools)

`ci_check`, `ci_run`, `ci_pipeline_status`, `ci_preflight`

### Podman / Containers (4 tools)

`podman_ps`, `podman_exec`, `podman_logs`, `podman_service_status`

### KVM Tools (2 tools)

`kvm_unload`, `kvm_reload`

Full signatures: [docs/mcp-tools.md](docs/mcp-tools.md)

---

## Hard Constraints

These are required by the plugin architecture:

1. **`vm-vision-control` is mandatory before any GUI interaction.** Do not call `vm_screenshot`, `vm_mouse_click`, `vm_type`, or `vm_key` directly from the orchestrator.

2. **MCP is the only approved path.** Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through `mcp__plugin_vm-flightsimulator_vm-blackbox__*` tools.

3. **The orchestrator does not call VM tools while a pilot is running.** The pilot owns the VM. Interrupt only by resuming the agent.

4. **Skills are scoped.** Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.

---

## Conventions

- `vm_type` has a 256-character limit per call. Chunk long text across multiple calls.
- Password fields may double-type. Clear with Ctrl+A → backspace before typing.
- `vm_mouse_click` and `vm_screenshot_api` require `target="local"` (vboxapi uses local XPCOM only).
- Recording parameters lock when enabled — configure before starting, not after.
- The pilot's transcript lives at `~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl`.

---

## Skill-to-Task Decision Guide

| You want to...                                    | Use                                                  |
| ------------------------------------------------- | ---------------------------------------------------- |
| Click a button / type text / read the screen      | `vm-vision-control`                                  |
| Run a multi-step operation (>30 seconds)          | `vm-ground-control` → dispatches `vm-pilot`          |
| Check on a running background task                | `vm-radio-control` → dispatches `vm-pilot-inspector` |
| Continue a completed pilot with more work         | `vm-ground-control` with `resume=agent_id`           |
| Record an operation as video                      | `vm-blackbox-record`                                 |
| Extract frames from a recording                   | MCP `vm_extract_frames`                              |
| Get the current screen without interrupting pilot | `vm-radio-control` with `output_type: screenshot`    |

---

## Local Development

```bash
# Install dependencies
uv sync

# Run all tests
uv run pytest

# Format
uv run ruff format

# Lint
uv run ruff check

# Type check
uv run ty check packages/

# Test the plugin locally in Claude Code
claude --plugin-dir ./
```

Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage) are excluded from CI coverage.

---

## Installation Reference

```bash
# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator

# MCP server only (PyPI)
uvx mcp-vm-blackbox

# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
```

---

## License

MIT — see [LICENSE](LICENSE)
