Metadata-Version: 2.4
Name: flyto-ai
Version: 0.10.3
Summary: AI agent that turns natural language into executable automation. 412 batteries included.
Project-URL: Homepage, https://github.com/flytohub/flyto-ai
Project-URL: Repository, https://github.com/flytohub/flyto-ai.git
Project-URL: Issues, https://github.com/flytohub/flyto-ai/issues
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: ai-agent,anthropic,automation,llm,mcp,ollama,openai,workflow
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: anthropic>=0.20.0
Requires-Dist: cryptography>=41.0
Requires-Dist: flyto-blueprint>=0.1.0
Requires-Dist: flyto-core[browser]>=2.16.1
Requires-Dist: langdetect>=1.0.9
Requires-Dist: openai>=1.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Provides-Extra: agent
Requires-Dist: claude-agent-sdk>=0.1.0; extra == 'agent'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: lite
Requires-Dist: pydantic>=2.0.0; extra == 'lite'
Requires-Dist: pyyaml>=6.0; extra == 'lite'
Provides-Extra: serve
Requires-Dist: aiohttp>=3.8; extra == 'serve'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/flytohub/flyto-ai/main/docs/logo.svg" alt="flyto-ai" width="120">
</p>

<h1 align="center">flyto-ai</h1>

<h3 align="center">Natural language → executable automation workflows</h3>

<p align="center">
  <em>Most AI agents have the LLM write shell commands and pray. <strong>flyto-ai uses 412 pre-built, schema-validated modules instead.</strong></em>
</p>

<p align="center">
  <a href="https://pypi.org/project/flyto-ai/"><img src="https://img.shields.io/pypi/v/flyto-ai?color=blue" alt="PyPI"></a>
  <a href="https://pypi.org/project/flyto-ai/"><img src="https://img.shields.io/pypi/pyversions/flyto-ai" alt="Python"></a>
  <a href="https://github.com/flytohub/flyto-ai/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green" alt="License"></a>
</p>

---

## The Problem

Most AI agents have the LLM generate shell commands or raw code on every run. This means:

- **Non-deterministic** — the same prompt can produce different commands each time
- **No validation** — wrong flags, hallucinated APIs, subtle bugs only found at runtime
- **Not reusable** — each execution is ephemeral, nothing saved for next time
- **Expensive** — LLM spends tokens figuring out *how* to execute, not just *what* to execute

## The Fix

flyto-ai flips the model: **the LLM never writes code.** It searches and selects from 412 pre-built modules, fills in parameters (validated against schemas), and executes them deterministically. Every run produces a reusable YAML workflow.

```
❯ scrape the title from example.com

Result: "Example Domain"
```
```yaml
name: Scrape Title
params:
  url: "https://example.com"
steps:
  - id: launch
    module: browser.launch
  - id: goto
    module: browser.goto
    params:
      url: "${{params.url}}"
  - id: extract
    module: browser.extract
    params:
      selector: "h1"
```

## Quick Start

```bash
pip install flyto-ai
playwright install chromium     # download browser for web automation
export OPENAI_API_KEY=sk-...   # or ANTHROPIC_API_KEY
flyto-ai
```

One install, one command — interactive chat with **412 automation modules**, browser automation, and self-learning blueprints.

<p align="center">
  <img src="https://raw.githubusercontent.com/flytohub/flyto-ai/main/docs/demo.svg" alt="flyto-ai demo" width="800">
</p>

## How It's Different

The core difference is **what the LLM does during execution**:

| | Traditional AI agents | flyto-ai |
|---|---|---|
| **LLM's job** | Write shell/Python code from scratch | Select modules + fill params |
| **Execution** | `subprocess.run(llm_output)` | `execute_module("browser.extract", {validated_params})` |
| **Validation** | None — errors at runtime | Schema validation before execution |
| **Determinism** | Same prompt → different code | Same module + params → same result |
| **Output** | One-time result | Result + reusable YAML workflow |
| **Learning** | None | Self-learning blueprints (zero LLM replay) |
| **Cost per replay** | Full LLM inference again | $0 (saved blueprint, no LLM) |

## Use Cases

### Web Scraping

```
❯ extract all product names and prices from example-shop.com/products
```

```yaml
name: Scrape Products
params:
  url: "https://example-shop.com/products"
steps:
  - id: launch
    module: browser.launch
  - id: goto
    module: browser.goto
    params:
      url: "${{params.url}}"
  - id: extract
    module: browser.extract
    params:
      selector: ".product"
      fields:
        name: ".product-name"
        price: ".product-price"
```

### Form Automation

```
❯ log in to staging.example.com, fill the contact form, and take a screenshot
```

```yaml
name: Fill Contact Form
steps:
  - id: launch
    module: browser.launch
  - id: login
    module: browser.login
    params:
      url: "https://staging.example.com/login"
      username_selector: "#email"
      password_selector: "#password"
      submit_selector: "button[type=submit]"
  - id: fill
    module: browser.form
    params:
      url: "https://staging.example.com/contact"
      fields:
        name: "Test User"
        message: "Hello from flyto-ai"
  - id: proof
    module: browser.screenshot
```

### API Monitoring + Notification

```
❯ check if https://api.example.com/health returns 200, if not send a Slack message
```

```yaml
name: Health Check Alert
params:
  endpoint: "https://api.example.com/health"
steps:
  - id: check
    module: http.get
    params:
      url: "${{params.endpoint}}"
  - id: notify
    module: notification.slack
    params:
      webhook_url: "${{params.slack_webhook}}"
      message: "Health check failed: ${{steps.check.status_code}}"
    condition: "${{steps.check.status_code}} != 200"
```

## 412 Batteries Included

Powered by [flyto-core](https://pypi.org/project/flyto-core/) — 412 automation modules across 55 categories:

| Category | Modules | Examples |
|----------|---------|---------|
| Browser | 39 | launch, goto, click, type, extract, screenshot, wait |
| Atomic | 35 | reusable building-block operations |
| Flow | 23 | conditionals, loops, branching, error handling |
| Cloud | 14 | S3, GCS, cloud storage and APIs |
| Data | 13 | JSON, CSV, parsing, transformation |
| Array | 12 | filter, map, sort, flatten, unique |
| String | 11 | split, replace, template, regex, slugify |
| Productivity | 10 | email, calendar, document integrations |
| Image | 9 | resize, crop, convert, watermark, compress |
| HTTP / API | 9 | GET, POST, download, upload, GraphQL |
| Notification | 9 | email, Slack, Telegram, webhook |
| + 44 more | 200+ | database, crypto, docker, k8s, testing, ... |

Browse available modules:

```bash
flyto-ai version   # Shows installed module count
```

## Self-Learning Blueprints

The agent remembers what works. Good workflows are automatically saved as **blueprints** — reusable patterns that make future tasks faster and free.

```
First time:  "screenshot example.com" → 15s (discover modules, build from scratch)
Second time: "screenshot another.com" → 3s  (reuse learned blueprint, zero LLM cost)
```

How it works (closed-loop, no LLM involved):

1. Execution succeeds with 3+ steps → auto-saved as blueprint (score 70)
2. Blueprint reused successfully → score +5
3. Blueprint fails → score -10
4. Score < 10 → auto-retired, never suggested again

```bash
flyto-ai blueprints                             # View learned blueprints
flyto-ai blueprints --export > blueprints.yaml  # Export for sharing
```

## Claude Code Agent

Use Claude Code as a coding worker with automatic verification loops:

```bash
pip install flyto-ai[agent]   # Installs claude-agent-sdk

# Basic — Claude Code writes code, no verification
flyto-ai code "fix the login form validation" --dir ./my-project

# With verification — screenshot + visual comparison after each fix attempt
flyto-ai code "match the Figma design for the login page" \
  --dir ./my-project \
  --verify screenshot \
  --verify-args '{"url": "http://localhost:3000/login"}' \
  --reference ./figma-login.png \
  --max-attempts 3

# JSON output for CI/CD
flyto-ai code "add unit tests for auth module" --dir ./project --json
```

**How it works:**

```
Phase 1: Gather codebase context from flyto-indexer
Phase 2: Claude Code writes code (with Guardian safety hooks)
Phase 3: Run verification recipe (browser screenshot + text extraction)
Phase 4: LLM visual comparison (actual vs reference)
  → Failed → feed back to Claude Code (Phase 2)
  → Passed → return result
```

**Features:**
- **Guardian hooks** — blocks dangerous operations (rm -rf, .env writes, credential access)
- **Evidence trail** — every tool call logged to `~/.flyto/evidence/<session>/evidence.jsonl`
- **Budget control** — `--budget 5.0` caps spending per task
- **Indexer integration** — flyto-indexer provides codebase context + mounts as MCP server
- **Session resume** — feedback loop reuses the same Claude Code session for full context

```python
# Python API
from flyto_ai import ClaudeCodeAgent, AgentConfig
from flyto_ai.agents import CodeTaskRequest

agent = ClaudeCodeAgent(config=AgentConfig.from_env())
result = await agent.run(CodeTaskRequest(
    message="fix the login page",
    working_dir="/path/to/project",
    verification_recipe="screenshot",
    verification_args={"url": "http://localhost:3000/login"},
    reference_image="./figma-login.png",
))
print(result.ok, result.attempts, result.files_changed)
```

## CLI

```bash
flyto-ai                                     # Interactive chat — executes tasks directly
flyto-ai chat "scrape example.com"           # One-shot execute mode
flyto-ai chat "scrape example.com" --plan    # YAML-only mode (don't execute)
flyto-ai chat "take screenshot" -p ollama    # Use Ollama (no API key needed)
flyto-ai chat "..." --webhook https://...    # POST result to webhook
flyto-ai code "fix bug" --dir ./project      # Claude Code Agent mode
flyto-ai serve --port 8080                   # HTTP server for triggers
flyto-ai blueprints                          # List learned blueprints
flyto-ai version                             # Version + dependency status
```

### Interactive Mode

Just run `flyto-ai` — multi-turn conversation with up/down arrow history:

```
$ flyto-ai

  _____ _       _        ____       _    ___
 |  ___| |_   _| |_ ___ |___ \     / \  |_ _|
 | |_  | | | | | __/ _ \  __) |   / _ \  | |
 |  _| | | |_| | || (_) |/ __/   / ___ \ | |
 |_|   |_|\__, |\__\___/|_____|  /_/   \_\___|
           |___/

  v0.6.0  Interactive Mode
  Provider: openai  Model: gpt-4o  Tools: 412

  ⏵⏵ execute · openai/gpt-4o · 412 tools
❯ scrape the title from example.com

  ○ browser.launch
  ○ browser.goto
  ○ browser.extract

  The title of example.com is: **Example Domain**

  3 executed · 5 tool calls

  ⏵⏵ execute · openai/gpt-4o · 412 tools · 1 msgs
❯ now also take a screenshot

❯ /mode
Switched to: plan-only (YAML output)
```

Commands: `/clear`, `/mode`, `/history`, `/version`, `/help`, `/exit`

## Webhook & HTTP Server

**Send results anywhere:**

```bash
flyto-ai chat "scrape example.com" --webhook https://hook.site/xxx
```

**Accept triggers from anywhere:**

```bash
flyto-ai serve --port 8080

# From Slack, n8n, Make, or any HTTP client:
curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "take a screenshot of example.com"}'

# Execute mode (default) or plan-only:
curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "scrape example.com", "mode": "yaml"}'
```

## Python API

```python
from flyto_ai import Agent, AgentConfig

agent = Agent(config=AgentConfig.from_env())

# Execute mode (default) — runs modules and returns results
result = await agent.chat("extract all links from https://example.com")
print(result.message)            # Result + YAML workflow
print(result.execution_results)  # Module execution results

# Plan-only mode — generates YAML without executing
result = await agent.chat("extract all links from example.com", mode="yaml")
print(result.message)            # YAML workflow only
```

## Multi-Provider

Works with any LLM provider:

```bash
export OPENAI_API_KEY=sk-...          # OpenAI models
export ANTHROPIC_API_KEY=sk-ant-...   # Anthropic models
flyto-ai chat "..." -p ollama         # Local models (Llama, Mistral, etc.)
flyto-ai chat "..." --model <name>    # Any specific model
```

## Security

- **Workflows are auditable** — YAML is human-readable, reviewable, and version-controllable
- **Module policies** — whitelist/denylist categories (e.g. block `file.*` or `database.*`)
- **Sensitive param redaction** — API keys and passwords are masked in tool call logs
- **Local-first** — blueprints stored in local SQLite, nothing sent to third parties
- **Webhook output** — structured JSON only, no raw credentials in payload

## Architecture

```
User message
  → LLM (OpenAI / Anthropic / Ollama)
    → Function calling: search_modules, get_module_info, execute_module, ...
      → 412 flyto-core modules (schema-validated, deterministic)
      → Self-learning blueprints (closed-loop, zero LLM)
      → Browser page inspection
    → Execute mode: run modules, return results + YAML
    → Plan mode: YAML validation loop (auto-retry on errors)
  → Structured output (results + reusable workflow)

Claude Code Agent (flyto-ai code):
  → Phase 1: flyto-indexer gathers codebase context
  → Phase 2: Claude Agent SDK spawns Claude Code
      → PreToolUse hook: Guardian blocks dangerous ops
      → PostToolUse hook: Evidence trail logging
      → MCP: flyto-indexer available for code intelligence
  → Phase 3: YAML recipe verification (browser automation)
  → Phase 4: LLM visual comparison (screenshot vs Figma)
  → Loop: failed → feedback → Phase 2 | passed → done
```

## Telegram Bot Gateway

Run Claude Code from your phone via Telegram — read/write files, run commands, multi-turn conversation with full context. Also supports flyto-ai agent automation via `/agent`.

```bash
# 1. Install
pip install flyto-ai[agent,serve]
npm install -g @anthropic-ai/claude-code   # Claude Code CLI (required by SDK)

# 2. Set tokens
export TELEGRAM_BOT_TOKEN=123456:ABC-DEF       # from @BotFather
export TELEGRAM_ALLOWED_CHATS=your_chat_id      # optional whitelist
export ANTHROPIC_API_KEY=sk-ant-...

# 3. Start server
flyto-ai serve --host 0.0.0.0 --port 7411 --dir /path/to/your/project

# 4. Register webhook (once)
curl "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/setWebhook?url=https://your-domain/telegram"

# 5. Open Telegram → send any message → Claude Code replies with streaming
```

The `--dir` flag sets the default working directory for Claude Code. You can change it later with `/cd` in the chat.

### Bot Commands

| Command | Description |
|---------|-------------|
| (plain text) | **Claude Code** — read/write files, run commands, multi-turn conversation |
| `/agent <msg>` | flyto-ai agent automation (browser, scraper, etc.) |
| `/cd <path>` | Change Claude Code working directory |
| `/model <name>` | Switch model (sonnet/opus/haiku) |
| `/cancel` | Interrupt Claude Code or cancel agent task |
| `/clear` | Clear session |
| `/status` | View active/recent tasks |
| `/cost` | View token spending |
| `/yaml` | List learned blueprints |
| `/help` | Show command list |

### Features

- **Claude Code as default** — plain text messages go to Claude Code CLI, with full file read/write, command execution, and persistent multi-turn context
- **Real-time streaming** — CLI output streams to Telegram by editing the status message in real time
- **CLI-agnostic** — `CLIProfile` abstraction supports any AI CLI (Claude, Codex, Gemini, etc.)
- **MCP tools built-in** — Claude Code inherits your MCP config (flyto-core 412 modules, flyto-indexer, etc.)
- **Session resume** — each chat maintains a CLI session; context is preserved across messages
- **flyto-ai agent via `/agent`** — browser automation, scraping, and 412-module workflows remain available as a slash command
- **Persistent job queue** — agent tasks survive server restarts, with status tracking
- **Mid-execution steering** — send a message while an agent task is running to redirect it

| Variable | Purpose | Required |
|----------|---------|----------|
| `TELEGRAM_BOT_TOKEN` | Bot token from @BotFather | Yes (for /telegram) |
| `TELEGRAM_ALLOWED_CHATS` | Comma-separated chat_id whitelist | No (empty = allow all) |

## Action Assistant (v0.10.0)

The Action Assistant is a 7-layer middleware system that makes browser automation reliable without hardcoding any site-specific logic into the system prompt.

### AssistantMiddleware

Seven layers of system intelligence that run automatically on every tool call:

1. **Blueprint Guard** — enforces blueprint-first routing; the agent must follow a matching blueprint before improvising
2. **Snapshot Guard** — ensures the agent always has a fresh page snapshot before acting
3. **Param Auto-Correction** — fixes common parameter mistakes (wrong field names, missing required fields) before they reach the module
4. **Circuit Breaker** — detects infinite retry loops on failing or empty modules and stops execution early
5. **Anti-Bot Detection** — recognizes bot-detection pages (Cloudflare, CAPTCHA) and switches strategy
6. **Selector Healing** — when a selector fails, attempts alternative selectors before giving up
7. **Output Auto-Save** — automatically persists structured output (screenshots, extracted data) to disk

### Key Features

- **ask_user tool** — pauses execution mid-flow to request user credentials, choices, or confirmation. The agent waits for the user's response before continuing.
- **Vault auto-fill** — encrypted local credential storage. Credentials entered once are securely saved and auto-filled on repeat visits to the same site.
- **Preference learning** — remembers non-sensitive choices (seat type, meal preference, sort order, etc.) so the agent does not ask again.
- **Blueprint-first routing** — 33 seed blueprints cover common workflows. The system enforces blueprint selection at the middleware level, not via prompt instructions.
- **Zero hardcoded prompt** — no module names, no site names, no selectors in the system prompt. All domain knowledge lives in blueprints and middleware.
- **Circuit breaker** — stops infinite retry when a module keeps failing or returns empty results. Prevents wasted tokens and stuck sessions.
- **Credential masking** — passwords and secrets are never exposed in LLM context. The vault injects credentials at execution time, after the LLM has selected the action.

## Environment Variables

| Variable | Description |
|----------|-------------|
| `FLYTO_AI_PROVIDER` | `openai`, `anthropic`, or `ollama` |
| `FLYTO_AI_API_KEY` | API key (or use provider-specific vars below) |
| `FLYTO_AI_MODEL` | Model name override |
| `OPENAI_API_KEY` | Fallback for OpenAI provider |
| `ANTHROPIC_API_KEY` | Fallback for Anthropic provider |
| `FLYTO_AI_BASE_URL` | Custom API endpoint (OpenAI-compatible) |
| `TELEGRAM_BOT_TOKEN` | Telegram Bot token for /telegram webhook |
| `TELEGRAM_ALLOWED_CHATS` | Comma-separated Telegram chat_id whitelist |
| `FLYTO_AI_CC_MAX_BUDGET` | Claude Code Agent max budget in USD (default: 5.0) |
| `FLYTO_AI_CC_MAX_TURNS` | Claude Code Agent max turns (default: 30) |
| `FLYTO_AI_CC_MAX_FIX_ATTEMPTS` | Claude Code Agent max fix attempts (default: 3) |

## License

Apache-2.0 — use it commercially, fork it, build on it.
