Metadata-Version: 2.4
Name: structured2graph
Version: 0.2.1
Summary: Database migration agent from structured data (e.g. SQL) to graph.
Project-URL: Homepage, https://github.com/memgraph/ai-toolkit
Project-URL: Repository, https://github.com/memgraph/ai-toolkit
Project-URL: Issues, https://github.com/memgraph/ai-toolkit/issues
Project-URL: Documentation, https://github.com/memgraph/ai-toolkit/tree/main/agents/sql2graph
Author-email: Memgraph <tech@memgraph.com>
Maintainer-email: Memgraph <tech@memgraph.com>
License-Expression: MIT
License-File: LICENSE
Keywords: database-migration,etl,graph-database,knowledge-graph,memgraph,sql-to-graph
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.40.0
Requires-Dist: langchain-anthropic>=0.3.0
Requires-Dist: langchain-core>=1.0.0
Requires-Dist: langchain-google-genai>=2.0.0
Requires-Dist: langchain-openai>=0.2.0
Requires-Dist: langchain>=1.0.0
Requires-Dist: langgraph>=0.2.0
Requires-Dist: memgraph-toolbox>=0.1.4
Requires-Dist: mysql-connector-python>=9.0.0
Requires-Dist: neo4j>=5.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: psycopg2-binary>=2.9
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymysql>=1.1.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: sqlalchemy>=2.0.0
Description-Content-Type: text/markdown

# SQL Database to Graph Migration Agent

Intelligent database migration agent that transforms SQL databases (MySQL, PostgreSQL) into graph databases, powered by LLM analysis and LangGraph workflows.

## Overview

This package provides a sophisticated migration agent that:

- **Analyzes SQL database schemas** - Automatically discovers tables, relationships, and constraints
- **Generates optimal graph models** - Uses AI to create node and relationship structures
- **Creates indexes and constraints** - Ensures performance and data integrity
- **Handles complex relationships** - Converts foreign keys to graph relationships
- **Incremental refinement** - Review each table, adjust the model
  immediately, then enter the interactive refinement loop once all tables
  are processed
- **Comprehensive validation** - Verifies migration results and data integrity

## Installation

```bash
# Install the package
uv pip install .

# Or install in development mode
uv pip install -e .
```

## Quick Start

Run the migration agent:

```bash
uv run main
```

The agent will guide you through:

1. Environment setup and database connections
2. Graph modeling strategy selection
3. Automatic or incremental migration mode
4. Complete migration workflow with progress tracking

> **Incremental review:** The LLM now drafts the entire graph model in a single
> shot and then walks you through table-level changes detected since the last
> migration. You only need to approve (or tweak) the differences that matter.

You can also preconfigure the workflow using CLI flags or environment variables:

```bash
uv run main --mode incremental --strategy llm --meta-graph reset --log-level DEBUG
```

| Option                                 | Environment          | Description                                                   |
| -------------------------------------- | -------------------- | ------------------------------------------------------------- |
| `--mode {automatic,incremental}`       | `SQL2MG_MODE`        | Selects automatic or incremental modeling flow.               |
| `--strategy {deterministic,llm}`       | `SQL2MG_STRATEGY`    | Chooses deterministic or LLM-powered HyGM strategy.           |
| `--provider {openai,anthropic,gemini}` | `LLM_PROVIDER`       | Selects LLM provider (auto-detects if not specified).         |
| `--model MODEL_NAME`                   | `LLM_MODEL`          | Specifies LLM model name (uses provider default if not set).  |
| `--meta-graph {auto,skip,reset}`       | `SQL2MG_META_POLICY` | Controls how stored meta graph data is used (default `auto`). |
| `--log-level LEVEL`                    | `SQL2MG_LOG_LEVEL`   | Sets logging verbosity (`DEBUG`, `INFO`, etc.).               |
| `--mapping PATH`                       | —                    | Generate/edit a mapping JSON file instead of running migration.|
| `--editor CMD`                         | `EDITOR`             | Editor for opening mapping files (e.g. `vim`, `code --wait`). |

## Mapping Mode

Use `--mapping` to generate or edit a mapping file that describes how SQL tables and columns map to graph nodes and edges — without running an actual migration.

```bash
# Generate a new mapping from the source database
uv run main --mapping output/mapping.json

# Re-open an existing mapping for editing
uv run main --mapping output/mapping.json
```

When the mapping file does not exist, the agent connects to the source database, analyzes the schema, builds a graph model, writes the mapping JSON, and enters the interactive editor. When the file already exists, it is loaded directly into the editor.

### Interactive Mapping Editor

Inside the editor you can use slash commands or natural language:

```
Commands:
  /edit    - open the mapping JSON in $EDITOR (vi by default)
  /save    - save changes and exit
  /cancel  - discard changes and exit

Or describe changes in natural language (sent to LLM), e.g.:
  Add a Person label node mapped from the people table
  Rename label Person to User
  Remove the KNOWS relationship
```

The LLM sees both the current and original model state, so requests like "go back to the original names" work correctly. An LLM provider is auto-detected from available API keys for natural language editing; `/edit` always works regardless.

### Docker Usage

Build and run with `Dockerfile.local` for local development:

```bash
docker build -f Dockerfile.local -t memgraph/structured2graph .
docker run -d --rm --net memgql-net --name structured2graph-dev \
  --env-file .env -v $(pwd)/output:/output \
  --entrypoint sleep memgraph/structured2graph infinity
docker exec -it structured2graph-dev uv run main.py --mapping /output/mapping.json
```

> **Note:** If your `.env` file quotes values (e.g. `ANTHROPIC_API_KEY="sk-..."`),
> the agent strips the surrounding quotes automatically so `docker run --env-file`
> works correctly.

## Configuration

Set up your environment variables in `.env`:

```bash
# Select source database (mysql or postgresql)
SOURCE_DB_TYPE=postgresql

# PostgreSQL Database (used when SOURCE_DB_TYPE=postgresql)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DATABASE=pagila
POSTGRES_USER=username
POSTGRES_PASSWORD=password
POSTGRES_SCHEMA=public

# MySQL Database (used when SOURCE_DB_TYPE=mysql)
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_DATABASE=sakila
MYSQL_USER=username
MYSQL_PASSWORD=password

# Memgraph Database
MEMGRAPH_URL=bolt://localhost:7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=
MEMGRAPH_DATABASE=memgraph

# LLM API Keys (for LLM-powered features - choose one or more)
OPENAI_API_KEY=your_openai_key         # For GPT models
# ANTHROPIC_API_KEY=your_anthropic_key # For Claude models
# GOOGLE_API_KEY=your_google_key       # For Gemini models

# LLM Provider Configuration (optional - auto-detects if not set)
# LLM_PROVIDER=openai                  # Options: openai, anthropic, gemini
# LLM_MODEL=gpt-4o-mini                # Specific model name

# Optional migration defaults (override CLI prompts)
SQL2MG_MODE=automatic
SQL2MG_STRATEGY=deterministic
SQL2MG_META_POLICY=auto
SQL2MG_LOG_LEVEL=INFO
```

When switching `SOURCE_DB_TYPE` remember to update the matching credential block and rerun `uv sync` so dependencies like `psycopg2-binary` are installed for PostgreSQL support.

Make sure that Memgraph is started with the `--schema-info-enabled=true`, since agent uses the schema information from Memgraph `SHOW SCHEMA INFO`.

## Multi-LLM Provider Support

The agent supports multiple LLM providers for AI-powered graph modeling:

### Supported Providers

- **OpenAI** (GPT models) - Default: `gpt-4o-mini`
- **Anthropic** (Claude models) - Default: `claude-sonnet-4-20250514`
- **Google** (Gemini models) - Default: `gemini-1.5-pro`

### Usage Examples

```bash
# Auto-detect provider based on API keys
uv run main --strategy llm

# Use specific provider
uv run main --strategy llm --provider anthropic

# Use specific model
uv run main --strategy llm --provider openai --model gpt-4o

# All options together
uv run main --mode incremental --strategy llm --provider gemini --model gemini-1.5-flash
```

All providers support **structured outputs** for consistent graph model generation. The system automatically validates schemas using Pydantic models.

📖 **[Full Multi-Provider Documentation](docs/MULTI_PROVIDER_SUPPORT.md)**

# Arhitecture

```
core/hygm/
├── hygm.py # Main orchestrator class
├── models/ # Data models and structures
│ ├── graph_models.py # Core graph representation
│ ├── llm_models.py # LLM-specific models
│ ├── operations.py # Interactive operations
│ └── sources.py # Source tracking
└── strategies/ # Modeling strategies
├── base.py # Abstract interface
├── deterministic.py # Rule-based modeling
└── llm.py # AI-powered modeling
```
