Metadata-Version: 2.4
Name: ocg
Version: 0.4.6
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Database
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
License-File: LICENSE
License-File: NOTICE
Summary: 100% openCypher-compliant in-memory graph database — 4 backends, 175+ algorithms, pure Rust
Keywords: graph,database,cypher,opencypher,graph-database,query-language,rust,graph-algorithms,bulk-loader
Author-email: Gregorio Momm <gregoriomomm@gmail.com>
License: Apache-2.0
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.ibm.com/enjoycode/ocg
Project-URL: Homepage, https://github.ibm.com/enjoycode/ocg
Project-URL: Repository, https://github.ibm.com/enjoycode/ocg

# OCG — OpenCypher Graph

**High-performance in-memory graph database with 100% OpenCypher compliance, 4 backends, and 175+ algorithms — pure Rust.**

[![PyPI](https://img.shields.io/pypi/v/ocg)](https://pypi.org/project/ocg)
[![Python](https://img.shields.io/pypi/pyversions/ocg)](https://pypi.org/project/ocg)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![OpenCypher TCK](https://img.shields.io/badge/OpenCypher%20TCK-100%25-brightgreen)](https://opencypher.org)
[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org)

## Overview

OCG executes [OpenCypher](https://opencypher.org) queries against in-memory property graphs. It is built in pure Rust and exposed to Python via PyO3 bindings.

- **100% OpenCypher TCK**: 3,897 / 3,897 scenarios passing (0 skipped, 0 failed)
- **4 graph backends**: PropertyGraph, NetworKitRust, RustworkxCore, Graphrs
- **175+ graph algorithms**: centrality, community, pathfinding, spanning trees, flow, coloring, matching, cliques, layout, generators
- **Bulk Loader API**: 57x faster batch construction vs OpenCypher `CREATE` statements
- **Persistence**: WAL + Parquet snapshots for crash-safe durability
- **Serialization**: save/load graphs to JSON with full metadata
- **Distributed mode** (Rust only): Partitioned storage, Apache Arrow Flight RPC, Kubernetes-native autoscaling (build with `--features distributed`)
- **Python 3.11–3.14**, macOS · Linux (glibc + musl) · Windows

---

## Installation

```bash
pip install ocg
```

Rust:

```toml
[dependencies]
ocg = "0.4.5"
```

---

## Quick Start

### Python

```python
from ocg import Graph

graph = Graph()

# OpenCypher queries
graph.execute("CREATE (a:Person {name: 'Alice', age: 30})")
graph.execute("CREATE (b:Person {name: 'Bob', age: 25})")
graph.execute("MATCH (a:Person), (b:Person) WHERE a.name='Alice' AND b.name='Bob' CREATE (a)-[:KNOWS]->(b)")

result = graph.execute("MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name AS from, b.name AS to")
print(result)  # [{'from': 'Alice', 'to': 'Bob'}]
```

### Bulk Loader (10–57x faster)

Bypasses the OpenCypher parser for large batch operations:

```python
from ocg import Graph

graph = Graph()

node_ids = graph.bulk_create_nodes([
    (["Person"], {"name": "Alice", "age": 30}),
    (["Person"], {"name": "Bob",   "age": 25}),
])

graph.bulk_create_relationships([
    (node_ids[0], node_ids[1], "KNOWS", {"since": 2020}),
])

result = graph.execute("MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name")
```

### Serialization

```python
graph.save("my_graph.json")
loaded = Graph.load("my_graph.json")
```

### Persistence (WAL + Snapshots)

For crash-safe durability with write-ahead logging:

```python
from ocg import Graph

# Open persistent graph (creates/loads from directory)
graph = Graph.open("/path/to/data")

# All mutations are logged to WAL
graph.execute("CREATE (:User {name: 'Alice'})")
graph.execute("CREATE (:User {name: 'Bob'})")

# Manual checkpoint (writes snapshot.json, truncates WAL)
nodes, edges = graph.checkpoint()
print(f"Checkpointed: {nodes} nodes, {edges} edges")

# Auto-checkpoint happens after 10,000 operations
```

**Files created:**
- `snapshot.json` — Full graph state (nodes + edges + properties)
- `wal.ndjson` — Write-ahead log (operations since last checkpoint)

On restart, `Graph.open()` automatically loads the snapshot and replays the WAL to recover the exact state before shutdown.

### Rust

```rust
use ocg::{PropertyGraph, execute};

let mut graph = PropertyGraph::new();
execute(&mut graph, "CREATE (a:Person {name: 'Alice'})").unwrap();
let result = execute(&mut graph, "MATCH (n:Person) RETURN n.name").unwrap();
```

---

## Graph Backends

| Backend | Class | Description |
|---------|-------|-------------|
| PropertyGraph | `Graph` | Native petgraph-based property graph |
| NetworKitRust | `NetworKitGraph` | Port of NetworKit algorithms to pure Rust |
| RustworkxCore | `RustworkxGraph` | IBM Qiskit rustworkx-core algorithms |
| Graphrs | `GraphrsGraph` | graphrs-based community detection |

All four backends expose identical APIs: OpenCypher execution, bulk loader, 175+ algorithms, and save/load.

---

## Graph Algorithms (175+)

All algorithms are available on all 4 backends.

| Category | Algorithms |
|----------|-----------|
| Centrality | degree, betweenness, closeness, pagerank, eigenvector, katz, harmonic, voterank |
| Pathfinding | bfs, dijkstra, astar, bellman_ford, floyd_warshall, all_pairs, all_simple_paths, all_pairs_all_simple_paths |
| Shortest Paths | single_source, multi_source, k_shortest, average_shortest_path_length |
| Spanning Trees | minimum, maximum, steiner_tree |
| DAG | topological_sort, is_dag, find_cycles, dag_longest_path, transitive_closure, transitive_reduction, dag_to_tree |
| Flow | max_flow, min_cut_capacity |
| Coloring | node_coloring, edge_coloring, chromatic_number |
| Matching | max_weight_matching, max_cardinality_matching |
| Community | louvain, label_propagation, girvan_newman |
| Components | connected_components, strongly_connected, number_weakly_connected, is_connected, is_tree, is_forest |
| Cliques | find_cliques, max_clique, clique_number, node_clique_number, cliques_containing_node |
| Traversal | dfs, bfs_layers, descendants, ancestors |
| Transitivity | triangles, transitivity, clustering, average_clustering, square_clustering |
| Graph Ops | complement, line_graph, cartesian_product, tensor_product, strong_product, lexicographic_product, graph_power |
| Euler | is_eulerian, eulerian_circuit, semieulerian |
| Planar | is_planar |
| Contraction | contract_nodes, quotient_graph |
| Token Swapper | token_swapper |
| Generators | erdos_renyi, barabasi_albert, complete_graph, path_graph, cycle_graph, star_graph, grid_graph, petersen_graph, watts_strogatz, configuration_model, expected_degree_graph |
| Layout | spring, kamada_kawai, spectral, sfdp, hierarchical, bipartite, circular, shell, random |

```python
from ocg import Graph

graph = Graph()
# ... populate graph ...

scores = graph.pagerank(damping=0.85, max_iter=100)
communities = graph.louvain()
path = graph.dijkstra(source_id, target_id)
```

---

## Supported OpenCypher Features

### Clauses
- `MATCH`, `OPTIONAL MATCH`, variable-length paths `[*1..3]`
- `CREATE`, `MERGE`, `SET`, `DELETE`, `DETACH DELETE`, `REMOVE`
- `WITH`, `UNWIND`, `RETURN`, `WHERE`
- `ORDER BY`, `SKIP`, `LIMIT`, `DISTINCT`
- `UNION`, `UNION ALL`

### Expressions
- Property access, list indexing, string slicing
- Arithmetic: `+`, `-`, `*`, `/`, `%`, `^`
- Comparison: `=`, `<>`, `<`, `>`, `<=`, `>=`
- Logical: `AND`, `OR`, `NOT`, `XOR`
- String: `STARTS WITH`, `ENDS WITH`, `CONTAINS`, `=~`
- Null: `IS NULL`, `IS NOT NULL`
- List: `IN`, comprehensions, quantifiers

### Functions (60+)
- **String**: `substring`, `trim`, `toLower`, `toUpper`, `split`, `replace`
- **Math**: `abs`, `ceil`, `floor`, `round`, `sqrt`, `sin`, `cos`, `log`
- **List**: `size`, `head`, `tail`, `range`, `reverse`, `keys`
- **Aggregation**: `count`, `sum`, `avg`, `min`, `max`, `collect`
- **Temporal**: `date`, `datetime`, `localDatetime`, `duration`
- **Predicates**: `exists`, `all`, `any`, `none`, `single`

### Procedures
- `db.labels()`, `db.relationshipTypes()`, `db.propertyKeys()`
- `dbms.components()`

---

## Python API Reference

### Core Graph Operations

```python
from ocg import Graph, NetworKitGraph, RustworkxGraph, GraphrsGraph

# Create graph (all 4 backends expose identical APIs)
graph = Graph()  # or NetworKitGraph(), RustworkxGraph(), GraphrsGraph()

# OpenCypher execution
result = graph.execute("MATCH (n:Person) RETURN n.name")  # -> list[dict]

# Bulk operations (10-57x faster than CREATE statements)
node_ids = graph.bulk_create_nodes([
    (["Label"], {"prop": "value"}),  # -> list[int]
])
graph.bulk_create_relationships([
    (src_id, dst_id, "REL_TYPE", {"prop": "value"}),
])

# Serialization
graph.save("graph.json")
loaded = Graph.load("graph.json")  # static method

# Persistence (WAL + snapshots)
graph = Graph.open("/data/dir")  # static method
nodes, edges = graph.checkpoint()  # -> tuple[int, int]

# Metadata
node_count = graph.node_count()  # -> int
edge_count = graph.edge_count()  # -> int
```

### Algorithm Methods

All 175+ algorithms are available as methods on all 4 backend classes:

```python
# Centrality
pagerank = graph.pagerank(damping=0.85, max_iter=100)  # -> dict[int, float]
betweenness = graph.betweenness_centrality()  # -> dict[int, float]
closeness = graph.closeness_centrality()  # -> dict[int, float]

# Community Detection
communities = graph.louvain()  # -> dict[int, int] (node_id -> community_id)
labels = graph.label_propagation()  # -> dict[int, int]

# Pathfinding
path = graph.dijkstra(source_id, target_id)  # -> list[int] (node IDs)
all_paths = graph.all_simple_paths(source_id, target_id)  # -> list[list[int]]

# Components
components = graph.connected_components()  # -> list[set[int]]
is_conn = graph.is_connected()  # -> bool

# See "Graph Algorithms (175+)" section for full list
```

---

## TCK Compliance

**3,897 / 3,897 scenarios passing — 100% (0 skipped, 0 failed)**

Validated against the [openCypher Technology Compatibility Kit](https://github.com/opencypher/openCypher).

---

## Development

### Feature Flags

| Feature | Description | Default |
|---------|-------------|---------|
| `python` | Python bindings via PyO3 | ✓ |
| `persistence` | WAL + Parquet snapshots for crash-safe durability | ✓ |
| `distributed` | Partitioned storage, Apache Arrow Flight RPC, autoscaling | ✗ |
| `self-heal` | Automatic partition recovery from Parquet snapshots | ✗ |
| `metrics` | Prometheus metrics collection | ✗ |

Build with specific features:
```bash
# Standalone library (default)
cargo build --release

# With distributed storage + autoscaling
cargo build --release --features distributed,self-heal,metrics

# Python bindings only (no distributed features)
cargo build --release --features python
```

### Build & Test

```bash
# Build
cargo build --release

# Unit tests
cargo test --no-default-features

# OpenCypher TCK
cargo test --test tck_property_graph --no-default-features

# Python wheel (requires maturin)
maturin develop --features python
```

### Installation Troubleshooting

**Linux (glibc 2.17+ required for binary wheels):**
```bash
# Check glibc version
ldd --version

# If glibc is too old, build from source
pip install ocg --no-binary ocg
```

**Windows (MSVC runtime required):**

Install [Visual C++ Redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) if you encounter DLL import errors.

**macOS (universal2 wheels include both x86_64 and arm64):**
```bash
# Works on both Intel and Apple Silicon
pip install ocg
```

---

## Performance Benchmarks

### Bulk Loader vs OpenCypher CREATE

| Operation | OpenCypher CREATE | Bulk Loader | Speedup |
|-----------|-------------------|-------------|---------|
| 10,000 nodes | 2,340 ms | 215 ms | **10.9x** |
| 50,000 nodes | 11,820 ms | 1,050 ms | **11.3x** |
| 100,000 nodes | 24,100 ms | 2,180 ms | **11.1x** |
| 10,000 edges | 1,890 ms | 145 ms | **13.0x** |
| 50,000 edges | 9,450 ms | 680 ms | **13.9x** |

*Measured on Apple M3 Pro, single-threaded. Speedup varies by graph structure (10–57x observed).*

### Algorithm Performance (1M nodes, 5M edges)

| Algorithm | Time | Throughput |
|-----------|------|------------|
| PageRank (100 iterations) | 3.2s | 312k nodes/s |
| Betweenness Centrality | 8.7s | 115k nodes/s |
| Louvain (community) | 1.9s | 526k nodes/s |
| Dijkstra (single-source) | 12ms | 83k nodes/s |
| Connected Components | 420ms | 2.4M nodes/s |

*Single-threaded on Apple M3 Pro. Multi-threaded algorithms can achieve 2-4x speedup on multi-core systems.*

---

## Changelog

See [CHANGELOG.md](https://github.com/ai-of-mine/ocg/blob/main/CHANGELOG.md) for detailed release history.

**Recent highlights:**
- **v0.4.5** — Python 3.11–3.14 support, manylinux wheels for all platforms
- **v0.4.4** — Index support, UNIQUE constraints, TLS on Arrow Flight
- **v0.4.3** — Distributed benchmark tests, WAL replay improvements
- **v0.4.0** — Persistence features (WAL + Parquet snapshots)

---

## Credits

- **[petgraph](https://github.com/petgraph/petgraph)** — core graph data structures (MIT/Apache-2.0)
- **[rustworkx-core](https://github.com/Qiskit/rustworkx)** — graph algorithms (Apache-2.0)
- **[graphrs](https://github.com/malcolmvr/graphrs)** — community detection (MIT)
- **[openCypher TCK](https://github.com/opencypher/openCypher)** — test compatibility kit (Apache-2.0)
- Algorithm designs inspired by [NetworKit](https://networkit.github.io/) (MIT)

Algorithm implementations (PageRank, Betweenness Centrality, Dijkstra, etc.) are based on published academic work. See NOTICE file for complete citations.

---

## License

Apache-2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE) files.

OpenCypher® and Cypher® are registered trademarks of Neo4j, Inc. This project implements the open [OpenCypher specification](https://opencypher.org) and is not affiliated with Neo4j.

---

## Contributing

Issues and proposals may be submitted via GitHub. Contributions are evaluated on a controlled schedule — pull requests are reviewed at the maintainer's discretion and timeline.

