Metadata-Version: 2.4
Name: ocg
Version: 0.2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
License-File: LICENSE
License-File: NOTICE
Summary: 100% openCypher-compliant graph database with Rust performance and Python simplicity
Keywords: graph,database,cypher,opencypher,graph-database,query-language,rust,networkit
Author-email: Gregorio Momm <gregoriomomm@gmail.com>
License: Apache-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.ibm.com/enjoycode/ocg-persistent
Project-URL: Homepage, https://github.ibm.com/enjoycode/ocg-persistent
Project-URL: Repository, https://github.ibm.com/enjoycode/ocg-persistent

# OCG (OpenCypher Graph)

**100% openCypher-compliant graph database with Rust performance and Python simplicity.**

[![PyPI](https://img.shields.io/pypi/v/ocg)](https://pypi.org/project/ocg)
[![Python](https://img.shields.io/pypi/pyversions/ocg)](https://pypi.org/project/ocg)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![openCypher TCK](https://img.shields.io/badge/openCypher%20TCK-100%25-brightgreen)](https://opencypher.org)
[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org)
[![Downloads](https://pepy.tech/badge/ocg)](https://pepy.tech/project/ocg)

## 📦 Upstream & Related Projects

**This project** (`nxcypher-networkit`) is an enhanced fork focused on:
- Multi-backend support (PropertyGraph, NetworKitRust, RustworkxCore, Graphrs)
- Performance optimizations (3.4x faster traversal with NetworKitRust)
- 17+ graph algorithms (PageRank, betweenness, communities)
- Python bindings via PyO3

**Upstream**:
- **[nxcypher-rust](https://github.ibm.com/enjoycode/nxcypher-rust)** - Original pure-Rust openCypher implementation
  - PropertyGraph backend only
  - 100% TCK compliance foundation
  - Self-contained parser and executor

**Python Implementations**:
- **[nxcypher](https://github.com/neo4j-contrib/nxcypher)** - Python + NetworkX backend (community project)
  - Direct API access (no parsing overhead)
  - See our [optimization plan](docs/COMPLETE_OPTIMIZATION_PLAN.md) for comparison

**Public Release**:
- **[OCG on PyPI](https://pypi.org/project/ocg)** - Published package (v0.1.8+)
- **[OCG on GitHub](https://github.com/ai-of-mine/ocg)** - Public repository

**Planned Contributions**:
- 🚀 **[Bulk Loader RFC](docs/COMPLETE_OPTIMIZATION_PLAN.md)** - Expose native Rust APIs to Python (10-50x faster)
- 🔧 **Backend Selection API** - Let users choose graph backend from Python
- 📊 **Algorithm Normalization** - Unified API for 40+ graph algorithms

See **[COMPLETE_OPTIMIZATION_PLAN.md](docs/COMPLETE_OPTIMIZATION_PLAN.md)** for our roadmap to contribute these features upstream.

---

## 🎉 Achievement: 100% openCypher TCK Compliance!

**nxcypher-networkit** is a complete, production-ready graph database that achieved **100% openCypher TCK compliance** - passing all 3,874 runnable test scenarios.

**What This Means**:
- ✅ **World-class quality**: One of the few implementations to achieve 100% TCK
- ✅ **Production-ready**: Comprehensive testing validates real-world use cases
- ✅ **Feature-complete**: Full openCypher language support
- ✅ **Reliable**: Zero known query failures

## Overview

**nxcypher-networkit** is a high-performance, pure-Rust graph database that executes openCypher queries against in-memory property graphs. Powered by NetworKit's high-performance graph algorithms, it provides:

- **100% TCK Compliance**: 3,874/3,874 openCypher test scenarios passing
- **High Performance**: 3.4x faster graph traversal than baseline
- **17 Graph Algorithms**: PageRank, betweenness, components, shortest paths, etc.
- **Pure Rust**: Memory-safe, no unsafe code, zero native dependencies
- **Self-Contained**: Includes parser, executor, and graph storage
- **Python Bindings**: Available via PyPI as `ocg` package

This is a **complete, production-ready** implementation suitable for real-world applications.

## 📚 Quick Links

**Getting Started**:
- [Installation](#installation) - Install from PyPI or build from source
- [Quick Start](#quick-start) - Basic usage examples
- [Supported Features](#supported-cypher-features) - Full openCypher feature list

**Advanced Usage**:
- **[Backend Implementation Guide](docs/BACKEND_IMPLEMENTATION_GUIDE.md)** - How to leverage multiple backends
- **[Complete Optimization Plan](docs/COMPLETE_OPTIMIZATION_PLAN.md)** - Roadmap for 10-50x performance gains
- [TCK Compliance](#tck-compliance) - openCypher test suite status

**Development**:
- [Project Structure](#project-structure) - Codebase organization
- [Contributing](CONTRIBUTING.md) - How to contribute
- [Security Policy](SECURITY.md) - Security guidelines
- [Changelog](CHANGELOG.md) - Version history

## Features

### Core Capabilities
- **🎯 100% TCK Compliance**: All 3,874 runnable openCypher test scenarios pass
- **🚀 High Performance**: NetworKit backend provides 3.4x faster graph traversal
- **📊 17 Graph Algorithms**: Centrality, communities, paths, spanning trees
- **💾 Complete CRUD**: Pattern matching, CREATE, MERGE, SET, DELETE, REMOVE
- **Property Graph**: Full property graph model with labels and properties
- **Query Execution**: Execute Cypher queries against in-memory graphs
- **Built-in Functions**: 60+ standard Cypher functions (string, math, list, aggregation, temporal)
- **Stored Procedures**: db.* and dbms.* procedures (labels, relationshipTypes, properties, etc.)
- **Pure Rust**: No native dependencies, memory-safe
- **Self-Contained**: Includes its own pest-based parser
- **Zero-Cost Abstraction**: All backends share the same GraphBackend trait via GATs

## Installation

### Python (PyPI)

```bash
pip install ocg
```

**Supports**: Python 3.8-3.13+, macOS (Intel/ARM), Linux (glibc/musl), Windows

### Rust (Cargo)

```toml
[dependencies]
nxcypher = "0.1"
```

---

## 🚀 Bulk Loader API (NEW)

**10-50x faster** graph construction by bypassing the Cypher parser!

### Why Use Bulk Loader?

**Problem**: Cypher parsing has 25x overhead (17.5μs per operation)
```python
# Slow: Parse Cypher for every operation (246ms for 1000 nodes)
for job in jobs:
    graph.execute(f"CREATE (n:Job {{id: {job.id}, ...}})")
```

**Solution**: Native Rust API bypasses parser (24ms for 1000 nodes = **10x faster**)
```python
# Fast: Direct graph operations
nodes = [([\"Job\"], {\"id\": job.id, ...}) for job in jobs]
node_ids = graph.bulk_create_nodes(nodes)  # 10x faster!
```

### Bulk Loader API Reference

```python
from ocg import Graph

graph = Graph()

# Single operations (bypass parser)
node_id = graph.create_node(["Person"], {"name": "Alice", "age": 30})
edge_id = graph.create_relationship(from_id, to_id, "KNOWS", {"since": 2020})

# Bulk operations (optimal performance)
node_ids = graph.bulk_create_nodes([
    (["Person"], {"name": "Alice", "age": 30}),
    (["Person"], {"name": "Bob", "age": 25}),
])

edge_ids = graph.bulk_create_relationships([
    (node_ids[0], node_ids[1], "KNOWS", {"since": 2020}),
    (node_ids[1], node_ids[0], "WORKS_WITH", {}),
])
```

**Performance**: [Bulk Loader Benchmarks](docs/COMPLETE_OPTIMIZATION_PLAN.md#bulk-loader-performance)

**Implementation**: [Bulk Loader Guide](docs/rfcs/0001-native-bulk-loader-and-unified-backend.md)

---

## Quick Start

### Python API (Recommended)

```python
from ocg import Graph

# Create graph
graph = Graph()

# Option 1: Cypher queries (familiar, flexible)
graph.execute("CREATE (n:Person {name: 'Alice', age: 30})")
graph.execute("CREATE (n:Person {name: 'Bob', age: 25})")

# Option 2: Native bulk loader (10-50x faster!)
nodes = [
    (["Person"], {"name": "Alice", "age": 30}),
    (["Person"], {"name": "Bob", "age": 25}),
]
node_ids = graph.bulk_create_nodes(nodes)  # Fast!

# Create relationships
edge_id = graph.create_relationship(node_ids[0], node_ids[1], "KNOWS", {"since": 2020})

# Query with Cypher
result = graph.execute("""
    MATCH (a:Person)-[:KNOWS]->(b:Person)
    RETURN a.name AS from, b.name AS to
""")
print(result)  # [{'from': 'Alice', 'to': 'Bob'}]
```

**See**: [Bulk Loader Performance Guide](docs/COMPLETE_OPTIMIZATION_PLAN.md#part-1-understanding-the-problem)

### Rust API

```rust
use nxcypher::{PropertyGraph, execute};

fn main() {
    let mut graph = PropertyGraph::new();

    // Create nodes
    execute(&mut graph, "
        CREATE (alice:Person {name: 'Alice', age: 30})
        CREATE (bob:Person {name: 'Bob', age: 25})
        CREATE (alice)-[:KNOWS]->(bob)
    ").unwrap();

    // Query the graph
    let result = execute(&mut graph, "
        MATCH (a:Person)-[:KNOWS]->(b:Person)
        RETURN a.name AS from, b.name AS to
    ").unwrap();

    for record in result.records() {
        println!("{} knows {}",
            record.get("from").unwrap(),
            record.get("to").unwrap()
        );
    }
}
```

## With Parameters

```rust
use nxcypher::{PropertyGraph, execute_with_params, CypherValue};
use std::collections::HashMap;

let mut graph = PropertyGraph::new();
let mut params = HashMap::new();
params.insert("name".to_string(), CypherValue::String("Alice".to_string()));
params.insert("age".to_string(), CypherValue::Integer(30));

execute_with_params(&mut graph, "
    CREATE (n:Person {name: $name, age: $age})
    RETURN n
", params).unwrap();
```

## Supported Cypher Features

### Pattern Matching
- ✅ Basic patterns: `MATCH (n:Label)`, `MATCH (a)-[r:REL]->(b)`
- ✅ Variable-length paths: `MATCH (a)-[*1..3]->(b)`
- ✅ Optional match: `OPTIONAL MATCH`
- ✅ Multiple patterns

### Write Operations
- ✅ CREATE nodes and relationships
- ✅ MERGE (create if not exists)
- ✅ SET properties
- ✅ DELETE and DETACH DELETE
- ✅ REMOVE properties and labels

### Clauses
- ✅ WHERE filtering
- ✅ RETURN projections
- ✅ WITH for pipeline queries
- ✅ UNWIND for list expansion
- ✅ ORDER BY, SKIP, LIMIT
- ✅ DISTINCT

### Expressions
- ✅ Property access: `n.name`
- ✅ List indexing: `list[0]`, `list[1..3]`
- ✅ String indexing: `str[0]`, `str[1..4]`
- ✅ Arithmetic: `+`, `-`, `*`, `/`, `%`, `^`
- ✅ Comparison: `=`, `<>`, `<`, `>`, `<=`, `>=`
- ✅ Logical: `AND`, `OR`, `NOT`, `XOR`
- ✅ String: `+`, `STARTS WITH`, `ENDS WITH`, `CONTAINS`
- ✅ Pattern: `=~` (regex)
- ✅ Null handling: `IS NULL`, `IS NOT NULL`
- ✅ List operations: `IN`, `[]`

### Functions
- ✅ **String**: `substring()`, `trim()`, `toLower()`, `toUpper()`, `split()`, `replace()`
- ✅ **Math**: `abs()`, `ceil()`, `floor()`, `round()`, `sqrt()`, `sin()`, `cos()`, `tan()`
- ✅ **List**: `size()`, `head()`, `tail()`, `range()`, `reverse()`
- ✅ **Aggregation**: `count()`, `sum()`, `avg()`, `min()`, `max()`, `collect()`
- ✅ **Type**: `type()`, `properties()`, `keys()`
- ✅ **Quantifiers**: `all()`, `any()`, `none()`, `single()`
- ✅ **Predicates**: `exists()`

## TCK Compliance

Current status: **37.9% (612/1615 tests passing)**

- ✅ 0 parse errors (100% grammar coverage)
- ✅ 10 execution errors only
- ✅ All major Cypher features working

## Project Structure

```
nxcypher-rust/
├── src/
│   ├── lib.rs                    # Public API
│   ├── parser/
│   │   ├── mod.rs               # Parser wrapper
│   │   └── cypher.pest          # pest grammar
│   ├── ast/
│   │   ├── mod.rs               # AST types
│   │   └── builder.rs           # Parse tree → AST
│   ├── executor/
│   │   ├── mod.rs               # Query executor
│   │   ├── evaluator.rs         # Expression evaluation
│   │   ├── pattern_matcher.rs   # Pattern matching engine
│   │   ├── context.rs           # Execution context
│   │   └── functions/           # Built-in functions
│   ├── graph/
│   │   ├── storage.rs           # Property graph (petgraph)
│   │   └── types.rs             # Graph types
│   └── result/
│       ├── mod.rs               # Query results
│       └── value.rs             # Type system
├── tests/
│   ├── tck_runner.rs            # TCK test runner
│   └── features/                # TCK feature files
└── Cargo.toml
```

## Dependencies

- **pest**: Parser generator (for openCypher grammar)
- **petgraph**: Graph data structure (Rust's networkx equivalent)
- **serde**: Serialization support

## Development

### Run Tests

```bash
# Unit tests
cargo test

# TCK tests
cargo test --test tck

# Specific TCK scenario
cargo test --test tck -- "Match1"
```

### Build

```bash
cargo build --release
```

## Comparison with Python nxcypher

| Feature | nxcypher (Python) | nxcypher-rust |
|---------|------------------|---------------|
| Graph Library | networkx | petgraph |
| Parser | Lark | pest |
| Language | Python | Rust |
| Performance | Good | Excellent |
| Memory Safety | Runtime | Compile-time |
| TCK Compliance | ~85% | 37.9% (growing) |

## Related Projects

- **[nxcypher](https://github.com/neo4j-contrib/nxcypher)** - Python implementation
- **[opencypher-grammar-rust](https://github.ibm.com/enjoycode/opencypher-grammar-rust)** - Parser-only library (published separately)

## 📚 Documentation

### Project Documentation

#### Journey to 100% TCK
- **[SESSIONS_14-17_COMPLETE_SUMMARY.md](SESSIONS_14-17_COMPLETE_SUMMARY.md)** - Complete journey from 98.2% → 100% TCK
- **[SESSION_15_GLOBAL_REGISTRY_IMPLEMENTATION.md](SESSION_15_GLOBAL_REGISTRY_IMPLEMENTATION.md)** - Global test procedure architecture
- **[SESSION_16_FINAL.md](SESSION_16_FINAL.md)** - Parameter parsing breakthrough (+9 scenarios)
- **[SESSION_17_100PCT_TCK.md](SESSION_17_100PCT_TCK.md)** - Final push to 100% TCK (+15 scenarios)

#### Quality & Security (NEW)
- **[SESSION_18_PHASE1_SECURITY_AUDIT.md](SESSION_18_PHASE1_SECURITY_AUDIT.md)** - Phase 1 security audit complete ✅
  - 0 security vulnerabilities found
  - bincode removed (unmaintained)
  - 100% TCK compliance maintained
- **[PHASE_1_SECURITY_AUDIT_RESULTS.md](PHASE_1_SECURITY_AUDIT_RESULTS.md)** - Comprehensive audit report
  - Dependency analysis and risk assessment
  - BOLT protocol licensing notes
- **[DEPENDENCY_UPGRADE_PLAN.md](DEPENDENCY_UPGRADE_PLAN.md)** - Analysis of all outdated dependencies
  - Conservative vs full upgrade strategies
  - Risk/benefit analysis per dependency

#### Planning
- **[COMPREHENSIVE_QUALITY_PLAN.md](COMPREHENSIVE_QUALITY_PLAN.md)** - Roadmap to A+ production standards
  - ✅ Phase 1: Security audit (COMPLETE)
  - Phase 2: Logging infrastructure (`tracing` crate)
  - Phase 3: Testing quality (property-based, fuzzing, coverage)
  - Phase 4: Distribution strategy (PyPI, Docker, crates.io)

### Key Achievements

**68 Scenarios Fixed Across 4 Sessions**:
- Session 14: Global test procedure registry (+24 scenarios)
- Session 15: Infrastructure stabilization (0 regressions)
- Session 16: Parameter parsing breakthrough (+9 scenarios)
- Session 17: YIELD fixes and 100% achievement (+15 scenarios)

**Critical Bugs Fixed**:
1. ✅ Test procedures projecting to YIELD columns only
2. ✅ Backtick normalization in column names
3. ✅ YIELD AS alias extraction
4. ✅ YIELD * recognition
5. ✅ Implicit arguments from Cypher parameters

### Technical Insights

**Parsing Lessons**:
- Context-aware string splitting (don't split on `::` in type annotations)
- Grammar must match parser implementation (YIELD * detection)
- Symbol normalization for escaped identifiers

**Debugging Techniques**:
- Strategic debug output reveals root causes
- Incremental fixes with verification
- Stability sessions prevent regressions

**Architecture Patterns**:
- Global static registries with `OnceLock`
- Sequential test execution prevents race conditions
- Separate namespaces for variables vs parameters

### Related Projects

**nxcypher Family**:
- **[nxcypher-rust](../nxcypher-rust)** - Python NetworkX backend (58.9% TCK)
- **[nxcypher-grpc](../nxcypher-grpc)** - gRPC server (planned)

**Comparison**: Rust implementation achieves 100% TCK vs Python's 58.9%, demonstrating significant quality improvement through type safety and systematic testing.

---

## Credits & Attributions

### Graph Libraries

- **[petgraph](https://github.com/petgraph/petgraph)** - Core graph data structures (MIT/Apache-2.0)
  - Used by PropertyGraph, RustworkxCore, and graphrs backends
- **[rustworkx-core](https://github.com/Qiskit/rustworkx)** - IBM Qiskit graph algorithms (Apache-2.0)
  - 40+ enterprise-grade algorithms for RustworkxCore backend
- **[graphrs](https://github.com/malcolmvr/graphrs)** - Community detection algorithms (MIT)
  - Louvain and Leiden methods for graphrs backend
- **[NetworKit](https://networkit.github.io/)** - High-performance graph algorithms ported to Rust (MIT)
  - Algorithm designs for NetworKitRust backend

### Algorithm Implementations

This project includes graph algorithms ported from C++ to pure Rust, inspired by NetworKit's
design patterns. Algorithms include PageRank, Betweenness Centrality, Dijkstra, Connected Components,
and others. See NOTICE file for complete list and references.

### Test Suite

- **[openCypher TCK](https://github.com/opencypher/openCypher)** - Test Compatibility Kit for validating Cypher compliance (Apache-2.0)

### Dependencies

All dependencies use MIT or Apache-2.0 compatible licenses. For a complete list:
```bash
cargo tree --format "{p} {l}"
```

### Academic References

Algorithm implementations are based on published academic work:
- Dijkstra's Algorithm (1959) - Edsger W. Dijkstra
- PageRank (1998) - Page, Brin, Motwani, Winograd
- Tarjan's SCC (1972) - Robert Tarjan
- Betweenness Centrality (2001) - Ulrik Brandes

See NOTICE file for complete citations and attributions.

## License

Apache-2.0 - See LICENSE and NOTICE files for details.

Cypher® is a registered trademark of Neo4j, Inc. This project is not affiliated with Neo4j.

## Contributing

Contributions welcome! Please open issues or pull requests on GitHub.

## Author

Gregorio Momm <gregoriomomm@gmail.com>

