Metadata-Version: 2.4
Name: vectors-db
Version: 0.1.0
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Summary: In-memory vector database with HNSW, BM25, and hybrid search
Keywords: vector-database,hnsw,bm25,hybrid-search,embeddings
Author-email: Nullcline Labs <info@nullcline.com>
License: AGPL-3.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/nullcline-labs/vectors.db
Project-URL: Issues, https://github.com/nullcline-labs/vectors.db/issues
Project-URL: Repository, https://github.com/nullcline-labs/vectors.db

<p align="center">
  <img src="assets/icon.png" width="128" height="128" alt="vectors.db icon">
</p>

# vectors.db

A lightweight, in-memory vector database with HNSW indexing, BM25 full-text search, and hybrid retrieval.

[![CI](https://github.com/nullcline-labs/vectors.db/actions/workflows/ci.yml/badge.svg)](https://github.com/nullcline-labs/vectors.db/actions/workflows/ci.yml)

## Features

- **HNSW vector search** with configurable M, ef_construction, ef_search
- **BM25 keyword search** with Okapi BM25 scoring
- **Hybrid search** combining vector + keyword via RRF or linear fusion
- **Scalar quantization** (f32 -> u8) for memory-efficient SIMD-friendly storage, with optional raw f32 vectors for maximum recall
- **Write-Ahead Log (WAL)** with CRC32 checksums and fsync for crash recovery
- **Encryption at rest** — AES-256-GCM for snapshots and WAL, with key file or env var
- **Auto-compaction** — automatic index rebuild when deleted nodes exceed a configurable threshold (default 20%)
- **WAL streaming replication** — active-passive HA with automatic snapshot sync and real-time WAL streaming
- **Structured audit logging** — WHO/WHAT/WHEN for all mutations, filterable via `RUST_LOG=audit=info`
- **RBAC with collection-scoped multi-tenancy** — Read/Write/Admin roles with optional per-key collection restrictions
- **Prometheus metrics** at `/metrics` with prebuilt **Grafana dashboard**
- **Request timeout** (30s) and **rate limiting** (100 req/s)
- **Batch insert** up to 1000 documents per request
- **Multiple distance metrics**: Cosine, Euclidean, Dot Product

## Quick Start

### Python library

```bash
pip install maturin
cd crates/python && maturin develop --release
```

```python
import vectorsdb

# Persistent database with WAL crash recovery
db = vectorsdb.VectorDB(data_dir="./data")
# or: db = vectorsdb.VectorDB()  # ephemeral (in-memory only)

db.create_collection("docs", dimension=3)

db.insert("docs", "hello world", [1.0, 0.0, 0.0], metadata={"tag": "greeting"})
db.insert("docs", "goodbye moon", [0.0, 1.0, 0.0])

# Vector search
results = db.search("docs", query_embedding=[1.0, 0.0, 0.0], k=5)
print(results[0].text)      # "hello world"
print(results[0].score)     # 1.0
print(results[0].metadata)  # {"tag": "greeting"}

# Keyword search
results = db.search("docs", query_text="moon", k=5)

# Hybrid search (vector + keyword)
results = db.search("docs", query_embedding=[1.0, 0.0, 0.0], query_text="hello", k=5)

# Filtered search
results = db.search("docs", query_embedding=[1.0, 0.0, 0.0], k=5, filter={
    "must": [{"field": "tag", "op": "eq", "value": "greeting"}]
})

# Snapshot + WAL truncation
db.save("docs")       # saves to ./data/docs.vdb, truncates WAL
db.compact()          # saves ALL collections + truncates WAL
```

Full IDE autocomplete and type hints are included via PEP 561 stubs. See [`examples/`](examples/) for more usage patterns.

### REST server

#### From source

```bash
cargo run --release -- --port 3030 --data-dir ./data
```

#### Docker

```bash
docker build -t vectors-db .
docker run -p 3030:3030 -v vectors-data:/data vectors-db
```

#### Docker Compose (primary + standby)

```bash
docker compose up -d                           # primary + standby
docker compose --profile monitoring up -d      # + Prometheus + Grafana
```

Grafana dashboard at [http://localhost:3000](http://localhost:3000), Prometheus at [http://localhost:9090](http://localhost:9090).

#### First steps

```bash
# Create a collection
curl -X POST http://localhost:3030/collections \
  -H "Content-Type: application/json" \
  -d '{"name": "my_collection", "dimension": 3}'

# Insert a document
curl -X POST http://localhost:3030/collections/my_collection/documents \
  -H "Content-Type: application/json" \
  -d '{"text": "hello world", "embedding": [0.1, 0.2, 0.3]}'

# Search
curl -X POST http://localhost:3030/collections/my_collection/search \
  -H "Content-Type: application/json" \
  -d '{"query_embedding": [0.1, 0.2, 0.3], "k": 5}'
```

## API Reference

| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check (no auth required) |
| GET | `/metrics` | Prometheus metrics (no auth required) |
| POST | `/collections` | Create a collection |
| GET | `/collections` | List all collections |
| DELETE | `/collections/:name` | Delete a collection |
| POST | `/collections/:name/documents` | Insert a document |
| POST | `/collections/:name/documents/batch` | Batch insert (max 1000) |
| GET | `/collections/:name/documents/:id` | Get document by ID |
| DELETE | `/collections/:name/documents/:id` | Delete document |
| POST | `/collections/:name/search` | Vector, keyword, or hybrid search |
| POST | `/collections/:name/save` | Save collection snapshot to disk |
| POST | `/collections/:name/load` | Load collection snapshot from disk |
| POST | `/admin/compact` | Save all collections and truncate WAL |
| POST | `/admin/promote` | Promote standby to primary (standby only) |

### Create Collection

```json
POST /collections
{
  "name": "my_collection",
  "dimension": 768,
  "m": 16,
  "ef_construction": 200,
  "ef_search": 50,
  "distance_metric": "cosine",
  "store_raw_vectors": false
}
```

`store_raw_vectors` (default `false`): when `true`, stores raw f32 vectors alongside quantized u8 for exact distance reranking (+0.75% recall, 5x RAM). See [Benchmarks](#benchmarks) for detailed comparison.

### Insert Document

```json
POST /collections/:name/documents
{
  "text": "document content",
  "embedding": [0.1, 0.2, ...],
  "metadata": {"key": "value"}
}
```

### Search

```json
POST /collections/:name/search
{
  "query_text": "search query",
  "query_embedding": [0.1, 0.2, ...],
  "k": 10,
  "min_similarity": 0.5,
  "alpha": 0.7,
  "fusion_method": "rrf"
}
```

## Configuration

### Environment Variables

| Variable | Description |
|----------|-------------|
| `VECTORS_DB_API_KEY` | Single bearer token for API authentication. If unset, auth is disabled. |
| `VECTORS_DB_API_KEYS` | JSON array for RBAC with optional collection scoping. See below. |
| `VECTORS_DB_ENCRYPTION_KEY` | 64-character hex string (32 bytes) for AES-256-GCM encryption at rest. |
| `RUST_LOG` | Log level filter (e.g., `vectorsdb_server=info`, `audit=info` for audit events) |

### Multi-Tenant RBAC

Use `VECTORS_DB_API_KEYS` to assign roles and restrict keys to specific collections:

```bash
VECTORS_DB_API_KEYS='[
  {"key": "admin-key", "role": "admin"},
  {"key": "tenant-a", "role": "write", "collections": ["tenantA_*"]},
  {"key": "reader", "role": "read", "collections": ["public", "shared_*"]}
]'
```

- **Roles**: `read` < `write` < `admin` (each inherits lower permissions)
- **`collections`** (optional): restricts the key to collections matching the listed patterns. Supports exact names (`"public"`) and prefix globs (`"tenantA_*"`). Omit for unrestricted access.
- **Admin bypass**: Admin keys always have full access regardless of `collections`.
- `GET /collections` is filtered to only show accessible collections for scoped keys.

### CLI Arguments

| Argument | Default | Description |
|----------|---------|-------------|
| `--port`, `-p` | 3030 | Port to listen on |
| `--data-dir`, `-d` | `./data` | Directory for WAL and snapshots |
| `--snapshot-interval` | 300 | Auto-snapshot interval in seconds (0 = disabled) |
| `--auto-compact-ratio` | 0.2 | Rebuild indices when >N% of nodes are deleted (0.0 = disabled) |
| `--encryption-key-file` | — | Path to encryption key file (32 raw bytes or 64-char hex). Overrides env var. |
| `--wal-strict` | false | Fail startup if WAL replay encounters errors |
| `--replication-port` | 0 (disabled) | TCP port for replication listener (primary mode) |
| `--standby-of` | — | `host:port` of primary to replicate from (standby mode) |
| `--replication-buffer` | 1024 | Broadcast channel capacity for WAL streaming |

## Architecture

```
                        ┌─────────────────────────────────┐
                        │          HTTP API (Axum)         │
                        │  auth · metrics · rate-limit     │
                        └──────────────┬──────────────────┘
                                       │
                        ┌──────────────▼──────────────────┐
                        │           Database               │
                        │   HashMap<String, Collection>    │
                        └──────────────┬──────────────────┘
                                       │
              ┌────────────────────────┼────────────────────────┐
              │                        │                        │
   ┌──────────▼─────────┐  ┌──────────▼─────────┐  ┌──────────▼─────────┐
   │    HNSW Index       │  │   BM25 Index       │  │   Hybrid Search    │
   │  scalar quantized   │  │  inverted index    │  │   RRF / linear     │
   └──────────┬──────────┘  └────────────────────┘  └────────────────────┘
              │
   ┌──────────▼──────────┐
   │    Distance Metrics  │
   │  cos · l2 · dot     │
   └─────────────────────┘

   Persistence:  WAL (append + CRC32 + fsync) → Snapshot (.vdb bincode)
   Replication:  WAL streaming (TCP) → Primary → Standby
```

## Benchmarks

Results on Apple Silicon (M-series), single-threaded, SIFT-128 (1M vectors, 128d, Euclidean):

#### Compact mode (`store_raw_vectors=false`, default)

| ef_search | Recall@10 | QPS | Memory |
|-----------|-----------|-----|--------|
| 10 | 0.7695 | 22,566 | 122 MB |
| 40 | 0.9450 | 9,152 | |
| 120 | 0.9853 | 3,661 | |
| 200 | 0.9898 | 2,325 | |
| 400 | 0.9916 | 1,275 | |

Build: 1,852 inserts/s

#### Exact mode (`store_raw_vectors=true`)

| ef_search | Recall@10 | QPS | Memory |
|-----------|-----------|-----|--------|
| 10 | 0.7716 | 22,940 | 610 MB |
| 40 | 0.9494 | 8,759 | |
| 120 | 0.9924 | 3,674 | |
| 200 | 0.9972 | 2,370 | |
| 400 | 0.9990 | 1,277 | |

Build: 1,912 inserts/s

Compact mode uses **5x less memory** with only ~0.7% recall loss. Exact mode matches hnsw(nmslib) at 0.9990 recall.

#### High-dimensional (768d, 1536d)

Synthetic data at LLM embedding dimensions. Compact vs exact at ef_search=400:

| Dimension | Compact Recall | Exact Recall | Compact QPS | Exact QPS |
|-----------|---------------|--------------|-------------|-----------|
| 768d (100K) | 0.9860 | 0.9993 | 1,209 | 1,311 |
| 1536d (25K) | 0.9880 | 1.0000 | 1,757 | 1,341 |

At high dimensions, exact mode is recommended for maximum recall (+1.3% at 768d). Build speed is comparable between modes thanks to cached dequantization during construction.

#### Filtered search & concurrency

| Benchmark | Result |
|-----------|--------|
| Filtered 50% selectivity | 0.9913 recall, 1,282 QPS |
| Filtered 1% selectivity | 0.9953 recall, 46 QPS |
| 8-thread concurrent | 10,878 QPS (5.0x scaling) |

Run benchmarks:

```bash
cargo bench
```

## License

AGPL-3.0

