Metadata-Version: 2.4
Name: retrievalx
Version: 0.1.3
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Requires-Dist: rank-bm25>=0.2.2 ; extra == 'bench'
Requires-Dist: numpy>=1.24 ; extra == 'bench'
Requires-Dist: pytest>=8.0 ; extra == 'dev'
Requires-Dist: maturin>=1.6,<2.0 ; extra == 'dev'
Requires-Dist: ruff>=0.6.0 ; extra == 'dev'
Requires-Dist: mypy>=1.11 ; extra == 'dev'
Provides-Extra: bench
Provides-Extra: dev
License-File: LICENSE
License-File: NOTICE
Summary: The complete BM25 engine for Python: production-scale, Rust-native
Keywords: bm25,information-retrieval,search,rag,ranking
Author: retrievalx maintainers
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/Saswatsusmoy/retrievalx/releases
Project-URL: Documentation, https://github.com/Saswatsusmoy/retrievalx/tree/main/docs
Project-URL: Homepage, https://github.com/Saswatsusmoy/retrievalx
Project-URL: Issues, https://github.com/Saswatsusmoy/retrievalx/issues
Project-URL: Repository, https://github.com/Saswatsusmoy/retrievalx

# retrievalx

The complete BM25 engine for Python: all major BM25 variants, production-scale internals, Rust-native performance.

## Highlights

- Rust core + PyO3 bindings, distributed as wheels.
- BM25 variants: Okapi, Plus, L, Adpt, F, T, Atire, Tf-Idf.
- Retrieval strategies: Exhaustive DAAT/TAAT, WAND, Block-Max WAND, MaxScore.
- Incremental updates with tombstones + explicit compaction.
- Persistence with binary snapshots, metadata sidecar, WAL replay, and mmap loading.
- Fusion utilities: RRF, linear combination, score normalization.
- Built-in BEIR benchmark, recall degradation report, and scoring variant comparison runner.

## Quickstart

```python
from retrievalx import BM25Index

index = BM25Index.from_documents([
    "rust and python",
    "information retrieval with bm25",
])

print(index.search("rust retrieval", top_k=5))
```

## Real-world Examples

- `examples/it_ticket_search.py`
- `examples/legal_clause_discovery.py`
- `examples/ecommerce_query_tuning.py`
- `examples/security_log_hunt.py`
- `examples/wal_crash_recovery.py`
- `examples/multilingual_news_monitor.py`
- `examples/production_hybrid_reranking.py`
- `examples/benchmark_retrievalx_vs_rank_bm25.py`

## Build

```bash
./scripts/check_all.sh
```

## Project Policies

- Contributing guide: [CONTRIBUTING.md](CONTRIBUTING.md)
- Security policy: [SECURITY.md](SECURITY.md)
- Support policy: [SUPPORT.md](SUPPORT.md)
- Governance model: [GOVERNANCE.md](GOVERNANCE.md)
- Release process: [RELEASING.md](RELEASING.md)

## License

Apache-2.0

