Metadata-Version: 2.4
Name: getgrip
Version: 0.3.3
Summary: A retrieval engine that learns your vocabulary, remembers what works, and knows when it doesn't have an answer.
Author: Grip Hub
License: Proprietary
Project-URL: Homepage, https://getgrip.dev
Project-URL: Documentation, https://github.com/Grip-Hub/getgrip.dev/blob/main/GUIDE.md
Project-URL: Repository, https://github.com/Grip-Hub/getgrip.dev
Project-URL: Bug Tracker, https://github.com/Grip-Hub/getgrip.dev/issues
Keywords: retrieval,search,rag,code-search,bm25,offline,no-embeddings,no-vector-db
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: fastapi>=0.95
Requires-Dist: uvicorn[standard]>=0.20
Requires-Dist: pydantic>=2.0
Requires-Dist: numpy>=1.20
Requires-Dist: requests>=2.28
Provides-Extra: license
Requires-Dist: cryptography>=41.0; extra == "license"
Provides-Extra: pdf
Requires-Dist: pypdf>=3.0; extra == "pdf"
Provides-Extra: rerank
Requires-Dist: sentence-transformers>=2.2; extra == "rerank"
Provides-Extra: llm
Requires-Dist: openai>=1.0; extra == "llm"
Requires-Dist: anthropic>=0.18; extra == "llm"
Requires-Dist: groq>=0.4; extra == "llm"
Provides-Extra: docs
Requires-Dist: pypdf>=4.0; extra == "docs"
Requires-Dist: python-docx>=1.0; extra == "docs"
Requires-Dist: openpyxl>=3.1; extra == "docs"
Requires-Dist: python-pptx>=0.6; extra == "docs"
Requires-Dist: striprtf>=0.0.26; extra == "docs"
Requires-Dist: odfpy>=1.4; extra == "docs"
Requires-Dist: xlrd>=2.0; extra == "docs"
Requires-Dist: olefile>=0.47; extra == "docs"
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.10; extra == "ocr"
Requires-Dist: Pillow>=9.0; extra == "ocr"
Requires-Dist: scikit-image>=0.20; extra == "ocr"
Provides-Extra: ocr-rapid
Requires-Dist: rapidocr>=3.0; extra == "ocr-rapid"
Provides-Extra: vision
Requires-Dist: transformers>=4.36; extra == "vision"
Requires-Dist: timm>=0.9; extra == "vision"
Requires-Dist: Pillow>=9.0; extra == "vision"
Requires-Dist: torch>=2.0; extra == "vision"
Provides-Extra: all
Requires-Dist: cryptography>=41.0; extra == "all"
Requires-Dist: pypdf>=4.0; extra == "all"
Requires-Dist: sentence-transformers>=2.2; extra == "all"
Requires-Dist: openai>=1.0; extra == "all"
Requires-Dist: anthropic>=0.18; extra == "all"
Requires-Dist: groq>=0.4; extra == "all"
Requires-Dist: python-docx>=1.0; extra == "all"
Requires-Dist: openpyxl>=3.1; extra == "all"
Requires-Dist: python-pptx>=0.6; extra == "all"
Requires-Dist: striprtf>=0.0.26; extra == "all"
Requires-Dist: odfpy>=1.4; extra == "all"
Requires-Dist: xlrd>=2.0; extra == "all"
Requires-Dist: olefile>=0.47; extra == "all"
Requires-Dist: pytesseract>=0.3.10; extra == "all"
Requires-Dist: Pillow>=9.0; extra == "all"
Requires-Dist: scikit-image>=0.20; extra == "all"

# GRIP

**Get a grip on your data.**

A knowledge engine that learns your data's vocabulary, reads documents including scanned pages and technical drawings, remembers what works, and tells the LLM when it doesn't have a good answer.

No embedding models. No vector databases. No API keys required. Fully offline.

[getgrip.dev](https://grip-hub.github.io/getgrip.dev/) | [User Guide](https://github.com/Grip-Hub/getgrip.dev/blob/main/GUIDE.md) | [GitHub](https://github.com/Grip-Hub/getgrip.dev)

---

## Install

```bash
pip install getgrip
getgrip                          # starts web UI + API on localhost:7878
```

```bash
# Ingest
curl -X POST localhost:7878/ingest \
  -H "Content-Type: application/json" \
  -d '{"paths": ["/path/to/your/data"]}'

# Search
curl "localhost:7878/search?q=valve+specification&top_k=5"
```

Open `http://localhost:7878` for the web UI.

---

## How it works

GRIP doesn't just search your documents. It learns them.

First query runs a full retrieval pass across every chunk in your corpus. That answer gets cached as a **knowledge artifact** with citations. Second query on the same topic returns instantly — zero LLM calls, sub-millisecond. A related question triggers one LLM call to adapt the cached knowledge. The corpus gets smarter with use.

| | Typical RAG | GRIP |
|---|---|---|
| Embedding model | Required | Not needed |
| Vector database | Required | Not needed |
| API keys | Required | Not needed |
| Reads scanned documents | No | Yes (OCR with confidence scoring) |
| Reads technical drawings | No | Yes (visual captioning + rotation OCR) |
| Learns your vocabulary | No | Yes (co-occurrence from your data) |
| Remembers what works | No | Yes (knowledge artifacts) |
| Knows when it doesn't know | No | Yes (confidence: HIGH/MEDIUM/LOW/NONE) |
| Detects stale answers | No | Yes (corpus versioning) |
| Works offline | Rarely | Fully air-gapped |

---

## File format support

**30+ formats:** PDF, Word (.docx), Excel (.xlsx/.xls), PowerPoint (.pptx), RTF, OpenDocument (ODS/ODT/ODP), CSV, Markdown, plain text, and all major code file types.

**Scanned documents:** Detects pages with no selectable text and runs OCR automatically. Three-tier engine fallback: PaddleOCR, RapidOCR, Tesseract. Mixed PDFs (some pages scanned, some digital) handled transparently.

**Technical drawings:** ISO drawings, P&IDs, schematics. Two parallel paths on every visual page: PaddleOCR with confidence-gated rotation (catches angled callouts), and Florence-2 visual captioning for structural descriptions.

**Confidence-tagged OCR:** Every OCR result carries a confidence score. Low-confidence text is tagged so the synthesis engine treats it as approximate.

---

## Knowledge artifacts

- **Exact hit** — Same question asked before? Zero LLM calls. Sub-millisecond response from cache with all citations.
- **Similar match** — Related question? One LLM call adapts the cached knowledge. Not a full retrieval pass.
- **Stale detection** — Source documents changed? Artifact flagged stale. Next query triggers a delta update.

---

## Features

- **Co-occurrence expansion** — learns which terms appear together in your data, expands queries automatically
- **Auto-remember** — reinforces successful queries, persists across restarts
- **Session context** — "tell me more" carries context from the previous query
- **Confidence scoring** — HIGH / MEDIUM / LOW / NONE so the LLM knows when to say "I don't know"
- **Exhaustive synthesis** — reads every chunk, not top-k. Nothing skipped
- **Academic citations** — BibTeX parsing, author-year formatting, page references
- **Plugin system** — sources (local, GitHub), chunkers (code-aware, generic), LLMs (Ollama, OpenAI, Anthropic, Groq)
- **9 API endpoints** + directory browser + web UI
- **Fully offline** — no cloud, no telemetry

---

## Benchmarks

### BEIR (6 datasets, 2,771 queries)

| Dataset | Corpus | BM25 | GRIP | Delta |
|---------|--------|------|------|-------|
| FEVER | 5,416,568 | 0.509 | **0.808** | +0.299 |
| HotpotQA | 5,233,329 | 0.595 | **0.741** | +0.146 |
| SciFact | 5,183 | 0.665 | **0.682** | +0.017 |
| NQ | 2,681,468 | 0.276 | **0.542** | +0.266 |
| FiQA | 57,638 | 0.232 | **0.347** | +0.116 |
| NFCorpus | 3,633 | 0.311 | **0.344** | +0.034 |

**Average NDCG@10: 0.58** — two-stage pipeline with optional MiniLM reranker (22M params).

### Accuracy (3,000 queries, no cherry-picking)

| Domain | Corpus | Accuracy |
|--------|--------|----------|
| Linux Kernel (code) | 188,209 chunks | **98.7%** |
| Wikipedia (encyclopedia) | 11.2M chunks | **98.5%** |
| Project Gutenberg (prose) | 173,817 chunks | **95.4%** |
| **Combined** | **3,000 queries** | **97.5%** |

### Scaling

Tested from 1,000 to 39.2 million documents. Streaming model R² = 0.999. Sublinear: 39,219x data, 140x latency.

---

## Integration

GRIP is a JSON API on localhost. Drop it into any existing stack:

```python
# LangChain — replace any vector store retriever
from langchain.schema import BaseRetriever, Document
import requests

class GRIPRetriever(BaseRetriever):
    grip_url: str = "http://localhost:7878"
    top_k: int = 5

    def _get_relevant_documents(self, query):
        r = requests.get(f"{self.grip_url}/search", params={"q": query, "top_k": self.top_k})
        data = r.json()
        return [
            Document(
                page_content=chunk["text"],
                metadata={"source": chunk["source"], "score": chunk["score"],
                          "confidence": data["confidence"]}
            )
            for chunk in data["results"]
        ]
```

Works with LangChain, LlamaIndex, or any HTTP client. Python, JavaScript, cURL, CI/CD pipelines.

---

## Docker

```bash
docker run -d -p 7878:7878 \
  -v grip-data:/data \
  -v /your/files:/code \
  griphub/grip:free
```

---

## Optional extras

```bash
pip install getgrip[pdf]        # PDF parsing
pip install getgrip[docs]       # All document formats (docx, xlsx, pptx, rtf, odt...)
pip install getgrip[ocr]        # OCR (pytesseract + Pillow, Apache-2.0)
pip install getgrip[vision]     # Visual pipeline (Florence-2 + OCR)
pip install getgrip[rerank]     # Cross-encoder reranking (MiniLM, 22M params)
pip install getgrip[llm]        # LLM answers (Ollama, OpenAI, Anthropic, Groq)
pip install getgrip[all]        # Everything
```

All extras are optional. Core retrieval works with zero extras installed.

---

## Pricing

All tiers include all features. The free tier has a 10,000 chunk limit (~3,500 files). No credit card. No time limit.

| Tier | Chunks | Price |
|------|--------|-------|
| Free | 10,000 | $0 |
| Personal | 100,000 | $499/year |
| Team | 500,000 | $1,499/year |
| Professional | 5,000,000 | $4,999/year |

One license per deployment. No per-seat fees. No per-query fees. Unlimited users.

Licensed tiers preserve learning data across deletions.

---

[User Guide](https://github.com/Grip-Hub/getgrip.dev/blob/main/GUIDE.md) | [getgrip.dev](https://grip-hub.github.io/getgrip.dev/) | [GitHub](https://github.com/Grip-Hub/getgrip.dev)
