Welcome to VectrixDB
Where vectors come alive
Collections
Browse and manage vector collections
Console
Execute API requests directly
Search
Semantic vector search
Collections
Console
// Response will appear here
Collection
| ID | Metadata | Vector | |
|---|---|---|---|
| Select a collection to browse points | |||
Tutorials
Get Started with Demo Data
Load demo data to explore all VectrixDB features: semantic search, keyword search, reranking, and knowledge graphs.
Learn the basics: create a collection, add data, and perform your first search.
- Click Load Demo above
- Go to Collections → click
demo - Browse the points to see the data
- Go to Search and try all search modes
Find results by meaning, not just keywords. Ask questions in natural language.
Traditional search that matches exact words. Great for names, codes, or specific terms.
Dense search with cross-encoder re-ranking for improved accuracy.
Token-level matching for maximum accuracy. Best for complex queries.
Automatically extracts entities and relationships, enabling graph-based retrieval.
- Auto entity extraction (People, Places, Concepts)
- Relationship mapping between entities
- Graph traversal for connected results
POST /api/collections
{
"name": "my_docs",
"dimension": 384,
"metric": "cosine"
}
POST /api/collections/{name}/points
{
"points": [{
"id": "1",
"text": "Your document text",
"payload": {"source": "file.pdf"}
}]
}
POST /api/collections/{name}/text-search
{
"text": "your query",
"mode": "hybrid",
"limit": 10
}
POST /api/collections/{name}/text-search
{
"text": "query",
"filter": {
"category": "science"
}
}
Code Tips
Copy-paste Python examples for VectrixDB. Click any code block to copy.
The simplest way to use VectrixDB - one line to add, one line to search.
from vectrixdb import Vectrix # Create database and add data db = Vectrix("my_docs", tier="hybrid") db.add(["Python is great", "JavaScript is fun", "Rust is fast"]) # Search results = db.search("programming language", mode="hybrid") print(results.top.text) # Best match
With metadata and filters:
from vectrixdb import Vectrix db = Vectrix("products", tier="hybrid") db.add( texts=["iPhone 15 Pro", "Samsung Galaxy", "Pixel 8"], metadata=[ {"brand": "Apple", "price": 999}, {"brand": "Samsung", "price": 899}, {"brand": "Google", "price": 699} ] ) # Filter by metadata results = db.search("smartphone", filter={"brand": "Apple"}) print(results.top.text) # "iPhone 15 Pro"
dense - Vector similarity onlysparse - BM25 keywords onlyhybrid - Dense + Sparse + Rerankultimate - Hybrid + ColBERTExamples for each tier:
from vectrixdb import Vectrix # Simple semantic similarity db = Vectrix("docs", tier="dense") db.add(["The quick brown fox", "A fast auburn canine"]) results = db.search("swift fox", mode="dense")
from vectrixdb import Vectrix # Keywords + Meaning + Reranking db = Vectrix("scifact", tier="hybrid") db.add(["CRISPR enables precise gene editing", "mRNA vaccines trigger immune response"]) # You can use different modes on hybrid tier results_hybrid = db.search("gene therapy", mode="hybrid") results_dense = db.search("immune system", mode="dense") results_sparse = db.search("CRISPR", mode="sparse")
from vectrixdb import Vectrix # Hybrid + ColBERT late interaction db = Vectrix("scifact", tier="ultimate") db.add([ "Sleep deprivation impairs memory consolidation", "Exercise reduces Alzheimer's disease risk", "The gut microbiome affects mental health" ]) # Best for complex queries results = db.search( "How does sleep affect brain function?", mode="ultimate" )
from vectrixdb import Vectrix # Auto-extracts entities and relationships db = Vectrix("biomedical", tier="graph") db.add([ "Metformin treats type 2 diabetes.", "Metformin may have anticancer properties." ]) # Internally extracts: Metformin --[treats]--> type 2 diabetes results = db.search("What does metformin treat?")
import pandas as pd from vectrixdb import Vectrix df = pd.read_csv("data.csv") db = Vectrix("docs", tier="hybrid") db.add( texts=df["text"].tolist(), ids=df["id"].tolist(), metadata=df[["category", "author"]].to_dict("records") )
from vectrixdb import Vectrix from pathlib import Path db = Vectrix("docs", tier="hybrid") for f in Path("./docs").glob("**/*.md"): content = f.read_text(encoding="utf-8") db.add( texts=[content], ids=[str(f)], metadata=[{"filename": f.name}] )
from vectrixdb import Vectrix def chunk_text(text, chunk_size=500, overlap=50): """Split text into overlapping chunks""" words = text.split() chunks = [] for i in range(0, len(words), chunk_size - overlap): chunk = ' '.join(words[i:i + chunk_size]) if chunk: chunks.append(chunk) return chunks db = Vectrix("docs", tier="hybrid") # Chunk and add long_doc = "Very long document text..." * 1000 chunks = chunk_text(long_doc) db.add( texts=chunks, ids=[f"chunk_{i}" for i in range(len(chunks))], metadata=[{"chunk_idx": i} for i in range(len(chunks))] )
Use language="en" for bundled English models (faster, offline), or custom models.
from vectrixdb import Vectrix # English models - bundled, no download (~100MB total) db = Vectrix("docs", tier="hybrid", language="en") # Multilingual - auto-downloads on first use db = Vectrix("docs", tier="hybrid") # or language="multi"
# pip install sentence-transformers from vectrixdb import Vectrix # Standard models db = Vectrix("docs", model="sentence-transformers/all-MiniLM-L6-v2") db = Vectrix("docs", model="sentence-transformers/all-mpnet-base-v2") # BGE models (high quality) db = Vectrix("docs", model="BAAI/bge-small-en-v1.5") db = Vectrix("docs", model="BAAI/bge-large-en-v1.5") # Multilingual db = Vectrix("docs", model="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
# pip install openai from vectrixdb import Vectrix from openai import OpenAI client = OpenAI() # Uses OPENAI_API_KEY env var def openai_embed(texts): response = client.embeddings.create( model="text-embedding-3-small", input=texts ) return [r.embedding for r in response.data] db = Vectrix( "docs", tier="hybrid", embed_fn=openai_embed, dimension=1536 ) db.add(["Document 1", "Document 2"]) results = db.search("query")
# pip install cohere from vectrixdb import Vectrix import cohere co = cohere.Client("YOUR_API_KEY") def cohere_embed(texts): response = co.embed(texts=texts, model="embed-english-v3.0") return response.embeddings db = Vectrix( "docs", tier="hybrid", embed_fn=cohere_embed, dimension=1024 )
Use embedders directly for custom pipelines.
from vectrixdb.models import ( DenseEmbedder, # Dense vectors (384 dim) SparseEmbedder, # BM25 sparse vectors RerankerEmbedder, # Cross-encoder reranking LateInteractionEmbedder, # ColBERT/BGE-M3 (128/1024 dim) GraphExtractor, # Knowledge triplet extraction )
from vectrixdb.models import DenseEmbedder # English (bundled, faster) embedder = DenseEmbedder(language="en") # Multilingual (auto-download) embedder = DenseEmbedder() # Generate embeddings vectors = embedder.embed(["Hello world", "How are you"]) print(vectors.shape) # (2, 384)
from vectrixdb.models import RerankerEmbedder reranker = RerankerEmbedder(language="en") scores = reranker.score("What is AI?", [ "AI is artificial intelligence", "The weather is sunny", "Machine learning is AI", ]) print(scores) # [0.99, 0.01, 0.87] # Rerank and get sorted results ranked = reranker.rerank("query", docs, top_k=5)
from vectrixdb.models import LateInteractionEmbedder # English ColBERT (bundled, 128 dim) late = LateInteractionEmbedder(language="en") # Multilingual BGE-M3 (auto-download, 1024 dim) late = LateInteractionEmbedder() # Encode query and document query_emb = late.encode_query("What is machine learning?") doc_emb = late.encode_document("Machine learning is a subset of AI...") # MaxSim scoring score = late.max_sim(query_emb, doc_emb) print(f"MaxSim: {score:.4f}")
from vectrixdb.models import GraphExtractor extractor = GraphExtractor() triplets = extractor.extract("CRISPR-Cas9 can edit human DNA.") for t in triplets: print(f"{t.head} --[{t.relation}]--> {t.tail}") # Output: CRISPR-Cas9 --[can edit]--> human DNA
Build RAG agents with VectrixDB as the retriever.
from vectrixdb import Vectrix from langchain_core.retrievers import BaseRetriever from langchain_core.documents import Document from pydantic import Field from typing import List, Optional class VectrixRetriever(BaseRetriever): """LangChain retriever for VectrixDB.""" db: Vectrix = Field(description="VectrixDB instance") k: int = Field(default=4) mode: str = Field(default="hybrid") class Config: arbitrary_types_allowed = True def _get_relevant_documents(self, query: str, **kwargs) -> List[Document]: results = self.db.search(query, mode=self.mode, limit=self.k) return [ Document( page_content=r.text, metadata={"id": r.id, "score": r.score, **r.metadata} ) for r in results ]
from vectrixdb import Vectrix from langchain_openai import ChatOpenAI from langchain.chains import create_retrieval_chain from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import ChatPromptTemplate # Create VectrixDB + retriever db = Vectrix("docs", tier="hybrid") db.add(["Your documents..."]) retriever = VectrixRetriever(db=db, k=4, mode="hybrid") # Create RAG chain llm = ChatOpenAI(model="gpt-4o-mini") prompt = ChatPromptTemplate.from_messages([ ("system", "Answer based on context:\n{context}"), ("human", "{input}") ]) doc_chain = create_stuff_documents_chain(llm, prompt) rag_chain = create_retrieval_chain(retriever, doc_chain) # Use it result = rag_chain.invoke({"input": "What is...?"}) print(result["answer"])
- Requires OpenAI embeddings
- Single search mode
- No built-in reranking
- Bundled embeddings (free!)
- Dense, sparse, hybrid, ultimate
- Built-in cross-encoder reranking
# Install pip install vectrixdb # Set API key (enables write operations) # Linux/Mac: export VECTRIXDB_API_KEY="your-secret-key" # Windows: set VECTRIXDB_API_KEY=your-secret-key # Optional: Read-only API key (view but not modify) export VECTRIXDB_READ_ONLY_API_KEY="viewer-key" # Download models (optional - English models bundled) vectrixdb download-models # All multilingual vectrixdb download-models --type dense # Dense only vectrixdb download-models --type late_interaction # BGE-M3 # Start server with dashboard (CLI) vectrixdb serve --port 7337 # Start server without dashboard (CLI) vectrixdb serve --port 7337 --no-dashboard # Start server programmatically (Python) from vectrixdb.api.server import run_server run_server(host="0.0.0.0", port=7337, db_path="./vectrixdb_data") # Or with uvicorn directly import uvicorn from vectrixdb.api.server import create_app app = create_app(db_path="./vectrixdb_data") uvicorn.run(app, host="0.0.0.0", port=7337) # Database info vectrixdb info ./vectrixdb_data vectrixdb list ./vectrixdb_data # Check models vectrixdb models-info