Metadata-Version: 2.4
Name: parsica-memory
Version: 2.2.0
Summary: File-based persistent memory for AI agents. Zero dependencies.
Author-email: Antaris Analytics <dev@antarisanalytics.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/Antaris-Analytics-LLC/parsica-memory
Project-URL: Documentation, https://memory.parsica.ai
Project-URL: Repository, https://github.com/Antaris-Analytics-LLC/parsica-memory
Keywords: ai,memory,agents,llm,persistence,recall
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: embeddings
Requires-Dist: openai>=1.0; extra == "embeddings"
Provides-Extra: mcp
Requires-Dist: mcp>=1.0; extra == "mcp"
Provides-Extra: all
Requires-Dist: openai>=1.0; extra == "all"
Requires-Dist: mcp>=1.0; extra == "all"
Provides-Extra: pro
Requires-Dist: openai>=1.0; extra == "pro"
Requires-Dist: mcp>=1.0; extra == "pro"
Dynamic: license-file

# 🧠 parsica-memory

**Persistent, intelligent memory for AI agents.** The flagship package of the [Antaris Analytics](https://antarisanalytics.ai) suite.

[![PyPI version](https://img.shields.io/pypi/v/parsica-memory)](https://pypi.org/project/parsica-memory/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/)
[![Zero dependencies](https://img.shields.io/badge/dependencies-zero-brightgreen)](https://pypi.org/project/parsica-memory/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)

---

## What Is This?

AI agents are stateless by default. Every spawn is a cold start. `parsica-memory` gives agents a persistent, searchable, intelligent memory store that:

- **Remembers** across sessions, spawns, and restarts
- **Retrieves** the right memories using an 11-layer BM25+ search engine
- **Decays** old memories gracefully so signal stays high
- **Learns** from mistakes, facts, and procedures with specialized memory types
- **Shares** knowledge across multi-agent teams via shared pools
- **Enriches** itself via LLM hooks to dramatically improve recall
- **Cross-session recall** — semantic memories surface across all sessions automatically

No vector database. No API keys required. No external services. Just `pip install` and go.

---

## ⚡ Quick Start

```bash
pip install parsica-memory
```

```python
from parsica_memory import MemorySystem

mem = MemorySystem(workspace="./memory", agent_name="my-agent")
mem.load()

# Store a memory
mem.ingest("Deployed v2.3.1 to production. All checks green.",
          source="deploy-log", session_id="session-123")

# Search with cross-session recall
results = mem.search("production deployment",
                     session_id="session-456",
                     cross_session_recall="semantic")
for r in results:
    print(r.content)

mem.save()
```

That's it. No config files needed.

---

## 📦 Installation

```bash
pip install parsica-memory
```

**Version:** 5.2.1
**Requirements:** Python 3.8+ · Zero external dependencies · stdlib only

---

## What's New in v5.5.1

### Bootstrap File Size Guard
`MemorySystem.check_bootstrap_files()` scans agent workspace files (MEMORY.md, AGENTS.md, SOUL.md, etc.) and warns when they approach or exceed OpenClaw's 35,000 char injection limit. Silent truncation causes agents to lose memory context without any visible error.

```python
warnings = mem.check_bootstrap_files()
# Returns list of warning strings, empty = all OK
```

`get_health()` now includes `bootstrap_files_ok` in its checks dict and surfaces bootstrap warnings in the result for easy monitoring integration.

### Enricher: ANTARIS_LLM_API_KEY Support
`auto_enricher()` and `anthropic_enricher()` now check `ANTARIS_LLM_API_KEY` first (preferred for OpenClaw users), then fall back to `ANTHROPIC_API_KEY`. No extra configuration needed for OpenClaw deployments.

---

## What's New in v5.2

### Cross-Session Recall

Control what memories cross session boundaries:

```python
results = mem.search(
    "API key format",
    session_id="session-B",
    cross_session_recall="semantic"  # "all" | "semantic" | "none"
)
```

- `"all"` — no filtering (default, backward compatible)
- `"semantic"` — other sessions' memories only if classified as semantic (facts, decisions, preferences)
- `"none"` — strict session isolation

Filter is applied **after** BM25 scoring and WindowEntry resolution, so every individual entry is correctly checked.

### Auto Memory Type Classification (v5.1)

Memories are automatically classified as `semantic` or `episodic` at ingest time. No manual tagging needed. Classification uses keyword heuristics — facts, decisions, and preferences become `semantic`; events and task logs become `episodic`.

### Session & Channel Provenance (v5.1)

Every memory tracks where it came from:

```python
mem.ingest("important fact", session_id="session-abc", channel_id="ops-channel")
```

---

## 🔑 Key Features

### 11-Layer Search Engine

Every query runs through a full pipeline:

1. **BM25+ TF-IDF** — baseline relevance with delta floor
2. **Exact Phrase Bonus** — verbatim matches score 1.5×
3. **Field Boosting** — tags 1.2×, category 1.3×, source 1.1×
4. **Rarity & Proper Noun Boost** — rare terms up to 2×, proper nouns 1.5×
5. **Positional Salience** — intro/conclusion windows 1.3×
6. **Semantic Expansion** — PPMI co-occurrence query widening
7. **Intent Reranker** — temporal, entity, howto detection
8. **Qualifier & Negation** — "failed" ≠ "successful"
9. **Clustering Boost** — coherent result groups score higher
10. **Embedding Reranker** — local MiniLM embeddings (no API needed)
11. **Pseudo-Relevance Feedback** — top results refine the query

### Memory Types

| Type | Decay Rate | Importance | Use Case |
|---|---|---|---|
| `episodic` | Normal | 1× | General events |
| `semantic` | Normal | 1× | Facts, decisions — crosses sessions |
| `fact` | Normal | High recall | Verified knowledge |
| `mistake` | 10× slower | 2× | Never forget failures |
| `preference` | 3× slower | 1× | User/agent preferences |
| `procedure` | 3× slower | 1× | How-to knowledge |

### LLM Enrichment

Pass an enricher callable to boost recall quality:

```python
def my_enricher(content: str) -> dict:
    # Call any LLM — returns tags, summary, keywords, search_queries
    return {"tags": [...], "summary": "...", "keywords": [...], "search_queries": [...]}

mem = MemorySystem(workspace="./memory", agent_name="my-agent", enricher=my_enricher)
```

Enriched fields get boosted weights: `search_queries` 3×, `enriched_summary` 2×, `search_keywords` 2×.

### Context Packets

Cold-spawn solution for sub-agents:

```python
packet = mem.build_context_packet(
    task="Deploy the auth service",
    max_tokens=3000,
    include_mistakes=True
)
markdown = packet.render()  # Inject into sub-agent system prompt
```

### Graph Intelligence

Automatic entity extraction and knowledge graph:

```python
path = mem.entity_path("payment-service", "database", max_hops=3)
triples = mem.graph_search(subject="PostgreSQL", relation="used_by")
entity = mem.get_entity("PostgreSQL")
```

### Tiered Storage

| Tier | Age | Behavior |
|---|---|---|
| Hot | 0–3 days | Always loaded |
| Warm | 3–14 days | Loaded on-demand |
| Cold | 14+ days | Requires `include_cold=True` |

### Input Gating

P0–P3 priority classification drops noise before it enters the store:

```python
mem.ingest_with_gating("ok thanks", source="chat")  # → dropped (P3)
mem.ingest_with_gating("Production outage: auth down", source="incident")  # → stored (P0)
```

### Shared / Team Memory

```python
from parsica_memory import AgentRole

pool = mem.enable_shared_pool(
    pool_dir="./shared",
    pool_name="project-alpha",
    agent_id="worker-1",
    role=AgentRole.WRITER
)
mem.shared_write("Research complete: competitor uses GraphQL", namespace="research")
results = mem.shared_search("competitor API", namespace="research")
```

### MCP Server

```bash
python -m parsica_memory serve --workspace ./memory --agent-name my-agent
```

Works with Claude Desktop and any MCP-compatible client.

---

## 🖥️ CLI

```bash
# Initialize a workspace
python -m parsica_memory init --workspace ./memory --agent-name my-agent

# Check status
python -m parsica_memory status --workspace ./memory

# Rebuild knowledge graph
python -m parsica_memory rebuild-graph --workspace ./memory

# Start MCP server
python -m parsica_memory serve --workspace ./memory --agent-name my-agent
```

---

## 🔧 Core API

```python
from parsica_memory import MemorySystem

mem = MemorySystem(
    workspace="./memory",          # Required
    agent_name="my-agent",         # Required — scopes the store
    half_life=7.0,                 # Decay half-life in days
    enricher=None,                 # LLM enrichment callable
    use_sharding=True,             # Enterprise sharding
    tiered_storage=True,           # Hot/warm/cold tiers
    graph_intelligence=True,       # Entity extraction + graph
    quality_routing=True,          # Follow-up pattern detection
    semantic_expansion=True,       # PPMI query expansion
)

# Lifecycle
mem.load()                         # Load from disk → entry count
mem.save()                         # Save to disk → path
mem.flush()                        # WAL → shards
mem.close()                        # Flush + release

# Ingestion
mem.ingest(content, source=..., session_id=..., channel_id=..., memory_type=...)
mem.ingest_fact(content, source=...)
mem.ingest_mistake(what_happened=..., correction=..., root_cause=..., severity=...)
mem.ingest_preference(content, source=...)
mem.ingest_procedure(content, source=...)
mem.ingest_file(path, category=...)
mem.ingest_directory(dir_path, category=..., pattern="*.md")
mem.ingest_url(url, depth=2, incremental=True)
mem.ingest_data_file(path, format="auto")
mem.ingest_with_gating(content, source=..., context=...)

# Search
mem.search(query, limit=10, session_id=..., cross_session_recall="semantic",
           tags=..., memory_type=..., explain=True, include_cold=False)
mem.search_with_context(query, cooccurrence_boost=True)
mem.recent(limit=20)
mem.on_date("2024-03-15")
mem.between("2024-03-01", "2024-03-31")

# Graph
mem.graph_search(subject=..., relation=..., obj=...)
mem.entity_path(source, target, max_hops=3)
mem.get_entity(canonical)
mem.get_graph_stats()
mem.rebuild_graph()

# Context Packets
mem.build_context_packet(task=..., max_tokens=3000, include_mistakes=True)
mem.build_context_packet_multi(task=..., queries=[...], max_tokens=4000)

# Shared Pool
mem.enable_shared_pool(pool_dir=..., pool_name=..., agent_id=..., role=AgentRole.WRITER)
mem.shared_write(content, namespace=...)
mem.shared_search(query, namespace=...)

# Enrichment
mem.re_enrich(batch_size=50)
mem.set_embedding_fn(fn)

# Maintenance
mem.compact()
mem.consolidate()
mem.compress_old(days=60)
mem.reindex()
mem.forget(topic=..., before_date=...)
mem.delete_source(source_url)
mem.mark_used(memory_ids=[...])
mem.boost_relevance(memory_id, multiplier=1.5)

# Stats & Health
mem.get_stats()  # or mem.stats()
mem.get_health()
mem.get_hot_entries(top_n=10)

# Export / Import
mem.export(output_path, include_metadata=True)
mem.import_from(input_path, merge=True)
mem.validate_data()
mem.migrate_to_v4()
```

---

## 🗺️ Feature Matrix

| Feature | Status | Since |
|---|---|---|
| Core ingestion & search | ✅ | v1.0 |
| Memory types (episodic/fact/mistake/procedure/preference/semantic) | ✅ | v1.0 |
| Temporal decay | ✅ | v1.0 |
| Context packets | ✅ | v1.1 |
| Export / Import | ✅ | v4.2 |
| GCS cloud backend | ✅ | v4.2 |
| LLM enrichment hooks | ✅ | v4.6.5 |
| Tiered storage (hot/warm/cold) | ✅ | v4.7 |
| Web & data file ingestion | ✅ | v4.7 |
| Graph intelligence (entity/relationship) | ✅ | v4.8/v4.9 |
| Shared / team memory pools | ✅ | v4.8 |
| 11-layer search architecture | ✅ | v4.x |
| Co-occurrence / PPMI semantic tier | ✅ | v4.x |
| Input gating (P0–P3 priority) | ✅ | v4.x |
| Hybrid BM25 + semantic embedding search | ✅ | v4.x |
| MCP server | ✅ | v4.9 |
| Auto memory type classification | ✅ | v5.1 |
| Session/channel provenance | ✅ | v5.1 |
| Cross-session memory recall | ✅ | v5.2 |
| doc2query (search query generation) | ✅ | v5.0.2 |
| Recovery system | ✅ | v3.3 |
| CLI tooling | ✅ | v4.x |

---

## 🏗️ Architecture

```
parsica-memory/
├── Core: MemorySystem, MemoryEntry, WAL
├── Storage: ShardManager, TierManager, GCS backend
├── Search: 11-layer BM25+ pipeline
├── Intelligence: EntityExtractor, MemoryGraph, LLM Enricher
├── Multi-Agent: SharedMemoryPool, AgentRoles
├── Context: ContextPacketBuilder
└── Server: MCP server, CLI
```

---

## Part of antaris-suite

`parsica-memory` is the core package of the [antaris-suite](https://github.com/Antaris-Analytics-LLC/antaris-suite) ecosystem:

- **parsica-memory** — persistent memory (this package)
- **antaris-guard** — input validation & safety
- **antaris-context** — context management
- **antaris-router** — intelligent model routing
- **antaris-pipeline** — orchestration pipeline
- **antaris-contracts** — shared type contracts

Also available as **[parsica-memory](https://pypi.org/project/parsica-memory/)** — same engine, standalone identity.

---

## 📄 License

Apache 2.0

---

## 🔗 Links

- **PyPI:** <https://pypi.org/project/parsica-memory/>
- **GitHub:** <https://github.com/Antaris-Analytics-LLC/antaris-suite>
- **Docs:** <https://docs.antarisanalytics.ai>
- **Website:** <https://antarisanalytics.ai>

---

*Built by [Antaris Analytics LLC](https://antarisanalytics.ai) for production AI agent deployments.*
