Metadata-Version: 2.4
Name: recollect-mcp
Version: 0.5.0
Summary: MCP server for Recollect
License-Expression: MIT
Requires-Python: >=3.12.4
Requires-Dist: humanize>=4.15.0
Requires-Dist: mcp>=1.26.0
Requires-Dist: recollect[openai]
Description-Content-Type: text/markdown

# recollect-mcp

MCP server exposing cognitive memory via the Recollect SDK. 5 tools, 3 resources, server-managed sessions.

## Install

```bash
pip install recollect-mcp
```

## Usage

```bash
# stdio (default)
recollect-mcp

# streamable-http
recollect-mcp --transport streamable-http
```

## Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `MEMORY_USER_ID` | Yes | -- | Scopes all operations to this user. Server refuses to start without it. |
| `DATABASE_URL` | Yes | `postgresql://localhost:5432/memory_sdk` | PostgreSQL connection string. |
| `MEMORY_LLM_PROVIDER` | No | (auto) | LLM provider: `anthropic`, `openai`, `openai-compat`, `pydantic-ai`. Empty = auto-fallback (Anthropic -> OpenAI-compat). |
| `ANTHROPIC_API_KEY` | For default provider | -- | Anthropic API key for LLM extraction. |
| `OPENAI_API_KEY` | For OpenAI provider | -- | OpenAI API key. |
| `ANTHROPIC_MODEL` | No | `claude-haiku-4-5-20251001` | Override Anthropic extraction model. |
| `OPENAI_MODEL` | No | `gpt-5-mini` | Override OpenAI extraction model. |
| `OLLAMA_MODEL` | No | `qwen3.5` | Override Ollama extraction model. |
| `OLLAMA_BASE_URL` | No | `http://localhost:11434/v1` | Ollama API endpoint. |
| `PYDANTIC_AI_MODEL` | For pydantic-ai provider | -- | pydantic-ai model string in `provider:model` format (e.g., `ollama:ministral-3`, `anthropic:claude-haiku-4-5-20251001`). |
| `MEMORY_EXTRACTION_MAX_TOKENS` | No | `4096` | Max tokens for LLM extraction. Increase for reasoning models that consume thinking tokens. |
| `OPENAI_REASONING_EFFORT` | No | `low` | Reasoning effort for OpenAI structured output (`low`, `medium`, `high`). |
| `OPENAI_STRUCTURED_MAX_TOKENS` | No | `1024` | Token cap for OpenAI structured output. Reasoning tokens consume this budget. |
| `MEMORY_CONFIG` | No | -- | Path to custom TOML config file. |
| `HF_HUB_OFFLINE` | No | -- | Set to `1` to skip HuggingFace HTTP checks on startup. Use after the embedding model has been cached locally. |
| `SERVER_HOST` | No | `localhost` | Server bind host (streamable-http transport). |
| `SERVER_PORT` | No | `8000` | Server bind port (streamable-http transport). |
| `MEMORY_RECALL_TOKENS_ENABLED` | No | `true` | Enable recall token disambiguation. |
| `MEMORY_RECALL_TOKENS_TOP_K` | No | `5` | Max related traces for token assessment. |
| `MEMORY_RECALL_TOKENS_THRESHOLD` | No | `0.3` | Min similarity for related trace lookup. |
| `MEMORY_RECALL_TOKENS_STRENGTH_THRESHOLD` | No | `0.1` | Min token strength to activate. |
| `MEMORY_RECALL_TOKENS_SCORE_BONUS` | No | `0.1` | Gated additive bonus per token. |
| `MEMORY_RECALL_TOKENS_REINFORCE_BOOST` | No | `0.1` | Strength increment on activation. |
| `MEMORY_RECALL_TOKENS_DECAY_FACTOR` | No | `0.9` | Inactive token decay per consolidation. |

## Client configuration

Add to `.mcp.json` (Claude Code) or `claude_desktop_config.json` (Claude Desktop):

```json
{
  "mcpServers": {
    "memory": {
      "command": "uvx",
      "args": ["recollect-mcp"],
      "env": {
        "MEMORY_USER_ID": "your-user-id",
        "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
        "ANTHROPIC_API_KEY": "sk-ant-..."
      }
    }
  }
}
```

## Providers

The server requires an LLM provider for concept extraction (entities, tags, persona facts). Set `MEMORY_LLM_PROVIDER` to select explicitly, or omit it for auto-fallback (Anthropic -> OpenAI-compat -> fail). The pydantic-ai provider must be selected explicitly.

| `MEMORY_LLM_PROVIDER` | Provider class | Required env vars |
|------------------------|---------------|-------------------|
| `anthropic` | `AnthropicProvider` | `ANTHROPIC_API_KEY` |
| `openai` | `OpenAIProvider` | `OPENAI_API_KEY` |
| `openai-compat` | `OpenAICompatProvider` | `OLLAMA_MODEL` + `OLLAMA_BASE_URL` |
| (empty/unset) | Auto-fallback | Tries Anthropic, then OpenAI-compat |
| `pydantic-ai` | `PydanticAIProvider` | `PYDANTIC_AI_MODEL` + provider-specific vars (see below) |

### Anthropic (default)

```json
"env": {
  "MEMORY_USER_ID": "your-user-id",
  "MEMORY_LLM_PROVIDER": "anthropic",
  "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
  "ANTHROPIC_API_KEY": "sk-ant-...",
  "HF_HUB_OFFLINE": "1"
}
```

### OpenAI

```json
"env": {
  "MEMORY_USER_ID": "your-user-id",
  "MEMORY_LLM_PROVIDER": "openai",
  "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
  "OPENAI_API_KEY": "sk-...",
  "OPENAI_MODEL": "gpt-5-mini",
  "HF_HUB_OFFLINE": "1"
}
```

### Ollama / OpenAI-compatible

```json
"env": {
  "MEMORY_USER_ID": "your-user-id",
  "MEMORY_LLM_PROVIDER": "openai-compat",
  "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
  "OLLAMA_MODEL": "qwen3:8b",
  "OLLAMA_BASE_URL": "http://localhost:11434/v1",
  "HF_HUB_OFFLINE": "1"
}
```

### pydantic-ai (multi-provider)

Routes calls through pydantic-ai's Agent abstraction. The model string uses pydantic-ai format (`provider:model`). Provider-specific credentials (e.g., `ANTHROPIC_API_KEY`, `OLLAMA_BASE_URL`) are read from the environment by the underlying provider. Provider-specific model overrides like `OLLAMA_MODEL` or `ANTHROPIC_MODEL` are not used -- the model is embedded in `PYDANTIC_AI_MODEL`.

Ollama via pydantic-ai:

```json
"env": {
  "MEMORY_USER_ID": "your-user-id",
  "MEMORY_LLM_PROVIDER": "pydantic-ai",
  "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
  "PYDANTIC_AI_MODEL": "ollama:ministral-3",
  "OLLAMA_BASE_URL": "http://localhost:11434/v1",
  "HF_HUB_OFFLINE": "1"
}
```

Anthropic via pydantic-ai:

```json
"env": {
  "MEMORY_USER_ID": "your-user-id",
  "MEMORY_LLM_PROVIDER": "pydantic-ai",
  "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
  "PYDANTIC_AI_MODEL": "anthropic:claude-haiku-4-5-20251001",
  "ANTHROPIC_API_KEY": "sk-ant-...",
  "HF_HUB_OFFLINE": "1"
}
```

Reasoning models (Qwen3, DeepSeek-R1) use thinking tokens that count against the extraction budget. If `remember` returns extraction errors, increase `max_tokens` via a custom config file:

```toml
# memory.toml
[extraction]
max_tokens = 8192
pydantic_ai_model = "ollama:ministral-3"   # pydantic-ai provider:model format
```

### Custom configuration

Mount a `memory.toml` in the server's working directory, or set `MEMORY_CONFIG`:

```json
"env": {
  "MEMORY_USER_ID": "your-user-id",
  "DATABASE_URL": "postgresql://user@localhost:5432/dbname",
  "ANTHROPIC_API_KEY": "sk-ant-...",
  "MEMORY_CONFIG": "/path/to/memory.toml"
}
```

## Tools

| Tool | Parameters | Description |
|------|------------|-------------|
| `remember` | `content: str` | Store an experience. LLM extracts entities, concepts, significance, and persona facts. |
| `recall` | `query: str` | Retrieve relevant memories. Returns persona facts as context followed by matching traces. |
| `pin` | `content: str` | Promote a statement to a permanent persona fact (allergies, preferences, relationships). |
| `unpin` | `fact_id: str` | Remove a persona fact. |
| `forget` | `trace_id: str` | Delete a memory trace. |

## Resources

| URI | Description |
|-----|-------------|
| `memory://primer` | Relational graph of persona facts. Read at conversation start for user context. |
| `memory://facts` | All active persona facts with confidence scores and timestamps. |
| `memory://health` | Server and database health status. |

## Requirements

- Python 3.12+
- PostgreSQL 17 with pgvector

## License

MIT
