Metadata-Version: 2.4
Name: pydstl
Version: 0.1.0
Summary: Distill scattered evidence into structured knowledge documents
Requires-Python: >=3.12
Requires-Dist: pydantic-ai>=0.1
Requires-Dist: pydantic>=2.0
Requires-Dist: sentence-transformers>=3.0
Requires-Dist: sqlite-vec>=0.1
Provides-Extra: dev
Requires-Dist: pre-commit>=4.0; extra == 'dev'
Requires-Dist: pyright>=1.1; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.8; extra == 'dev'
Description-Content-Type: text/markdown

# pydstl

A Python library that turns scattered knowledge — code reviews, docs, notes — into structured skill documents using vector search and LLM distillation.

## Why

Teams accumulate knowledge in scattered places — PR comments, incident notes, design docs, Slack threads. Most of it never gets consolidated into something reusable. `pydstl` automates the distillation: collect evidence from any source, run it through an LLM, and produce a clean markdown skill file that can be fed to AI coding assistants, onboarding docs, or team wikis.

## Install

```bash
uv add pydstl
```

## Quick Start

```python
from pydstl import Dstl

dstllr = Dstl(db_path="knowledge.db", model="google-gla:gemini-3-flash-preview")

# Collect evidence from anywhere
dstllr.add_evidence(
    "Always wrap network calls in try/except to handle timeouts gracefully",
    source={"author": "alice", "origin": "pr-review", "repo": "acme/api"}
)
dstllr.add_evidence(
    "Use structured logging (not print) in production — it makes debugging 10x easier",
    source={"author": "bob", "origin": "incident-retro", "date": "2024-09-15"}
)
dstllr.add_evidence(
    "Prefer composition over inheritance for service classes",
    source={"author": "carol", "origin": "design-doc"}
)

# Distill into a skill document
path = dstllr.distill(topic="python best practices", output_dir="skills")
# -> skills/python-best-practices.md

dstllr.close()
```

## API

Single class, minimal surface:

```python
dstllr = Dstl(db_path="my.db", model="google-gla:gemini-3-flash-preview")
```

| Method | What it does |
|---|---|
| `add_evidence(content, source?)` | Store text + metadata, embed and index for search |
| `retrieve(query, top_k=5)` | Vector similarity search across all evidence |
| `list_evidence(source_filter?)` | List/filter evidence by source metadata fields |
| `distill(topic?, skill_id?, output_dir?, evidence?)` | Synthesize evidence into a markdown skill doc via LLM |
| `consolidate(documents, topic?, skill_id?, output_dir?)` | Merge multiple skill docs into one |
| `edit_skill(skill_id, instruction?, content?, output_dir?)` | LLM-assisted or manual edit of an existing skill |
| `report_outcome(skill_id, success, notes?)` | Link feedback to a skill for future reference |

**Model-agnostic** — uses [pydantic-ai](https://ai.pydantic.dev) under the hood. Pass any supported model string:

```python
Dstl(db_path="my.db", model="google-gla:gemini-3-flash-preview")        # Gemini
Dstl(db_path="my.db", model="openai:gpt-4o")                            # OpenAI
Dstl(db_path="my.db", model="anthropic:claude-sonnet-4-20250514")       # Anthropic
```

Set the corresponding provider env var (`GEMINI_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).

**Storage** — SQLite + [sqlite-vec](https://github.com/asg017/sqlite-vec) for vector search. No external database needed.

**Embeddings** — local [sentence-transformers](https://sbert.net) (`all-MiniLM-L6-v2`). No API calls for embeddings.

## Development

```bash
uv sync --all-extras
uv run pytest tests/ -v
```

## License

MIT
