Metadata-Version: 2.4
Name: vezilka-schemas
Version: 0.1.5
Summary: Pydantic models for the scraped data storage for the Vezilka Project
Author-email: Daniel Ilievski <daniel.ilievski.2@students.finki.ukim.mk>
License-Expression: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Dynamic: license-file

# Vezilka Schemas

Pydantic models for managing structured scraped data in the Vezilka project.

## Features

- Type-safe models using Pydantic v2
- Built-in validation for data integrity
- Easy serialization to JSON and dictionary
- Designed for MongoDB, APIs, and RAG pipelines

## Package Structure

```
vezilka-schemas/
├── __init__.py              # Package exports
├── models.py                # Core Pydantic models
```

## Installation

```bash
pip install vezilka-schemas
```

For development installation and running tests, see [QUICKSTART.md](QUICKSTART.md)

## Models

### Record

Represents a single scraped item.

**Fields:**
- `id`: Unique identifier
- `text`: Full content text
- `type`: Content type (`RecordType`)
- `last_modified_at`: Last modification timestamp
- `meta`: Metadata object (`RecordMeta`)

### RecordMeta

Metadata associated with a record.

**Fields:**
- `source`: Source identifier (domain, filename, etc.)
- `url`: Original URL (optional)
- `tags`: List of tags (optional)
- `labels`: Additional labels (optional)
- `scraped_at`: Scraping timestamp

### RecordType

Enum defining record types:
- `NARRATIVE`: Articles, stories, documentation
- `HUMAN`: Speeches, transcripts, interviews

## Usage

### Creating a Record

```python
from vezilka_schemas import Record, RecordMeta, RecordType
from datetime import datetime

# Create metadata
meta = RecordMeta(
    source="mk.wikipedia.org",
    url="https://mk.wikipedia.org/wiki/Ѓаваткол",
    tags=["Историско-географски области", "Битола"],
    labels=[],
    scraped_at=datetime.now(),
)

# Create record
record = Record(
    id="wiki_1068030",
    text="Ѓаваткол е историско-географска област...",
    type=RecordType.NARRATIVE,
    last_modified_at=datetime.now(),
    meta=meta
)

# Serialize to JSON
json_str = record.to_json()

# Convert to dictionary
data_dict = record.to_dict()
```

### Creating from Dictionary

```python
from vezilka_schemas import Record

data = {
    "id": "speech_9f8e7d6c5b4a",
    "text": "Говорник Африм Гаши: Почитувани пратеници...",
    "type": "human",
    "last_modified_at": "2026-02-11",
    "meta": {
        "source": "stenogram_session_12.pdf",
        "url": "",
        "tags": [],
        "labels": [],
        "scraped_at": "2026-01-17T01:40:11"
    }
}

record = Record.from_dict(data)
```

### Validation

The models include built-in validation. For example, empty `id` or `text` will raise `ValidationError`.

## License

MIT License

## Contributing

Contributions are welcome! Please feel free to submit a pull request.
