Metadata-Version: 2.4
Name: ai-parrot-loaders
Version: 0.1.6
Summary: Document loaders for AI-Parrot RAG pipelines
Author-email: Jesus Lara <jesuslara@phenobarbital.info>
License-Expression: MIT
Project-URL: Homepage, https://github.com/phenobarbital/ai-parrot
Project-URL: Repository, https://github.com/phenobarbital/ai-parrot
Project-URL: Documentation, https://github.com/phenobarbital/ai-parrot/
Keywords: ai,rag,document-loaders,pdf,youtube,agents,asyncio
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Framework :: AsyncIO
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: ai-parrot>=0.24.7
Requires-Dist: decorator>=5
Provides-Extra: youtube
Requires-Dist: pytube==15.0.0; extra == "youtube"
Requires-Dist: youtube_transcript_api==1.0.3; extra == "youtube"
Requires-Dist: yt-dlp>=2026.02.21; extra == "youtube"
Provides-Extra: audio
Requires-Dist: whisperx==3.4.2; extra == "audio"
Requires-Dist: av==16.1.0; extra == "audio"
Requires-Dist: resemblyzer==0.1.4; extra == "audio"
Requires-Dist: pyannote-audio==3.4.0; extra == "audio"
Requires-Dist: pyannote-core==5.0.0; extra == "audio"
Requires-Dist: pyannote-database==5.1.3; extra == "audio"
Requires-Dist: pyannote-metrics==3.2.1; extra == "audio"
Requires-Dist: pyannote-pipeline==3.0.1; extra == "audio"
Requires-Dist: torch-audiomentations==0.12.0; extra == "audio"
Requires-Dist: torch-pitch-shift==1.2.5; extra == "audio"
Requires-Dist: torchmetrics==1.8.2; extra == "audio"
Provides-Extra: pdf
Requires-Dist: paddleocr==3.2.0; extra == "pdf"
Provides-Extra: web
Requires-Dist: beautifulsoup4>=4.12; extra == "web"
Requires-Dist: html2text>=2024.0; extra == "web"
Provides-Extra: ebook
Requires-Dist: ebooklib>=0.19; extra == "ebook"
Provides-Extra: video
Requires-Dist: moviepy==2.2.1; extra == "video"
Requires-Dist: ffmpeg==1.4; extra == "video"
Provides-Extra: document
Requires-Dist: mammoth>=1.11.0; extra == "document"
Provides-Extra: ml
Requires-Dist: torch==2.6.0; extra == "ml"
Requires-Dist: torchaudio==2.6.0; extra == "ml"
Requires-Dist: torchvision==0.21.0; extra == "ml"
Requires-Dist: pytorch-lightning==2.5.5; extra == "ml"
Requires-Dist: pytorch-metric-learning==2.9.0; extra == "ml"
Requires-Dist: nvidia-cudnn-cu12==9.1.0.70; extra == "ml"
Provides-Extra: ml-heavy
Requires-Dist: torch==2.6.0; extra == "ml-heavy"
Requires-Dist: torchaudio==2.6.0; extra == "ml-heavy"
Requires-Dist: numpy<2.2,>=2.1; extra == "ml-heavy"
Requires-Dist: accelerate>=1.1.0; extra == "ml-heavy"
Requires-Dist: bitsandbytes==0.49.2; extra == "ml-heavy"
Requires-Dist: datasets>=3.0.2; extra == "ml-heavy"
Requires-Dist: transformers<=4.51.3,>=4.51.1; extra == "ml-heavy"
Requires-Dist: tensorflow>=2.19.1; extra == "ml-heavy"
Requires-Dist: tf-keras==2.19.0; extra == "ml-heavy"
Requires-Dist: opencv-python==4.10.0.84; extra == "ml-heavy"
Provides-Extra: all
Requires-Dist: ai-parrot-loaders[audio,document,ebook,ml,ml-heavy,pdf,video,web,youtube]; extra == "all"

# AI-Parrot Loaders

**ai-parrot-loaders** provides document loaders for [AI-Parrot](https://pypi.org/project/ai-parrot/) RAG (Retrieval-Augmented Generation) pipelines. Each loader transforms a specific document format into text chunks that can be embedded and searched.

## Installation

```bash
pip install ai-parrot-loaders
```

Install only the extras you need:

```bash
pip install ai-parrot-loaders[pdf]
pip install ai-parrot-loaders[youtube]
pip install ai-parrot-loaders[audio]
pip install ai-parrot-loaders[web]

# Everything
pip install ai-parrot-loaders[all]
```

## Available Extras

| Extra | Description |
|-------|-------------|
| `pdf` | PDF loading with OCR support (PaddleOCR) |
| `youtube` | YouTube transcript and video download |
| `audio` | Audio transcription (WhisperX, pyannote) |
| `web` | HTML/web page loading |
| `ebook` | EPUB e-book loading |
| `video` | Video processing (MoviePy, FFmpeg) |

## Supported Formats

| Loader | Format | Description |
|--------|--------|-------------|
| `TextLoader` | `.txt` | Plain text files |
| `CSVLoader` | `.csv` | CSV files |
| `ExcelLoader` | `.xlsx`, `.xls` | Excel spreadsheets |
| `MSWordLoader` | `.docx` | Microsoft Word documents |
| `HTMLLoader` | `.html` | HTML files |
| `MarkdownLoader` | `.md` | Markdown files |
| `PDFLoader` | `.pdf` | PDF documents |
| `PDFMarkdownLoader` | `.pdf` | PDF to Markdown conversion |
| `PDFTablesLoader` | `.pdf` | PDF table extraction |
| `PowerPointLoader` | `.pptx` | PowerPoint presentations |
| `EpubLoader` | `.epub` | EPUB e-books |
| `WebLoader` | URL | Web pages |
| `YoutubeLoader` | URL | YouTube video transcripts |
| `VimeoLoader` | URL | Vimeo video transcripts |
| `AudioLoader` | `.mp3`, `.wav`, etc. | Audio transcription |
| `VideoLoader` | URL | Video download + transcription |
| `VideoLocalLoader` | `.mp4`, etc. | Local video transcription |
| `DocumentConverterLoader` | multiple | Auto-detect format and convert |

## Quick Start

```python
from parrot_loaders.factory import get_loader_class

# Auto-detect loader by file extension
LoaderClass = get_loader_class("report.pdf")
loader = LoaderClass(source="report.pdf")
documents = await loader.load()

for doc in documents:
    print(doc.page_content[:200])
```

Or use a specific loader directly:

```python
from parrot_loaders.youtube import YoutubeLoader

loader = YoutubeLoader(source="https://www.youtube.com/watch?v=...")
documents = await loader.load()
```

## Requirements

- Python >= 3.11
- [ai-parrot](https://pypi.org/project/ai-parrot/) >= 0.23.18

## License

MIT
