Metadata-Version: 2.4
Name: cadar
Version: 0.1.10
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Operating System :: OS Independent
License-File: LICENSE
Summary: Canonicalization and Darija Representation - Bidirectional transliteration for Darija
Keywords: darija,arabic,transliteration,moroccan,nlp,darija-moroccan,romanization
Author: Ouail LAAMIRI
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/Oit-Technologies/CaDaR/issues
Project-URL: Changelog, https://github.com/Oit-Technologies/CaDaR/blob/main/CHANGELOG.md
Project-URL: Documentation, https://oit-technologies.github.io/CaDaR/
Project-URL: Homepage, https://github.com/Oit-Technologies/CaDaR
Project-URL: Repository, https://github.com/Oit-Technologies/CaDaR

# CaDaR: Moroccan Darija Transliteration

[![PyPI version](https://badge.fury.io/py/cadar.svg)](https://pypi.org/project/cadar/)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**High-performance bidirectional transliteration for Darija (Moroccan Arabic)**

CaDaR provides seamless conversion between Arabic script and Latin script for Moroccan Darija, with intelligent vowel handling and Darija-aware linguistic processing.

## Features

- 🔄 **Bidirectional**: Arabic ↔ Latin transliteration
- 🎯 **Smart Vowels**: Distinguishes long (aa, ee, ii) vs short (a, e, i) vowels
- 🇲🇦 **Darija Mode**: Readable informal writing style
- 🚀 **Fast**: Rust core with Python bindings
- ✨ **Standardization**: Normalize mixed-script text

## Installation

```bash
pip install cadar
```

## Quick Start

```python
import cadar

# Latin to Arabic
cadar.bizi2ara("salam 3likom")
# Output: "سلام عليكم"

# Arabic to Latin
cadar.ara2bizi("كيفاش داير؟")
# Output: "kifash dayer?"

# Smart vowel handling (v0.1.9+)
cadar.bizi2ara("imken")   # → "يمكن" (short 'e' skipped)
cadar.bizi2ara("salaam")  # → "سالام" (long 'aa' preserved)
cadar.bizi2ara("daba")    # → "دابا" (short 'a' written in Darija mode)
```

## API

### Simple Functions

```python
cadar.bizi2ara(text, darija="Ma")   # Latin → Arabic
cadar.ara2bizi(text, darija="Ma")   # Arabic → Latin
cadar.ara2ara(text, darija="Ma")    # Standardize Arabic
cadar.bizi2bizi(text, darija="Ma")  # Standardize Latin
```

### CaDaR Class

```python
processor = cadar.CaDaR(darija="Ma")
processor.bizi2ara("salam")  # Reusable processor
processor.ara2bizi("سلام")
```

## Smart Vowel Handling (v0.1.9+)

CaDaR intelligently handles vowels for natural Darija writing:

**Long vowels (aa, ee, ii, oo, uu)** → Single Arabic letter:
- `"salaam"` → `"سالام"` (long aa)
- `"kaatb"` → `"كاتب"` (long aa in middle)

**Short vowels** - Context-aware:
- `"imken"` → `"يمكن"` (middle 'e' skipped)
- `"salam"` → `"سالام"` (middle 'a' written - Darija mode)
- `"daba"` → `"دابا"` (middle 'a' written)

**Bidirectional preservation**:
- `"سلام"` → `"slam"` → `"سلام"` ✓ Perfect round-trip

## Use Cases

- **Chat Apps**: Support users in both scripts
- **Search**: Match queries regardless of script
- **NLP**: Normalize Darija text for ML
- **Data Processing**: Standardize mixed-script datasets

## Documentation

- **Full Docs**: [https://oit-technologies.github.io/CaDaR/](https://oit-technologies.github.io/CaDaR/)
- **GitHub**: [https://github.com/Oit-Technologies/CaDaR](https://github.com/Oit-Technologies/CaDaR)
- **Issues**: [GitHub Issues](https://github.com/Oit-Technologies/CaDaR/issues)

## License

MIT License - See [LICENSE](https://github.com/Oit-Technologies/CaDaR/blob/main/LICENSE)

---

Made with ❤️ for the Darija community

