Metadata-Version: 2.4
Name: alta-acceleration
Version: 0.3.1
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic
Summary: High-performance Rust acceleration for ALTA Tokenizer (Kinyarwanda BPE)
Keywords: tokenizer,bpe,kinyarwanda,rust,nlp
Author-email: Yali Labs <yalilabs24@gmail.com>
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Issues, https://github.com/Nschadrack/Kin-Tokenizer/issues
Project-URL: Repository, https://github.com/Nschadrack/Kin-Tokenizer

# ALTA Acceleration

High-performance Rust acceleration for [ALTA Tokenizer](https://pypi.org/project/altatk/) — a BPE tokenizer for Kinyarwanda.

This package provides optional Rust-powered functions for faster BPE training, encoding, and text preprocessing. It is automatically detected and used by `altatk` when installed.

## Installation

```bash
pip install alta-acceleration
```

Or install with the main tokenizer:

```bash
pip install altatk
```

## What it accelerates

| Operation | Function | Speedup |
|-----------|----------|---------|
| BPE merge loop | `rust_bpe_train_loop`, `RustBpeTrainer.step` | 3–5x |
| File streaming + pretokenization | `rust_stream_pretokenize_file` | Memory-efficient |
| Text encoding | `rust_encode_text_raw_batch` | 4–5x |
| Text preprocessing | `rust_preprocess_text` | 3–5x |
| Pair counting | `rust_count_pairs` | Parallel via Rayon |

## Usage

You don't need to import this package directly. Once installed, `altatk` detects it automatically and uses the Rust-accelerated functions where available. If not installed, the tokenizer falls back to pure Python — no functionality is lost.

## License

MIT

