Metadata-Version: 2.4
Name: polars-whichlang
Version: 0.1.2
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Dist: polars==1.*
License-File: LICENSE
Summary: Language identification plugin for polars
Author-email: Rob Malouf <rmalouf@sdsu.edu>
License-Expression: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: repository, https://github.com/rmalouf/polars_whichlang

# polars-whichlang

[![PyPI version](https://img.shields.io/pypi/v/polars-whichlang.svg)](https://pypi.org/project/polars-whichlang/)


This polars plugin is a wrapper for [whichlang](https://github.com/quickwit-oss/whichlang), 
a very fast and reasonably accurate language identification library written in rust. 

It currently supports the following languages:
 
- Arabic (ara)
- Dutch (nld)
- English (eng)
- French (fra)
- German (deu)
- Hindi (hin)
- Italian (ita)
- Japanese (jpn)
- Korean (kor)
- Mandarin (cmn)
- Portuguese (por)
- Russian (rus)
- Spanish (spa)
- Swedish (swe)
- Turkish (tur)
- Vietnamese (vie)

## Installation

```
pip install polars-whichlang
```
## Examples

```python
import polars as pl
from polars_whichlang import detect_lang

df = pl.DataFrame(
    {
        "index": [1, 2, 3, 4],
        "text": [
            "This is a test.", 
            "Đây là một bài kiểm tra.", 
            "Dies ist ein Test", 
            "这是一个测试"
        ],
    }
)

df.with_columns(detect_lang('text').alias('lang'))
```

```
shape: (4, 3)
┌───────┬──────────────────────────┬──────┐
│ index ┆ text                     ┆ lang │
│ ---   ┆ ---                      ┆ ---  │
│ i64   ┆ str                      ┆ str  │
╞═══════╪══════════════════════════╪══════╡
│ 1     ┆ This is a test.          ┆ eng  │
│ 2     ┆ Đây là một bài kiểm tra. ┆ vie  │
│ 3     ┆ Dies ist ein Test        ┆ deu  │
│ 4     ┆ 这是一个测试               ┆ cmn  │
└───────┴──────────────────────────┴──────┘
```

