Metadata-Version: 2.4
Name: wikipedia-article-transform
Version: 0.4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Summary: Extract clean text from Wikipedia article HTML using Rust core parser
Keywords: wikipedia,html,text-extraction,rust,pyo3
Author: Santhosh Thottingal
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Repository, https://github.com/santhoshtr/wikipedia-article-transform

# wikipedia-article-transform (Python)

Python bindings for the Rust `wikipedia-article-transform` library.

## Install (from source)

```sh
pip install maturin
maturin develop --release
```

## Library usage

```python
from wikipedia_article_transform import fetch_article_html, extract

html = fetch_article_html("en", "Rust_(programming_language)")
text = extract(html, format="plain", language="en")
print(text)
```

## CLI usage

```sh
wikipedia-article-transform fetch --language en --title "Rust_(programming_language)"
wikipedia-article-transform fetch --language ml --title "കേരളം" --format json
wikipedia-article-transform fetch --language en --title "Liquid_oxygen" --format markdown
```

