Metadata-Version: 2.4
Name: underthesea_core
Version: 2.0.0
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Summary: Underthesea Core
Home-Page: https://github.com/undertheseanlp/underthesea/tree/main/extensions/underthesea_core
Author-email: Vu Anh <anhv.ict91@gmail.com>
License: GPL-3.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/undertheseanlp/underthesea/tree/main/extensions/underthesea_core
Project-URL: Repository, https://github.com/undertheseanlp/underthesea/

# Underthesea Core

Underthesea Core is a powerful extension of the popular natural language processing library Underthesea, which includes a range of efficient data preprocessing tools and machine learning models for training. Built with Rust for optimal performance, Underthesea Core offers fast processing speeds and is easy to implement, with Python bindings for seamless integration into existing projects. This extension is an essential tool for developers looking to build high-performance NLP systems that deliver accurate and reliable results.

## Usage

CRFFeaturizer

```python
>>> from underthesea_core import CRFFeaturizer
>>> features = ["T[-1]", "T[0]", "T[1]"]
>>> dictionary = set(["sinh viên"])
>>> featurizer = CRFFeaturizer(features, dictionary)
>>> sentences = [[["sinh", "X"], ["viên", "X"], ["đi", "X"], ["học", "X"]]]
>>> featurizer.process(sentences)
[[['T[-1]=BOS', 'T[0]=sinh', 'T[1]=viên'],
  ['T[-1]=sinh', 'T[0]=viên', 'T[1]=đi'],
  ['T[-1]=viên', 'T[0]=đi', 'T[1]=học'],
  ['T[-1]=đi', 'T[0]=học', 'T[1]=EOS']]]
```
