Metadata-Version: 2.1
Name: iranlowo
Version: 0.0.0
Summary: Utility package for analysis & (pre)processing of Yorùbá text
Home-page: https://github.com/Niger-Volta-LTI/iranlowo
Author: Ruoho Ruotsi
Author-email: ruoho.ruotsi@gmail.com
License: MIT
Download-URL: https://github.com/Niger-Volta-LTI/iranlowo/archive/vNone.tar.gz
Description: [![Build Status](https://travis-ci.com/ruohoruotsi/iranlowo.svg?token=DjfQAQyyoxFCdeCmWju3&branch=master)](https://travis-ci.com/ruohoruotsi/iranlowo)
        
        # Ìrànlọ́wọ́
        Ìrànlọ́wọ́ is a set of utilities to analyze &amp; process Yorùbá text for NLP tasks. The initial focus is on help for diacritic restoration or machine translation.
        
        ## Features
        
        ### ADR tools
        * Strip all diacritics from word-types
        * Verify that text is NFC or NFD
        * Canonicalize a corpus (from MS Word or elsewhere) &rarr; NFC
        * Split long sentences on certain characters like `;`,`:`, etc
        * Compute a score of diacritic ambiguity in a given corpus
        * Find all variants of all word-type in a given corpus
        * Automatically restore correct diacritics using a pre-trained model
        * Partially strip diacritics from word-types
        
        ### Ready to use webpage scrapers
        * Bíbélì Mímọ́
        * Yoruba Bible - Bible Society of Nigeria
        * Yorùbá Blog
        * BBC Yorùbá
        
        ### Corpus analysis tools
        * Dataset scoring (proximity to correctly diacritized text, lm perplexity, KL divergence)
        * dataset character distribution
        * dataset ambuiguity statistics &rarr; Lexdif, etc
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown
