Metadata-Version: 2.4
Name: quranic_nlp
Version: 1.2.1
Summary: quarnic nlp
Home-page: https://github.com/language-ml/hadith-quranic_nlp
Author: SMAJY
Author-email: s.m.aref.j@gmail.com
Maintainer: SMAJY
Maintainer-email: s.m.aref.j@gmail.com
License: MIT
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Arabic
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Requires-Python: >=3.7.0, <3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: spacy
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: openpyxl
Requires-Dist: requests
Requires-Dist: clint
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: maintainer
Dynamic: maintainer-email
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

<!-- <h1 align="center">
  <img src="images/dadmatech.jpeg"  width="150"  />
   Dadmatools
</h1> -->

<h2 align="center">QuranicTools: A Python NLP Library for Quranic NLP</h2>

<div align="center">
  <a href="https://pypi.org/project/quranic-nlp/"><img src="https://shields.io/pypi/v/quranic-nlp.svg"></a>
  <a href=""><img src="https://img.shields.io/badge/license-Apache%202-blue.svg"></a>
</div>

<div align="center">
  <h5>
      Part of Speech Tagging
    <span> | </span>
      Dependency Parsing
    <span> | </span>
      Lemmatizer
    <span> | </span>
      Multilingual Search    <br>
    <span> | </span>
      Quranic Extractions        
    <span> | </span>
      Revelation Order
    <span> | </span> <br>
      Embeddings (coming soon)
    <span> | </span>
      Translations    
  </h5>
</div>

# Quranic NLP

Quranic NLP is a computational toolbox to conduct various syntactic and semantic analyses of Quranic verses. The aim is to put together all available resources contributing to a better understanding/analysis of the Quran for everyone.

Contents:

- [Installation](#installation)
- [Pipeline (dep,pos,lem,root)](#pipeline)
- [Format Inputs](#format-inputs)
- [Example](#example)
- [Contributors](#contributors)
- [Contributing](#contributing)

## Installation

### Step 1 — Install the package

```bash
pip install quranic-nlp
```

### Step 2 — Download the data

The library requires data files (~97MB) that are downloaded separately from GitHub Releases:

```bash
quranic_data
```

Or from Python:

```python
from quranic_nlp.data_requirements import download_data
download_data()
```

Data is downloaded once and stored inside the package directory automatically.

## Development Setup

To set up a local development environment:

```bash
git clone https://github.com/language-ml/hadith-quranic_nlp.git
cd hadith-quranic_nlp
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -e .
quranic_data
```

## Pipeline

The NLP pipeline contains morphological information e.g., Lemmatizer as well as POS Tagger and Dependancy Parser in a `Spacy`-like pipeline.

```python
from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lemma'
nlp = language.Pipeline(pips, translation_translator)
```

[`Doc`](https://spacy.io/api/doc) object has different extensions.
First, there are `sentences` in `doc` referring to the verses.
Second, there are `ayah` in `doc` which is indicate number ayeh in soure.
Third, there are `surah` in `doc` which is indicate name of soure.
Fourth, there are `revelation_order` in `doc` which is indicate order of revelation of the ayeh.
`doc` which is the list of [`Token`](https://spacy.io/api/token) also has its own extensions.
The pips is information to use from quranic_nlp.
The translation_translator is language for translate quran such that language (fa) or language along with \# along with number books.
For see all translate run below code
```python
from quranic_nlp import utils
utils.print_all_translations()
```
Quranic NLP has its own spacy extensions. If related pipeline is not called, that extension cannot be used.

## Format Inputs

There are three ways to format the input:

1. number surah along with \# along with number ayah.
2. name surah along with \# along with number ayah.
3. search text in quran.

Note The last two calls require access to the internet for an API call.

```python
from quranic_nlp import language

translation_translator = 'fa#1'
pips = 'dep,pos,root,lemma'
nlp = language.Pipeline(pips, translation_translator)

doc = nlp('1#1')
doc = nlp('حمد#1')
doc = nlp('رب العالمین')
```

## Example
Two examples are provided below to demonstrate the usage of the library:

First, Displaying data from Surah Al-Fatiha, Verse 1:
```python

first_doc = nlp('1#1')
```


Second, Displaying data from Surah Aal-i-Imran, Verse 200:
```python

second_doc = nlp('3#200')
```

We have two functional sections that can be used with any input:

1. Verse Information: This section provides detailed information about a specific verse in the Quran. 
The information related to a verse is structured as follows:

    * Verse Text and Meaning: The text of the verse is provided along with its meaning or translation.
    * Similar Verses: Similar verses are included, following the format Surah#Verse, along with the name of the verse. These verses share similarities in content or theme.
    * Verse Order: The order of the verse within the Surah is mentioned.
    Revelation Order: The chronological order of the revelation of the verse is specified.

First, Displaying data from Surah Al-Fatiha, Verse 1:
```python
print(first_doc)
```
```python
بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِیمِ 
```
```python
print(first_doc._.text)
```
```python
بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ
```
```python
print(first_doc._.surah)
```
```python
فاتحه
```
```python
print(first_doc._.ayah)
```
```python
1
```
```python
print(first_doc._.revelation_order)
```
```python
63
```
```python
print(first_doc._.translations)
```
```python
ستايش خدا را كه پروردگار جهانيان است.
```
```python
print(first_doc._.sim_ayahs)
```
```python
['27#30', '1#3', '55#1', '41#2', '2#163', '59#22', '11#41', '12#92', '7#151', '24#20', '44#42', '6#118', '36#5', '26#191', '26#175', '26#159', '26#140', '26#122', '26#104', '26#68', '26#9', '26#217', '20#5', '19#88', '21#83', '24#10', '25#60', '15#49', '12#64', '19#93', '2#192', '4#96', '4#106', '30#5', '32#6', '9#99', '6#121', '78#37', '19#91', '22#34', '2#218', '19#85', '41#32', '22#28', '12#98', '19#96', '20#8', '19#18', '5#4', '2#64', '39#53', '22#40', '5#74', '3#157', '10#58', '52#28', '22#36', '19#78', '43#84', '50#33', '6#119', '5#98', '17#110', '19#87', '21#26', '9#27', '36#58', '49#10', '26#5', '16#18', '9#104', '7#180', '6#138', '3#129', '23#118', '7#49', '67#29', '20#109', '27#46', '19#92', '43#36', '67#28', '25#59', '19#69', '2#37', '21#112', '43#20', '24#14', '69#52', '56#96', '56#74', '11#73', '3#132', '24#5', '3#89', '42#5', '43#45', '36#15', '57#28', '48#14']
```
Second, Displaying data from Surah Aal-i-Imran, Verse 200:
```python
print(second_doc)
```
```python
يَا أَیُّهَا الَّذِينَ  آمَن اِصْبِر وَ صَابِر وَ رَابِط وَ اِتَّق اللَّهَ لَعَلَّکُمْ تُفْلِحُو اُو اُو اُو اُو اُو نَ 
```
```python
print(second_doc._.text)
```
```python
يَا أَيُّهَا الَّذِينَ آمَنُوا اصْبِرُوا وَصَابِرُوا وَرَابِطُوا وَاتَّقُوا اللَّهَ لَعَلَّكُمْ تُفْلِحُونَ
```
```python
print(second_doc._.surah)
```
```python
آل عمران
```
```python
print(second_doc._.ayah)
```
```python
200
```
```python
print(second_doc._.revelation_order)
```
```python
89
```
```python
print(second_doc._.translations)
```
```python
اى كسانى كه ايمان آورده‌ايد، شكيبا باشيد و ديگران را به شكيبايى فراخوانيد و در جنگها پايدارى كنيد و از خدا بترسيد، باشد كه رستگار شويد.
```
```python
print(second_doc._.sim_ayahs)
```
```python
['3#130', '5#35', '9#119', '33#70', '2#189', '8#45', '59#18', '22#77', '3#102', '3#123', '26#179', '26#163', '26#150', '26#144', '26#131', '26#126', '26#110', '26#108', '2#153', '49#1', '49#10', '2#278', '70#5', '5#90', '33#41', '61#2', '15#69', '57#28', '5#57', '7#69', '5#100', '2#183', '8#29', '5#11', '58#9', '52#16', '47#7', '8#20', '74#7', '5#8', '47#33', '8#27', '8#15', '7#87', '10#63', '61#10', '3#149', '3#100', '9#123', '33#69', '2#104', '2#172', '4#71', '16#127', '23#1', '5#87', '33#56', '62#10', '65#10', '49#12', '4#144', '63#9', '24#27', '2#208', '58#11', '64#14', '5#93', '5#105', '33#9', '27#53', '41#18', '5#88', '9#23', '61#14', '29#59', '16#42', '8#46', '60#13', '8#24', '4#59', '62#9', '49#6', '4#136', '5#51', '4#1', '5#2', '58#12', '4#29', '66#6', '5#94', '33#1', '2#254', '8#69', '87#14', '91#9', '33#49', '5#1', '2#21', '64#16', '9#34']
```

2. Word Information: In this section, you will find information specifically related to the words within a particular verse.
The information related to a verse is structured as follows:
    * Word Text: The actual text of the word.
    * Tag: The part-of-speech tag that describes the word's grammatical category.
    * Dependency: The dependency relationship of the word within the sentence.
    * Lemma: The base or dictionary form of the word.
    * Root: The root form of the word, which captures its core meaning.
    * Head: The head word to which the current word is dependent.
    * Arc Dep: The arc dependency label that represents the grammatical relationship between the head word and the current word.
    * Rel: The semantic or syntactic relationship between the head word and the current word.
First, Displaying data from Surah Al-Fatiha, Verse 1:
I will show third word in verse.
```python
word=first_doc[2]
print(word)
```
```python
اللَّهِ
```
```python
print(word.dep_)
```
```python
نعت
```
```python
print(word.head)
```
```python
رَّحِیمِ
```
```python
print(word.lemma_)
```
```python
ٱللَّه
```
```python
print(word.pos_)
from quranic_nlp import utils
print(utils.POS_UNI_FA[word.pos_])
```
```python
NOUN
اسم
```
```python
print(word._.dep_arc)
```
```python
LTR
```
```python
print(word._.root)
```
```python
اله
```

Second, Displaying data from Surah Aal-i-Imran, Verse 200:
I will show sixth word in verse.
```python
word=second_doc[5]
print(word)
```
```python
اِصْبِر
```
```python
print(word.dep_)
```
```python
الف زینت
```
```python
print(word.head)
```
```python
ا
```
```python
print(word.lemma_)
```
```python
صَبَرَ
```
```python
print(word.pos_)
from quranic_nlp import utils
print(utils.POS_UNI_FA[word.pos_])
```
```python
VERB
فعل
```
```python
print(word._.dep_arc)
```
```python
LTR
```
LTR : Left to Right
RTL : Right to Left
```python
print(word._.root)
```
```python
صبر
```



At the end, to jsonify the results you can use the following:

```python
dictionary = language.to_json(pips, doc)
print(dictionary)
```

```python
[{'id': 1, 'text': بِ, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'مجرور', 'arc': 'LTR', 'head': سْمِ}, {'id': 2, 'text': سْمِ, 'root': 'سمو', 'lemma': 'ٱسْم', 'pos': 'NOUN', 'rel': 'مضاف الیه ', 'arc': 'LTR', 'head': اللَّهِ}, {'id': 3, 'text': اللَّهِ, 'root': 'اله', 'lemma': 'ٱللَّه', 'pos': 'NOUN', 'rel': 'نعت', 'arc': 'LTR', 'head': رَّحِیمِ}, {'id': 4, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحْمَنِ}, {'id': 5, 'text': رَّحْمَنِ, 'root': 'رحم', 'lemma': 'رَّحْمَٰن', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحْمَنِ}, {'id': 6, 'text': ال, 'root': '', 'lemma': '', 'pos': 'INTJ', 'rel': 'تعریف', 'arc': 'RTL', 'head': رَّحِیمِ}, {'id': 7, 'text': رَّحِیمِ, 'root': 'رحم', 'lemma': 'رَّحِيم', 'pos': 'NOUN', 'rel': '', 'arc': None, 'head': رَّحِیمِ}]
```

To show the results you can use the following:
```python
from spacy import displacy
displacy.serve(doc, style="dep")
options = {"compact": True, "bg": "#09a3d5",
           "color": "white", "font": "xb-niloofar"}
displacy.serve(doc, style="dep", options=options)

```
![](./docs/image_readme/fig.png "")
![](./docs/image_readme/fig2.png "")

## Contributors

- Seyyed Mohammad Aref Jahanmir
- Alireza Sahebi
- Doratossadat Dastgheyb
- Erfan Mohammadi
- Mahdi Ahmadi
- Ehsaneddin Asgari

📧 Contact: asgari [dot] berkeley [dot] edu

## Contributing

We warmly welcome contributions from the community! Whether you are a researcher, developer, linguist, or simply passionate about the Quran and NLP, there are many ways to get involved:

| Area | How to Help |
|------|-------------|
| New features | New pipeline components, morphological analyses, or language support |
| Data quality | Corrections to POS tags, dependency parses, lemmas, or roots |
| Translations | Add or improve Quranic translations for underrepresented languages |
| Testing | Help increase test coverage |
| Bug reports | Open an issue if something doesn't work as expected |
| Documentation | Clearer examples, tutorials, or API docs |

To contribute, fork the repository, make your changes, and open a pull request. For larger changes, please open an issue first to discuss your idea.

We believe open collaboration leads to better tools for everyone. Every contribution, big or small, is valued and appreciated.

## Bibles
```python
import pickle

with open("./data_utils/bibles.pickle", "rb") as handle:
    bibles_dict = pickle.load(handle)
print(len(bibles_dict))
print(list(bibles_dict.keys())[0:10])
print(bibles_dict['01001009'])
```
```python
38555
['01001001', '01001002', '01001003', '01001004', '01001005', '01001006', '01001007', '01001008', '01001009', '01001010']
{'arb-x-bible-1993': 'وقال الله لتجتمع المياه تحت السماء الى مكان واحد ولتظهر اليابسة . وكان كذلك .', 'eng-x-bible-amplified': 'Then God said : “ Let the waters under the heavens be collected together into one place , and let the dry land appear . ” And it was so .', 'grc-x-bible-accented': 'και ειπεν ο θεος συναχθητω το υδωρ το υποκατω του ουρανου εις συναγωγην μιαν και οφθητω η ξηρα και εγενετο ουτως και συνηχθη το υδωρ το υποκατω του ουρανου εις τας συναγωγας αυτων και ωφθη η ξηρα'}
```
