Metadata-Version: 2.4
Name: kganalytica
Version: 0.1.4
Summary: Knowledge graph construction, comparison, and similarity toolkit for LLM contextual understanding
Author-email: Kithuni <kithuni.21@email.com>, Chamath <chamathg.21@cse.mrt.ac.lk>, Mamta <nallaretnam.21@cse.mrt.ac.lk>, Subavarshana <subavarshanaa.21@cse.mrt.ac.lk>
License: MIT
Keywords: knowledge-graph,llm,evaluation,triplet-analysis,graph-similarity,information-extraction
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: License
Requires-Dist: numpy>=1.24
Requires-Dist: networkx>=3.0
Requires-Dist: pandas>=2.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: sentence-transformers>=2.2.2
Requires-Dist: torch>=2.0
Requires-Dist: openai
Requires-Dist: google-genai
Requires-Dist: groq
Provides-Extra: graph
Requires-Dist: grakel>=0.1.10; extra == "graph"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=4.0; extra == "dev"
Requires-Dist: ruff>=0.5.0; extra == "dev"
Dynamic: license-file

# kganalytica

**kganalytica** is a Python research library designed for
**constructing, analyzing, and comparing knowledge graphs generated from
natural language text**.

The library enables researchers and developers to transform unstructured
text such as paragraphs, documents, or model outputs into **structured
knowledge graphs (KGs)** represented as triplets:

subject → relation → object

By converting textual information into graph structures, the library
allows deeper analysis of factual information and relationships
contained within the text.

---

# Why This Library Is Important

Modern large language models generate answers in natural language, but
evaluating whether those answers **faithfully represent the underlying
facts** is challenging.

Traditional evaluation methods rely on:

- surface-level text similarity
- token overlap metrics
- embedding similarity

These methods often fail to capture **whether the factual relationships
are correct**.

Knowledge graphs provide a structured representation of information
where:

- entities represent real-world objects
- relations represent factual connections between entities

By constructing knowledge graphs from text, it becomes possible to
compare **facts rather than sentences**.

---

# What This Library Enables

Using **kganalytica**, a paragraph can be converted into a knowledge
graph containing multiple factual triplets.

Example:

Input paragraph:

Albert Einstein was born in Germany.\
He developed the theory of relativity.

Extracted knowledge graph:

Albert Einstein → born_in → Germany\
Albert Einstein → developed → theory_of_relativity

Once two texts are converted into knowledge graphs, the library can
compare them to measure **how similar their factual content is**.

This allows comparison between:

- context vs ground truth answers
- ground truth vs model-generated answers
- knowledge graphs extracted from different systems

---

# Core Capabilities

## Knowledge Graph Construction

Convert text paragraphs into structured knowledge graphs using LLM-based
extraction.

Supported inputs include:

- single paragraphs
- multiple paragraphs
- CSV datasets containing text fields

---

## Triplet-Level Graph Comparison

The library enables detailed comparison between two knowledge graphs by
identifying:

- aligned triplets
- missing triplets
- extra triplets
- entity mismatches
- relation mismatches

This allows precise identification of **where factual differences
occur**.

---

## Semantic and Structural Graph Similarity

kganalytica measures similarity between two knowledge graphs using a
hybrid approach that combines:

**Semantic similarity**\
Triplet meaning is compared using sentence embeddings.

**Structural similarity**\
Graph topology and relational structure are compared using graph
kernels.

This allows similarity measurement beyond simple text matching.

---

# Research Applications

The library is particularly useful for:

### LLM Evaluation

Evaluate whether a language model's output preserves the factual
relationships present in the context.

### Knowledge Graph Research

Analyze and compare graph structures generated from different extraction
models.

### Information Extraction Systems

Benchmark triplet extraction pipelines and analyze their errors.

---

# Installation

Install directly from PyPI:

```bash
pip install kganalytica
```

---

# Quick Example

```python
from kganalytica import construct_graph

paragraph = '''
Albert Einstein was born in Germany.
He developed the theory of relativity.
'''

graph = construct_graph(paragraph)

for t in graph.triples:
    print(t.subject, "->", t.relation, "->", t.object)
```

Output:

Albert Einstein -\> born_in -\> Germany\
Albert Einstein -\> developed -\> theory_of_relativity

---

# Graph Similarity Example

```python
from kganalytica import kg_similarity

kg1 = [
    ("Einstein", "born_in", "Germany"),
    ("Einstein", "developed", "Relativity")
]

kg2 = [
    ("Albert Einstein", "born in", "Germany"),
    ("Albert Einstein", "created", "Theory of Relativity")
]

score = kg_similarity(kg1, kg2)

print("Similarity:", score)
```

---

# Project Motivation

The library was developed to support research on **evaluating contextual
understanding in large language models through knowledge graph
comparison**.

By analyzing information at the **triplet level rather than the sentence
level**, researchers can better understand whether an AI system has
captured the correct relationships between entities.

---

# License

MIT License
