Metadata-Version: 2.4
Name: kanoniv
Version: 0.3.2
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Dist: pyarrow>=14.0 ; extra == 'bigquery'
Requires-Dist: duckdb>=1.0 ; extra == 'bigquery'
Requires-Dist: google-cloud-bigquery>=3.0 ; extra == 'bigquery'
Requires-Dist: db-dtypes>=1.0 ; extra == 'bigquery'
Requires-Dist: httpx>=0.27 ; extra == 'cloud'
Requires-Dist: pydantic>=2.0 ; extra == 'cloud'
Requires-Dist: pyarrow>=14.0 ; extra == 'databricks'
Requires-Dist: duckdb>=1.0 ; extra == 'databricks'
Requires-Dist: databricks-sql-connector>=3.0 ; extra == 'databricks'
Requires-Dist: pyarrow>=14.0 ; extra == 'dataplane'
Requires-Dist: duckdb>=1.0 ; extra == 'dataplane'
Requires-Dist: snowflake-connector-python>=3.0 ; extra == 'dataplane'
Requires-Dist: pytest>=8.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24 ; extra == 'dev'
Requires-Dist: respx>=0.22 ; extra == 'dev'
Requires-Dist: httpx>=0.27 ; extra == 'dev'
Requires-Dist: pydantic>=2.0 ; extra == 'dev'
Provides-Extra: bigquery
Provides-Extra: cloud
Provides-Extra: databricks
Provides-Extra: dataplane
Provides-Extra: dev
Summary: Identity resolution as code
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# kanoniv

Identity resolution as code. Define matching rules in YAML, reconcile locally in Python.

[![PyPI](https://img.shields.io/pypi/v/kanoniv)](https://pypi.org/project/kanoniv/)
[![License](https://img.shields.io/pypi/l/kanoniv)](https://github.com/kanoniv/kanoniv)

## Installation

```bash
pip install kanoniv
```

## Quick Start

```python
import kanoniv

# 1. Load your spec
spec = kanoniv.Spec.from_file("kanoniv.yml")

# 2. Validate it
result = kanoniv.validate(spec)
result.raise_on_error()

# 3. Load sources
sources = [
    kanoniv.Source.from_csv("crm", "data/crm_contacts.csv", primary_key="id"),
    kanoniv.Source.from_csv("billing", "data/billing_accounts.csv", primary_key="id"),
]

# 4. Reconcile
result = kanoniv.reconcile(sources, spec)

# 5. Golden records as a DataFrame
df = result.to_pandas()
print(f"{result.cluster_count} entities, {result.merge_rate:.0%} merge rate")
```

Every record in the output DataFrame gets a `kanoniv_id` — a stable identifier that groups duplicate records across sources into a single entity.

## What the Spec Covers

The YAML spec is the single source of truth for your identity resolution pipeline:

- **Sources** — canonical field mappings from each system
- **Blocking** — composite keys to reduce O(n²) comparisons
- **Scoring** — Fellegi-Sunter probabilistic matching with EM training
- **Normalizers** — email, phone, name, nickname, domain (built-in)
- **Survivorship** — golden record assembly rules (source priority, most complete)
- **Governance** — freshness checks, schema validation, shadow-mode deploys

See the [spec reference](https://kanoniv.com/docs/spec-reference/) for the full schema.

## Source Adapters

```python
# Pandas DataFrame
source = kanoniv.Source.from_pandas("crm", df, primary_key="contact_id")

# CSV file
source = kanoniv.Source.from_csv("billing", "data/billing.csv", primary_key="account_id")

# Warehouse table (requires sqlalchemy)
source = kanoniv.Source.from_warehouse(
    "erp", table="raw.erp_customers", connection_string="postgresql://..."
)

# dbt model (requires sqlalchemy)
source = kanoniv.Source.from_dbt("staging", model="stg_customers")
```

## Validation & Planning

```python
# Validate spec for errors
result = kanoniv.validate(spec)
if not result:
    print(result.errors)

# Preview the execution plan
plan = kanoniv.plan(spec)
print(plan.summary())
```

## Diffing Specs

```python
# Compare two spec versions
diff = kanoniv.diff(spec_v1, spec_v2)
print(diff.summary)
```

## Cloud API (Optional)

For managed reconciliation, monitoring, and collaboration, install with the cloud extra:

```bash
pip install kanoniv[cloud]
```

```python
client = kanoniv.Client(api_key="kn_...")

result = client.resolve(system="crm", external_id="003xxx")
entities = client.entities.search(q="john@acme.com")
```

See the [cloud API docs](https://kanoniv.com/docs/api-reference/) for the full reference.

## Links

- [Documentation](https://kanoniv.com/docs/)
- [Spec Reference](https://kanoniv.com/docs/spec-reference/)
- [GitHub](https://github.com/kanoniv/kanoniv)
- [Examples](https://github.com/kanoniv/kanoniv/tree/master/examples)

## License

Apache-2.0

