Metadata-Version: 2.4
Name: pycharter
Version: 0.0.30
Summary: A Python package for data contract management with five core services: contract parsing, metadata storage, Pydantic generation, JSON Schema conversion, and runtime validation
Author-email: semantic developers <na@example.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/auscheng/pycharter
Project-URL: Documentation, https://auscheng.github.io/pycharter/
Project-URL: Repository, https://github.com/auscheng/pycharter
Project-URL: Issues, https://github.com/auscheng/pycharter/issues
Keywords: semantic,python,pydantic,contract,validation,schema
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: jsonschema>=4.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: alembic>=1.13.0
Requires-Dist: jsonata-python>=0.6.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: jupyterlab>=4.0.0; extra == "dev"
Requires-Dist: ipykernel>=6.0.0; extra == "dev"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "dev"
Requires-Dist: pymongo>=4.0.0; extra == "dev"
Requires-Dist: boto3>=1.26.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Requires-Dist: pymdown-extensions>=10.0.0; extra == "docs"
Requires-Dist: mike>=2.0.0; extra == "docs"
Provides-Extra: api
Requires-Dist: fastapi>=0.104.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "api"
Requires-Dist: pydantic-settings>=2.0.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: PyJWT>=2.8.0; extra == "api"
Requires-Dist: httpx>=0.24.0; extra == "api"
Provides-Extra: ui
Requires-Dist: fastapi>=0.104.0; extra == "ui"
Requires-Dist: uvicorn[standard]>=0.24.0; extra == "ui"
Requires-Dist: httpx>=0.24.0; extra == "ui"
Requires-Dist: aiofiles>=23.0.0; extra == "ui"
Provides-Extra: worker
Requires-Dist: pyspark>=3.5.0; extra == "worker"
Requires-Dist: redis>=5.0.0; extra == "worker"
Provides-Extra: etl
Requires-Dist: sshtunnel>=0.4.0; extra == "etl"
Requires-Dist: boto3>=1.26.0; extra == "etl"
Requires-Dist: google-cloud-storage>=2.0.0; extra == "etl"
Requires-Dist: azure-storage-blob>=12.0.0; extra == "etl"
Requires-Dist: openpyxl>=3.0.0; extra == "etl"
Requires-Dist: lxml>=4.9.0; extra == "etl"
Requires-Dist: pymongo>=4.0.0; extra == "etl"
Dynamic: license-file

# PyCharter

> **Data contract management and validation for Python: define schemas, enforce quality, and run ETL with contracts.**

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

---

## ⚡ Quick start (2 minutes)

Install, define a tiny contract, and validate one record. Copy-paste into a new terminal:

```bash
pip install pycharter
```

```python
from pycharter import from_dict, validate

# 1. Define a minimal contract (schema)
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
    },
    "required": ["name", "age"],
}

# 2. Build a validator and validate one record
Person = from_dict(schema, "Person")
result = validate(Person, {"name": "Alice", "age": 30})

print(result.is_valid)   # True
print(result.data.name) # Alice
```

Invalid data returns `result.is_valid == False` and `result.errors` with details. Next: use a **contract file** with `Validator.from_file("contract.yaml")`, or add **coercion/validation rules** and store contracts in a **metadata store** (see [Concepts](#-concepts) and [Installation](#-installation)).

---

## What is PyCharter?

**PyCharter** is a **Data Contract Management and Quality Assurance** platform for Python that enables you to define, store, version, enforce, and monitor data contracts throughout your data pipelines. At its core, PyCharter automatically converts JSON schemas into fully-functional Pydantic models, fully supporting the JSON Schema Draft 2020-12 standard, including all standard validation keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.), while also providing extensions for pre-validation coercion and post-validation checks. It handles nested objects, arrays, and custom validators, with all validation logic stored as data (not Python code). PyCharter also includes a comprehensive quality assurance module for monitoring data quality, tracking violations, and generating quality reports.

## What is PyCharter?

PyCharter is a **data contract-as-code** platform that brings structure, validation, and governance to data pipelines and applications. It provides:

- **Data Contract Definition & Management**: Define formal agreements that specify data structure, quality rules, and governance policies
- **Schema Registry**: Centralized storage and versioning of data schemas with support for schema evolution
- **Data Quality Enforcement**: Coercion rules (data transformation) and validation rules (business constraints) to ensure data integrity
- **Data Governance**: Track ownership, stewardship, and enforce data policies across your organization
- **Metadata Management**: Store and retrieve data about data (schemas, ownership, rules, lineage)
- **Runtime Validation**: Validate data against contracts in production pipelines, APIs, and ETL processes

### Data Terminology

PyCharter implements key data management concepts:

- **Data Contract**: Formal specification of data structure, quality rules, and governance policies that define the "contract" between data producers and consumers
- **Schema Registry**: Centralized repository for storing and versioning data schemas, enabling schema discovery and evolution tracking
- **Data Quality**: Coercion (pre-validation transformation) and validation (post-validation checks) rules that ensure data integrity
- **Data Governance**: Ownership tracking, stewardship management, and policy enforcement for data assets
- **Schema Evolution**: Versioning and migration capabilities that allow data structures to evolve over time while maintaining backward compatibility
- **Metadata Management**: Storage and retrieval of data about data, including schemas, ownership information, governance rules, and lineage

### Use Cases

- **Data Pipeline Validation**: Ensure data conforms to contracts before processing in ETL pipelines
- **API Contract Enforcement**: Validate API request/response data against defined contracts
- **Data Integration**: Standardize data formats across systems and services
- **Compliance & Governance**: Track ownership, enforce data policies, and maintain audit trails
- **Schema Registry**: Centralized schema management for microservices and data platforms
- **Data Quality Assurance**: Catch data quality issues early in the pipeline through automated validation

---

## 📐 Concepts

A short mental model so you know what to reach for.

| Concept | What it is | When you use it |
|--------|------------|------------------|
| **Schema** | The shape of the data (JSON Schema): types, required fields, nested objects. | When you only need structure (e.g. “this field is string, that one is integer”). |
| **Data contract** | Schema + **coercion rules** (e.g. string → int) + **validation rules** (e.g. min/max, allowed values) + optional **metadata** (ownership, governance). | When you want one artifact that defines structure, transforms, and business rules. |
| **Metadata store** | A database (SQLite, PostgreSQL, etc.) that stores contracts (and their versions) so many apps can reuse them. | When you have multiple services or pipelines and want a single source of truth. |
| **ETL validation** | Validating data **after extract** (source) and **before load** (target) in a pipeline, using a schema or contract. | When you run ETL and want to reject or quarantine bad rows at stage boundaries. |

Flow from “just validate” to “contracts in a store and ETL”:

```
  ┌─────────────────────────────────────────────────────────────────────────┐
  │  Option A: No database                                                   │
  │  Schema/contract in code or YAML file  →  Validator  →  validate(data)  │
  └─────────────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────────────────┐
  │  Option B: With metadata store                                           │
  │  Contract in DB  →  Validator(store=..., schema_id=...)  →  validate()  │
  └─────────────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────────────────┐
  │  Option C: ETL pipeline                                                  │
  │  Extract  →  [optional: validate with contract]  →  Transform  →         │
  │  [optional: validate with contract]  →  Load                            │
  └─────────────────────────────────────────────────────────────────────────┘
```

Start with **Option A** (Quick start above); add **metadata store** when you need versioned, shared contracts; add **ETL validation** when you run pipelines and want contract checks at extract/load.

---

## ✨ Features

- 🚀 **Dynamic Model Generation** - Convert JSON schemas to Pydantic models at runtime
- 📋 **JSON Schema Compliant** - Full support for JSON Schema Draft 2020-12 standard
- 🔄 **Type Coercion** - Automatic type conversion before validation (e.g., string → integer)
- ✅ **Custom Validators** - Built-in and extensible validation rules
- 🏗️ **Nested Structures** - Full support for nested objects and arrays
- 📦 **Multiple Input Formats** - Load schemas from dicts, JSON strings, files, or URLs
- 🎯 **Type Safe** - Full type hints and Pydantic v2 compatibility
- 🔧 **Extensible** - Register custom coercion and validation functions
- 📊 **Data-Driven** - All validation logic stored as JSON data, not Python code
- 📝 **Data Contract Management** - Complete lifecycle management for data contracts with versioning
- 💾 **Schema Registry** - Centralized schema storage with support for PostgreSQL, MongoDB, Redis, and InMemory
- 🏛️ **Data Governance** - Track ownership, stewardship, and enforce governance policies
- 🔍 **Metadata Management** - Store and query metadata about your data assets
- 📈 **Schema Evolution** - Version schemas and track changes over time
- ✅ **Quality Assurance** - Monitor data quality, calculate metrics, track violations, and generate reports
- 📊 **Quality Metrics** - Calculate quality scores, accuracy, completeness, and violation rates
- 🚨 **Threshold Alerting** - Set quality thresholds and get alerts when quality degrades

## 📦 Installation

### Core Library

```bash
pip install pycharter
```

### With API Support

```bash
pip install pycharter[api]
```

This installs FastAPI and Uvicorn for running the REST API server.

### With UI Support

```bash
pip install pycharter[ui]
```

This installs the Python dependencies and **pre-built UI static files** (like Airflow).

**After installation, you can immediately start the UI:**

```bash
pycharter ui serve    # Production mode (uses pre-built static files)
```

**For development** (if you have the source code):

```bash
cd ui
npm install          # Install Node.js dependencies
pycharter ui dev     # Development mode with hot reload
```

**Note**: When installed from pip, the UI works immediately without Node.js. For development, Node.js is required. See `ui/INSTALLATION.md` for detailed instructions.

### Database setup (for metadata store, API, and seed data)

If you use the **metadata store** (SQLite or PostgreSQL), the **REST API**, or **seed data**, initialize the database once:

```bash
# Initialize database schema (SQLite default: sqlite:///pycharter.db)
pycharter db init

# Seed reference data (owners, domains, systems, environments, data feeds, compliance frameworks, tags)
pycharter db seed
```

**From the project source** (e.g. after cloning the repo), you can run the full seed script to also load example aviation contracts into the metadata store:

```bash
python bin/seed                    # seed data/seed + data/aviation_examples
python bin/seed --skip-aviation    # seed only data/seed
```

Use `pycharter db init --force` to drop and recreate all tables (SQLite only; **destroys existing data**). See [Configuration Guide](docs/guides/configuration.md) for connection options and migrations.

## 🚀 Quick Start

### Quick Start: ETL Pipelines

Build and run ETL pipelines programmatically (with the `|` operator) or from YAML configs. Pipeline **run()** is async; use `asyncio.run()` from scripts or `await` in async code.

```python
import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, AddField

# Programmatic pipeline
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/data"))
    | Rename({"old": "new"})
    | AddField("processed_at", "now()")
    | PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())

# Config-driven: explicit files
pipeline = Pipeline.from_config_files(
    extract="configs/extract.yaml",
    load="configs/load.yaml",
    variables={"API_KEY": "secret"}
)

# Config-driven: directory (extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")

# Config-driven: single file
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")

result = asyncio.run(pipeline.run())
```

See **ETL Pipelines** under Core Services for error handling (`error_context`, `ErrorMode`) and variable resolution (`PipelineContext(variables={...})`).

### Quick Start: Convenience Functions (One-off Use)

```python
from pycharter import from_dict, validate

# Define your JSON schema
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"}
    },
    "required": ["name", "age"]
}

# Generate a Pydantic model (convenience function)
Person = from_dict(schema, "Person")

# Validate data
result = validate(Person, {"name": "Alice", "age": 30, "email": "alice@example.com"})
if result.is_valid:
    print(f"Valid: {result.data.name}")  # Output: Valid: Alice
```

### Production Use: Validator Class (Recommended)

For production code with multiple validations, use the `Validator` class for better performance. Create validators via **factory methods** or from a metadata store:

```python
from pycharter import Validator

# From directory (expects schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")

# From explicit files (any filenames)
validator = Validator.from_files(
    schema="schemas/user.yaml",
    coercion_rules="rules/coercion.yaml",
    validation_rules="rules/validation.yaml"
)

# From a single contract file
validator = Validator.from_file("user_contract.yaml")

# From dictionaries
validator = Validator.from_dict(schema={...}, coercion_rules={...}, validation_rules={...})

# From metadata store (with database)
validator = Validator(store=store, schema_id="user_schema_v1")

# Validate multiple records efficiently (model is cached)
result1 = validator.validate({"name": "Alice", "age": 30})
result2 = validator.validate({"name": "Bob", "age": 25})

# Batch validation
results = validator.validate_batch([data1, data2, data3])
```

### With Metadata Store

```python
from pycharter import Validator, SQLiteMetadataStore

# Connect to metadata store
store = SQLiteMetadataStore("metadata.db")
store.connect()

# Create validator from store
validator = Validator(store=store, schema_id="user_schema_v1")

# Validate data
result = validator.validate({"name": "Alice", "age": 30})
```

## 📐 API Organization

PyCharter's API is organized into **three tiers** to help you choose the right approach for your use case:

### Tier 1: Primary Interfaces (⭐ Recommended for Production)

**Classes** that provide the best performance and most features:
- **`Validator`** - Primary validation interface (use for multiple validations); create via `from_dir()`, `from_files()`, `from_file()`, `from_dict()` or from store
- **`Pipeline`** - ETL pipeline (programmatic or config-driven); create via `from_config_files()`, `from_config_dir()`, `from_config_file()` or constructor
- **`QualityCheck`** - Primary quality assurance interface
- **`MetadataStoreClient`** - Base class for metadata stores

**When to use**: Production code, batch processing, when you need to validate multiple records or run ETL pipelines.

### Tier 2: Convenience Functions (Quick Start)

**Functions** that make common tasks easy and discoverable:
- **Input helpers**: `from_dict()`, `from_file()`, `from_json()`, `from_url()`
- **Output helpers**: `to_dict()`, `to_file()`, `to_json()`
- **Validation helpers**: `validate_with_store()`, `validate_with_contract()`
- **Contract helpers**: `parse_contract_file()`, `build_contract()`

**When to use**: Quick scripts, one-off validations, exploratory work, learning the library.

### Tier 3: Low-Level Utilities

**Functions** for when you already have models or need fine-grained control:
- **`validate()`** - Validate with existing Pydantic model
- **`validate_batch()`** - Batch validate with existing model
- **`model_to_schema()`** - Core conversion function

**When to use**: Advanced use cases, when you've already generated models, custom workflows.

### Choosing the Right Approach

| Use Case | Recommended Approach | Example |
|----------|---------------------|---------|
| **Production pipeline with multiple validations** | `Validator` class | `validator = Validator(store=store, schema_id="schema"); validator.validate(data)` |
| **Quick one-off validation** | Convenience function | `validate_with_contract("contract.yaml", data)` |
| **You already have a model** | Low-level function | `validate(UserModel, data)` |
| **Batch processing** | `Validator.validate_batch()` | `validator.validate_batch([data1, data2, data3])` |

## 🏗️ Core Services & Data Production Journey

PyCharter provides **eight core services** that work together to support a complete data production journey, from contract specification to quality assurance. Each service plays a critical role in managing data contracts and ensuring data quality throughout your pipeline.

### The Data Production Journey

The typical data production workflow follows this path:

```
1. Data Contract Specification
   ↓
2. Contract Parsing
   ↓
3. Metadata Storage
   ↓
4. Pydantic Model Generation
   ↓
5. Runtime Validation
   ↓
6. Quality Assurance & Monitoring
```

### 1. 📄 Contract Parser (`pycharter.contract_parser`)

**Purpose**: Reads and decomposes data contract files into structured metadata components.

**When to Use**: At the beginning of your data production journey, when you have data contract files (YAML or JSON) that need to be processed and understood.

**How It Works**:
- Accepts data contract files containing schema definitions, governance rules, ownership information, and metadata
- Decomposes the contract into distinct components: `schema`, `governance_rules`, `ownership`, and `metadata`
- Returns a `ContractMetadata` object that separates concerns and makes each component accessible
- Extracts and tracks versions of all components

**Example**:
```python
from pycharter import parse_contract_file, ContractMetadata

# Parse a contract file (YAML or JSON)
metadata = parse_contract_file("data_contract.yaml")

# Access decomposed components
schema = metadata.schema              # JSON Schema definition
governance = metadata.governance_rules # Governance policies
ownership = metadata.ownership         # Owner/team information
metadata_info = metadata.metadata      # Additional metadata
versions = metadata.versions          # Component versions
```

**Contribution to Journey**: The contract parser is the **entry point** that takes raw contract specifications and prepares them for downstream processing. It ensures that contracts are properly structured and that all components (schema, governance, ownership) are separated for independent handling.

---

### 1b. 🏗️ Contract Builder (`pycharter.contract_builder`)

**Purpose**: Constructs consolidated data contracts from separate artifacts (schema, coercion rules, validation rules, metadata).

**When to Use**: When you have separate artifacts stored independently and need to combine them into a single consolidated contract for runtime validation or distribution.

**How It Works**:
- Takes separate artifacts (schema, coercion rules, validation rules, metadata, ownership, governance rules)
- Merges coercion and validation rules into the schema
- Tracks versions of all components
- Produces a consolidated contract suitable for runtime validation
- Can build from artifacts directly or retrieve from metadata store

**Example**:
```python
from pycharter import build_contract, build_contract_from_store, ContractArtifacts

# Build from separate artifacts
artifacts = ContractArtifacts(
    schema={"type": "object", "version": "1.0.0", "properties": {...}},
    coercion_rules={"version": "1.0.0", "rules": {"age": "coerce_to_integer"}},
    validation_rules={"version": "1.0.0", "rules": {"age": {"is_positive": {...}}}},
    metadata={"version": "1.0.0", "description": "User contract"},
    ownership={"owner": "data-team", "team": "engineering"},
)

contract = build_contract(artifacts)
# Contract now has:
# - schema (RAW - rules NOT merged into it)
# - coercion_rules, validation_rules (separate)
# - metadata, ownership, governance_rules
# - versions tracking all components

# Or build from metadata store
contract = build_contract_from_store(store, "user_schema_v1")

# Use for validation - Validator merges rules internally
from pycharter import validate_with_contract
result = validate_with_contract(contract, {"name": "Alice", "age": "30"})
```

**Contribution to Journey**: The contract builder is the **consolidation layer** that combines separate artifacts (stored independently in the database) into a single contract artifact. The contract contains **raw schema + separate rules** (not merged). The Validator class handles merging internally during validation, keeping the contract structure clear and editable.

---

### 2. 💾 Metadata Store Client (`pycharter.metadata_store`)

**Purpose**: Manages persistent storage and retrieval of decomposed metadata in databases.

**When to Use**: After parsing contracts, when you need to store metadata components (schemas, governance rules, ownership) in a database for versioning, querying, and governance.

**How It Works**:
- Provides methods to store and retrieve schemas, governance rules, ownership information, and metadata
- Supports versioning and querying of stored metadata
- Multiple implementations available: PostgreSQL, MongoDB, Redis, and In-Memory (for testing)

**Available Implementations**:
- **PostgresMetadataStore** - For PostgreSQL databases (recommended for production)
- **SQLiteMetadataStore** - For SQLite databases (great for development and small deployments)
- **MongoDBMetadataStore** - For MongoDB databases
- **RedisMetadataStore** - For Redis databases
- **InMemoryMetadataStore** - For testing and development (no persistence)

**Example**:
```python
from pycharter import SQLiteMetadataStore, parse_contract_file

# Parse contract
metadata = parse_contract_file("contract.yaml")

# Use SQLite metadata store (or PostgresMetadataStore, MongoDBMetadataStore, RedisMetadataStore, etc.)
store = SQLiteMetadataStore("metadata.db")
store.connect()

# Store decomposed components
schema_id = store.store_schema("user_schema", metadata.schema, version="1.0")

# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules

# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")

# Retrieve later
stored_schema = store.get_schema(schema_id)
coercion_rules = store.get_coercion_rules(schema_id)
validation_rules = store.get_validation_rules(schema_id)
```

**Contribution to Journey**: The metadata store is the **persistence layer** that ensures contracts and their components are versioned, searchable, and accessible across your organization. It enables governance, audit trails, and schema evolution tracking.

**See [Configuration Guide](docs/guides/configuration.md) for database setup and initialization instructions.**

---

### 3. 🏭 Pydantic Generator (`pycharter.pydantic_generator`)

**Purpose**: Dynamically generates fully-functional Pydantic models from JSON Schema definitions.

**When to Use**: After storing schemas (or directly from parsed contracts), when you need to generate Python models for type-safe data validation and processing.

**How It Works**:
- Takes JSON Schema definitions (from contracts or metadata store)
- Programmatically generates Pydantic model classes at runtime
- Supports all JSON Schema Draft 2020-12 features plus custom coercions and validations
- Can generate models from dictionaries, JSON strings, files, or URLs
- Optionally generates Python files with model definitions

**Example**:
```python
from pycharter import from_dict, generate_model_file, MetadataStoreClient

# Option 1: Generate from parsed contract
metadata = parse_contract_file("contract.yaml")
UserModel = from_dict(metadata.schema, "User")

# Option 2: Generate from stored schema
client = MetadataStoreClient(...)
schema = client.get_schema("user_schema_v1")
UserModel = from_dict(schema, "User")

# Option 3: Generate and save to file
generate_model_file(schema, "user_model.py", "User")
```

**Contribution to Journey**: The Pydantic generator is the **transformation engine** that converts declarative JSON Schema definitions into executable Python models. It bridges the gap between contract specifications (data) and runtime validation (code), enabling type-safe data processing.

---

### 4. 🔄 JSON Schema Converter (`pycharter.json_schema_converter`)

**Purpose**: Converts existing Pydantic models back into JSON Schema format (reverse conversion).

**When to Use**: When you have existing Pydantic models and need to generate JSON Schema definitions, or when you want to round-trip between schemas and models.

**How It Works**:
- Takes Pydantic model classes as input
- Generates JSON Schema dictionaries that represent the model structure
- Preserves validation rules, types, and constraints
- Can output to dictionaries, JSON strings, or files

**Example**:
```python
from pycharter import to_dict, to_file, to_json
from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool = True

# Convert to JSON Schema
schema = to_dict(Product)
json_string = to_json(Product)
to_file(Product, "product_schema.json")

# Now you can use the schema with other services
ProductModel = from_dict(schema, "Product")  # Round-trip
```

**Contribution to Journey**: The JSON Schema converter enables **bidirectional conversion** between models and schemas. It's useful for:
- Generating schemas from existing code
- Round-trip validation (schema → model → schema)
- Integrating with systems that require JSON Schema format
- Documenting existing models as schemas

---

### 5. ✅ Runtime Validator (`pycharter.runtime_validator`)

**Purpose**: Lightweight validation utility for validating data against generated Pydantic models in production data pipelines.

**When to Use**: In your data processing scripts, ETL pipelines, API endpoints, or any place where you need to validate incoming data against contract specifications.

**API Organization**:

PyCharter provides validation through three tiers:

1. **Tier 1: Validator Class** (⭐ **PRIMARY INTERFACE** - Recommended for production)
   - Best performance for multiple validations (model is cached)
   - Supports all data sources (contract files, directories, stores, dictionaries)
   - Reusable instance for batch processing

2. **Tier 2: Convenience Functions** (Quick start - one-off validations)
   - `validate_with_store()` - Quick validation with metadata store
   - `validate_with_contract()` - Quick validation with contract file/dict
   - `get_model_from_store()` / `get_model_from_contract()` - Get model for reuse

3. **Tier 3: Low-Level Functions** (When you already have a model)
   - `validate()` - Validate single record with existing model
   - `validate_batch()` - Batch validate with existing model

**How It Works**:
- Takes a Pydantic model (generated from a schema) and raw data
- Validates data against the model's constraints
- Returns a `ValidationResult` with validation status, validated data, and errors
- Supports single record and batch validation
- Can be used in strict mode (raises exceptions) or lenient mode (returns results)

**Example - Validator Class (Recommended)**:
```python
from pycharter import Validator, SQLiteMetadataStore

# Option 1: From directory (schema.yaml, coercion_rules.yaml, validation_rules.yaml)
validator = Validator.from_dir("data/contracts/user")
result = validator.validate({"name": "Alice", "age": 30})

# Option 2: From explicit files
validator = Validator.from_files(schema="schemas/user.yaml", coercion_rules="rules/coercion.yaml")
result = validator.validate({"name": "Alice", "age": 30})

# Option 3: From single contract file
validator = Validator.from_file("user_contract.yaml")
result = validator.validate({"name": "Alice", "age": 30})

# Option 4: From metadata store (with database)
store = SQLiteMetadataStore("metadata.db")
store.connect()
validator = Validator(store=store, schema_id="user_schema_v1")
result = validator.validate({"name": "Alice", "age": 30})

# Batch validation (efficient - model cached)
results = validator.validate_batch([data1, data2, data3])
```

**Example - Convenience Functions (Quick Start)**:
```python
from pycharter import validate_with_store, validate_with_contract, SQLiteMetadataStore

# Quick validation with store
store = SQLiteMetadataStore("metadata.db")
store.connect()
result = validate_with_store(store, "user_schema_v1", {"name": "Alice", "age": 30})

# Quick validation with contract file (no database)
result = validate_with_contract("user_contract.yaml", {"name": "Alice", "age": 30})
```

**Example - Low-Level (When You Have a Model)**:
```python
from pycharter import from_dict, validate, validate_batch

# Generate model
UserModel = from_dict(schema, "User")

# Validate single record
result = validate(UserModel, {"name": "Alice", "age": 30})

# Batch validate
results = validate_batch(UserModel, [data1, data2, data3])
```

**Performance Tips**:
- ⚡ **For multiple validations**: Use `Validator` class (model is cached)
- ⚡ **For one-off validations**: Convenience functions are fine
- ⚡ **For batch processing**: Use `Validator.validate_batch()` or `validate_batch()`

**Contribution to Journey**: The runtime validator is the **enforcement layer** that ensures data quality in production. It validates actual data against contract specifications, catching violations early and preventing bad data from propagating through your systems. It supports both database-backed workflows (for production systems with metadata stores) and contract-based workflows (for simpler use cases without database dependencies).

---

### 5b. 🔄 ETL Pipelines (`pycharter.etl_generator`)

**Purpose**: Build and run ETL pipelines programmatically (with the `|` operator) or from YAML configs. No assumptions about project layout—you specify file paths or use a directory with standard filenames.

**When to Use**: When you need to extract, transform, and load data from config-driven or code-defined pipelines (HTTP, files, databases, cloud storage → transforms → Postgres, files, cloud).

**How It Works**:
- **Programmatic**: `Pipeline(extractor) | transformer | loader`; chain with `|`; call `await pipeline.run()`.
- **Config-driven**: Load from explicit files (`from_config_files`), from a directory with `extract.yaml`, `transform.yaml`, `load.yaml` (`from_config_dir`), or from a single `pipeline.yaml` (`from_config_file`).
- **Variables**: Pass `PipelineContext(variables={"API_KEY": "x"})` or `variables={...}` in factory methods; `${VAR}` and `${VAR:-default}` in configs are resolved from these (no built-in `CONTRACT_DIR`).
- **Async**: `run()` is async; use `asyncio.run(pipeline.run())` in scripts or `await pipeline.run()` in async code.
- **Error handling**: Optional `error_context` with `ErrorMode` (STRICT, LENIENT, COLLECT) controls whether extraction/load failures raise or are collected in `result.errors`.

**Example**:
```python
import asyncio
from pycharter import Pipeline, PipelineContext, HTTPExtractor, PostgresLoader, Rename, AddField

# Programmatic
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/users"))
    | Rename({"userName": "name"})
    | AddField("processed_at", "now()")
    | PostgresLoader(connection_string="...", table="users")
)
result = asyncio.run(pipeline.run())

# Config-driven (explicit files)
pipeline = Pipeline.from_config_files(
    extract="configs/extract.yaml",
    load="configs/load.yaml",
    variables={"API_KEY": "secret"}
)

# Config-driven (directory: extract.yaml, transform.yaml, load.yaml)
pipeline = Pipeline.from_config_dir("pipelines/users/")

# Config-driven (single file)
pipeline = Pipeline.from_config_file("pipelines/users/pipeline.yaml")

result = asyncio.run(pipeline.run())
```

**Exceptions**: Pipeline and config loading use PyCharter’s exception hierarchy: `PyCharterError` (base), `ConfigError`, `ConfigValidationError`, `ExpressionError`. See **Exceptions** under API Reference.

**See** `pycharter/etl_generator/ASYNC_AND_EXECUTION.md` for async usage and error modes.

---

### 6. 🔍 Quality Assurance (`pycharter.quality`)

**Purpose**: Data quality assurance pipeline that polices data according to data contracts, calculates quality metrics, tracks violations, and generates quality reports.

**When to Use**: When you need to:
- Monitor data quality over time
- Calculate quality scores and metrics
- Track and manage data quality violations
- Set quality thresholds and get alerts
- Generate quality reports for governance

**How It Works**:
- Validates data against contracts (using Runtime Validator)
- Calculates quality metrics (accuracy, completeness, violation rates)
- Tracks violations for audit and remediation
- Checks quality thresholds and generates alerts
- Produces comprehensive quality reports

**Example**:
```python
from pycharter import QualityCheck, QualityCheckOptions, QualityThresholds

# Define quality thresholds
thresholds = QualityThresholds(
    min_overall_score=95.0,
    max_violation_rate=0.05
)

# Run quality check
check = QualityCheck(store=store)
report = check.run(
    schema_id="user_schema_v1",
    data="data/users.json",
    options=QualityCheckOptions(
        calculate_metrics=True,
        record_violations=True,
        check_thresholds=True,
        thresholds=thresholds
    )
)

print(f"Quality Score: {report.quality_score.overall_score:.2f}/100")
print(f"Passed: {report.passed}")
```

**Contribution to Journey**: The quality assurance module is the **policing layer** that ensures data quality is maintained throughout the pipeline. It provides metrics, tracking, and alerting capabilities that transform PyCharter from a contract management tool into a complete data quality assurance platform.

**See [Quality Module README](pycharter/quality/README.md) for detailed documentation.**

---

### Complete Workflow Example

Here's how all services work together in a complete data production journey:

```python
from pycharter import (
    parse_contract_file,
    SQLiteMetadataStore,
    from_dict,
    Validator,
    to_dict
)

# Step 1: Parse contract specification
metadata = parse_contract_file("user_contract.yaml")

# Step 2: Store metadata in database
store = SQLiteMetadataStore("metadata.db")
store.connect()
schema_id = store.store_schema("user", metadata.schema, version="1.0")

# Merge ownership and governance into metadata before storing
# Ownership and governance are part of metadata, not separate entities
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules

# Store metadata once with all information (ownership and governance included)
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

# Store coercion and validation rules
store.store_coercion_rules(schema_id, {"age": "coerce_to_integer"}, version="1.0")
store.store_validation_rules(schema_id, {"age": {"is_positive": {}}}, version="1.0")

# Step 3: Generate Pydantic model from stored schema
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")

# Step 4: (Optional) Convert model back to schema for documentation
schema_doc = to_dict(UserModel)

# Step 5: Validate data in production pipeline
# Option A: Using Validator class (recommended for production)
validator = Validator(store=store, schema_id=schema_id)

def process_user_data(raw_data):
    result = validator.validate(raw_data)
    if result.is_valid:
        # Process validated data
        return result.data
    else:
        # Handle validation errors
        raise ValueError(f"Invalid data: {result.errors}")

# Option B: Using convenience function (quick start)
from pycharter import validate_with_store

def process_user_data_quick(raw_data):
    result = validate_with_store(store, schema_id, raw_data)
    if result.is_valid:
        return result.data
    else:
        raise ValueError(f"Invalid data: {result.errors}")
```

---

### 7. 🌐 REST API (`pycharter.api`)

**Purpose**: Expose all PyCharter services as REST API endpoints.

**When to Use**: When you need to use PyCharter from non-Python applications, microservices, or want to provide a web-based interface.

**How It Works**:
- Provides HTTP endpoints for all core services
- Uses FastAPI for automatic OpenAPI/Swagger documentation
- Supports both store-based and contract-based operations
- Handles request/response validation with Pydantic models
- Located at the root level (`api/`) as a separate application
- All endpoints are async-ready for better performance

**Example**:
```bash
# Start the API server (uses PYCHARTER_DATABASE_URL or sqlite:///pycharter.db)
pycharter api

# With host/port
pycharter api --host 0.0.0.0 --port 8080
```

**Endpoints**:
- `POST /api/v1/contracts/parse` - Parse a data contract
- `POST /api/v1/contracts/build` - Build contract from store
- `POST /api/v1/metadata/schemas` - Store a schema
- `GET /api/v1/metadata/schemas/{schema_id}` - Get a schema
- `POST /api/v1/schemas/generate` - Generate Pydantic model
- `POST /api/v1/validation/validate` - Validate data
- `POST /api/v1/validation/validate-batch` - Batch validation

**Documentation**: 
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc

See `api/README.md` for complete API documentation.

### Service Integration Summary

| Service | Input | Output | Journey Stage |
|---------|-------|--------|---------------|
| **Contract Parser** | Contract files (YAML/JSON) | `ContractMetadata` | Contract Specification → Parsing |
| **Contract Builder** | Separate artifacts or Store | Consolidated contract | Storage → Consolidation |
| **Metadata Store** | `ContractMetadata` | Stored metadata (DB) | Parsing → Storage |
| **Pydantic Generator** | JSON Schema | Pydantic models | Storage → Model Generation |
| **JSON Schema Converter** | Pydantic models | JSON Schema | (Bidirectional) |
| **Runtime Validator** | Pydantic models + Data | `ValidationResult` | Model Generation → Validation |
| **ETL Pipelines** | Config files or code | `PipelineResult` | Extract → Transform → Load |
| **Quality Assurance** | Contract + Data | `QualityReport` | Validation → Quality Monitoring |

Each service is designed to be **independent** yet **composable**, allowing you to use them individually or together as part of a complete data contract management system.

## 📖 Documentation

- **[Configuration Guide](docs/guides/configuration.md)** - Database connection, `pycharter db init` / `upgrade` / `seed`, migrations, and variable injection
- **[Data Journey Guide](docs/guides/data-journey.md)** - Data production journey: contract specification → storage → validation → quality
- **[Database ERD](docs/guides/database-erd.md)** - Database schema and entity relationship diagrams
- **[Examples & Notebooks](docs/notebooks/README.md)** - Jupyter notebooks (ETL, contracts, validation, quality, metadata store, schema conversion)
- **[REST API](api/README.md)** - API endpoints and usage (install with `pip install pycharter[api]`)

## 📚 Usage Examples

### Basic Usage

**Using Convenience Functions** (Quick Start):
```python
from pycharter import from_dict, from_json, from_file

# From dictionary
schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "title": {"type": "string"},
        "published": {"type": "boolean", "default": False}
    }
}
Article = from_dict(schema, "Article")

# From JSON string
schema_json = '{"type": "object", "version": "1.0.0", "properties": {"name": {"type": "string"}}}'
User = from_json(schema_json, "User")

# From file
Product = from_file("product_schema.json", "Product")
```

**Using Validator Class** (Production):
```python
from pycharter import Validator

# From directory or single file
validator = Validator.from_dir("data/contracts/article")
# or: validator = Validator.from_file("article_contract.yaml")
result = validator.validate({"title": "My Article", "published": True})
```

### Nested Objects

```python
from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "name": {"type": "string"},
        "address": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "zipcode": {"type": "string"}
            }
        }
    }
}

Person = from_dict(schema, "Person")
person = Person(
    name="Alice",
    address={
        "street": "123 Main St",
        "city": "New York",
        "zipcode": "10001"
    }
)

print(person.address.city)  # Output: New York
```

### Arrays and Collections

```python
from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "tags": {
            "type": "array",
            "items": {"type": "string"}
        },
        "items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price": {"type": "number"}
                }
            }
        }
    }
}

Cart = from_dict(schema, "Cart")
cart = Cart(
    tags=["python", "pydantic"],
    items=[
        {"name": "Apple", "price": 1.50},
        {"name": "Banana", "price": 0.75}
    ]
)

print(cart.items[0].name)  # Output: Apple
```

### Coercion and Validation

PyCharter supports **coercion** (pre-validation transformation) and **validation** (post-validation checks):

```python
from pycharter import from_dict

schema = {
    "type": "object",
    "version": "1.0.0",
    "properties": {
        "flight_number": {
            "type": "integer",
            "coercion": "coerce_to_integer"  # Convert string/float to int
        },
        "destination": {
            "type": "string",
            "coercion": "coerce_to_string",
            "validations": {
                "min_length": {"threshold": 3},
                "max_length": {"threshold": 3},
                "no_capital_characters": None,
                "only_allow": {"allowed_values": ["abc", "def", "ghi"]}
            }
        },
        "distance": {
            "type": "number",
            "coercion": "coerce_to_float",
            "validations": {
                "greater_than_or_equal_to": {"threshold": 0}
            }
        }
    }
}

Flight = from_dict(schema, "Flight")

# Coercion happens automatically
flight = Flight(
    flight_number="123",    # Coerced to int: 123
    destination="abc",      # Passes all validations
    distance="100.5"        # Coerced to float: 100.5
)
```

## 📋 Standard JSON Schema Support

Charter supports all standard JSON Schema Draft 2020-12 validation keywords:

| Keyword | Type | Description | Example |
|---------|------|-------------|---------|
| `minLength` | string | Minimum string length | `{"minLength": 3}` |
| `maxLength` | string | Maximum string length | `{"maxLength": 10}` |
| `pattern` | string | Regular expression pattern | `{"pattern": "^[a-z]+$"}` |
| `enum` | any | Allowed values | `{"enum": ["a", "b", "c"]}` |
| `const` | any | Single allowed value | `{"const": "fixed"}` |
| `minimum` | number | Minimum value (inclusive) | `{"minimum": 0}` |
| `maximum` | number | Maximum value (inclusive) | `{"maximum": 100}` |
| `exclusiveMinimum` | number | Minimum value (exclusive) | `{"exclusiveMinimum": 0}` |
| `exclusiveMaximum` | number | Maximum value (exclusive) | `{"exclusiveMaximum": 100}` |
| `multipleOf` | number | Must be multiple of | `{"multipleOf": 2}` |
| `minItems` | array | Minimum array length | `{"minItems": 1}` |
| `maxItems` | array | Maximum array length | `{"maxItems": 10}` |
| `uniqueItems` | array | Array items must be unique | `{"uniqueItems": true}` |

All schemas are validated against JSON Schema standard before processing, ensuring compliance.

## 🔧 Built-in Coercions (Charter Extensions)

| Coercion | Description |
|----------|-------------|
| `coerce_to_string` | Convert int, float, bool, datetime, dict, list to string |
| `coerce_to_integer` | Convert float, string (numeric), bool, datetime to int |
| `coerce_to_float` | Convert int, string (numeric), bool to float |
| `coerce_to_boolean` | Convert int, string to bool |
| `coerce_to_datetime` | Convert string (ISO format), timestamp to datetime |
| `coerce_to_date` | Convert string (date format), datetime to date (date only, no time) |
| `coerce_to_uuid` | Convert string to UUID |
| `coerce_to_lowercase` | Convert string to lowercase |
| `coerce_to_uppercase` | Convert string to uppercase |
| `coerce_to_stripped_string` | Strip leading and trailing whitespace from string |
| `coerce_to_list` | Convert single value to list `[value]` (preserves None) |
| `coerce_empty_to_null` | Convert empty strings/lists/dicts to None (useful for nullable fields) |

## ✅ Built-in Validations (Charter Extensions)

| Validation | Description | Configuration |
|------------|-------------|---------------|
| `min_length` | Minimum length for strings/arrays | `{"threshold": N}` |
| `max_length` | Maximum length for strings/arrays | `{"threshold": N}` |
| `only_allow` | Only allow specific values | `{"allowed_values": [...]}` |
| `greater_than_or_equal_to` | Numeric minimum | `{"threshold": N}` |
| `less_than_or_equal_to` | Numeric maximum | `{"threshold": N}` |
| `is_positive` | Value must be positive | `{"threshold": 0}` |
| `no_capital_characters` | No uppercase letters | `null` |
| `no_special_characters` | Only alphanumeric and spaces | `null` |
| `non_empty_string` | String must not be empty | `null` |
| `matches_regex` | String must match regex pattern | `{"pattern": "..."}` |
| `is_email` | String must be valid email address | `null` |
| `is_url` | String must be valid URL | `null` |
| `is_alphanumeric` | Only alphanumeric characters (no spaces/special) | `null` |
| `is_numeric_string` | String must be numeric (digits, optional decimal) | `null` |
| `is_unique` | All items in array must be unique | `null` |

> **Note**: Charter extensions (`coercion` and `validations`) are optional and can be used alongside standard JSON Schema keywords. All validation logic is stored as data in the JSON schema, making it fully data-driven.

## 🎨 Custom Coercions and Validations

Extend Charter with your own coercion and validation functions:

```python
from pycharter.shared.coercions import register_coercion
from pycharter.shared.validations import register_validation

# Register custom coercion
def coerce_to_uppercase(data):
    if isinstance(data, str):
        return data.upper()
    return data

register_coercion("coerce_to_uppercase", coerce_to_uppercase)

# Register custom validation
def must_be_positive(threshold=0):
    def _validate(value, info):
        if value <= threshold:
            raise ValueError(f"Value must be > {threshold}")
        return value
    return _validate

register_validation("must_be_positive", must_be_positive)
```

## 📖 API Reference

PyCharter's API is organized into three tiers to help you choose the right approach:

### Tier 1: Primary Interfaces (Classes - Best Performance)

**Validator** - Primary validation interface (recommended for production)
```python
from pycharter import Validator

# Create validator via factory methods or store
validator = Validator.from_dir("data/contracts/user")
validator = Validator.from_files(schema="schema.yaml", coercion_rules="coercion.yaml")
validator = Validator.from_file("contract.yaml")
validator = Validator.from_dict(schema={...}, coercion_rules={...})
validator = Validator(store=store, schema_id="user_schema")  # from metadata store

# Validate data
result = validator.validate(data)
results = validator.validate_batch([data1, data2])
model = validator.get_model()  # Get the generated Pydantic model
```

**QualityCheck** - Primary quality assurance interface
```python
from pycharter import QualityCheck, QualityCheckOptions

check = QualityCheck(store=store)
report = check.run(schema_id="user_schema", data=data, options=QualityCheckOptions(...))
```

**MetadataStoreClient** - Base class for metadata stores
```python
from pycharter import MetadataStoreClient, SQLiteMetadataStore, PostgresMetadataStore

store = SQLiteMetadataStore("metadata.db")
store.connect()
```

### Tier 2: Convenience Functions (Quick Start)

**Pydantic Generator** - Input type helpers
- **`from_dict(schema: dict, model_name: str = "DynamicModel")`** - Create model from dictionary
- **`from_json(json_string: str, model_name: str = "DynamicModel")`** - Create model from JSON string
- **`from_file(file_path: str, model_name: str = None)`** - Create model from file (JSON/YAML)
- **`from_url(url: str, model_name: str = "DynamicModel")`** - Create model from URL
- **`generate_model(schema: dict, model_name: str = "DynamicModel")`** - Advanced: more control
- **`generate_model_file(schema: dict, output_path: str, model_name: str = "DynamicModel")`** - Generate and save to file

**JSON Schema Converter** - Output type helpers
- **`to_dict(model: Type[BaseModel], ...)`** - Convert model to JSON Schema dictionary
- **`to_file(model: Type[BaseModel], file_path: str, ...)`** - Convert model to file
- **`to_json(model: Type[BaseModel], ...)`** - Convert model to JSON string
- **`model_to_schema(model: Type[BaseModel], ...)`** - Advanced: core conversion function

**Runtime Validator** - Data source helpers
- **`validate_with_store(store, schema_id, data, ...)`** - Quick validation with metadata store
- **`validate_batch_with_store(store, schema_id, data_list, ...)`** - Batch validation with store
- **`validate_with_contract(contract, data, ...)`** - Quick validation with contract file/dict
- **`validate_batch_with_contract(contract, data_list, ...)`** - Batch validation with contract
- **`get_model_from_store(store, schema_id, ...)`** - Get model from metadata store
- **`get_model_from_contract(contract, ...)`** - Get model from contract
- **`validate_input(contract, ...)`** - Decorator for function input validation
- **`validate_output(contract, ...)`** - Decorator for function output validation
- **`validate_with_contract_decorator(contract, ...)`** - Decorator for contract-based validation

**Contract Management**
- **`parse_contract(contract_dict: dict)`** - Parse contract dictionary
- **`parse_contract_file(file_path: str)`** - Parse contract file (YAML/JSON)
- **`build_contract(artifacts: ContractArtifacts)`** - Build contract from artifacts
- **`build_contract_from_store(store, schema_id, ...)`** - Build contract from metadata store

### Tier 3: Low-Level Utilities (When You Have Models)

- **`validate(model: Type[BaseModel], data: dict, strict: bool = False)`** - Validate single record
- **`validate_batch(model: Type[BaseModel], data_list: List[dict], strict: bool = False)`** - Batch validate
- **`ValidationResult`** - Result class with `is_valid`, `data`, and `errors` attributes

### Metadata Store Implementations

- **`InMemoryMetadataStore()`** - In-memory store (testing/development)
- **`SQLiteMetadataStore(database_path: str)`** - SQLite database
- **`PostgresMetadataStore(connection_string: str)`** - PostgreSQL database
- **`MongoDBMetadataStore(connection_string: str)`** - MongoDB database
- **`RedisMetadataStore(connection_string: str)`** - Redis database

### Exceptions

PyCharter uses a small exception hierarchy for config and pipeline errors. Catch **`PyCharterError`** to handle any PyCharter failure:

- **`PyCharterError`** - Base for all PyCharter exceptions
- **`ConfigError`** - Config loading/parsing failures (missing file, invalid YAML)
- **`ConfigValidationError`** - Schema validation failures (e.g. missing required `type` field)
- **`ConfigLoadError`** - Config file load errors
- **`ExpressionError`** - Expression evaluation failures (e.g. invalid syntax in AddField)

Pipeline **`run(error_context=...)`** supports **`ErrorMode`**: **STRICT** (raise on failure), **LENIENT** (log and continue), **COLLECT** (append to `result.errors`). Import from `pycharter.shared.errors`.

## 🎯 Design Principles & Requirements

Charter is designed to meet the following core requirements:

### ✅ JSON Schema Standard Compliance

All schemas must abide by conventional JSON Schema syntax and qualify as valid JSON Schema:

- **Validation**: All schemas are validated against JSON Schema Draft 2020-12 standard before processing
- **Standard Keywords**: Full support for all standard validation keywords (minLength, pattern, enum, minimum, maximum, etc.)
- **Compliance**: Uses `jsonschema` library for validation with graceful fallback

### ✅ Data-Driven Validation Logic

All schema information and complex field validation logic is stored as **data**, not Python code:

- **Coercion**: Referenced by name (string) in JSON: `"coercion": "coerce_to_integer"`
- **Validations**: Referenced by name with configuration (dict) in JSON: `"validations": {"min_length": {"threshold": 3}}`
- **No Code Required**: Validation rules are defined entirely in JSON schema files
- **Example**: `{"coercion": "coerce_to_string", "validations": {"min_length": {"threshold": 3}}}`

### ✅ Dynamic Pydantic Model Generation

Models are created dynamically at runtime from JSON schemas:

- **Runtime Generation**: Uses `pydantic.create_model()` to generate models on-the-fly
- **Dynamic Validators**: Field validators are dynamically attached using `field_validator` decorators
- **Multiple Sources**: Models can be created from dicts, JSON strings, files, or URLs
- **No Static Code**: All models are generated from data, not pre-defined classes

### ✅ Nested Schema Support

Full support for nested object schemas and complex structures:

- **Recursive Processing**: Nested objects are recursively processed into their own Pydantic models
- **Arrays of Objects**: Arrays containing nested objects are fully supported
- **Deep Nesting**: Deeply nested structures work correctly with full type safety
- **Type Safety**: Each nested object becomes its own typed Pydantic model

### ✅ Extension Fields

Custom fields can be added to JSON Schema to extend functionality:

- **`coercion`**: Pre-validation type conversion (e.g., string → integer)
- **`validations`**: Post-validation custom rules
- **Optional**: Extensions work alongside standard JSON Schema keywords
- **Separated**: Extensions are clearly distinguished from standard JSON Schema

### ✅ Complex Field Validation

Support for both standard and custom field validators:

- **Standard Validators**: minLength, pattern, enum, minimum, maximum, etc. (JSON Schema standard)
- **Custom Validators**: Extensible validation rules via `validations` field
- **Validation Order**: Coercion → Standard Validation → Pydantic Validation → Custom Validations
- **Factory Pattern**: Validators are factory functions that return validation functions

## 🚀 Development Setup

### Quick Setup

```bash
# Run setup script
./setup.sh

# Activate environment
source venv/bin/activate

# Run tests
pytest
```

### Using Make

```bash
make install-dev    # Install package and dev dependencies
make test          # Run tests
make format        # Format code with black and isort
make lint          # Run type checking with mypy
make check         # Run all checks (format, lint, test)
```

**Building the package:** Run `make clean && make build` for a reliable build (clears stale egg-info; see [Publishing](docs/contributing/publishing.md)).

## 🧪 Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=pycharter --cov-report=html

# Run specific test file
pytest tests/test_converter.py

# Run tests matching a pattern
pytest -k "coercion"
```

## 📦 Publishing to PyPI

**Automatic publishing via GitHub Releases (Trusted Publishing - no tokens needed!):**

```bash
# 1. Update version in pyproject.toml
# version = "0.0.21"

# 2. Commit and push
git add pyproject.toml
git commit -m "Bump version to 0.0.21"
git push

# 3. Create GitHub Release (automatically publishes to PyPI)
gh release create v0.0.21 --title "v0.0.21" --notes "Release notes"
```

The workflow automatically:
- ✅ Builds UI
- ✅ Builds Python package
- ✅ Publishes to PyPI (using Trusted Publishing)

**Local build (reliable):** clean first to avoid stale build artifacts, then build:
```bash
make clean && make build
```
Core package builds without Node.js; the UI is included when built (see [Publishing guide](docs/contributing/publishing.md)).

## 📋 JSON Schema Compliance

PyCharter is fully compliant with **JSON Schema Draft 2020-12** standard:

- All schemas are validated against the standard before processing
- Full support for all standard keywords (minLength, maxLength, pattern, enum, minimum, maximum, etc.)
- Optional extensions (`coercion` and `validations`) work alongside standard keywords
- Strict mode available to enforce standard-only schemas

## 🔗 Requirements

- **Python 3.10+**
- Pydantic >= 2.0.0
- jsonschema >= 4.0.0 (optional, for enhanced validation)

See [pyproject.toml](pyproject.toml) for full dependencies and optional extras (`api`, `ui`, `dev`, `etl`, etc.).

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- **Homepage**: [https://github.com/auscheng/pycharter](https://github.com/auscheng/pycharter)
- **Repository**: [GitHub](https://github.com/auscheng/pycharter)
- **Issues**: [GitHub Issues](https://github.com/auscheng/pycharter/issues)
- **Documentation**: [Configuration & guides](docs/guides/configuration.md) · [API docs](api/README.md)

---

**Made with ❤️ for the Python community**
