Metadata-Version: 2.4
Name: pydantable
Version: 0.17.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: pydantic>=2.0,<3
Requires-Dist: typing-extensions>=4.7
Requires-Dist: pyarrow>=14.0 ; extra == 'arrow'
Requires-Dist: maturin>=1.4,<2.0 ; extra == 'benchmark'
Requires-Dist: pandas>=2.0 ; extra == 'benchmark'
Requires-Dist: polars>=1.0.0,<2 ; extra == 'benchmark'
Requires-Dist: fastapi>=0.100 ; extra == 'dev'
Requires-Dist: httpx>=0.24 ; extra == 'dev'
Requires-Dist: python-multipart>=0.0.6 ; extra == 'dev'
Requires-Dist: hypothesis>=6.0 ; extra == 'dev'
Requires-Dist: numpy>=1.24 ; extra == 'dev'
Requires-Dist: pyarrow>=14.0 ; extra == 'dev'
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
Requires-Dist: pytest-xdist>=3.0 ; extra == 'dev'
Requires-Dist: coverage[toml]>=7.0 ; extra == 'dev'
Requires-Dist: polars>=1.0.0,<2 ; extra == 'dev'
Requires-Dist: sphinx ; extra == 'docs'
Requires-Dist: myst-parser ; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints ; extra == 'docs'
Requires-Dist: sphinx-copybutton ; extra == 'docs'
Requires-Dist: sphinx-rtd-theme ; extra == 'docs'
Requires-Dist: pandas>=2.0 ; extra == 'pandas'
Requires-Dist: polars>=1.0.0,<2 ; extra == 'polars'
Provides-Extra: arrow
Provides-Extra: benchmark
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: pandas
Provides-Extra: polars
License-File: LICENSE.md
Summary: Strongly-typed DataFrames for Python, powered by Rust.
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# PydanTable

[![CI](https://github.com/eddiethedean/pydantable/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/eddiethedean/pydantable/actions/workflows/ci.yml?query=branch%3Amain)
[![Documentation](https://readthedocs.org/projects/pydantable/badge/?version=latest)](https://pydantable.readthedocs.io/en/latest/)
[![PyPI version](https://img.shields.io/pypi/v/pydantable)](https://pypi.org/project/pydantable/)
[![Python versions](https://img.shields.io/pypi/pyversions/pydantable)](https://pypi.org/project/pydantable/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Typed dataframe transformations for FastAPI and Pydantic services, backed by a Rust execution core (Polars inside the native extension).**

**Current release: 0.17.0** · Python **3.10+**

---

## At a glance

- **Schemas first:** Pydantic field annotations define column types, nullability (`T | None`), and which expressions are legal. Many mistakes are caught when you build the `Expr`, not only when you run the query.
- **Two entry styles:** `DataFrameModel` (SQLModel-like whole-table class with a generated row model) or `DataFrame[YourSchema](data)` with any Pydantic `BaseModel` schema.
- **Polars-shaped API:** `select`, `with_columns`, `filter`, `join`, `group_by`, windows, reshape helpers — semantics are documented in the [interface contract](https://pydantable.readthedocs.io/en/latest/INTERFACE_CONTRACT.html), not guaranteed identical to Polars on every edge case.
- **Optional extras:** `pydantable[polars]` for `to_polars()`; `pydantable[arrow]` for `read_parquet` / `read_ipc`, `to_arrow` / `ato_arrow`, and `pa.Table` / `RecordBatch` constructors.
- **Optional façades:** `pydantable.pandas` and `pydantable.pyspark` swap naming/imports; execution stays the same in-process core (not a real Spark or pandas backend).
- **Service-ready:** Sync and async materialization (`collect`, `to_dict`, `acollect`, `ato_dict`, …), [FastAPI](https://pydantable.readthedocs.io/en/latest/FASTAPI.html) patterns, and trusted ingest modes for bulk JSON or Arrow.

---

## Documentation

The **canonical manual** is on Read the Docs: **[https://pydantable.readthedocs.io/en/latest/](https://pydantable.readthedocs.io/en/latest/)**

| Topic | Read the Docs |
|--------|----------------|
| **Home / overview** | [Documentation home](https://pydantable.readthedocs.io/en/latest/index.html) |
| **Changelog & versions** | [Changelog](https://pydantable.readthedocs.io/en/latest/changelog.html) |
| **`DataFrameModel`** (inputs, transforms, collisions, materialization) | [DataFrameModel](https://pydantable.readthedocs.io/en/latest/DATAFRAMEMODEL.html) |
| **Column types** (scalars, structs, `list[T]`, maps, trusted ingest) | [Supported data types](https://pydantable.readthedocs.io/en/latest/SUPPORTED_TYPES.html) |
| **FastAPI** (routers, bodies, async, multipart) | [FastAPI integration](https://pydantable.readthedocs.io/en/latest/FASTAPI.html) |
| **Execution** (`collect`, `to_dict`, `to_polars`, `to_arrow`, async) | [Execution](https://pydantable.readthedocs.io/en/latest/EXECUTION.html) |
| **Semantics** (nulls, joins, windows, reshape) | [Interface contract](https://pydantable.readthedocs.io/en/latest/INTERFACE_CONTRACT.html) |
| **Roadmap** (shipped **0.17.0**, planned **0.18+**, path to v1.0.0) | [Roadmap](https://pydantable.readthedocs.io/en/latest/ROADMAP.html) |
| **Why not Polars alone?** | [Why not just use Polars?](https://pydantable.readthedocs.io/en/latest/WHY_NOT_POLARS.html) |
| **Pandas-style API** (`pydantable.pandas`) | [Pandas UI](https://pydantable.readthedocs.io/en/latest/PANDAS_UI.html) |
| **PySpark-style API** (`pydantable.pyspark`) | [PySpark UI](https://pydantable.readthedocs.io/en/latest/PYSPARK_UI.html) · [Parity matrix](https://pydantable.readthedocs.io/en/latest/PYSPARK_PARITY.html) |
| **Polars parity** | [Scorecard](https://pydantable.readthedocs.io/en/latest/PARITY_SCORECARD.html) · [Workflows](https://pydantable.readthedocs.io/en/latest/POLARS_WORKFLOWS.html) · [Transformation roadmap](https://pydantable.readthedocs.io/en/latest/POLARS_TRANSFORMATIONS_ROADMAP.html) |
| **Contributors** | [Developer guide](https://pydantable.readthedocs.io/en/latest/DEVELOPER.html) |
| **Architecture plan** | [Plan document](https://pydantable.readthedocs.io/en/latest/pydantable_plan.html) |
| **Python API (autodoc)** | [API reference](https://pydantable.readthedocs.io/en/latest/api/index.html) |

---

## Install

```bash
pip install pydantable
```

**Optional dependencies** (same package, feature extras):

```bash
pip install 'pydantable[polars]'   # to_polars()
pip install 'pydantable[arrow]'  # read_parquet/read_ipc, to_arrow, Table/RecordBatch constructors
```

**From a git checkout** you need a Rust toolchain and a build of the extension (e.g. [Maturin](https://www.maturin.rs/)):

```bash
pip install .
# editable: maturin develop --manifest-path pydantable-core/Cargo.toml
```

Full setup, `make check-full`, and release notes: [Developer guide](https://pydantable.readthedocs.io/en/latest/DEVELOPER.html).

---

## Quick start

```python
from pydantable import DataFrameModel

class User(DataFrameModel):
    id: int
    age: int | None

df = User({"id": [1, 2], "age": [20, None]})
df2 = df.with_columns(age2=df.age * 2)
df3 = df2.select("id", "age2")
df4 = df3.filter(df3.age2 > 10)

# Columnar dict (good for JSON APIs)
print(df4.to_dict())
# {'age2': [40], 'id': [1]}

# List of Pydantic row models (default collect)
for row in df4.collect():
    print(row.id, row.age2)
```

**Materialization:** [`collect()`](https://pydantable.readthedocs.io/en/latest/DATAFRAMEMODEL.html) → `list` of row models; [`to_dict()`](https://pydantable.readthedocs.io/en/latest/EXECUTION.html) / `collect(as_lists=True)` → `dict[str, list]`; `to_polars()` / `to_arrow()` when the matching extra is installed. **Async:** `acollect`, `ato_dict`, `ato_polars`, `ato_arrow` offload blocking work from the event loop ([Execution](https://pydantable.readthedocs.io/en/latest/EXECUTION.html), [FastAPI](https://pydantable.readthedocs.io/en/latest/FASTAPI.html)).

**Alternate import styles** (same engine):

```python
from pydantable.pandas import DataFrameModel as PandasDataFrameModel
from pydantable.pyspark import DataFrameModel as PySparkDataFrameModel
from pydantable import DataFrameModel as DefaultDataFrameModel
```

More examples: [FastAPI](https://pydantable.readthedocs.io/en/latest/FASTAPI.html), [Polars-style workflows](https://pydantable.readthedocs.io/en/latest/POLARS_WORKFLOWS.html).

**Validation policy:** Constructors validate strictly by default. For messy row lists, `ignore_errors=True` plus `on_validation_errors=callback` receives failed rows (`row_index`, `row`, Pydantic `errors`). Trusted bulk paths use `trusted_mode` (`off` / `shape_only` / `strict`). Details: [DataFrameModel](https://pydantable.readthedocs.io/en/latest/DATAFRAMEMODEL.html), [Supported types](https://pydantable.readthedocs.io/en/latest/SUPPORTED_TYPES.html).

---

## Expression & API surface

Typed **`Expr`** builds a Rust AST. Highlights:

- **Globals in `select`:** `global_sum`, `global_mean`, `global_count`, `global_min`, `global_max`, `global_row_count()` (row count). PySpark façade: `F.count()` with no argument = row count.
- **Windows:** `row_number`, `rank`, `dense_rank`, `window_sum`, `window_mean`, `window_min`, `window_max`, `lag`, `lead` with `Window.partitionBy(...).orderBy(..., nulls_last=...)`; framed `rowsBetween` / `rangeBetween` where supported ([window semantics](https://pydantable.readthedocs.io/en/latest/WINDOW_SQL_SEMANTICS.html)).
- **Temporal & strings:** `strptime`, `unix_timestamp`, `cast` to `date`/`datetime`, `dt_*` parts, `strip` / `lower` / `upper`, `str_replace`, `strip_prefix` / `suffix` / `chars`, list helpers (`list_len`, `list_get`, …).
- **Maps (string keys):** `map_len`, `map_get`, `map_contains_key`, `map_keys`, `map_values`, `map_entries`, `map_from_entries`, `element_at`; `binary_len` for `bytes` columns.

PySpark-named wrappers: `pydantable.pyspark.sql.functions` mirrors much of the above ([parity table](https://pydantable.readthedocs.io/en/latest/PYSPARK_PARITY.html)).

---

## Recent releases

**0.17.0** — Tighter docs and tests for **`map_get` / `map_contains_key`** after PyArrow **`map<utf8, …>`** ingest; more **`pyspark.sql.functions`** thin wrappers (`str_replace`, `regexp_replace`, `strip_*`, `strptime`, `binary_len`, `list_*`). Non-string map keys (`dict[int, T]`, etc.) remain future work ([Roadmap](https://pydantable.readthedocs.io/en/latest/ROADMAP.html) **Later**).

**0.16.x** — Arrow interchange (`read_parquet` / `read_ipc`, `to_arrow` / `ato_arrow`, Table/RecordBatch constructors), FastAPI multipart and deployment docs, map-column arithmetic `TypeError` fix, `DataFrame[Schema](pa.Table)` constructor fix.

Older highlights: **0.15.0** async materialization and Arrow map ingest; **0.14.0** window null ordering and FastAPI `TestClient` coverage. Full history: [Changelog](https://pydantable.readthedocs.io/en/latest/changelog.html).

---

## Development

From a clone with `.venv` and `pip install -e ".[dev]"` plus a built extension:

```bash
make check-full              # Ruff, mypy, Rust fmt / clippy / tests
PYTHONPATH=python pytest -q  # integration tests (see DEVELOPER.md)
```

Rust tests need the Makefile `PYO3_PYTHON` / `PYTHONPATH` wiring: `make rust-test`. Details: [Developer guide](https://pydantable.readthedocs.io/en/latest/DEVELOPER.html).

---

## License

MIT

