Metadata-Version: 2.4
Name: arpeggio-shredder
Version: 0.1.9
License: AGPL-3.0-or-later OR Commercial
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: description
Dynamic: description-content-type
Dynamic: license
Dynamic: license-file

# arpeggio-shredder

Reversible shredding (flatten) and unshredding (rebuild) of semi-structured JSONL into Apache Arrow RecordBatches.

This package provides a **thin Python binding** over a C++ core that converts JSON documents into a columnar “atoms” representation suitable for Arrow / Parquet workflows, while preserving the ability to reconstruct the original documents exactly.

The scope is intentionally narrow: **flattening with reversibility**, not general JSON processing.

## Key properties

- **Reversible**: JSON → Arrow atoms → JSON
- **Columnar-first**: output is optimized for Arrow-native pipelines
- **Deterministic identity**: optional object and transaction tagging
- **Minimal Python surface**: Arrow `RecordBatch` in, `RecordBatch` out

## Native extension

The package ships with a **prebuilt native extension (`.so`)** built against Apache Arrow and exposed via pybind11.

- No runtime compilation
- No system Arrow installation required
- Shared libraries are bundled into the wheel

## Platform support

- **OS**: Linux (manylinux-compatible)
- **Architecture**: x86_64
- **Python**: CPython 3.12
- **ABI**: glibc (manylinux)

Other platforms are not currently supported.

## Relationship to the C++ project

This package is the **Python distribution layer** for the Shredder C++ project.  
It does not include the full C++ documentation, tests, or build system.

## License

This package is dual-licensed:

- **AGPL-3.0** for open-source use and networked deployments
- **Commercial license** for proprietary or closed-source use

Commercial licensing is available via https://arpeggio.one/shop.
