Metadata-Version: 2.4
Name: sigilyx
Version: 0.2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries
Classifier: Typing :: Typed
Requires-Dist: polars>=0.20
Requires-Dist: pyarrow>=14.0 ; extra == 'all'
Requires-Dist: pandas>=1.5 ; extra == 'all'
Requires-Dist: pyarrow>=14.0 ; extra == 'arrow'
Requires-Dist: pandas>=1.5 ; extra == 'pandas'
Requires-Dist: pyarrow>=14.0 ; extra == 'pandas'
Provides-Extra: all
Provides-Extra: arrow
Provides-Extra: pandas
License-File: LICENSE
Summary: SigilYX — High-performance YXDB file reader and writer.
Keywords: yxdb,alteryx,polars,dataframe,etl,arrow
Author: Sigilweaver
License: AGPL-3.0-only
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/Sigilweaver/sigilyx/blob/main/CHANGELOG.md
Project-URL: Documentation, https://sigilweaver.app/sigilyx/
Project-URL: Homepage, https://sigilweaver.app/sigilyx/
Project-URL: Issues, https://github.com/Sigilweaver/sigilyx/issues
Project-URL: Repository, https://github.com/Sigilweaver/sigilyx

# sigilyx

*High-performance YXDB reader and writer for Python.*

[![PyPI](https://img.shields.io/pypi/v/sigilyx)](https://pypi.org/project/sigilyx/)
[![Python](https://img.shields.io/pypi/pyversions/sigilyx)](https://pypi.org/project/sigilyx/)
[![License: AGPL-3.0](https://img.shields.io/badge/license-AGPL--3.0-blue)](https://github.com/Sigilweaver/sigilyx/blob/main/LICENSE)

YXDB is the native binary format used by [Alteryx](https://www.alteryx.com/) Designer. `sigilyx` reads and writes `.yxdb` files using [Polars](https://pola.rs/) DataFrames, [PyArrow](https://arrow.apache.org/docs/python/) Tables, or [Pandas](https://pandas.pydata.org/) DataFrames.

The core is written in Rust — parallel LZF decompression, SIMD UTF-16→UTF-8 transcoding, direct Arrow array construction. No native Alteryx Designer installation required.

## Installation

```bash
pip install sigilyx                     # Polars only (default)
pip install "sigilyx[arrow]"            # + PyArrow
pip install "sigilyx[pandas]"           # + Pandas + PyArrow
pip install "sigilyx[all]"              # all extras
```

Requires Python 3.9+. Pre-built wheels for Windows, macOS, and Linux (x64 and ARM).

## Quick Start

```python
import polars as pl
import sigilyx  # importing registers pl.read_yxdb(), df.yxdb, etc.

# Read
df = pl.read_yxdb("data.yxdb")

# Write
df.yxdb.write("output.yxdb")
```

## API

### Polars Integration

Importing `sigilyx` registers official [Polars namespace plugins](https://docs.pola.rs/api/python/stable/reference/api.html) and top-level IO aliases. No extra calls needed — just `import sigilyx`.

```python
import polars as pl
import sigilyx

# Top-level IO (mirrors pl.read_parquet / pl.scan_parquet style)
df = pl.read_yxdb("data.yxdb")           # returns pl.DataFrame
lf = pl.scan_yxdb("data.yxdb")           # returns pl.LazyFrame (IO plugin)

# Namespace API on DataFrame / LazyFrame
df.yxdb.write("output.yxdb")             # pl.DataFrame → .yxdb file
lf.yxdb.sink("output.yxdb")              # pl.LazyFrame → .yxdb file (streaming)
```

### Reading

```python
import sigilyx as yx

# Polars DataFrame — fastest, zero-copy via Arrow C Data Interface
df = yx.read_yxdb("data.yxdb")

# PyArrow Table
table = yx.read_yxdb_arrow("data.yxdb")

# Pandas DataFrame (via PyArrow)
pdf = yx.read_yxdb_pandas("data.yxdb")
```

### Writing

```python
import sigilyx as yx

# Polars DataFrame
yx.write_yxdb("output.yxdb", df)

# PyArrow Table
yx.write_yxdb_arrow("output.yxdb", table)

# Pandas DataFrame
yx.write_yxdb_pandas("output.yxdb", pdf)
```

### Streaming / Batched Read

Iterate over large files with constant memory usage:

```python
import sigilyx as yx

# Basic iteration — each batch is a Polars DataFrame
for batch in yx.read_yxdb_batches("data.yxdb", batch_size=100_000):
    process(batch)

# Column projection — only materialise the columns you need
for batch in yx.read_yxdb_batches("data.yxdb", columns=["Id", "Name", "Amount"]):
    process(batch)

# Row limit — stop after N total rows
for batch in yx.read_yxdb_batches("data.yxdb", n_rows=5_000):
    process(batch)
```

### Lazy Scan

```python
import polars as pl
import sigilyx as yx

# Returns a Polars LazyFrame backed by a native Rust streaming reader.
# Only the YXDB header is read on construction; data streams on .collect().
lf = yx.scan("data.yxdb")

result = lf.filter(pl.col("amount") > 100).collect()

# Projection pushdown — only selected columns are materialised in Rust
top10 = lf.select("id", "name").head(10).collect()
```

> **Pushdown support:** projection (`select` / `with_columns`) and row-limit
> (`n_rows` / `.head()`) are pushed down to the Rust reader.
> Predicate pushdown is not possible — YXDB rows are LZF-compressed with no
> block-level statistics, so `.filter()` is applied after the scan.

### Metadata

```python
import sigilyx as yx

# Inspect schema without reading any row data
fields = yx.read_yxdb_fields("data.yxdb")
for f in fields:
    print(f.name, f.field_type, f.size)

# Record count from the file header — no data read
n = yx.record_count("data.yxdb")
```

## Field Types

| YXDB Type | Polars / Arrow Type | Notes |
|-----------|---------------------|-------|
| `Bool` | `Boolean` | |
| `Byte` | `Int16` | Unsigned byte stored as Int16 |
| `Int16` | `Int16` | |
| `Int32` | `Int32` | |
| `Int64` | `Int64` | |
| `Float` | `Float32` | |
| `Double` | `Float64` | |
| `FixedDecimal` | `Decimal` | Precision and scale preserved |
| `String` | `String` / `Utf8` | Fixed-width, ASCII/Latin-1 |
| `WString` | `String` / `Utf8` | Fixed-width, UTF-16 decoded |
| `V_String` | `String` / `LargeUtf8` | Variable-length, ASCII/Latin-1 |
| `V_WString` | `String` / `LargeUtf8` | Variable-length, UTF-16 decoded |
| `Date` | `Date` | Days since Unix epoch |
| `DateTime` | `Datetime(us)` | Microsecond precision |
| `Time` | `Time` | Nanosecond precision |
| `Blob` | `Binary` / `LargeBinary` | Variable-length binary |
| `SpatialObj` | `Binary` / `LargeBinary` | Geometry as ISO WKB or raw SHP bytes |

## Performance

100,000 rows, 100 runs, median. SigilYX (Python) vs pure-Python yxdb-py:

| Shape | SigilYX | yxdb-py | Speedup |
|-------|--------:|--------:|--------:|
| Narrow (2 cols) | 2.8 ms | 309 ms | **111×** |
| Mixed (8 cols) | 22.2 ms | 4,333 ms | **195×** |
| String-heavy (5 cols) | 52.2 ms | 10,659 ms | **204×** |

SigilYX (Python) vs open-source implementations in other languages:

| Shape | SigilYX | Best C++ | Go | .NET |
|-------|--------:|---------:|---:|-----:|
| Narrow (2 cols) | **2.8 ms** | 2.2 ms | 4.5 ms | 8.7 ms |
| Numeric (5 cols) | **5.1 ms** | 4.3 ms | 7.2 ms | 11.6 ms |
| Mixed (8 cols) | **22.2 ms** | 39.9 ms | 130.3 ms | 108.4 ms |
| String-heavy (5 cols) | **52.2 ms** | 85.3 ms | 344.6 ms | 204.6 ms |

Full methodology and results: [PERFORMANCE.md](https://github.com/Sigilweaver/sigilyx/blob/main/PERFORMANCE.md)

## Links

- **GitHub:** https://github.com/Sigilweaver/sigilyx
- **Documentation:** https://sigilweaver.app/sigilyx/
- **Rust crate (crates.io):** https://crates.io/crates/sigilyx
- **Changelog:** https://github.com/Sigilweaver/sigilyx/blob/main/CHANGELOG.md
- **Issues:** https://github.com/Sigilweaver/sigilyx/issues

## License

[GNU Affero General Public License v3.0](https://github.com/Sigilweaver/sigilyx/blob/main/LICENSE) (AGPL-3.0-only).

