Metadata-Version: 2.4
Name: python-calamine-reducto
Version: 0.7.0
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
License-File: LICENSE
Summary: Reducto fork of python-calamine — Python binding for calamine (Rust xlsx/xls/ods reader) with CellGrid, Rust SSF formatting, and style bindings
Author-email: Reducto <support@reducto.ai>, Dmitriy <dimastbk@proton.me>
License-Expression: MIT
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: homepage, https://github.com/reductoai/python-calamine
Project-URL: source, https://github.com/reductoai/python-calamine

# python-calamine (Reducto fork)

> **Private fork** of [python-calamine](https://github.com/dimastbk/python-calamine) with
> CellGrid, style/layout bindings, and performance optimizations for spreadsheet processing.

## Provenance

```
dimastbk/python-calamine (upstream, v0.6.1)
  └─ reductoai/python-calamine (this repo — CellGrid + style/layout bindings)
```

The `upstream/master` branch is a frozen snapshot of dimastbk's code.
Our additions live on `main`.

## What we added

### Style & layout bindings (`workbook.rs`)

- **`get_sheet_styles(name)`** → list of `(row, col, style_dict)` tuples
  - Font: bold, italic, size, color, name, underline, strikethrough
  - Fill: pattern, fg_color, bg_color
  - Borders: top, bottom, left, right styles
  - Number format string
- **`get_sheet_layout(name)`** → dict with row heights, column widths, defaults
- **`get_sheet_bounds(name)`** → `(max_filled_row, max_filled_col)` computed in Rust
- **`get_sheet_grid(name, rows, cols, include_styles=True)`** → `CellGrid`
  (builds entire grid in a single Rust call)

### CellGrid (`grid.rs`) — Rust-backed 2D grid

Replaces numpy structured arrays. All cell data stays in Rust Vecs and only
crosses to Python on demand.

**Types:**
- `CellGrid` — the main 2D grid (values, styles, merges, header, colors, formulas)
- `CellProxy` — lightweight single-cell accessor (supports `cell.value` and `cell["value"]`)
- `CellGridView` — zero-copy sub-grid view for slicing

**Access patterns:**
- `grid[r, c]` → `CellProxy`
- `grid[r1:r2, c1:c2]` → `CellGridView`
- `grid["field_name"]` → numpy 2D array of that field
- `grid["field_name"] = value` → broadcast or copy from numpy

**Grid operations:**
- `vstack(other)` / `hstack(other)` — concatenation
- `slice_copy(r1, r2, c1, c2)` — independent copy of region
- `copy()` — deep copy
- `CellGrid.empty(rows, cols)` — create empty grid
- `masked_copy_from(source, mask, ...)` — boolean mask copy

**Formatting:**
- `apply_formatting(callback)` — calls Python SSF per cell
- `apply_raw_as_formatted()` — `str(raw)` for large sheets

### Performance

| File | openpyxl | calamine + CellGrid | Speedup |
|------|----------|---------------------|---------|
| 7 MB, 26k cells | 711ms | 256ms | 2.8x |
| 22 MB, 4.2M cells | 491s | 8.2s | 60x |

## Calamine dependency

This repo depends on `reductoai/calamine` (our private fork with styles):

```toml
# Cargo.toml
calamine = { git = "https://github.com/reductoai/calamine", branch = "main", features = ["chrono"] }
```

For local development, override with a path dep:

```toml
calamine = { path = "../calamine", features = ["chrono"] }
```

## Syncing with upstream

```bash
git fetch upstream  # upstream = dimastbk/python-calamine
git merge upstream/master
# Resolve Cargo.toml conflict: keep our git dep for calamine
```

## Key files

| File | What |
|------|------|
| `src/types/grid.rs` | CellGrid, CellProxy, CellGridView |
| `src/types/workbook.rs` | get_sheet_grid, get_sheet_styles, get_sheet_layout |
| `src/types/sheet.rs` | CalamineSheet, CalamineCellIterator |
| `src/types/cell.rs` | CellValue enum → Python type conversion |
| `src/lib.rs` | PyO3 module registration |
| `python/python_calamine/__init__.py` | Python re-exports |

---

# python-calamine

Python binding for Rust's [calamine](https://github.com/tafia/calamine) library
for reading Excel and ODF files.

### Built with
* [calamine](https://github.com/tafia/calamine)
* [pyo3](https://github.com/PyO3/pyo3)
* [maturin](https://github.com/PyO3/maturin)

### Example
```python
from python_calamine import CalamineWorkbook

workbook = CalamineWorkbook.from_path("file.xlsx")
workbook.sheet_names
# ["Sheet1", "Sheet2"]

workbook.get_sheet_by_name("Sheet1").to_python()
# [
# ["1",  "2",  "3",  "4",  "5",  "6",  "7"],
# ["1",  "2",  "3",  "4",  "5",  "6",  "7"],
# ["1",  "2",  "3",  "4",  "5",  "6",  "7"],
# ]
```

### Development

#### Prerequisites
- Rust toolchain ([rustup](https://rustup.rs/))
- Python 3.10+
- maturin (`pip install maturin`)

#### Build

```bash
git clone git@github.com:reductoai/python-calamine.git
cd python-calamine

# Create venv (or use existing one)
python3 -m venv .venv && source .venv/bin/activate

# Build and install into venv
maturin develop --release    # ~4s incremental, ~30s clean

# Or build into a specific venv (e.g. api4's):
/path/to/api4/.venv/bin/maturin develop --release
```

#### Workflow for Rust changes

1. Edit files in `src/types/`
2. Run `maturin develop --release`
3. Test in Python — the venv's `python_calamine` package is updated in-place

## Credits

Thanks to [dimastbk](https://github.com/dimastbk/python-calamine) for the original
python-calamine bindings.

