Metadata-Version: 2.4
Name: opensheet-core
Version: 0.1.1
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Dist: pytest ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: openpyxl ; extra == 'dev'
Provides-Extra: dev
License-File: LICENSE
Summary: A fast, memory-efficient spreadsheet I/O library for Python, powered by Rust
Keywords: xlsx,excel,spreadsheet,streaming,rust,pyo3,openpyxl
Author-email: 0xNadr <hi@nader.info>
Maintainer-email: 0xNadr <hi@nader.info>
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/0xNadr/opensheet-core/blob/main/CHANGELOG.md
Project-URL: Homepage, https://github.com/0xNadr/opensheet-core
Project-URL: Issues, https://github.com/0xNadr/opensheet-core/issues
Project-URL: Repository, https://github.com/0xNadr/opensheet-core

<p align="center">
  <img src="https://raw.githubusercontent.com/0xNadr/opensheet-core/main/assets/banner.svg" alt="OpenSheet Core — Fast, memory-efficient spreadsheet I/O for Python, powered by Rust" width="100%">
</p>

<p align="center">
  <a href="https://github.com/0xNadr/opensheet-core/actions/workflows/ci.yml"><img src="https://github.com/0xNadr/opensheet-core/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://pypi.org/project/opensheet-core/"><img src="https://img.shields.io/pypi/v/opensheet-core.svg" alt="PyPI"></a>
  <a href="https://github.com/0xNadr/opensheet-core/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.9%E2%80%933.13-blue.svg" alt="Python 3.9–3.13"></a>
  <a href="https://codecov.io/gh/0xNadr/opensheet-core"><img src="https://codecov.io/gh/0xNadr/opensheet-core/branch/main/graph/badge.svg" alt="Coverage"></a>
</p>

<p align="center">
  <a href="#features">Features</a> &nbsp;&bull;&nbsp;
  <a href="#benchmarks">Benchmarks</a> &nbsp;&bull;&nbsp;
  <a href="#installation">Installation</a> &nbsp;&bull;&nbsp;
  <a href="#quick-start">Quick Start</a> &nbsp;&bull;&nbsp;
  <a href="#api-reference">API</a> &nbsp;&bull;&nbsp;
  <a href="#roadmap">Roadmap</a> &nbsp;&bull;&nbsp;
  <a href="#contributing">Contributing</a>
</p>

---

## Why OpenSheet Core?

Existing Python spreadsheet libraries force you to choose between performance, memory efficiency, broad format support, and easy installation. OpenSheet Core eliminates that tradeoff with a native Rust core exposed through a clean Python API — installable with a single `pip install`.

## Features

- **Streaming XLSX reader** — row-by-row iteration without loading the entire file into memory
- **Streaming XLSX writer** — write millions of rows with constant memory usage
- **Formula support** — read and write formulas with optional cached values
- **Date/time support** — read and write `datetime.date` and `datetime.datetime` cells with automatic Excel serial number conversion
- **Merged cells** — read and write merged cell ranges
- **Typed cell extraction** — strings, numbers, booleans, dates, datetimes, formulas, and empty cells are returned as native Python types
- **Context manager support** — Pythonic `with` statement for safe resource management
- **Cross-platform** — tested on Linux, macOS, and Windows across Python 3.9–3.13
- **Zero Python dependencies** — single native extension, no dependency tree to manage

## Benchmarks

Benchmarked against [openpyxl](https://openpyxl.readthedocs.io/) 3.1.5 on a 100,000-row x 10-column dataset (1M cells):

| Operation | OpenSheet Core | openpyxl | Speedup | Memory |
|-----------|---------------|----------|---------|--------|
| **Write** | 2.3s | 20.8s | **9x faster** | **~300x less** |
| **Read** | 0.46s | 14.3s | **31x faster** | — |

> Run it yourself: `python benchmarks/benchmark.py`

## Installation

```bash
pip install opensheet-core
```

### From source (requires Rust toolchain)

```bash
pip install maturin
git clone https://github.com/0xNadr/opensheet-core
cd opensheet-core
maturin develop --release
```

## Quick Start

### Reading an XLSX file

```python
from opensheet_core import read_xlsx, read_sheet

# Read all sheets
sheets = read_xlsx("report.xlsx")
for sheet in sheets:
    print(f"Sheet: {sheet['name']}")
    for row in sheet["rows"]:
        print(row)  # List of typed Python values

# Read a specific sheet
rows = read_sheet("report.xlsx", sheet_name="Data")
```

### Writing an XLSX file

```python
from opensheet_core import XlsxWriter

with XlsxWriter("output.xlsx") as writer:
    writer.add_sheet("Data")
    writer.write_row(["Name", "Age", "Active"])
    writer.write_row(["Alice", 30, True])
    writer.write_row(["Bob", 25, False])
```

### Writing dates

```python
import datetime
from opensheet_core import XlsxWriter

with XlsxWriter("output.xlsx") as writer:
    writer.add_sheet("Events")
    writer.write_row(["Event", "Date", "Timestamp"])
    writer.write_row(["Launch", datetime.date(2025, 3, 15), datetime.datetime(2025, 3, 15, 14, 30)])
```

### Merging cells

```python
from opensheet_core import XlsxWriter

with XlsxWriter("output.xlsx") as writer:
    writer.add_sheet("Report")
    writer.write_row(["Title spanning three columns", "", ""])
    writer.write_row(["A", "B", "C"])
    writer.merge_cells("A1:C1")
```

### Writing formulas

```python
from opensheet_core import XlsxWriter, Formula

with XlsxWriter("output.xlsx") as writer:
    writer.add_sheet("Budget")
    writer.write_row(["Item", "Cost"])
    writer.write_row(["Rent", 1200])
    writer.write_row(["Food", 400])
    writer.write_row(["Total", Formula("SUM(B2:B3)", cached_value=1600)])
```

## API Reference

### `read_xlsx(path: str) -> list[dict]`

Reads an XLSX file and returns a list of dicts with `"name"` (str), `"rows"` (list of lists), and `"merges"` (list of range strings like `"A1:C1"`). Each cell is a typed Python value (`str`, `int`, `float`, `bool`, `datetime.date`, `datetime.datetime`, `Formula`, or `None`).

### `read_sheet(path, sheet_name=None, sheet_index=None) -> list[list]`

Reads a single sheet by name or index. Returns the first sheet by default.

### `sheet_names(path: str) -> list[str]`

Returns the list of sheet names in a workbook.

### `XlsxWriter(path: str)`

Streaming XLSX writer. Use as a context manager.

| Method | Description |
|--------|-------------|
| `add_sheet(name: str)` | Create a new worksheet |
| `write_row(values: list)` | Write a row of values to the current sheet |
| `merge_cells(range: str)` | Merge a range of cells (e.g. `"A1:C1"`) |
| `close()` | Finalize and close the file |

### `Formula(formula: str, cached_value=None)`

Represents a spreadsheet formula. Pass as a cell value when writing, and received when reading cells that contain formulas.

## Architecture

```
┌──────────────────────────┐
│      Python API          │  ← opensheet_core (PyO3 bindings)
├──────────────────────────┤
│      Rust Core           │  ← Streaming parser & writer
│  ┌────────┐ ┌──────────┐ │
│  │ Reader │ │  Writer  │ │
│  │ (SAX)  │ │ (Stream) │ │
│  └────────┘ └──────────┘ │
├──────────────────────────┤
│  quick-xml  │    zip     │  ← Dependencies
└──────────────────────────┘
```

## Feature Comparison vs openpyxl

OpenSheet Core is designed to be a faster, memory-efficient alternative to openpyxl for the most common spreadsheet workflows. Here's where we stand:

### What we already do better

| | OpenSheet Core | openpyxl |
|---|---|---|
| **Write 1M cells** | ~0.7s | ~1.8s |
| **Read 1M cells** | ~0.9s | ~2.4s |
| **Memory usage** | Constant (streaming) | ~50x file size |
| **Python dependencies** | Zero | Several |
| **Architecture** | Rust streaming core | Pure Python DOM |

### Feature coverage

| Category | Feature | openpyxl | OpenSheet Core |
|----------|---------|:--------:|:--------------:|
| **Formats** | .xlsx read/write | Yes | Yes |
| | .xlsm (macro-enabled) | Yes | Planned |
| | .xltx/.xltm (templates) | Yes | — |
| **Cell Types** | Strings, numbers, booleans | Yes | Yes |
| | Dates and datetimes | Yes | Yes |
| | Formulas with cached values | Yes | Yes |
| | Rich text | Yes | Planned |
| | Error values | Yes | Planned |
| **Styling** | Fonts (name, size, bold, italic, color) | Yes | Planned |
| | Fill (solid, pattern, gradient) | Yes | Planned |
| | Borders (14 styles) | Yes | Planned |
| | Alignment (horizontal, vertical, wrap, rotation) | Yes | Planned |
| | Number formats (30+ builtins + custom) | Yes | Date/datetime only |
| | Named styles | Yes | Planned |
| | Conditional formatting (6 rule types) | Yes | Planned |
| **Worksheet** | Merged cells | Yes | Yes |
| | Freeze panes | Yes | Planned |
| | Auto-filter | Yes | Planned |
| | Column widths / row heights | Yes | Planned |
| | Data validation (7 types) | Yes | Planned |
| | Sheet protection | Yes | Planned |
| | Row/column insert/delete | Yes | — |
| | Print settings | Yes | Planned |
| | Row/column grouping | Yes | — |
| **Workbook** | Named ranges / defined names | Yes | Planned |
| | Document properties | Yes | Planned |
| | Workbook protection | Yes | — |
| | Multiple sheet states (hidden, veryHidden) | Yes | Planned |
| **Charts** | 12+ chart types (bar, line, pie, scatter, etc.) | Yes | Planned |
| | 3D variants and combined charts | Yes | — |
| **Images** | Embed PNG/JPEG | Yes | Planned |
| **Tables** | Structured tables with styles | Yes | Planned |
| **Pivot Tables** | Read/preserve existing | Yes | — |
| **VBA/Macros** | Preserve on load (.xlsm) | Yes | Planned |
| **Integration** | Pandas DataFrame I/O | Yes | Planned |
| | NumPy type support | Yes | Planned |
| **Performance** | Streaming read (constant memory) | Yes (read_only mode) | Yes (default) |
| | Streaming write (constant memory) | Yes (write_only mode) | Yes (default) |

> **Legend:** Yes = implemented, Planned = on the roadmap, — = not planned for now

### Our approach

We are not trying to clone openpyxl. We are building a **fast, safe, memory-efficient core** for the most common Excel workflows. The goal is to cover the ~80% of features that people use day-to-day, while being 2–3x faster and using orders of magnitude less memory. Streaming is the default, not an opt-in mode.

## Roadmap

### Done

- [x] XLSX reading with typed cell extraction
- [x] Streaming XLSX writing with low memory usage
- [x] Formula read/write support with cached values
- [x] Date/time cell support with automatic serial number conversion
- [x] Merged cell metadata (read and write)
- [x] Python bindings via PyO3
- [x] Type stubs (`.pyi`) and `py.typed` marker for IDE autocomplete
- [x] CI across Linux, macOS, Windows (Python 3.9–3.13)
- [x] Prebuilt wheels on PyPI
- [x] Benchmarks vs openpyxl
- [x] Runnable benchmark script (`python benchmarks/benchmark.py`)
- [x] Zero Python dependencies

### Phase 1 — Core usability (next)

- [ ] Basic cell styling (fonts, fills, borders, alignment)
- [ ] Number formats (currency, percentage, custom format strings)
- [ ] Column widths and row heights
- [ ] Freeze panes
- [ ] Auto-filter
- [ ] Pandas integration (`read_xlsx_df` / `to_xlsx`)

### Phase 2 — Broader compatibility

- [ ] Named ranges / defined names
- [ ] Data validation
- [ ] Comments and hyperlinks
- [ ] .xlsm read support (preserve macros)
- [ ] Sheet protection
- [ ] Structured tables with styles
- [ ] Multiple sheet states (hidden, veryHidden)

### Phase 3 — Rich content and ecosystem

- [ ] Charts (bar, line, pie, scatter — most common types)
- [ ] Image embedding (PNG, JPEG)
- [ ] Conditional formatting
- [ ] Document and custom properties
- [ ] NumPy type support
- [ ] Broader test corpus and fuzzing
- [ ] Security hardening (XML attack prevention)

## Project Status

**v0.1.0** — functional streaming reader and writer with formula, date/time, and merged cell support, 37 passing tests, and prebuilt wheels on PyPI. The API may change before 1.0.

## Contributing

Contributions are welcome! Here are some great ways to get involved:

- Report bugs or real-world spreadsheet edge cases
- Submit representative sample files for testing
- Suggest benchmark scenarios
- Improve documentation
- Open PRs for roadmap items

## License

[MIT](LICENSE)

---

<p align="center">
  <sub>Built with Rust and PyO3 &nbsp;|&nbsp; Open digital infrastructure for the Python ecosystem</sub>
</p>

