Metadata-Version: 2.2
Name: disco-data-logger
Version: 0.1.5
Summary: Data logger for Disco simulations
Author: Michiel Jansen
License: Apache-2.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: C++
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Requires-Dist: numpy>=1.23
Requires-Dist: python-graphblas>=2025.2.0
Requires-Dist: disco-tools>=0.1
Requires-Dist: pyarrow>=14
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-cov>=5; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: cibuildwheel>=2.20; extra == "dev"
Requires-Dist: pybind11-stubgen>=2.5; extra == "dev"
Description-Content-Type: text/markdown

# 🧾 disco-data-logger

**High-performance, C++/NumPy-backed data logger**  
for **Disco** discrete-event and Monte Carlo simulation programs.

[![PyPI](https://img.shields.io/pypi/v/disco-data-logger.svg)](https://pypi.org/project/disco-data-logger/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Build](https://github.com/michielmj/disco-data-logger/actions/workflows/build.yml/badge.svg)](https://github.com/michielmj/disco-data-logger/actions)
[![Tests](https://github.com/michielmj/disco-data-logger/actions/workflows/test.yml/badge.svg)](https://github.com/michielmj/disco-data-logger/actions)

---

## Overview

`disco-data-logger` provides a **fast, compressed, and lightweight data recording layer**  
for large-scale **Disco** simulations and other computational experiments.  

It is optimized for capturing **sparse numerical state updates** and **accumulators** during simulation runs,  
and writing them efficiently to disk as **Zstandard-compressed segment files**.  
Each simulation entity or measurement can log its data independently through labeled streams.

It combines:
- A **C++/pybind11 core** for high-throughput buffering and compression.
- **Python API** for easy stream registration and control.
- Built-in **Parquet export** for analysis and aggregation after runs.

---

## ✨ Features

- **Sparse vector logging** powered by [`graphblas.Vector`](https://pypi.org/project/python-graphblas/).
- **Fixed-point quantization** for compact and deterministic encoding.
- **Buffered, lock-free write path** (ring buffer + writer thread).
- **Zstandard compression** (vendored, no external dependencies).
- **Segment rotation** for large simulation outputs.
- **JSON metadata** for each stream (`organisation`, `model`, `experiment`, …).
- **Periodic vector streams** that emit state snapshots or accumulator sums once per period.
- **Integrated Parquet export** for post-run analytics.
- **Arrow-based collector** to filter finished loggers and emit RecordBatches directly.
- **MIT-licensed** and designed for in-cluster (on-disk/in-memory) use.

---

## 🚀 Installation

```bash
pip install disco-data-logger
```

`pyarrow` ships with the package, so Parquet export works out of the box.

---

## 📚 Documentation

- [Collector](docs/collector.md) – decode completed loggers, filter streams with
  `label_selector`, and write Arrow `RecordBatch` outputs efficiently.
- [Periodic vector stream logging](docs/periodic_vector_stream.md) – step-by-step guide for
  configuring `periodicity`, choosing between `state` and `accumulator` modes, and verifying
  the emitted sparse data.
- [ENGINEERING_SPEC.md](ENGINEERING_SPEC.md) – project history, motivation, and architectural
  overview.

---

## 🛠️ Development

Set up a virtual environment (for example, `python -m venv .venv && source .venv/bin/activate`),
then install the project in editable mode with its development extras:

```bash
pip install -e '.[dev]'
```

> [!NOTE]
> The `tools` namespace is provided by the separate [`disco-tools`](https://pypi.org/project/disco-tools/)
> dependency. After installing it for the first time you must deactivate and reactivate your
> environment (e.g., `deactivate` followed by `source .venv/bin/activate`) so that `pytest` can
> discover the package properly.

Once the environment is ready, run the unit tests with:

```bash
pytest
```

---

## License

This project is licensed under the Apache License, Version 2.0.

You are free to use, modify, and distribute this software, including
for commercial purposes, subject to the terms of the license.

### Contributions

By contributing to this project, you agree that your contributions
will be licensed under the Apache License 2.0 and that you have the
right to submit the work under those terms.

This project uses a Contributor License Agreement (CLA) to ensure
clarity of intellectual property and patent rights.