Metadata-Version: 2.4
Name: flowrider
Version: 0.1.4
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Distributed Computing
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: Environment :: GPU
Requires-Dist: numpy
License-File: LICENSE
Summary: High-performance PyTorch-compatible streaming dataset with distributed caching for on-the-fly remote dataset fetching
Keywords: pytorch,dataset,streaming,distributed,machine-learning,deep-learning,cache,mds
Home-Page: https://github.com/fpgaminer/flowrider
Author-email: fpgaminer <fpgaminer@bitcoin-mining.com>
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Project-URL: Bug Tracker, https://github.com/fpgaminer/flowrider/issues
Project-URL: Documentation, https://github.com/fpgaminer/flowrider#readme
Project-URL: Homepage, https://github.com/fpgaminer/flowrider
Project-URL: Source, https://github.com/fpgaminer/flowrider

# Flowrider

## WARNING: FOR PERSONAL USE ONLY, NOT PRODUCTION READY

## Overview
Inspired by MosaicML's `streaming` library (https://github.com/mosaicml/streaming), this library provides a PyTorch IterableDataset implementation that streams data from cloud storage.  It is distributed training compatible, and can cache data to disk.



## Testing

`cargo test --no-default-features --features auto-initialize`


## NOTE

- Logging has to use envlogger, even though there are ways to send logs to the Python logger.  This is because when sending logs to Python's logger, the GIL is required.  Since we have a background thread doing work (and potentially logging), that can create a minefield of either deadlocks or not allowing background threads to work.


## Local install

```bash
maturin develop -r
```

