Metadata-Version: 2.4
Name: deep-variance
Version: 1.0.0
Summary: GPU Virtual Memory Stitching SDK: CUDA VMM allocator with chunk caching and DLPack tensors for PyTorch
Author: Deep Variance
License-Expression: MIT
Project-URL: Homepage, https://deepvariance.com/
Project-URL: Documentation, https://docs.google.com/document/d/14Znj73Mz68CMi78BTi2IQNOFi2Moko1wh0xlv8IraaA/edit?usp=sharing
Project-URL: Repository, https://github.com/deepvariance
Keywords: cuda,gpu,virtual-memory,vmm,pytorch,dlpack,memory
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-timeout; extra == "dev"
Provides-Extra: obfuscate
Requires-Dist: cython>=3; extra == "obfuscate"
Dynamic: license-file

# deepvariance-vmm-sdk

**GPU Virtual Memory Management SDK**

CUDA virtual memory management (VMM) with physical chunk caching and DLPack-backed PyTorch tensors.

Use this package on machines where **you have a CUDA-capable GPU**. The package is distributed as a **source distribution (sdist)** so it compiles on your environment, matching your CUDA and PyTorch versions.

## Requirements

- **Python** 3.9+
- **PyTorch** (any version with CUDA support; install the build that matches your CUDA, e.g. `pip install torch`)
- **CUDA** toolkit and driver (e.g. via `module load cuda` on HPC clusters)
- **C++ compiler** (e.g. `g++`, `clang++`) — set `CXX` if needed

## Install

1. Install PyTorch for your CUDA version (see [pytorch.org](https://pytorch.org)):

   ```bash
   pip install torch
   ```

2. On HPC, load CUDA if it’s not in your path:

   ```bash
   module load cuda   # or your site’s module name
   ```

3. Install from PyPI (source build on your machine):

   ```bash
   pip install deepvariance-vmm-sdk
   ```

   Or install from a source tree:

   ```bash
   pip install -e .
   ```

The build uses your environment’s C++ compiler (`CXX` or `g++`/`clang++`) and the CUDA/PyTorch configuration already present, so it works with different CUDA and PyTorch versions.

## Optional: environment check

Before building or at runtime you can check that the environment is suitable:

```python
from dv_vmm import check_environment

report = check_environment()
for name, (ok, msg) in report.items():
    print(f"{name}: {'ok' if ok else 'MISSING'} — {msg}")
```

To try loading CUDA via environment modules (e.g. `module load cuda`) when it’s not visible:

```python
from dv_vmm import ensure_cuda_visible
ensure_cuda_visible(use_module=True)
```

## Documentation and how to run

**For detailed step-by-step run instructions**, see **pip_dv_vmm.pdf** in the project. A summary is also in [DOCUMENTATION.md](DOCUMENTATION.md).

Summary:

1. **Environment**  
   Ensure Python 3.8+, PyTorch (CUDA build), CUDA driver/toolkit, and a C++ compiler (`g++` or `clang++`). On HPC, run `module load cuda` (or your site’s CUDA module) before installing or running.

2. **Install**  
   Install PyTorch for your CUDA version, then install deepvariance-vmm-sdk (see [Install](#install)). For a verbose, non-isolated build (e.g. on a cluster):

   ```bash
   python -m pip install -v --no-build-isolation --no-cache-dir deepvariance-vmm-sdk
   ```

3. **Check environment**  
   Run the env check script or use the API:

   ```bash
   deepvariance-vmm-sdk-check
   deepvariance-vmm-sdk-check --module-load   # try loading CUDA module if not visible
   ```

4. **Use the API**  
   Allocate GPU tensors via virtual memory stitching with `vmm_empty` / `vmm_empty_nd`; optionally tune cache with `set_cache_limit` and `cache_stats` (see [Usage](#usage)).

## Usage

```python
import torch
from dv_vmm import vmm_empty, vmm_empty_nd, set_cache_limit, cache_stats

# Allocate a 1D tensor (1M float32 elements) on the default CUDA device
t = vmm_empty(1_000_000, dtype=torch.float32, device="cuda:0")

# Allocate with a shape
t = vmm_empty_nd((100, 1000), dtype=torch.float32)

# Optional: set cache limit (e.g. 2 GB per pool)
set_cache_limit(device_id=0, chunk_bytes=0, max_bytes=2 * 1024**3)

# Inspect cache stats
stats = cache_stats()
```

### Analytics (opt-in)

Usage telemetry is **disabled by default**. Enable it to collect allocation counts,
latencies, and error rates locally (SQLite) or to a remote endpoint:

```python
from dv_vmm import enable_analytics, disable_analytics, is_analytics_enabled, analytics_summary

enable_analytics()           # start background telemetry worker
print(is_analytics_enabled()) # True

# ... use vmm_empty / vmm_empty_nd as normal ...

print(analytics_summary())   # returns a dict with counts/latency stats
disable_analytics()          # flush and stop worker
```

Or via environment variables before import:

```bash
export DEEP_VARIANCE_ANALYTICS=1      # auto-enable on import
export DEEP_VARIANCE_NO_TELEMETRY=1   # hard opt-out (overrides the above)
```

## Build customization

- **C++ compiler**: set `CXX` (e.g. `export CXX=clang++`).
- **C++ flags**: set `DEEP_VARIANCE_CXXFLAGS` (space-separated).
- **NVCC flags**: set `DEEP_VARIANCE_NVCCFLAGS` (space-separated).
- **Cython obfuscation**: set `DEEP_VARIANCE_OBFUSCATE=1` to compile Python sources
  to C extensions via Cython (requires `pip install deepvariance-vmm-sdk[obfuscate]`):
  ```bash
  pip install deepvariance-vmm-sdk[obfuscate]
  DEEP_VARIANCE_OBFUSCATE=1 python -m build --wheel
  # or via Makefile:
  make build-wheel
  ```

## Development

Install dev dependencies and run the test suite:

```bash
pip install -e ".[dev]"
pytest                          # 29 non-GPU unit tests
pytest --run-cuda-live          # + 16 CUDA-live tests (requires A100/CUDA GPU)
make benchmark                  # CIFAR-10 baseline vs VMM throughput benchmark
```

## Publishing to PyPI (maintainers)

Publishing is **wheel-only** via GitHub Actions with OIDC trusted publishing
(no API tokens stored). The workflow lives in [.github/workflows/publish.yml](.github/workflows/publish.yml)
and triggers on a pushed version tag.

```bash
# Build a wheel locally (uses DEEP_VARIANCE_OBFUSCATE=1 if cython is installed)
make build-wheel

# Publish to PyPI (triggers via CI on tag push; local fallback below)
make publish          # runs twine upload dist/*.whl
```

To release:

1. Bump `version` in [pyproject.toml](pyproject.toml) and [dv_vmm/**init**.py](dv_vmm/__init__.py).
2. `git tag v<version> && git push origin v<version>` — CI builds and publishes `deepvariance-vmm-sdk` automatically.

## License

MIT
