Metadata-Version: 2.4
Name: distensor
Version: 0.2.0
Summary: Decentralized distributed training with explicit 4D parallelism control
Project-URL: Homepage, https://distensor.com
Project-URL: Documentation, https://distensor.com/docs
Project-URL: Repository, https://github.com/don-arash/distensor
Project-URL: Issues, https://github.com/don-arash/distensor/issues
Author: disTensor Team
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: deep-learning,distributed,machine-learning,parallelism,pytorch,training
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: grpcio>=1.60
Requires-Dist: numpy>=1.24
Requires-Dist: protobuf>=4.25
Requires-Dist: safetensors>=0.4
Requires-Dist: torch>=2.0
Provides-Extra: cloud
Requires-Dist: alembic>=1.13; extra == 'cloud'
Requires-Dist: asyncpg>=0.29; extra == 'cloud'
Requires-Dist: authlib>=1.3; extra == 'cloud'
Requires-Dist: fastapi>=0.109; extra == 'cloud'
Requires-Dist: httpx>=0.27; extra == 'cloud'
Requires-Dist: opentelemetry-api>=1.22; extra == 'cloud'
Requires-Dist: opentelemetry-exporter-otlp>=1.22; extra == 'cloud'
Requires-Dist: opentelemetry-instrumentation-fastapi>=0.43b0; extra == 'cloud'
Requires-Dist: opentelemetry-instrumentation-grpc>=0.43b0; extra == 'cloud'
Requires-Dist: opentelemetry-sdk>=1.22; extra == 'cloud'
Requires-Dist: prometheus-client>=0.19; extra == 'cloud'
Requires-Dist: psycopg2-binary>=2.9; extra == 'cloud'
Requires-Dist: pydantic>=2.0; extra == 'cloud'
Requires-Dist: python-multipart>=0.0.6; extra == 'cloud'
Requires-Dist: redis>=5.0; extra == 'cloud'
Requires-Dist: sqlalchemy>=2.0; extra == 'cloud'
Requires-Dist: uvicorn>=0.27; extra == 'cloud'
Requires-Dist: websockets>=12.0; extra == 'cloud'
Provides-Extra: dev
Requires-Dist: absl-py>=2.0; extra == 'dev'
Requires-Dist: commitizen>=4.1; extra == 'dev'
Requires-Dist: docker>=7.0; extra == 'dev'
Requires-Dist: grpcio-tools>=1.60; extra == 'dev'
Requires-Dist: mypy-protobuf>=3.0; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: myst-parser>=2.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0; extra == 'dev'
Requires-Dist: pydata-sphinx-theme>=0.15; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.8; extra == 'dev'
Requires-Dist: sphinx-design>=0.5; extra == 'dev'
Requires-Dist: sphinxcontrib-mermaid>=0.9; extra == 'dev'
Provides-Extra: trainer
Provides-Extra: worker
Requires-Dist: psutil>=5.9; extra == 'worker'
Provides-Extra: worker-cuda
Requires-Dist: psutil>=5.9; extra == 'worker-cuda'
Provides-Extra: worker-mlx
Requires-Dist: mlx>=0.16; (platform_machine == 'arm64' and sys_platform == 'darwin') and extra == 'worker-mlx'
Requires-Dist: psutil>=5.9; extra == 'worker-mlx'
Description-Content-Type: text/markdown

<p align="left">
  <img src="docs/_static/logo.svg" alt="disTensor" width="560">
</p>

[![PyPI](https://img.shields.io/pypi/v/distensor.svg)](https://pypi.org/project/distensor/)
[![Python](https://img.shields.io/pypi/pyversions/distensor.svg)](https://pypi.org/project/distensor/)
[![CI](https://github.com/don-arash/distensor/actions/workflows/ci.yml/badge.svg)](https://github.com/don-arash/distensor/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)
[![Docs](https://img.shields.io/badge/docs-distensor.com-blue)](https://distensor.com/docs)

**Decentralized distributed training with explicit 4D parallelism control.**

disTensor trains large models across heterogeneous consumer compute. You declare the parallelism strategy — data, tensor, pipeline, and (soon) context — and disTensor handles orchestration, synchronization, and fault tolerance while keeping familiar PyTorch semantics.

> **Status:** alpha. APIs and wire formats may change between minor releases.

## Why disTensor

- **Explicit 4D parallelism.** No auto-magic sharding — you define `StageMapping` and `ShardingSpec`, the system executes them predictably.
- **Driver-mediated execution.** Your process owns the training loop and data pipeline. The hub coordinates topology but stays off the critical path.
- **Control/data plane split.** gRPC for orchestration (`:50051`), direct P2P for tensor transfers. Workers sync without round-tripping through the hub.
- **Heterogeneous compute.** CUDA, Apple Silicon (MPS/MLX), and CPU backends with auto-detected capabilities.
- **Cloud or standalone.** Run a local hub for experiments, or point at the managed cloud for persistence, auth, billing, and a web console.

## Installation

```bash
# Trainer SDK (PyPI)
pip install distensor

# Compute provider — macOS Apple Silicon (one-liner installer)
DISTENSOR_API_KEY=dt_p_... curl -fsSL https://api.distensor.com/install/macos.sh | bash

# Provider node app, install manually
pip install distensor-server distensor-tui

# Development (editable, from source)
git clone https://github.com/don-arash/distensor
cd distensor && make dev
```

Role-specific extras: `distensor[worker]`, `distensor[worker-cuda]`, `distensor[worker-mlx]`, `distensor[cloud]`, `distensor[dev]`.

## Quick Start

```python
import distensor as ds
from distensor.core import ParallelConfig
from distensor.nn import DistributedModel, StageMapping, ShardingSpec, Shard, Replicate

ds.init(mode="cloud", api_key="dt_u_...")

# Inspect available compute before committing
mesh = ds.query_mesh()
print(f"Workers online: {mesh.available_workers}")

# Declare parallelism explicitly
parallel = ParallelConfig(data_parallel=4, tensor_parallel=2, pipeline_parallel=2)

stages = StageMapping([
    ["model.embed"],
    ["model.layers.[0-5]"],
    ["model.layers.[6-11]"],
    ["model.norm", "lm_head"],
])

tp_spec = ShardingSpec({
    "model.layers.*.self_attn.q_proj.weight": Shard(dim=0),
    "model.layers.*.self_attn.o_proj.weight": Shard(dim=1),
    "model.layers.*.input_layernorm.weight": Replicate(),
})

model = DistributedModel(my_module, stage_mapping=stages, tp_spec=tp_spec)
trainer = ds.Trainer(model, parallel=parallel)
trainer.fit(data_loader, epochs=1)
```

See the [docs](https://distensor.com/docs) for full training examples, mesh planning, and the cloud API reference.

## Architecture

disTensor uses a hub-and-spoke topology. The hub coordinates, workers compute and sync directly, and the cloud layer adds persistence and a web surface on top.

| Component | Package | Description |
|-----------|---------|-------------|
| **Hub** | `distensor.hub` | Control plane — node / job / topology registries, gRPC services (mesh, job, training, checkpoint, sync) |
| **Cloud API** | `distensor.cloud` | Production HTTP layer — FastAPI + Postgres + Redis + Prometheus. Auth, billing, worker registry, telemetry |
| **Node** | `distensor.node` | Compute worker daemon — model shard execution, P2P data plane, CUDA / MPS / CPU backends |
| **Trainer** | `distensor.training` | Orchestration — `Trainer.fit()`, pipeline scheduling (GPipe, 1F1B), shard loading, execution plans |
| **Sync** | `distensor.sync` | 4D parallel coordination — TP AllReduce, PP transfers, DP gradient sync, worker topology |
| **Optimizer** | `distensor.optim` | Distributed optimizer — ZeRO-3, gradient sync with bucketing, LR schedulers, distributed state dict |
| **Console** | `ui/` | React web UI for trainers, providers, and admins |
| **Node App** | `distributions/` | Provider surface — `distensor-server` supervisor + `distensor` TUI dashboard |

See [Architecture.md](Architecture.md) for diagrams, execution flows, and design principles.

## Parallelism

| Mode | | Strategy |
|------|:---:|----------|
| Data Parallel | DP | Replicate model across workers, split data batches |
| Tensor Parallel | TP | Shard individual layers across workers |
| Pipeline Parallel | PP | Split model into sequential stages across workers |
| Context Parallel | CP | Split sequence dimension across workers *(coming soon)* |

## Development

```bash
make dev                  # Full bootstrap (uv sync + proto generation)
make test                 # Run full test suite
make lint                 # Ruff linting
make format               # Auto-format

make cloud-up             # Start Postgres, Redis, MinIO, Prometheus
make cloud-api            # Run HTTP API server (:8000)
make cloud-hub            # Run gRPC hub server (:50051)

make console-dev          # Vite dev server (:5173, proxies /api to :8000)

make sandbox-up           # Docker sandbox (coordinator + 8 workers)
make sandbox-cloud-up     # Full cloud stack in Docker

make help                 # All available targets
```

Contributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/); releases are cut by commitizen and published to PyPI via GitHub Actions Trusted Publishing on `v*` tags.

## Community & Support

- **Documentation:** [distensor.com/docs](https://distensor.com/docs)
- **Issues & bugs:** [GitHub Issues](https://github.com/don-arash/distensor/issues)
- **Discussions:** [GitHub Discussions](https://github.com/don-arash/distensor/discussions)
- **Changelog:** [CHANGELOG.md](CHANGELOG.md)

## License

Apache 2.0 — see [LICENSE](LICENSE).
