Metadata-Version: 2.4
Name: quarterbit
Version: 14.0.0
Summary: AXIOM - High-performance optimizer for deep learning with extreme memory efficiency
Home-page: https://quarterbit.dev
Author: Clouthier Simulation Labs
Author-email: Clouthier Simulation Labs <info@quarterbit.dev>
License: Commercial
Project-URL: Homepage, https://quarterbit.dev
Project-URL: Documentation, https://quarterbit.dev/docs
Keywords: optimizer,adam,deep-learning,pytorch,gpu,memory-efficient,compression,axiom
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# QuarterBit - AXIOM Optimizer

**Memory-efficient optimizer for LLM training**

Drop-in Adam replacement with 1333x memory compression. Train larger language models on the same hardware.

## Features

- **1333x Memory Compression** - Train GPT/LLaMA/Gemma on smaller GPUs
- **16% Better Convergence** - Outperforms AdamW on GPT-2 WikiText benchmark
- **Production Ready** - Gradient clipping, NaN detection, checkpointing
- **Two Tiers** - AXIOM (default) and AXIOM_2 (for 3B+ models on 8GB GPU)

## Requirements

- **Python 3.12+** (Windows or Linux)
- **PyTorch 2.0+** with CUDA
- **NVIDIA GPU** - Pascal or newer (GTX 10xx, RTX 20/30/40, T4, A100, H100)

## Installation

```bash
# PyTorch required (install first)
pip install torch --index-url https://download.pytorch.org/whl/cu121

# Install QuarterBit
pip install quarterbit
```

## Quick Start

```python
from quarterbit import AXIOM

# Create optimizer (drop-in Adam replacement)
optimizer = AXIOM(model.parameters(), lr=1e-4)

# Training loop
for batch in dataloader:
    optimizer.zero_grad()
    loss = model(batch).loss
    loss.backward()
    optimizer.step(loss=loss.item())  # Pass loss for adaptive learning
```

## Two Optimizer Tiers

### AXIOM (Default)
Standard mode with 1333x optimizer compression. Use this for most training.

```python
from quarterbit import AXIOM

opt = AXIOM(model.parameters(), lr=1e-4)
```

### AXIOM_2 (Large Models)
Compresses both optimizer AND gradients. Train 3B+ models on 8GB GPU.

```python
from quarterbit import AXIOM_2

opt = AXIOM_2(model.parameters(), lr=5e-3)
opt.register_hooks()  # IMPORTANT: Call before training loop

for batch in dataloader:
    loss = model(batch).loss
    loss.backward()  # Gradients compressed automatically
    opt.step(loss.item())
    opt.zero_grad()
```

**Memory comparison for 2.8B model:**
| Optimizer | Gradients | Opt State | Total |
|-----------|-----------|-----------|-------|
| Adam | 11.2 GB | 22 GB | 33 GB |
| AXIOM | 11.2 GB | 16 MB | 11.2 GB |
| AXIOM_2 | 16 MB | 16 MB | 32 MB |

## API Reference

### AXIOM / AXIOM_2

```python
AXIOM(
    params,                    # Model parameters
    lr=0.001,                  # Learning rate
    weight_decay=0.01,         # Decoupled weight decay
    max_grad_norm=None,        # Gradient clipping (None = disabled)
    detect_anomaly=True,       # Raise error on NaN/Inf gradients
)

# Methods
optimizer.step(loss)           # Pass loss.item() for adaptive learning
optimizer.zero_grad()          # Clear gradients
optimizer.get_lr()             # Get current learning rate
optimizer.set_lr(lr)           # Set learning rate
optimizer.state_dict()         # For checkpointing
optimizer.load_state_dict(d)   # Restore checkpoint
optimizer.memory_usage()       # Print memory comparison
```

### AXIOM_2 Additional Methods

```python
optimizer.register_hooks()     # Enable gradient compression (call once)
optimizer.remove_hooks()       # Disable gradient compression
```

## Checkpointing

```python
# Save
checkpoint = {
    'model': model.state_dict(),
    'optimizer': optimizer.state_dict(),
    'epoch': epoch,
}
torch.save(checkpoint, 'checkpoint.pt')

# Load
checkpoint = torch.load('checkpoint.pt')
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
```

## Extensions

### AXIOM_CHECKPOINT (Activation Compression)

Reduce activation memory during training (85% savings).

```python
from quarterbit import AXIOM_CHECKPOINT

actcp = AXIOM_CHECKPOINT(max_slots=32, max_n=1024*1024)

# In forward pass
actcp.store(hidden_states, slot=layer_idx)

# In backward pass
hidden_states = actcp.restore(slot=layer_idx)

# Memory stats
print(actcp.memory_stats())
```

### AXIOM_DDP (Gradient Compression for Distributed)

Reduce all-reduce bandwidth for distributed training (128x compression).

```python
from quarterbit import AXIOM_DDP

gc = AXIOM_DDP(n=total_params, top_k_percent=6.25)

# Compress before all-reduce
vals, idx, count = gc.compress(all_gradients)

# After all-reduce
full_grads = gc.decompress(vals, idx, count)

# Stats
print(gc.stats())
```

### DDP Helper Functions

```python
from quarterbit import compress_gradients_for_ddp, decompress_gradients_for_ddp

# Compress all model gradients
vals, idx, count, compressor = compress_gradients_for_ddp(model)

# ... distributed all-reduce on vals, idx ...

# Decompress back to model
decompress_gradients_for_ddp(model, vals, idx, count, compressor)
```

## Building from Source

### Requirements
- Python 3.12+
- CUDA Toolkit 12.x
- Cython
- Visual Studio Build Tools (Windows) or GCC (Linux)

### Windows Build

```batch
cd C:\QuarterBit\quarterbit

# 1. Build CUDA kernels
cd quarterbit && build.bat all && cd ..

# 2. Build IP-protected wheel
python build_ip_protected.py

# Output: dist/quarterbit-14.0.0-cp312-cp312-win_amd64.whl
```

### Linux Build (WSL)

```bash
cd /mnt/c/QuarterBit/quarterbit

# 1. Build CUDA kernels
cd quarterbit && bash build.sh all && cd ..

# 2. Build IP-protected wheel
python3 build_ip_protected.py

# 3. Fix platform tag for PyPI
python3 -m wheel tags --platform-tag manylinux_2_17_x86_64 dist/*.whl

# Output: dist/quarterbit-14.0.0-cp312-cp312-manylinux_2_17_x86_64.whl
```

### Upload to PyPI

```bash
# Windows
python -m twine upload dist/*.whl -u __token__ -p $PYPI_TOKEN

# Linux (WSL)
python3 -m twine upload dist/*.whl -u __token__ -p $PYPI_TOKEN
```

## Supported Models

AXIOM is optimized for **language models**:
- GPT-2, GPT-Neo, GPT-J
- LLaMA, LLaMA 2, LLaMA 3
- Gemma, Gemma 2
- Mistral, Mixtral
- Phi, Phi-2, Phi-3
- BERT, RoBERTa (fine-tuning)

## License

Commercial license required for production use.
Free for research and evaluation.

**https://quarterbit.dev**

---

Copyright 2026 Clouthier Simulation Labs. All rights reserved.
