Metadata-Version: 2.4
Name: quarterbit
Version: 25.0.1
Summary: Memory-efficient LLM training. AXIOM enables 70B on single H100, 7B on consumer GPUs.
Home-page: https://quarterbit.dev
Author: Clouthier Simulation Labs
Author-email: Clouthier Simulation Labs <info@quarterbit.dev>
Project-URL: Homepage, https://quarterbit.dev
Project-URL: Documentation, https://quarterbit.dev/docs
Keywords: optimizer,adam,deep-learning,pytorch,gpu,memory-efficient,compression,axiom
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# QuarterBit AXIOM

**Train large models on single GPUs. 15-17x memory compression.**

AXIOM enables memory-efficient training for any PyTorch model. Train 70B on a single H100, or 13B on a free Kaggle T4.

## Installation

```bash
pip install quarterbit
```

**Requirements:** Python 3.11+, PyTorch 2.0+, NVIDIA GPU

## Quick Start

```python
from quarterbit import axiom
from torch.utils.data import DataLoader

# Create your data loaders
train_loader = DataLoader(train_dataset, batch_size=4)
val_loader = DataLoader(val_dataset, batch_size=4)

# Train with AXIOM (70% better improvement, 2x less memory)
results = axiom(
    model='gpt2-large',
    train_loader=train_loader,
    val_loader=val_loader,
    steps=1000,
)

print(f"Best PPL: {results['best_ppl']:.2f}")
```

Output:
```
Loading gpt2-large with AXIOM (error_bits=4)...
AXIOM layers configured: 51
Effective LR: 0.0001 x 10.0 = 0.001

Training 0.77B model for 1000 steps...
Step   100/1000 | Loss: 3.2451 | 1200 tok/s | Peak: 3.6GB
Step   200/1000 | Loss: 2.8934 | 1180 tok/s | Peak: 3.6GB
  -> Val PPL: 24.5
...

TRAINING COMPLETE
Steps: 1000
Peak VRAM: 3.6 GB
Val PPL: 45.2 -> 18.3 (59.5%)
```

## Full Example

```python
from quarterbit import axiom
from datasets import load_dataset
from transformers import AutoTokenizer
from torch.utils.data import DataLoader, TensorDataset

# Load and tokenize dataset
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
texts = [t for t in dataset["train"]["text"] if len(t) > 50][:2000]

encodings = tokenizer(texts, truncation=True, max_length=128,
                      padding=True, return_tensors="pt")

train_loader = DataLoader(
    TensorDataset(encodings["input_ids"]),
    batch_size=4, shuffle=True
)
val_loader = DataLoader(
    TensorDataset(encodings["input_ids"][:100]),
    batch_size=4
)

# Train
results = axiom(
    model='gpt2-large',
    train_loader=train_loader,
    val_loader=val_loader,
    steps=2000,
    lr=1e-4,
    log_interval=50,
    eval_interval=100,
    early_stopping=True,
    early_stopping_patience=5,
)
```

## Memory Savings

| Model | Standard (FP16+AdamW) | AXIOM | Compression |
|-------|----------------------|-------|-------------|
| 7B | 84 GB | 5.5 GB | 15x |
| 13B | 156 GB | 9 GB | 17x |
| 70B | 840 GB | 53 GB | 16x |

## Configuration Options

```python
results = axiom(
    model='gpt2-large',           # HuggingFace model name or loaded model
    train_loader=train_loader,    # Training DataLoader
    val_loader=val_loader,        # Validation DataLoader (optional)
    steps=1000,                   # Number of training steps
    lr=1e-4,                      # Learning rate (scaled internally)
    error_bits=4,                 # Compression level (4 recommended)
    log_interval=50,              # Log every N steps
    eval_interval=100,            # Evaluate every N steps
    early_stopping=True,          # Enable early stopping
    early_stopping_patience=3,    # Stop after N evals without improvement
    checkpoint_interval=500,      # Save checkpoint every N steps (0=disabled)
    checkpoint_dir="checkpoints", # Directory for checkpoints
    verbose=True,                 # Print progress
    device="cuda",                # Device to use
    gradient_checkpointing=False, # Reduce memory for large models
)
```

## Supported Models

Works with **any HuggingFace model**:

- All HuggingFace Transformers (NLP, Vision, Audio, Multimodal)
- LLaMA, Mistral, Mixtral, Qwen, Yi, Phi, Gemma
- GPT-2, GPT-J, GPT-NeoX, OPT
- ViT, CLIP, Whisper, LLaVA
- Custom PyTorch models

## CLI

```bash
quarterbit login      # Login via browser
quarterbit status     # Show license and usage
quarterbit activate   # Activate with license key
```

## License

**Free account required** - Sign up at [quarterbit.dev](https://quarterbit.dev)

| Tier | Price | GPU Hours |
|------|-------|-----------|
| Free | $0 | 5/month |
| Academic | $0 | 10/month |
| Pro | $49/mo | Unlimited |
| Team | $299/mo | Unlimited |

## Links

- **Website**: [quarterbit.dev](https://quarterbit.dev)
- **Docs**: [quarterbit.dev/docs](https://quarterbit.dev/docs)

---

**Clouthier Simulation Labs** | Copyright 2026
