Metadata-Version: 2.4
Name: custom-transformer
Version: 0.1.0
Summary: Transformer implementations for language modeling and ASR, with numpy-based attention primitives.
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: numpy
Requires-Dist: torchaudio
Requires-Dist: tqdm
Requires-Dist: pyyaml
Requires-Dist: wandb
Requires-Dist: torchmetrics
Requires-Dist: torchinfo
Requires-Dist: tokenizers
Requires-Dist: pandas
Requires-Dist: matplotlib
Requires-Dist: seaborn
Provides-Extra: dev
Requires-Dist: Cython>=3.0; extra == "dev"
Requires-Dist: pytest; extra == "dev"

# custom-transformer

A PyTorch-based transformer library for language modeling and automatic speech recognition (ASR). Includes both a numpy-based multi-head attention implementation (`mytorch`) and a full PyTorch transformer toolkit (`transformerlib`) with training, decoding, and data loading utilities.

## Installation

```bash
pip install custom-transformer
```

## Features

### `mytorch` — NumPy Attention Primitives

Pure numpy implementations for educational and prototyping purposes:

- `Linear` — fully-connected layer with forward and backward passes
- `Softmax` — numerically stable softmax activation
- `ScaledDotProductAttention` — attention mechanism with optional masking
- `MultiHeadAttention` — multi-head attention with split/concat head logic

```python
from mytorch.nn import MultiHeadAttention
mha = MultiHeadAttention(d_model=512, num_heads=8)
```

### `transformerlib` — PyTorch Transformer Toolkit

Full-featured transformer models and training infrastructure:

**Models**
- `DecoderOnlyTransformer` — GPT-style causal language model
- `EncoderDecoderTransformer` — encoder-decoder for sequence-to-sequence tasks (e.g., ASR)
- Pre-LN architecture with sinusoidal positional encoding
- Weight tying, layer dropout, and mixed precision support
- `from_pretrained_decoder` for initializing encoder-decoder models from pretrained decoder weights

**Data**
- `LMDataset` — text dataset with tokenization, SOS/EOS framing, and collation
- `ASRDataset` — speech dataset with filterbank features, global MVN normalization, and SpecAugment
- `H4Tokenizer` — BPE tokenizer wrapper (char, 1k, 5k, 10k vocab sizes included)

**Training**
- `LMTrainer` — language model training with gradient accumulation, mixed precision, and WandB logging
- `ASRTrainer` — ASR training with CTC + cross-entropy joint loss
- `ProgressiveTrainer` — curriculum learning with gradual layer unfreezing and data subsetting

**Decoding**
- `SequenceGenerator` — greedy search, beam search, and nucleus sampling
- Language model shallow fusion for ASR recognition

## Quick Start

```python
from transformerlib.model import DecoderOnlyTransformer

model = DecoderOnlyTransformer(
    num_layers=6,
    d_model=512,
    num_heads=8,
    d_ff=2048,
    dropout=0.1,
    max_len=512,
    num_classes=10000
)
```

```python
from transformerlib.model import EncoderDecoderTransformer

model = EncoderDecoderTransformer(
    num_encoder_layers=6,
    num_decoder_layers=6,
    d_model=512,
    num_heads=8,
    d_ff=2048,
    dropout=0.1,
    max_len=512,
    num_classes=1000,
    feat_dim=80
)
```

## Architecture

```
mytorch/
  nn/
    linear.py
    activation.py
    scaled_dot_product_attention.py
    multi_head_attention.py

transformerlib/
  model/        — masks, positional encoding, sublayers, encoder/decoder layers, transformers
  data/         — tokenizer, LM dataset, ASR dataset
  trainers/     — base trainer, LM trainer, ASR trainer, progressive trainer
  decoding/     — sequence generator (greedy, beam, sampling)
  utils/        — optimizer and LR scheduler factories
```

## Requirements

- Python >= 3.9
- PyTorch >= 2.0
- torchaudio
- numpy, tqdm, wandb, torchmetrics, tokenizers, pandas, matplotlib

## License

MIT
