Metadata-Version: 2.4
Name: viettelcloud-aiplatform
Version: 0.4.0
Summary: Viettel Cloud AI Platform SDK for ML training, optimization, and model registry.
Project-URL: Homepage, https://github.com/viettelcloud/aiplatform-sdk
Project-URL: Documentation, https://docs.viettelcloud.vn/aiplatform
Project-URL: Source, https://github.com/viettelcloud/aiplatform-sdk
Author: Viettel AI Platform Team
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Keywords: ai,aiplatform,kubeflow,llm,model training,trainer,viettelcloud
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: aiplatform-sdk>=1.0.5
Requires-Dist: kubeflow-katib-api>=0.19.0
Requires-Dist: kubeflow-trainer-api>=2.0.0
Requires-Dist: kubernetes>=27.2.0
Requires-Dist: pydantic>=2.10.0
Provides-Extra: docker
Requires-Dist: docker>=6.1.3; extra == 'docker'
Provides-Extra: hub
Requires-Dist: model-registry>=0.3.0; extra == 'hub'
Provides-Extra: podman
Requires-Dist: podman>=5.6.0; extra == 'podman'
Description-Content-Type: text/markdown

# Viettel Cloud AI Platform SDK

[![SDK Version](https://img.shields.io/badge/SDK-v0.3.0-blue)](https://github.com/viettelcloud/aiplatform-sdk/releases/tag/0.3.0)

> **Attribution:** This project is based on [Kubeflow SDK](https://github.com/kubeflow/sdk),
> Copyright 2024-2025 The Kubeflow Authors, licensed under Apache License 2.0.
> See [NOTICE](./NOTICE) for full attribution.

## Overview

The Viettel Cloud AI Platform SDK is a set of unified Pythonic APIs that let you run any AI workload at any scale –
without the need to learn Kubernetes. It provides simple and consistent APIs across the AI Platform
ecosystem, enabling users to focus on building AI applications rather than managing complex
infrastructure.

### SDK Benefits

- **Unified Experience**: Single SDK to interact with multiple AI platform components through consistent Python APIs
- **Simplified AI Workloads**: Abstract away Kubernetes complexity and work effortlessly using familiar Python APIs
- **Built for Scale**: Seamlessly scale any AI workload — from local laptop to large-scale production
  cluster with thousands of GPUs using the same APIs.
- **Rapid Iteration**: Reduced friction between development and production environments
- **Local Development**: First-class support for local development without a Kubernetes cluster
  requiring only `pip` installation

## Get Started

### Install SDK

```bash
pip install -U viettelcloud-aiplatform
```

### Run your first PyTorch distributed job

```python
from viettelcloud.aiplatform.trainer import TrainerClient, CustomTrainer, TrainJobTemplate

def get_torch_dist(learning_rate: str, num_epochs: str):
    import os
    import torch
    import torch.distributed as dist

    dist.init_process_group(backend="gloo")
    print("PyTorch Distributed Environment")
    print(f"WORLD_SIZE: {dist.get_world_size()}")
    print(f"RANK: {dist.get_rank()}")
    print(f"LOCAL_RANK: {os.environ['LOCAL_RANK']}")

    lr = float(learning_rate)
    epochs = int(num_epochs)
    loss = 1.0 - (lr * 2) - (epochs * 0.01)

    if dist.get_rank() == 0:
        print(f"loss={loss}")

# Create the TrainJob template
template = TrainJobTemplate(
    runtime="torch-distributed",
    trainer=CustomTrainer(
        func=get_torch_dist,
        func_args={"learning_rate": "0.01", "num_epochs": "5"},
        num_nodes=3,
        resources_per_node={"cpu": 2},
    ),
)

# Create the TrainJob
job_id = TrainerClient().train(**template)

# Wait for TrainJob to complete
TrainerClient().wait_for_job_status(job_id)

# Print TrainJob logs
print("\n".join(TrainerClient().get_job_logs(name=job_id)))
```

### Optimize hyperparameters for your training

```python
from viettelcloud.aiplatform.optimizer import OptimizerClient, Search, TrialConfig

# Create OptimizationJob with the same template
optimization_id = OptimizerClient().optimize(
    trial_template=template,
    trial_config=TrialConfig(num_trials=10, parallel_trials=2),
    search_space={
        "learning_rate": Search.loguniform(0.001, 0.1),
        "num_epochs": Search.choice([5, 10, 15]),
    },
)

print(f"OptimizationJob created: {optimization_id}")
```

### Manage models with Model Registry

**Install Model Registry support:**
```bash
pip install 'viettelcloud-aiplatform[hub]'
```

```python
from viettelcloud.aiplatform.hub import ModelRegistryClient

client = ModelRegistryClient("https://model-registry.example.com", author="Your Name")

# Register a model
model = client.register_model(
    name="my-model",
    uri="s3://bucket/path/to/model",
    version="v1.0.0",
    model_format_name="pytorch",
    model_format_version="2.0",
    version_description="My trained model"
)

# Get a registered model
model = client.get_model("my-model")

# List all models
for model in client.list_models():
    print(f"Model: {model.name}")

# List model versions
for version in client.list_model_versions("my-model"):
    print(f"Version: {version.name}")
```

## Architecture

```
viettelcloud/
└── aiplatform/
    ├── trainer/     → TrainerClient (distributed training with PyTorch, JAX, DeepSpeed)
    ├── optimizer/   → OptimizerClient (hyperparameter tuning via Katib)
    ├── hub/         → ModelRegistryClient (model artifact management)
    └── common/      → Shared types, constants, utilities
```

## Local Development

The Trainer client supports local development without needing a Kubernetes cluster.

### Backend Comparison

| Backend | Use Case | Requirements | Install |
|---------|----------|--------------|---------|
| **Kubernetes** (default) | Production training | K8s cluster + CRDs | `pip install viettelcloud-aiplatform` |
| **Container** | Isolated local dev | Docker or Podman | `pip install viettelcloud-aiplatform[docker]` |
| **LocalProcess** | Quick prototyping | Python only | `pip install viettelcloud-aiplatform` |

### Quick Start with Container Backend

```python
from viettelcloud.aiplatform.trainer import TrainerClient, ContainerBackendConfig, CustomTrainer

# Switch to local container execution
client = TrainerClient(backend_config=ContainerBackendConfig())

# Your training runs locally in isolated containers
job_id = client.train(trainer=CustomTrainer(func=train_fn))
```

## Supported Components

| Component             | Status           | Version Support | Description                                      |
| --------------------- | ---------------- | --------------- | ------------------------------------------------ |
| **Trainer**           | Available        | v2.0.0+         | Train and fine-tune AI models with various frameworks |
| **Optimizer (Katib)** | Available        | v0.19.0+        | Hyperparameter optimization                      |
| **Model Registry**    | Available        | v0.3.0+         | Manage model artifacts, versions and ML artifacts metadata |

## Documentation

### SDK Documentation

- **[Project Overview](./docs/project-overview-pdr.md)**: Vision, scope, and key decisions
- **[System Architecture](./docs/system-architecture.md)**: Backend design and component interactions
- **[Codebase Summary](./docs/codebase-summary.md)**: Module structure and key classes
- **[Code Standards](./docs/code-standards.md)**: Development guidelines
- **[Project Roadmap](./docs/project-roadmap.md)**: Future plans and version compatibility

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

This project includes code from the Kubeflow SDK project. See [NOTICE](NOTICE) for attribution.
