🎨 vLLM-Omni Configuration

Checking...
🤖

Model Selection

Download models from HuggingFace or ModelScope (China)
Generate images from text prompts (Text-to-Image)
Model ID: Tongyi-MAI/Z-Image-Turbo

Server Configuration

Subprocess requires local vLLM-Omni installation
Path to virtual environment with vLLM-Omni installed
Server port (default: 8091)
Required for gated models (Stable Audio, etc.). Get token from HuggingFace Settings
🎮

GPU Settings

CUDA_VISIBLE_DEVICES setting
For LLMs: splits model across GPUs. For diffusion: runs parallel workers (each needs full model memory)
Fraction of GPU memory to use (0.1-1.0)
Reduce GPU memory by offloading to CPU (slower but uses less VRAM)
Faster inference but slower startup (2-5 min compilation on first run)

Generation Parameters

x

Command Preview

Editable for customization. Copy to run manually. "Start Server" uses settings above. Click "Reset" to restore.
Offline
🎨 Image Generation
Random number for reproducibility. Same seed + prompt = same output.