NEON
Teaching robots to see time. Open-source Vision‑Language‑Action model for humanoid whole‑body control.
Teaching robots to see time. Open-source Vision‑Language‑Action model for humanoid whole‑body control.
A frozen 7B video model provides spatiotemporal understanding. A tiny 6M action decoder translates that into 29 joint positions across 16 timesteps. Only 0.08% of parameters train.
14 modalities: cameras, audio, LiDAR, tactile, IMU, depth, force
Qwen2.5‑Omni 7B frozen. Sees temporal context via video, not single frames
Vision + proprioception + audio features merged into unified representation
~6M params. MLP, Flow Matching, DiT, or gated Ensemble. Parameter Golf v2
29 DoF × 16 steps @ 20Hz = 800ms of planned whole‑body motion
| Model | Backbone | Temporal? | DoF | Decoder |
|---|---|---|---|---|
| Neon | Qwen2.5-Omni 7B | ✅ Video | 29 | ~6M |
| RT-2 | PaLI-X 55B | ❌ Single frame | 7 | 55B (all) |
| Octo | Custom 27M | ❌ Single frame | 7 | 27M |
| π₀ | PaLI 3B | ❌ Single frame | 7 | 3B (all) |
| OpenVLA | Llama 2 7B | ❌ Single frame | 7 | 7B (all) |
| GR-2 | Custom 3B | ⚠️ Limited | 32 | 3B (all) |
14 typed channels flow into the model simultaneously — the most of any open VLA.
image.camera_headimage.camera_handstateeef_statelidaraudiodepthtactileimuforcegpssegmentationMulti-source dataset mixing with automatic embodiment mapping. Every dataset becomes one unified stream.
Tabletop manipulation with Franka arms. The foundation.
Diverse manipulation across environments and objects.
Bimanual manipulation at scale. Two hands, infinite tasks.
Natural language robot commands with paired audio samples.
NVIDIA's synthetic humanoid demonstrations for sim2real.
Cosmos Predict2.5 visual augmentation pipeline.
Pick a preset. Launch. Parameter Golf v2 is baked in — ReLU² activation, RMSNorm, soft-capped gradients.
RTX 3090 · Qwen 3B · arms only
L4 · Qwen 3B · arms only
A100 · Qwen 7B · arms only
A100 80G · 7B · full 29 DoF
L40S · Cosmos 8B · physics
A100 80G · gated MLP+Flow+DiT
A100 80G · ALL 14 modalities
A100 80G · DiT 8-layer head
load_in_4bit=True — full precision OOMs even on A100push_to_hub=True always — HF Jobs storage is ephemeralpaged_adamw_8bit saves ~30% VRAM vs standard AdamWMuJoCo, NVIDIA Newton, and Isaac Sim backends. From single-env debugging to 4096+ GPU-parallel training.
Full G1 simulation with raycast LiDAR, stereo cameras, EEF tracking, and teleop support.
NVIDIA Newton warp solver. Run thousands of environments on a single GPU for RL and data collection.
NVIDIA Isaac Sim/Lab integration with MJCF→USD conversion and domain randomization.
Record episodes in sim or real-world with the same API. LeRobot v3 format, all 14 modalities.
World generation → Physics simulation → Visual augmentation → Export. GPU-parallel across 4096+ environments simultaneously.
Procedural kitchens, offices, warehouses with randomized furniture, lighting, and materials via Marble scene composer.
NVIDIA Newton warp solver runs thousands of physics simulations simultaneously on a single GPU. Differentiable sim for gradient-based policy optimization.
Cosmos Transfer 2.5 transforms sim renders into photorealistic frames. Bridge the domain gap without real-world data collection.
NVIDIA Kimodo generates G1-compatible whole-body motions from text. "Walk to the table and pick up the cup" → 29-DoF trajectory.
Automatic task descriptions + scripted arm trajectories + language instructions. Combinatorial explosion of training scenarios.
Everything exports to NeonLeRobotWriter — video, joints, audio, LiDAR, depth, all 14 modalities in one unified dataset.
From architecture to autonomy — the path to a robot that understands you.
Action heads, data soup, video backbone, 168 tests, PyPI package
14 modalities, 4 head types (MLP/Flow/DiT/Ensemble), StateRelativeHead, NeonBench, 200+ tests
G1 hardware integration, real-world evaluation, safety systems
Multi-camera support, monocular depth estimation, stereo fusion
Reinforcement learning fine-tuning, sim2real via Cosmos DreamGen
Autonomous G1 manipulation from voice commands. The dream.
Neon runs across the full compute spectrum — from cloud GPUs for training to edge devices on the robot.
NeonBench evaluates generalization — not memorization. Every test uses environments and tasks the model has never seen during training.
— Neon Soul Manifesto
Neon is open-source and community-driven. Star the repo, report issues, contribute code, or just say hi.
6-page technical report with full mathematical derivations, proofs, pseudocode, and ablation studies.