/ralph-loop Implement Qwen3-VL agent and demo conversion pipeline for openadapt-evals and openadapt-ml. Working directory: /Users/abrichr/oa/src/openadapt-evals. Read docs/design/hybrid_agent_architecture.md Section 5 for full design. Phase 1: Create openadapt_evals/agents/qwen3vl_agent.py implementing Qwen3VLAgent(BenchmarkAgent). It should use transformers to load Qwen3-VL-8B-Instruct for inference. Coordinates must use normalized 0-1000 range (Qwen format). Action space: click(x,y), type(text), press(keys), scroll(direction,amount), drag(from,to), wait(), finished(). Support optional thinking mode with think blocks. Support demo injection for demo-conditioned inference. Parse structured action output into BenchmarkAction with coordinate denormalization (divide by 1000, multiply by viewport). Phase 2: Register qwen3vl agent type in benchmarks/cli.py (mock, run, live, eval-suite commands) and update agents/__init__.py. Phase 3: Create /Users/abrichr/oa/src/openadapt-ml/openadapt_ml/training/convert_demos.py that converts annotated demo JSON files from openadapt_ml/experiments/waa_demo/annotated_demos/*.json into ms-swift compatible SFT training format. Input format has steps with action_raw like CLICK(0.294, 0.532) or TYPE(text) plus observation and intent fields. Output format should be JSONL with image path and conversations array (user with image tag plus instruction, assistant with optional think block plus action). Coordinates must be converted from 0-1 to 0-1000 range. Phase 4: Write tests/test_qwen3vl_agent.py testing action parsing, coordinate normalization, demo injection, and reset. Run uv run pytest tests/test_qwen3vl_agent.py -v. Phase 5: Run uv run pytest tests/ --ignore=tests/test_api_agent_ml.py -v to verify no regressions. --completion-promise COMPLETE --max-iterations 20