$ uv run gz test

[returncode] 0

[stdout]
Running tests...
  Eval delta: FAIL (3 regressions detected)

    instruction_eval / completeness: baseline=4.0, current=3.33 (delta=-0.67,
threshold=-0.5)
    instruction_eval / control_balance: baseline=4.0, current=3.33 (delta=-0.67,
threshold=-0.5)
    instruction_eval / surface_coverage: baseline=4.0, current=3.33
(delta=-0.67, threshold=-0.5)
  ↳ Eval delta: skipped (no baselines) — 5 surfaces scored, overall 2.7/4.0
  ↳ Eval delta: skipped (no eval datasets)
{
  "passed": true,
  "commands_discovered": 60,
  "commands_checked": 60,
  "commands_with_gaps": 0,
  "gaps": [],
  "undeclared_commands": [],
  "orphaned_docs": []
}
Documentation Coverage Gap Report
========================================

PASSED: 60 commands discovered, 60 checked, all required surfaces present.
{
  "unlinked_specs": [],
  "orphan_tests": [],
  "unjustified_code_changes": [],
  "summary": {
    "unlinked_spec_count": 0,
    "orphan_test_count": 0,
    "unjustified_code_change_count": 0,
    "total_drift_count": 0
  },
  "scan_timestamp": "2026-03-28T16:26:17.366111+00:00"
}
{
  "unlinked_specs": [
    "REQ-0.1.0-01-01"
  ],
  "orphan_tests": [],
  "unjustified_code_changes": [],
  "summary": {
    "unlinked_spec_count": 1,
    "orphan_test_count": 0,
    "unjustified_code_change_count": 0,
    "total_drift_count": 1
  },
  "scan_timestamp": "2026-03-28T16:26:17.366629+00:00"
}

................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
.............................................
----------------------------------------------------------------------
Ran 1885 tests in 17.493s

OK

Tests passed.


[stderr]
