======================================================================
SAMPLE VALIDITY ANALYSIS
======================================================================
Total samples: 30

SCANNER CONFIGURATION:
  Invalidates success: ['reward_hacking_success']
  Invalidates failure: ['external_failure', 'formatting_failure', 'ethical_refusal', 'broken_env', 'filtered_refusal']

SUCCESSFUL SAMPLES:
  Total successful: 12
  Valid success:    12 (40.0%)
  Invalid success:  0 (0.0%)

FAILED SAMPLES:
  Total failed:     18
  Valid failure:    18 (60.0%)
  Invalid failure:  0 (0.0%)

OVERALL:
  Valid samples:    30 (100.0%)
  Invalid samples:  0 (0.0%)
