scoreboard — mission success gates (the judgment layer)

Purpose

Defines the scored mission objectives: for each one, how it is judged — reduction, direction,
pass/fail gate, weight, human meaning, and the after: startup-skip window. The catalog
(metric_catalog’s metrics.yaml) says what is measured; this file says how it is judged.
These two files own separate concerns. Three entries today: tracking error, coverage, throttle fraction.

Structure

A single top-level variables: list. Each item is one scored objective with:

name — the logged variable key (matched by leaf name against the raw log).
weight — relative importance in the overall verdict (p_e 3 > coverage 2 > throttle 1).
success: — the judgment block: reduction, direction, gate, after, label, meaning.
(optional) threshold / reductions: — for diagnostic variables that need a band/floor reduction (e.g. s_min_J’s frac_below at 0.025).

Key knobs/sections

key path	what it controls	default/example	note
`variables[].name`	the logged signal this gate scores	`p_e`, `area_coverage_frac`, `s_min_J`	matched by leaf name (last dotted segment) — see completeness bar below
`variables[].weight`	relative weight in the verdict	`3` / `2` / `1`	p_e is the tightest, highest-weight gate
`success.reduction`	how the per-step signal collapses to one number	`p99` / `final` / `frac_below`	reductions live in data_analyzer (`Reductions` enum)
`success.direction`	which way is good	`lower` / `higher`	`lower` for errors/throttle, `higher` for coverage
`success.gate`	the pass/fail threshold	`0.2` m / `0.99` / `null`	`null` = diagnostic only, never fails the run
`success.after`	startup-skip window (seconds)	`90.0`	excludes the startup transient; steps inside it are recorded but NOT scored
`threshold` + `reductions`	band/floor for diagnostics	`0.025`, `[min, frac_below]`	`s_min_J` floor mirrors `controller.arm.null_space.freeze_floor`
`success.label` / `success.meaning`	human-facing strings for the console verdict	”camera tracking error”	per the no-internal-nomenclature rule — every metric explained

Who reads it

pre_run_loader — ScoreboardSpecLoader parses this into ScoreboardSpec objects (sibling of MetricSpecLoader, which parses metrics.yaml). pre_run_loader.py:117.
validation/run_scoreboard.py — the console tool that prints the verdict (the model output; reports do NOT auto-inject scoreboard content).
logger — scoreboard_metric_coverage uses this file as the log-completeness bar (NOT the full metrics.yaml catalog), matched by leaf variable name.
parameter_loader / metric_catalog — adjacent config owners (built matrices, the metric catalog) that this judgment layer pairs with.

Footguns

Two files, two concerns — don't merge them

metrics.yaml (metric_catalog) = the variable catalog, what is measured. scoreboard.yaml = the
judgment layer, how it is judged. MetricSpecLoader and ScoreboardSpecLoader are separate parsers.
A gate change belongs here; a new logged signal belongs in the catalog. (analysis/INSIGHTS.md
[pre_run_loader, config]; .claude/rules/MEASUREMENT.md.)

Completeness bar = the scoreboard set, NOT the full catalog

logger’s scoreboard_metric_coverage matches raw log keys against this file, not metrics.yaml.
The full catalog has include:true metrics that are config-conditional (e.g. reactive-scorer scores
absent in POSE mode), so they can’t be a universal bar. Adding an entry here makes it a hard
write-completeness requirement (save_raw_log(require_complete=True)). (analysis/INSIGHTS.md [logger, io].)

The frac_below threshold must stay in sync with the controller floor

s_min_J’s threshold: 0.025 mirrors controller.arm.null_space.freeze_floor (the speed-throttle
floor) — and the catalog’s derate_fraction threshold mirrors the same value. All three must move
together when the controller floor is retuned. (YAMLs_by_domain/INSIGHTS.md [config, baseline].)

after: 90.0 is load-bearing, not cosmetic

All gates exclude the first 90 s (startup transient). This is the operational window where the arm has
acquired its first target and the controller has settled. TASK_10 tried 15 s and full-helix coverage
regressed 0.998 → 0.78. Metrics inside the skip are recorded but not scored. (YAMLs_by_domain/INSIGHTS.md
[config, baseline]; matches camera_guidance.targeting.startup.time = 90.0.)

Reduction semantics are not obvious — read the enum

final = the LAST logged value (coverage marking is monotone by design, so a late-achieved 0.99 passes).
frac_below scores the share of steps below threshold and is a diagnostic here (gate: null, never
fails). p99 allows 1% of steps above the gate. The Reductions enum abs-maps most signals — see
data_analyzer’s magnitude footgun before adding a new reduction. (analysis/INSIGHTS.md [data_analyzer, math].)

Keep it lean — one in, one out

The scoreboard is deliberately the minimum-viable scorecard (3 entries since CHAIN_5). The companion
metrics.yaml enforces “retire one metric per addition”; the same discipline keeps this file from
accreting cruft. Don’t add a fourth gate without retiring or strongly justifying it.
(YAMLs_by_domain/INSIGHTS.md [history].)

metric_catalog · pre_run_loader · parameter_loader · logger · data_analyzer · breve_controller · robot · terminology · current_sota

For the exhaustive key list, read YAMLs_by_domain/scoreboard.yaml itself — this page is a navigable reference, not a dump.

tags	measurement, config
aliases	scoreboard

Quartz 5

Explorer

scoreboard

scoreboard — mission success gates (the judgment layer)

Structure

Key knobs/sections

Who reads it

Footguns

Graph View

Table of Contents

Backlinks

Quartz 5

Explorer

scoreboard

scoreboard — mission success gates (the judgment layer)

Structure

Key knobs/sections

Who reads it

Footguns

Related

Graph View

Table of Contents

Backlinks