signal_measurement — the log-to-number facade

Purpose

The one place to turn a logged run into a number. Every function delegates to the same
SignalAnalysis / DataSummary path the orchestrator sweep uses, so the value you get is by
construction
the pipeline’s own definition of the metric — not a second, hand-rolled one. This is the
enforceable core of the measurement contract (.claude/rules/MEASUREMENT.md).

Role in the system

  • A thin facade over data_analyzer (math primitives + DataSummary/SignalAnalysis/LogView) and
    statistics (correlation/effect-size/dispersion), imported as _da and _stats.
  • The shims are deliberately thin: the heavy lifting lives in data_analyzer (the permanent home), the
    facade just gives validation scripts a friendly, discoverable entry point. New durable idioms land in
    analysis/ and get a shim HERE — never the reverse.
  • Consumed by every validator and diagnostic that needs a metric: the five validate_*_baseline.py,
    run_scoreboard.py, dwell_episodes.py, omega_b_diagnostic.py, the fragility/divergence studies.
  • __all__ (validation/signal_measurement.py:47) IS the public surface — both from … import measure
    and import … as sm styles read off it; everything else is implementation detail.

Inputs / Outputs

  • source — whatever you have: an NPZ path, a live Logger, or an existing LogView
    (_as_view adapts all three — :73).
  • key — a catalog metric name (z_b, p_e, s_min_J, …); its metrics.yaml spec decides the
    Delta (e.g. pointing → versine) and units. You do not pass the delta.
  • windows — optional after / before seconds (t >= after, t <= before); after=90 is the
    operational window the sweep uses.
  • Out — floats (measure), per-step arrays (series, diff_norm), Package records (regress,
    episode_contrast, ensemble_dispersion, payload_diff), and index/run sets.

Key functions

  • measure(source, key, reduction, *, after, before, mask) — canonical scalar reduction; identical to the sweep — :88
  • series(source, key, *, after, before) — the per-step magnitude signal the pipeline plots/reduces — :109
  • as_pointing_deg(versine) — DISPLAY-ONLY versine → angle in degrees — :120
  • view_from_npz(path) — saved NPZ → canonical LogView (the one-liner) — :68
  • divergence_series / local_growth_rate / first_divergence_index — cross-run separation, growth rate, split onset — :148 / :165 / :180
  • diff_norm(values, n, prepend) / step_norm(source, key, …) — per-step jump ‖diff(x)‖: raw-array and keyed — :192 / :197
  • contiguous_runs / merge_runs / drop_short_runs / longest_run_length — episode-run spans/merge/filter/length — :217 / :237 / :242 / :232
  • threshold_crossings / first_crossing / quantized_stall_mask — crossings + dwell-stall mask — :222 / :247 / :227
  • correlate / regress / lead_lag / effect_size / episode_contrast — keyed two-signal stats + Cliff’s delta — :252 / :262 / :271 / :281 / :286
  • ensemble_dispersion / event_effect_size — across-seed dispersion + seeded event-study contrast — :295 / :300
  • back_half_pkpk / percentile_of / time_to_extreme / near_singular_fraction — tail amplitude, arbitrary-q, extreme timing, band fraction — :313 / :319 / :324 / :329
  • point_cloud_distance / arclength_projection — spatial: cloud→curve distance / projection — :336 / :342
  • available_reductions() — the reduction vocabulary — :347

Footguns

Measure ONLY through this facade — never re-derive a metric

Re-computing “the z_b error” with raw numpy invents a SECOND definition of the number; that is exactly
how a chord got mislabelled as the pointing error (Jun 16). The facade routes through the same reducer
the sweep calls, so its number is the pipeline’s by construction. Parity is pinned to 1e-12 by
validation/tests/test_signal_measurement.py. (.claude/rules/MEASUREMENT.md)

z_b / z_e ARE the versine 1 − cos θ, not a chord or an angle

The catalog’s pointing-error definition. The chord ‖z_b − z_b_des‖ = 2 sin(θ/2) is ~12× larger and
NOT what the controller regulates; utils.geometry.angle_between is a different Delta (successive
samples). Show degrees with as_pointing_deg for humans, but the stored/reduced metric stays the
versine — and do not rms() the degrees (the nonlinear map does not commute with rms). (MEASUREMENT.md)

The shims delegate — keep the math in analysis/, not here

Each function is a thin wrapper over _da/_stats. Adding the implementation here instead of in
data_analyzer inverts the contract and the next refactor loses it. Promote new idioms into
analysis/, shim here, add a parity test. (validation/INSIGHTS.md)

Keyed vs raw correlation

correlate takes catalog KEYS and pulls their series (use it for logged metrics). For two DERIVED raw
arrays (e.g. ‖actual omega_b‖ vs arm joint-rate) there is no catalog key — call the primitive
pearson_r from data_analyzer/statistics directly. (validation/INSIGHTS.md)

Pseudocode (the measure path — why the number is the pipeline’s)

view    = windowed(as_view(source), after, before)   # NPZ/Logger/LogView → masked LogView
spec    = DataSummary().spec_with_overrides(key)      # the catalog MetricSpec (z_b → pointing/versine)
signal  = SignalAnalysis(view).metric_signal(key)     # the SAME per-step signal the sweep builds
return    Reductions[reduction].apply(signal)         # the SAME reduction the sweep applies

Using it — examples & vocabulary

(The how-to, moved here from MEASUREMENT.md so it loads only when you open this page; the rule keeps just the enforceable three.)

from validation.signal_measurement import measure, series, as_pointing_deg, view_from_npz
 
measure("logs/.../run.npz", "z_b", "rms", after=90)        # op base-pointing versine rms (sweep's 0.0147)
measure(result.logger, "p_e", "p99", after=90)             # op EE-tracking p99, no NPZ round-trip
measure(view, "s_min_J", "frac_below", after=90)           # near-singular fraction (catalog threshold)
zb = series("logs/.../run.npz", "z_b", after=90)
print(f"median base pointing ≈ {float(as_pointing_deg(zb).mean()):.2f} deg")
# band-restricted: p_e p99 only over steps where s_min_G is in the singularity band
measure(view, "p_e", "p99", after=90, mask={"signal": "s_min_G", "lo": 0.025, "hi": 0.049})

Reductions (the reduction arg): rms, p99, p95, median, mean, max, min, final, frac_below, cumulative, count (+ frac_true/count for booleans, ptp peak-to-peak). available_reductions() lists them; the Reductions enum lives in data_analyzer.
Deltas (catalog-chosen — you don’t pass this): POINTING_ERROR (versine, simultaneous), ERROR (Euclidean actual − desired), ANGLE_STEP (angle_between successive samples).

What each metric MEANS (the catalog’s call, not yours):

keymeasuresdefinition
z_b, z_epointing error (versine)1 − cos θ, SIMULTANEOUS actual & desired axis
p_e, p_cposition error‖actual − desired‖
omega_b, v_crate / velocity error‖actual − desired‖
s_min_J, s_min_Gsmallest singular valuelogged scalar (catalog frac_below for bands)

Reusable primitives (fragility / sensitivity / cross-run; all NaN-aware; live in data_analyzer, shimmed here — reach for THESE, never hand-roll):

from validation.signal_measurement import (
    divergence_series, first_divergence_index, local_growth_rate,   # cross-run divergence
    ensemble_dispersion, episode_contrast, correlate, regress,      # dispersion / episode / two-signal
    back_half_pkpk, near_singular_fraction,                         # tail amplitude / band fraction
)
t, D = divergence_series(run_a, run_b, "p_c"); lam = local_growth_rate(t, D, smooth=31)  # onset + rate
near = series(run, "s_min_J") < 0.005; c = episode_contrast(arm_speed, near)             # c.delta = Cliff's

Full roster = the Key functions list above. Domain-specific idioms (mesh coverage, SVD manipulability, schedule oracles) stay with their own code, not the facade — see validation/data_tricks.md.

Equations & references

  • The enforceable three rules + figures/tmp_ policy: .claude/rules/MEASUREMENT.md (thin — points back here for the how-to).
  • Golden parity test (facade == sweep to 1e-12) + value pins: validation/tests/test_signal_measurement.py.
  • Reusable-primitive narrative + the “Hand-roll census” ledger: validation/INSIGHTS.md.

data_analyzer · statistics · metric_catalog · metrics · scoreboard · orchestrator · logger · terminology