data_analyzer — signal math for the analysis pipeline

Purpose

The math library downstream of the NPZ handoff: LogView (masked log access), SignalAnalysis
(per-key metric evaluation + plot-axis building), DataSummary (YAML-driven reductions + comparison
tables), the Reductions/Deltas enums, and a reusable data-primitives block.
It is the one definition of every metric number a figure, table, or scoreboard gate reports.

Role in the system

  • Consumed by orchestrator (the only module that reaches it) → feeds plotter (axes) and
    star_reporter (summary/comparison frames). Modules never cross boundaries: this one reduces, it
    does not plot or render.
  • Reads saved logs produced by logger (LogStore → NPZ); LogView wraps the loaded LogEntry tree.
  • Specs come from the catalog layer metric_catalog (re-exported here for back-compat); the inference
    layer statistics (correlation/regression/dispersion) is also re-exported from here.
  • The measurement facade validation/signal_measurement.py shims only to functions here — that is
    what makes its number the pipeline’s number by construction (see terminology, the MEASUREMENT contract).
  • Pointing error is the versine 1 − cos θ (Deltas.POINTING_ERROR), and it lives here, not in GNC.

Inputs / Outputs

  • In: a saved log (NPZ path / Logger / LogView), the metric catalog, and run-spec analysis options
    (delta specs, band masks, key_variables).
  • Out: per-step signals (np.ndarray), scalar reductions (float), and presentation DataFrames
    (summary tables, comparison tables, peak-alignment frames) plus prepared PlotAxis objects.

Key methods / functions

  • LogView.values / values_at — masked values by leaf name / dotted path — analysis/data_analyzer.py:848 / :844
  • LogView.with_mask / include_between — compose a step mask / time window — :741 / :763
  • SignalAnalysis.metric_values — a key’s per-step values, collapsing paired → error signal — :1032
  • SignalAnalysis.signal_from_values — norm-or-component, abs unless the reduction is signed — :1057
  • SignalAnalysis.window_reduction — the scalar a sweep/gate reports — :1086
  • SignalAnalysis.plot_axis — build a panel’s PlotAxis (lines/quivers/hlines/mode-shading) — :1428
  • Reductions.apply / Deltas.apply — the reduction & delta math — :603 / :503
  • DataSummary.summary_frame / comparison_table_frames — per-run / cross-variant tables — :2061 / :2121
  • band_mask — half-open [lo, hi) step mask over RAW full-length indices — :1774
  • payload_diff — NaN-aware key-by-key max-abs diff (the byte-identical refactor gate) — :227
  • local_growth_rateλ(t) = d/dt ln D(t), chaos vs ill-conditioning diagnostic — :276

Footguns

Most reductions reduce |x|, but the signed members do NOT

signal_from_values abs-maps first, so MAX = max|x|. The is_signed members (MIN, FRAC_BELOW,
PTP) read the raw signed sample: MIN is the signed minimum, FRAC_BELOW tests < threshold,
PTP = max(x) − min(x) (not max|x| − min|x|). Abs-mapping these would silently mis-count.
(analysis/INSIGHTS.md [math])

v_sat is the infinity-norm clip flag, and v_max must ride on the spec

v_sat = (max_i |v_i| >= v_max) — component-wise-any, NOT the Euclidean norm. v_max is per-run and is
not threaded from cfg to analysis; it must arrive on the metric spec (via masked_reduction_items /
spec_with_overrides). A missing v_max raises hard at reduction time. (analysis/INSIGHTS.md [config])

Band masks must come from a fresh full-length view, or step alignment is lost

band_mask is built over RAW indices from a fresh LogView(view.log_entry), never the post-windowed /
extracted array. The caller ANDs it onto the active view with with_mask(). The only hardcoded band is
SMIN_BAND = (0.025, 0.049) (derate onset / Tikhonov onset); every other mask field is YAML.
(analysis/INSIGHTS.md [io][config])

An empty (never-occurring) band returns NaN, not a crash

A masked array with 0 rows breaks numpy’s -1 reshape; series_matrix passes the explicit column count
so the reduction returns NaN (an absent point). (analysis/INSIGHTS.md [footgun])

Unit "" not NaN, and tuple-ify list variant values

units_for_item coalesces to "": a NaN unit makes pivot_table(dropna=True) silently drop the whole
row. And comparison_variant_headers tuple-ifies a list variant_value (a multi-leaf sweep cascade)
before drop_duplicates, which cannot hash a list. (analysis/INSIGHTS.md [footgun])

Re-exports here are a deliberate cycle-break, not stray imports

statistics.py (inference) and metric_catalog.py (catalog access) were split out Jun 18 and are
re-exported here with # noqa: E402,F401 so the facade/tests keep importing from data_analyzer.
statistics.py imports the handful of helpers it needs (finite_numeric_samples, percentile,
Reductions, longest_run_length) lazily to avoid a module-level cycle. (analysis/INSIGHTS.md [history])

Pseudocode (key → reported scalar)

spec   = metric_spec(key)                       # from the catalog
values = values_for_spec(spec)                  # derived p_ce / v_sat, else raw log values
signal = paired? POINTING_ERROR|ERROR(values)   # versine if spec.pointing, else actual−desired
       : norm-or-first-component(values)
signal = signal if reduction.is_signed else |signal|
view   = window_for_item(view, item)            # time/index window AND band_mask (full-length)
value  = reduction.apply(signal[view], threshold=spec.threshold)   # the sweep/gate number

Equations & references

  • Pointing versine 1 − cos θ (sheet §7 metrics) is implemented here as Deltas.POINTING_ERROR,
    not in GNC — see [[current_sota#7]] and the MEASUREMENT contract.
  • The SMIN_BAND singularity band onsets (s_min_G) come from the derate stack [[current_sota#6]].

orchestrator · plotter · star_reporter · logger · metric_catalog · statistics ·
signal_measurement · runner · breve_controller · terminology