Doctoral Research · Space Robotics Inspection with a Free-Flying Space Manipulator
A Doctoral Research Journal Aerospace Engineering

The originality gate

Policy. New code must pass an originality test: no GNC function pair may reach ≥ 0.90 multiset Jaccard (structural near-duplication) unless the pair is on the allowlist. Enforced by a fast pytest test, so it runs in the default pytest -m "not slow" suite.

The what and how of the score live in ast_duplication_detector.md; this doc is the policy and the living allowlist log.

Why a whole-tree scan, not a diff

The gate scans the whole active tree (GNC/) and fails on any un-allowlisted high-J pair — it does not try to diff “new” functions against main. Reasons: - It matches the repo’s existing “pin the acceptable state, fail on regression” lineage — the same shape as the five validate_*_baseline.py gold-standard validators. The allowlist is the pinned state; an un-allowlisted high-J pair is the regression. - There is no changed-files/diff plumbing anywhere in the repo; mapping a git diff hunk back to a function qualname is fragile (rename detection, line-range → qualname) and net-new machinery. - A whole-tree scan also catches the case a diff-on-new-functions gate misses: editing an old function until it resembles another. - It is cheap — the detector runs in ~1–2 s on GNC/, so the gate is not marked slow.

The threshold

ORIGINALITY_FAIL = 0.90   # an un-allowlisted pair at/above this FAILS the suite
ORIGINALITY_WARN = 0.85   # advisory only (printed by check_originality.py); never fails

0.90 is deliberately generous: the live GNC maximum is 0.827 (after the Jun-22 dedup), so the fail line sits a clear ~0.07 above every current pair and the gate starts green. A pair only reaches ≥0.90 by being nearly a copy-paste-rename — the regime where a human almost always agrees “this is the same code twice.” Below ~0.85 the detector’s known false-positive families (boilerplate getters, value/derivative twins) dominate, so 0.90 sits comfortably above the noise.

A future violation means a newly-added or newly-edited function is a structural near-duplicate (≥0.90) of an existing one and is not adjudicated. Two ways to clear it: 1. Dedupe (preferred) — extract the shared structure (a helper, a base method, a Template-Method hook) so the J drops below the line. 2. Allowlist-with-justification — if the duplication is deliberate, add the frozenset({qualnameA, qualnameB}) pair to ALLOWLIST in validation/tests/test_originality.py and add a dated entry to the log below. The PR that adds an allowlist entry is the review checkpoint.

The allowlist

Allowlist justification log (living — one dated bullet per entry)

All five seeds are below the 0.90 fail line today; they are seeded so that (a) check_originality.py shows them as adjudicated rather than unexplained, and (b) if a deliberate fork legitimately drifts ≥0.90 later, the gate does not false-alarm.

How to run

See also