Walkthrough: \(\boldsymbol z_a\) is forced, not chosen — and it is the same object as the reconstruction

Scratch-pad companion to derivation_7dof.md §4 and dynamics_modifications_7dof.md §3.3, §5. The point of this note: tighten the story so that the inertia-weighted covector \(\boldsymbol z_a\) stops looking like a lucky choice and starts looking like the only answer — and so that M4 (the reconstruction section) and M5 (the covector) are revealed as two faces of one variational principle. Everything here is elementary linear algebra; no SVD, no pseudoinverse.

1 · The question

Section 4 of derivation_7dof.md introduces the self-motion covector as a choice:

“The normalization fixes one degree of freedom of \(\boldsymbol z_a \in \mathbb R^{13}\), leaving a twelve-parameter family of admissible covectors. Among them, take the inertia-weighted choice \(\boldsymbol z_a^T = \hat{\boldsymbol k}^T\boldsymbol M/(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k})\).”

A referee reads “among them, take” and asks the obvious question: why that one? The honest answer is better than “it has nice properties.” The honest answer is that the two properties we actually need already determine \(\boldsymbol z_a\) uniquely. There is no family to choose from once the requirements are stated. Let us see that.

2 · The two requirements

We want a covector (a row vector, a linear functional on the generalized-velocity space) \(\boldsymbol z_a^T : \mathbb R^{13} \to \mathbb R\), reading \(v_n = \boldsymbol z_a^T\boldsymbol x\), that does exactly two jobs.

Requirement (i) — normalization. It reads unity on a unit of self-motion: \[ \boldsymbol z_a^T\hat{\boldsymbol k} = 1 . \] This just sets the scale of the coordinate \(v_n\), so that \(\boldsymbol x = \hat{\boldsymbol k}\) registers as \(v_n = 1\) rather than as some arbitrary number.

Requirement (ii) — dynamic consistency. The set the controller reconstructs on, \(\{v_n = 0\} = \ker \boldsymbol z_a^T\), is the \(\boldsymbol M\)-orthogonal complement of the self-motion: \[ \ker \boldsymbol z_a^T \;=\; \{\,\boldsymbol x \in \mathbb R^{13} : \hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x = 0\,\} . \] This is what “dynamically consistent” means in Khatib’s sense: the task motion carries no kinetic-energy cross-term with the self-motion (we read off the block-diagonal \(\hat{\boldsymbol M}\) from exactly this in derivation_7dof.md §4c). Equivalently, \(\{v_n=0\}\) is the horizontal subspace of the mechanical connection.

That is the whole specification. Two requirements. Now watch them collapse to a formula.

3 · The derivation (two lines)

Proposition. On \(\Omega\), the unique covector satisfying (i) and (ii) is \[ \boxed{\;\boldsymbol z_a \;=\; \frac{\boldsymbol M\hat{\boldsymbol k}}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}}\;} \]

Scratch work. Requirement (ii) is a statement about kernels. Notice that the right-hand side \(\{\boldsymbol x : \hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x = 0\}\) is itself the kernel of a covector — namely the row vector \(\hat{\boldsymbol k}^T\boldsymbol M\). So (ii) says: the covector \(\boldsymbol z_a^T\) and the covector \(\hat{\boldsymbol k}^T\boldsymbol M\) have the same kernel. And two linear functionals with the same kernel can only differ by a scalar. That is the entire idea; the rest is bookkeeping.

Proof. First, the covector \(\hat{\boldsymbol k}^T\boldsymbol M\) is nonzero: since \(\boldsymbol M\) is symmetric positive definite and \(\hat{\boldsymbol k} \neq \boldsymbol 0\), the vector \(\boldsymbol M\hat{\boldsymbol k}\) is nonzero, so its transpose is a nonzero row. Its kernel is therefore a hyperplane (dimension \(12\) in \(\mathbb R^{13}\)). By requirement (ii) the covector \(\boldsymbol z_a^T\) has that same hyperplane as its kernel.

Now invoke the elementary fact that two nonzero linear functionals on a vector space with the same kernel are proportional. (If \(\ker\phi = \ker\psi\) is a hyperplane \(H\), pick any \(\boldsymbol w \notin H\); every \(\boldsymbol x\) decomposes as \(\boldsymbol x = t\boldsymbol w + \boldsymbol h\) with \(\boldsymbol h \in H\), so \(\phi(\boldsymbol x) = t\,\phi(\boldsymbol w)\) and \(\psi(\boldsymbol x) = t\,\psi(\boldsymbol w)\); hence \(\phi = [\phi(\boldsymbol w)/\psi(\boldsymbol w)]\,\psi\).) Applying it here, there is a scalar \(c\) with \[ \boldsymbol z_a^T \;=\; c\,\hat{\boldsymbol k}^T\boldsymbol M , \qquad\text{equivalently}\qquad \boldsymbol z_a \;=\; c\,\boldsymbol M\hat{\boldsymbol k} \] (using \(\boldsymbol M = \boldsymbol M^T\)). This consumes requirement (ii) completely; the inertia weighting is not an input, it is forced.

It remains to fix \(c\), which is what requirement (i) is for. Substituting, \[ 1 \;=\; \boldsymbol z_a^T\hat{\boldsymbol k} \;=\; c\,\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k} , \qquad\text{so}\qquad c \;=\; \frac{1}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}} , \] where the denominator is strictly positive on \(\Omega\) because \(\boldsymbol M\) is positive definite and \(\hat{\boldsymbol k} \neq \boldsymbol 0\) — so \(c\) is well defined everywhere we operate. Therefore \(\boldsymbol z_a = \boldsymbol M\hat{\boldsymbol k}/(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k})\), and it is unique. \(\blacksquare\)

The moral. There was never a twelve-parameter family to choose from. Requirement (ii) pins the direction of \(\boldsymbol z_a\) (it must be \(\boldsymbol M\hat{\boldsymbol k}\)), and requirement (i) pins its length. The phrase “we take the inertia-weighted choice” should become “the two requirements force the inertia weighting.” This is strictly stronger and removes the only soft spot in §4.

4 · The bonus: this is also your reconstruction (M4 = M5)

Here is the part worth putting in the thesis, because it unifies two modifications that the write-ups currently treat as separate. The reconstruction problem (M4) is: given a task velocity \(\boldsymbol y \in \mathbb R^{12}\), recover a generalized velocity \(\boldsymbol x \in \mathbb R^{13}\) with \(\boldsymbol\Gamma\boldsymbol x = \boldsymbol y\). The map is wide, so “recover” needs a selection principle. Use the physical one: among all consistent \(\boldsymbol x\), take the one of least kinetic energy.

Claim. The minimum-kinetic-energy reconstruction is exactly the \(\{v_n = 0\}\) section.

Proof. We solve the constrained optimization \[ \min_{\boldsymbol x}\ \tfrac12\boldsymbol x^T\boldsymbol M\boldsymbol x \quad\text{subject to}\quad \boldsymbol\Gamma\boldsymbol x = \boldsymbol y . \] Form the Lagrangian \(\mathcal L = \tfrac12\boldsymbol x^T\boldsymbol M\boldsymbol x - \boldsymbol\lambda^T(\boldsymbol\Gamma\boldsymbol x - \boldsymbol y)\) with multiplier \(\boldsymbol\lambda \in \mathbb R^{12}\). The stationarity condition is \(\partial\mathcal L/\partial\boldsymbol x = \boldsymbol M\boldsymbol x - \boldsymbol\Gamma^T\boldsymbol\lambda = \boldsymbol 0\), so the optimizer has the form \[ \boldsymbol x^\star \;=\; \boldsymbol M^{-1}\boldsymbol\Gamma^T\boldsymbol\lambda . \] Imposing the constraint gives \(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T\boldsymbol\lambda = \boldsymbol y\); the matrix \(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T\) is invertible on \(\Omega\) (it is \(12\times 12\), and positive definite since \(\boldsymbol\Gamma\) has full row rank and \(\boldsymbol M^{-1}\) is positive definite), so \(\boldsymbol\lambda = (\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1}\boldsymbol y\) and \[ \boldsymbol x^\star \;=\; \underbrace{\boldsymbol M^{-1}\boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1}}_{\textstyle \bar{\boldsymbol\Gamma}}\,\boldsymbol y . \] This \(\bar{\boldsymbol\Gamma}\) is the \(\boldsymbol M\)-weighted (dynamically consistent) generalized inverse. Now read off the self-motion content of \(\boldsymbol x^\star\). Because \(\boldsymbol x^\star = \boldsymbol M^{-1}\boldsymbol\Gamma^T\boldsymbol\lambda\), \[ v_n \;=\; \boldsymbol z_a^T\boldsymbol x^\star \;=\; \frac{\hat{\boldsymbol k}^T\boldsymbol M}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}}\,\boldsymbol M^{-1}\boldsymbol\Gamma^T\boldsymbol\lambda \;=\; \frac{\hat{\boldsymbol k}^T\boldsymbol\Gamma^T\boldsymbol\lambda}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}} \;=\; \frac{(\boldsymbol\Gamma\hat{\boldsymbol k})^T\boldsymbol\lambda}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}} \;=\; 0 , \] where the \(\boldsymbol M^{-1}\) and \(\boldsymbol M\) annihilate and the last step uses \(\boldsymbol\Gamma\hat{\boldsymbol k} = \boldsymbol 0\) (Proposition 1). So the least-kinetic-energy reconstruction automatically satisfies \(v_n = 0\): it is the section of §5.2. \(\blacksquare\)

Why lstsq injected the ghost, in one line. The production lstsq/min-norm path solves the same constrained problem but with the Euclidean objective \(\tfrac12\boldsymbol x^T\boldsymbol x\) in place of \(\tfrac12\boldsymbol x^T\boldsymbol M\boldsymbol x\). Repeating the computation with \(\boldsymbol M \to \boldsymbol E\) gives \(\boldsymbol x_E = \boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol\Gamma^T)^{-1}\boldsymbol y\) and the orthogonality it enforces is \(\hat{\boldsymbol k}^T\boldsymbol x_E = 0\) — Euclidean-orthogonal to \(\hat{\boldsymbol k}\), not \(\boldsymbol M\)-orthogonal. But \(v_n = \hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x_E/(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k})\), and Euclidean orthogonality does not imply \(\hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x_E = 0\). The two metrics disagree by exactly the inertia weighting, and that disagreement is the non-decaying \(v_n\) we measured (mean \(|v_n| = 0.157\)). The fix was never a damping hack; it was using the right inner product.

The moral. M4 and M5 are one idea. The covector \(\boldsymbol z_a\) (M5) is the measurement of self-motion in the kinetic-energy metric; the reconstruction section (M4) is the selection of generalized velocity in the same metric. They share the matrix \(\boldsymbol M\) because they are the dual statements of one variational principle — minimum kinetic energy subject to the task constraint. Present them together and the whole 7-DOF story has a single spine.

5 · The citation, told honestly

The temptation is to attribute the inertia weighting to “Khatib eq. 18.” That is the wrong equation, and the looser attribution invites a referee to check. Here is the accurate account, from reading Khatib (1987) §VI–VII directly.

The dynamically consistent inverse \(\bar{\boldsymbol J}\) and its null-space projector are in Khatib 1987, §VI, eqs. 51–52 and 55 — not eq. 18 (eq. 18 is the operational-space inertia congruence \(\boldsymbol\Lambda = \boldsymbol J^{-T}\boldsymbol M\boldsymbol J^{-1}\), a different statement). Cite eqs. 51–52 and the remark beneath eq. 52.
The phrase “dynamically consistent inverse” does not appear in the 1987 paper. What Khatib writes there is descriptive: \(\bar{\boldsymbol J}\) “is actually a generalized inverse … that minimizes the manipulator’s instantaneous kinetic energy” (p. 49, under eq. 52), and the inverse “consistent with the dynamic equations” (p. 50, §VII). The fixed noun phrase is a later crystallization — its canonical locus is Featherstone & Khatib 1997, titled literally “Load independence of the dynamically consistent inverse of the Jacobian matrix.” Use Featherstone–Khatib 1997 when you want the term, Khatib 1987 eqs. 51–52 when you want the construction.
The lineage predates 1987: the operational-space / kinetic-energy construction traces to Khatib’s 1980 docteur-ingénieur thesis (ENSAE Toulouse) and his 1983 paper “Dynamic control of manipulators in operational space” (6th CISM–IFToMM). Cite these for the true origin.

What is genuinely yours. Khatib’s \(\bar{\boldsymbol J}\) is a \(7\times 6\) matrix object he never computes — by his own summary (p. 52) the scheme “avoids the explicit evaluation of any generalized inverse or pseudo-inverse.” Two things are your own contribution:

The scalar covector \(\boldsymbol z_a\) — the \(1\)-D specialization to a one-dimensional null space — derived in two lines from its defining properties (§3 above), not lifted as a quoted matrix formula. This is the right thing to show rather than cite.
You compute the section explicitly and reconstruct on it every step, where Khatib uses \(\bar{\boldsymbol J}\) only as a derivation device. Same object, opposite implementation stance — worth a sentence if a referee asks why your treatment looks different from his.

6 · Edits this note implies

If you agree with the above, the concrete changes are small and local:

derivation_7dof.md §4 — replace “Among them, take the inertia-weighted choice …” with the two-requirement Proposition of §3 (it is shorter than the current three-property justification and strictly stronger: uniqueness, not preference). Keep §4(a)–(c) as the consequences of the now-forced \(\boldsymbol z_a\).
dynamics_modifications_7dof.md §3.3 — the line “This inertia weighting is the same dynamically-consistent device as Giordano’s thesis equation (5.22)” is fine, but add the one-sentence variational characterization (§4 above) so M4 and M5 read as one principle; and correct the Khatib pointer to eqs. 51–52.
§5.1 of dynamics_modifications already says “the min-norm convention was itself the ghost injector” — strengthen it with the one-line metric argument from §4 (Euclidean vs \(\boldsymbol M\) orthogonality), which makes the claim a proof rather than an assertion.
References note — add the eq-52 pointer, Featherstone–Khatib 1997 for the term, and the 1980/1983 origin.

None of this touches deriv_7dof.tex.