Doctoral Research · Space Robotics Inspection with a Free-Flying Space Manipulator
A Doctoral Research Journal Aerospace Engineering

Walkthrough v2 — \(M\)-orthogonality and projectors, from scratch, ending at \(\boldsymbol z_a\)

Companion to 7dof_walkthrough.md, derivation_7dof.md §4, and dynamics_modifications_7dof.md §3.3/§5. This note assumes nothing about inner products or projectors and builds both from the ground up, then walks the “projector route” to the self-motion covector \(\boldsymbol z_a = \boldsymbol M\hat{\boldsymbol k}/(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k})\) slowly enough that every “why” is answered. A single 3-dimensional toy is carried through every step so you can check the whole chain by hand. The algebra was also verified numerically over 2000 random trials (recipe at the end).

You said you’re stuck on two words — \(M\)-orthogonal” and “projector / idempotent.” They are the only two ideas in the whole derivation. Get them and the rest is bookkeeping. So we spend most of this note on the two ideas and only then assemble the argument.


0 · Standing assumptions (the small print that everything leans on)

Everything below is true on the operating region \(\Omega\), where:

Why this box matters for you specifically. Your thesis lives near singularities. If \(\boldsymbol\Gamma\) ever drops to rank \(11\), \(\ker\boldsymbol\Gamma\) becomes two-dimensional, \(\boldsymbol N\) is no longer rank one, and the factorization \(\boldsymbol N = \hat{\boldsymbol k}\,\boldsymbol w^T\) collapses — the entire construction of \(\boldsymbol z_a\) is undefined there. That is not a flaw in the math; it is the reason the freeze-floor mitigation exists. State this assumption out loud before a committee asks, because they will.


Part I — What “\(M\)-orthogonal” means

1 · The motivating question: when are two motions independent?

A motion of the whole robot is one vector \(\boldsymbol x \in \mathbb R^{13}\) — base linear velocity, base angular velocity, seven joint rates, all stacked. Hand someone two motions \(\boldsymbol u\) and \(\boldsymbol v\) and ask: are they independent, or do they interfere?

Your reflex is to compute the dot product \(\boldsymbol u^T\boldsymbol v\) and call them independent if it is zero. But look at what that reflex secretly assumes. The dot product \(\boldsymbol u^T\boldsymbol v = \sum_i u_i v_i\) adds up the thirteen coordinates as if they were the same kind of thing on the same footing — as if one radian of shoulder rotation were interchangeable with one meter of base drift. For a robot that is physically wrong: those motions cost different amounts of energy and live on different axes. The honest notion of “independent” has to come from the physics, not from the accident of how we stacked numbers into a column. That is the entire reason \(M\)-orthogonality exists.

2 · An inner product is a chosen ruler-and-protractor

Two geometric questions matter in any vector space: how long is this vector? (a ruler) and what is the angle between these two — in particular, are they perpendicular? (a protractor). Remarkably, a single device answers both: an inner product \(\langle\boldsymbol u,\boldsymbol v\rangle\), a rule that eats two vectors and returns a number. From it, length is \(\|\boldsymbol u\| = \sqrt{\langle\boldsymbol u,\boldsymbol u\rangle}\) and “perpendicular” is just \(\langle\boldsymbol u,\boldsymbol v\rangle = 0\).

The ordinary dot product is one particular choice of inner product — the one that declares the coordinate axes mutually perpendicular and each one unit long. It is the default ruler, not a law of nature. We are free to pick a different ruler, and when we do, lengths and angles rearrange. The liberating point:

“Perpendicular” is not a property of two vectors. It is a property of two vectors together with a chosen inner product. Change the ruler and what counts as a right angle changes too.

3 · A symmetric positive-definite \(\boldsymbol M\) is a ruler

Here is how a matrix becomes a ruler. For SPD \(\boldsymbol M\), the rule \[ \langle\boldsymbol u,\boldsymbol v\rangle_{\boldsymbol M} \;=\; \boldsymbol u^T\boldsymbol M\boldsymbol v \] is a genuine inner product — the \(\boldsymbol M\)-inner product — with its own length, the \(\boldsymbol M\)-norm \(\|\boldsymbol u\|_{\boldsymbol M} = \sqrt{\boldsymbol u^T\boldsymbol M\boldsymbol u}\). Both requirements on \(\boldsymbol M\) earn their keep. Symmetry makes the pairing symmetric, \(\langle\boldsymbol u,\boldsymbol v\rangle_{\boldsymbol M} = \langle\boldsymbol v,\boldsymbol u\rangle_{\boldsymbol M}\) (an honest protractor reads the same angle both ways). Positive-definiteness makes \(\|\boldsymbol u\|_{\boldsymbol M}^2 = \boldsymbol u^T\boldsymbol M\boldsymbol u > 0\) for nonzero \(\boldsymbol u\), so only the zero vector has zero length. The ordinary dot product is the special case \(\boldsymbol M = \boldsymbol E\) (identity): “the ruler that treats every axis as unit and mutually square.” A general SPD \(\boldsymbol M\) is just a different, equally legitimate ruler.

Definition. Two vectors are \(\boldsymbol M\)-orthogonal when \[ \langle\boldsymbol u,\boldsymbol v\rangle_{\boldsymbol M} \;=\; \boldsymbol u^T\boldsymbol M\boldsymbol v \;=\; 0 . \] That is the whole definition. It is “perpendicular,” measured with the \(\boldsymbol M\)-protractor. The only new ingredient versus ordinary perpendicularity is the matrix \(\boldsymbol M\) wedged into the middle of the product.

4 · Why this \(\boldsymbol M\) is the physically correct ruler: it measures energy

For our robot \(\boldsymbol M\) is the generalized inertia matrix, and it is built to measure energy. The kinetic energy of a motion \(\boldsymbol x\) is \[ T \;=\; \tfrac12\,\boldsymbol x^T\boldsymbol M\boldsymbol x \;=\; \tfrac12\,\|\boldsymbol x\|_{\boldsymbol M}^2 . \] So the \(\boldsymbol M\)-norm is (twice) the kinetic energy. The \(\boldsymbol M\)-ruler measures motions in the currency the dynamics actually cares about. Now expand the energy of a combined motion: \[ \tfrac12(\boldsymbol u+\boldsymbol v)^T\boldsymbol M(\boldsymbol u+\boldsymbol v) = \underbrace{\tfrac12\boldsymbol u^T\boldsymbol M\boldsymbol u}_{T(\boldsymbol u)} + \underbrace{\tfrac12\boldsymbol v^T\boldsymbol M\boldsymbol v}_{T(\boldsymbol v)} + \underbrace{\boldsymbol u^T\boldsymbol M\boldsymbol v}_{\text{cross term}} . \] The cross term is exactly \(\langle\boldsymbol u,\boldsymbol v\rangle_{\boldsymbol M}\). When it vanishes — when \(\boldsymbol u\) and \(\boldsymbol v\) are \(\boldsymbol M\)-orthogonal — the combined energy is just the sum of the two separate energies, \(T(\boldsymbol u+\boldsymbol v) = T(\boldsymbol u)+T(\boldsymbol v)\). A Pythagorean theorem in energy. The two motions borrow no energy from each other; that is the dynamically meaningful sense of “independent.” This is why, downstream, “\(\boldsymbol M\)-perpendicular to \(\hat{\boldsymbol k}\)” is the right notion: it collects exactly the motions that are energetically decoupled from the self-motion.

5 · A 2×2 example you can check in your head

Let the ruler be \(\boldsymbol M = \begin{bmatrix}2&0\\0&1\end{bmatrix}\) — “the first axis is heavy, costing twice the energy per unit.” Take \[ \boldsymbol a = \begin{bmatrix}1\\2\end{bmatrix}, \qquad \boldsymbol b = \begin{bmatrix}1\\-1\end{bmatrix} . \] Euclidean: \(\boldsymbol a^T\boldsymbol b = (1)(1)+(2)(-1) = -1 \neq 0\)not perpendicular to ordinary eyes. Now the \(\boldsymbol M\)-protractor: \(\boldsymbol M\boldsymbol b = \begin{bmatrix}2\\-1\end{bmatrix}\), so \[ \boldsymbol a^T\boldsymbol M\boldsymbol b = (1)(2)+(2)(-1) = 0 . \] The same two arrows are skew to Euclidean eyes and at a perfect right angle to \(\boldsymbol M\)-eyes. Nothing about the arrows changed — only the ruler did.

6 · The picture, and the one trap

\(\boldsymbol M\) acts like a stretch-and-tilt of space (its eigenvectors are the principal axes, its eigenvalues the stretch factors). Two arrows that are genuinely at a right angle in the stretched (energy) world look skew when you draw them on ordinary graph paper. That is all \(\boldsymbol M\)-orthogonality is: a right angle in the stretched world.

The trap that caused a real bug. Euclidean-perpendicular (\(\boldsymbol u^T\boldsymbol v = 0\)) does not imply \(\boldsymbol M\)-perpendicular (\(\boldsymbol u^T\boldsymbol M\boldsymbol v = 0\)). They are different equations. The naive minimum-Euclidean-norm reconstruction makes its residual Euclidean-perpendicular to the task rows — the wrong right angle — which leaves a nonzero, energetically-coupled component along \(\hat{\boldsymbol k}\). That leaked component is the non-decaying \(v_n\) you measured (\(\bar{|v_n|} = 0.157\)). The cure is to switch rulers.


Part II — The keystone: why \(\boldsymbol M\)-orthogonality is the right relation

Before any projector machinery, here is the one theorem that explains why this whole story is about \(\boldsymbol M\)-orthogonality and not something else. It is also the bridge between your two modifications: the reconstruction (M4) and the covector (M5) are the same idea.

Theorem (minimum-energy reconstruction \(=\) \(\boldsymbol M\)-orthogonal residual). Among all generalized velocities \(\boldsymbol x\) consistent with a task command \(\boldsymbol z\) (i.e. \(\boldsymbol\Gamma\boldsymbol x = \boldsymbol z\)), the one of least kinetic energy is \[ \boldsymbol x^\star = \bar{\boldsymbol\Gamma}\boldsymbol z, \qquad \bar{\boldsymbol\Gamma} = \boldsymbol M^{-1}\boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1}, \] and it carries zero self-motion: \(\hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x^\star = 0\), i.e. \(v_n = 0\).

Proof. Minimize \(\tfrac12\boldsymbol x^T\boldsymbol M\boldsymbol x\) subject to \(\boldsymbol\Gamma\boldsymbol x = \boldsymbol z\). Form the Lagrangian \(\mathcal L = \tfrac12\boldsymbol x^T\boldsymbol M\boldsymbol x - \boldsymbol\lambda^T(\boldsymbol\Gamma\boldsymbol x - \boldsymbol z)\) with multiplier \(\boldsymbol\lambda \in \mathbb R^{12}\). Stationarity, \(\partial\mathcal L/\partial\boldsymbol x = \boldsymbol M\boldsymbol x - \boldsymbol\Gamma^T\boldsymbol\lambda = \boldsymbol 0\), gives the crucial structural fact \[ \boldsymbol x^\star = \boldsymbol M^{-1}\boldsymbol\Gamma^T\boldsymbol\lambda \qquad\text{(every optimizer lies in } \boldsymbol M^{-1}\,\mathrm{row}(\boldsymbol\Gamma)\text{).} \] Impose the constraint to solve for \(\boldsymbol\lambda\), then read off the self-motion content using \(\boldsymbol\Gamma\hat{\boldsymbol k} = \boldsymbol 0\): \[ \hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x^\star = \hat{\boldsymbol k}^T\boldsymbol M\,\boldsymbol M^{-1}\boldsymbol\Gamma^T\boldsymbol\lambda = \hat{\boldsymbol k}^T\boldsymbol\Gamma^T\boldsymbol\lambda = (\boldsymbol\Gamma\hat{\boldsymbol k})^T\boldsymbol\lambda = 0. \qquad\blacksquare \]

The moral. “Least kinetic energy” and “residual \(\boldsymbol M\)-orthogonal to the self-motion” are the same condition. That is why the dynamically meaningful reconstruction lands on \(\{v_n = 0\}\) automatically, and why \(\boldsymbol M\) (not the identity) is the relation that matters. The projector \(\boldsymbol N\) we study next is precisely the operator that extracts the part of any motion this reconstruction throws away.


Part III — What a “projector” is (idempotents)

7 · Idempotent: doing it twice equals doing it once

A square matrix is idempotent if \(\boldsymbol P^2 = \boldsymbol P\). The feeling behind the equation: applying the operation a second time changes nothing. The picture is a shadow. A pencil casts a shadow on the table; the shadow of the shadow is the same shadow — it already lies flat. That is \(\boldsymbol P^2 = \boldsymbol P\). The only scalars with \(p^2 = p\) are \(0\) (collapse) and \(1\) (do nothing); idempotent matrices are the rich middle, acting like \(1\) on some directions and \(0\) on others.

8 · A projector splits the space: range \(\oplus\) kernel

Every idempotent \(\boldsymbol P\) on \(\mathbb R^n\) carves the space into two complementary pieces, \[ \mathbb R^n = \mathrm{range}(\boldsymbol P)\ \oplus\ \ker(\boldsymbol P), \] and projects onto \(\mathrm{range}(\boldsymbol P)\) along \(\ker(\boldsymbol P)\). The range is the screen (where shadows land); the kernel is the direction of projection (what gets crushed, the way each point slides to reach the screen). Proof that this splits the space: for any \(\boldsymbol x\), write \(\boldsymbol x = \boldsymbol P\boldsymbol x + (\boldsymbol x - \boldsymbol P\boldsymbol x)\). The first piece is in the range. The second is in the kernel, because \(\boldsymbol P(\boldsymbol x - \boldsymbol P\boldsymbol x) = \boldsymbol P\boldsymbol x - \boldsymbol P^2\boldsymbol x = \boldsymbol P\boldsymbol x - \boldsymbol P\boldsymbol x = \boldsymbol 0\) — and that step is where idempotency does its work.

Two facts we will use:

9 · Oblique by default; perpendicular is a luxury

Crucial warning. Nothing above says the screen and the projection direction are perpendicular. The split \(\mathrm{range}(\boldsymbol P) \oplus \ker(\boldsymbol P)\) only needs them to be complementary (fill the space, meet at \(\boldsymbol 0\)). Whether they meet at a right angle is a separate, extra condition that generic projectors fail.

A projector whose range and kernel are not perpendicular is oblique — think of a low afternoon sun casting a long slanted shadow; each point slides to the table along a slanted ray. The tidy special case (sun straight overhead, shadow straight down) is the orthogonal projector, and algebraically that is exactly the symmetric case \(\boldsymbol P = \boldsymbol P^T\).

This is the heart of why \(\boldsymbol z_a\) is subtle. Our \(\boldsymbol N = \boldsymbol E_{13} - \bar{\boldsymbol\Gamma}\boldsymbol\Gamma\) carries \(\boldsymbol M\) inside \(\bar{\boldsymbol\Gamma}\), so it is an oblique projector to Euclidean eyes. Its range and kernel are perpendicular in the \(\boldsymbol M\)-metric, not the Euclidean one. The single embedded \(\boldsymbol M^{-1}\) is the obliquity knob.

10 · The complementary projector \(\boldsymbol E - \boldsymbol P\)

If \(\boldsymbol P\) is a projector, so is \(\boldsymbol Q = \boldsymbol E - \boldsymbol P\), and it is the mirror image: it projects onto \(\ker(\boldsymbol P)\) along \(\mathrm{range}(\boldsymbol P)\). Three one-liners we will use directly:

11 · A concrete oblique projector (numbers)

Take \(\boldsymbol P = \begin{bmatrix}1&1\\0&0\end{bmatrix}\). Idempotent: \(\boldsymbol P^2 = \boldsymbol P\) (check the top-left entry: \(1\cdot1 + 1\cdot0 = 1\)). Range: every output has second coordinate \(0\), so \(\mathrm{range}(\boldsymbol P) = \mathrm{span}\{(1,0)^T\}\), the \(x\)-axis. Kernel: \(\boldsymbol P\boldsymbol x = \boldsymbol 0\) means \(x_1 + x_2 = 0\), so \(\ker(\boldsymbol P) = \mathrm{span}\{(1,-1)^T\}\). Now the punchline: \((1,0)\cdot(1,-1) = 1 \neq 0\) — screen and slide-direction meet at \(45^\circ\), not a right angle. Watch it act: \(\boldsymbol P(0,1)^T = (1,0)^T\). The point \((0,1)\) did not drop straight down to the origin (that would be the orthogonal projection \(\begin{bmatrix}1&0\\0&0\end{bmatrix}\)); it slid diagonally along \((1,-1)\) to land at \((1,0)\). Oblique projection, in miniature.

12 · Rank-one projectors are forced to be \(\boldsymbol k\,\boldsymbol w^T\) with \(\boldsymbol w^T\boldsymbol k = 1\)

This is the structural fact \(\boldsymbol z_a\) rests on. Any rank-one matrix is a column times a row, \(\boldsymbol P = \boldsymbol k\,\boldsymbol w^T\) (that is what “rank one” means — every column is a multiple of one direction \(\boldsymbol k\), and \(\boldsymbol w^T\) records the multipliers). Its range is \(\mathrm{span}\{\boldsymbol k\}\), because \(\boldsymbol P\boldsymbol x = \boldsymbol k\,(\boldsymbol w^T\boldsymbol x)\) is always a multiple of \(\boldsymbol k\). Now demand it be a projector. Using that \(\boldsymbol w^T\boldsymbol k\) is a scalar: \[ \boldsymbol P^2 = (\boldsymbol k\,\boldsymbol w^T)(\boldsymbol k\,\boldsymbol w^T) = \boldsymbol k\,(\boldsymbol w^T\boldsymbol k)\,\boldsymbol w^T = (\boldsymbol w^T\boldsymbol k)\,\boldsymbol P, \] so \(\boldsymbol P^2 = \boldsymbol P\) forces \((\boldsymbol w^T\boldsymbol k - 1)\boldsymbol P = \boldsymbol 0\), and since \(\boldsymbol P \neq \boldsymbol 0\), \[ \boxed{\ \boldsymbol w^T\boldsymbol k = 1\ }. \] A rank-one projector is exactly an outer product normalized so the covector reads \(1\) on its own column. (A ruler must measure its own unit as one unit.) Note the gauge freedom: \(\boldsymbol k \mapsto \alpha\boldsymbol k,\ \boldsymbol w \mapsto \alpha^{-1}\boldsymbol w\) leaves both \(\boldsymbol P\) and \(\boldsymbol w^T\boldsymbol k = 1\) unchanged — so the directions of \(\boldsymbol k\) and \(\boldsymbol w\) are pinned by \(\boldsymbol P\), but their individual lengths are a choice. This is why we may fix \(\hat{\boldsymbol k}\) to unit length and then solve for the length of \(\boldsymbol w\).


Part IV — The argument: \(\boldsymbol N = \boldsymbol E_{13} - \bar{\boldsymbol\Gamma}\boldsymbol\Gamma\) down to \(\boldsymbol z_a\)

We now assemble Parts I–III. We carry a 3D toy alongside so you can verify every line.

The toy. Take \(n = 3\) with task dimension \(2\) (so \(\ker\boldsymbol\Gamma\) is \(3-2 = 1\) dimensional — the smallest case that shows the rank arithmetic). Let \[ \boldsymbol\Gamma = \begin{bmatrix}1&-1&0\\0&0&1\end{bmatrix}, \qquad \boldsymbol M = \begin{bmatrix}2&0&0\\0&1&0\\0&0&1\end{bmatrix}. \] Then \(\boldsymbol\Gamma\boldsymbol x = \boldsymbol 0\) means \(x_1 = x_2,\ x_3 = 0\), so \(\hat{\boldsymbol k} = (1,1,0)^T\) spans the kernel. (\(\boldsymbol M\) is diagonal but \(\hat{\boldsymbol k}\) is not axis-aligned, which is what makes the \(\boldsymbol M\)-tilt visible. Using the un-normalized \(\hat{\boldsymbol k}\) keeps the arithmetic clean; the final \(\boldsymbol z_a\) is unaffected because the formula self-adjusts to \(\boldsymbol z_a^T\hat{\boldsymbol k}=1\).)

Step 2 — \(\boldsymbol N\) is idempotent, and rank \(1\)

First, \(\boldsymbol P := \bar{\boldsymbol\Gamma}\boldsymbol\Gamma\) is itself a projector, because \(\bar{\boldsymbol\Gamma}\) is a right inverse, \(\boldsymbol\Gamma\bar{\boldsymbol\Gamma} = \boldsymbol E_{12}\): \[ \boldsymbol P^2 = \bar{\boldsymbol\Gamma}(\boldsymbol\Gamma\bar{\boldsymbol\Gamma})\boldsymbol\Gamma = \bar{\boldsymbol\Gamma}\,\boldsymbol E_{12}\,\boldsymbol\Gamma = \bar{\boldsymbol\Gamma}\boldsymbol\Gamma = \boldsymbol P . \] Then \(\boldsymbol N = \boldsymbol E_{13} - \boldsymbol P\) is its complementary projector (§10), idempotent. Its rank is \[ \mathrm{rank}\,\boldsymbol N = 13 - \mathrm{rank}\,\boldsymbol P = 13 - 12 = 1 . \] That “\(1\)is the redundancy: \(13\) ways to move, \(12\) the task can see, one left over — the self-motion.

Toy. \(\boldsymbol M^{-1} = \mathrm{diag}(\tfrac12,1,1)\). Working it out (all by hand): \(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T = \begin{bmatrix}3/2&0\\0&1\end{bmatrix}\), so \(\bar{\boldsymbol\Gamma} = \boldsymbol M^{-1}\boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1} = \begin{bmatrix}1/3&0\\-2/3&0\\0&1\end{bmatrix}\). Check \(\boldsymbol\Gamma\bar{\boldsymbol\Gamma} = \boldsymbol E_2\). Then \[ \boldsymbol N = \boldsymbol E_3 - \bar{\boldsymbol\Gamma}\boldsymbol\Gamma = \begin{bmatrix}2/3&1/3&0\\2/3&1/3&0\\0&0&0\end{bmatrix}. \] Two identical nonzero rows and a zero row: rank \(1\), visibly. And \(\boldsymbol N^2 = \boldsymbol N\) (check the top-left: \(\tfrac23\cdot\tfrac23 + \tfrac13\cdot\tfrac23 = \tfrac49+\tfrac29 = \tfrac69 = \tfrac23\)).

Step 3 — \(\mathrm{range}\,\boldsymbol N = \ker\boldsymbol\Gamma = \mathrm{span}\{\hat{\boldsymbol k}\}\)

For any idempotent, the range is the fixed-point set (§8). Here \(\boldsymbol N\boldsymbol x = \boldsymbol x\) means \(\bar{\boldsymbol\Gamma}\boldsymbol\Gamma\boldsymbol x = \boldsymbol 0\); and because \(\bar{\boldsymbol\Gamma}\) has full column rank (it has the left inverse \(\boldsymbol\Gamma\), so \(\bar{\boldsymbol\Gamma}\boldsymbol y = \boldsymbol 0 \Rightarrow \boldsymbol y = \boldsymbol 0\)), this is equivalent to \(\boldsymbol\Gamma\boldsymbol x = \boldsymbol 0\). Hence \(\mathrm{range}\,\boldsymbol N = \ker\boldsymbol\Gamma\) exactly (not merely \(\subseteq\) — the injectivity of \(\bar{\boldsymbol\Gamma}\) is what upgrades the inclusion to equality). And \(\ker\boldsymbol\Gamma = \mathrm{span}\{\hat{\boldsymbol k}\}\) by the standing assumption. Sanity: \[ \boldsymbol N\hat{\boldsymbol k} = \hat{\boldsymbol k} - \bar{\boldsymbol\Gamma}(\boldsymbol\Gamma\hat{\boldsymbol k}) = \hat{\boldsymbol k} - \boldsymbol 0 = \hat{\boldsymbol k}. \] \(\boldsymbol N\) fixes the self-motion — the self-motion is literally the screen \(\boldsymbol N\) projects onto.

Toy. Columns of \(\boldsymbol N\) are \((2/3,2/3,0)^T,\ (1/3,1/3,0)^T,\ \boldsymbol 0\) — all multiples of \((1,1,0)^T = \hat{\boldsymbol k}\). And \(\boldsymbol N\hat{\boldsymbol k} = (2/3+1/3,\ 2/3+1/3,\ 0)^T = (1,1,0)^T = \hat{\boldsymbol k}\). ✓

Step 4 — rank one \(\Rightarrow\) \(\boldsymbol N = \hat{\boldsymbol k}\,\boldsymbol w^T\)

By §12, a rank-one matrix whose range is \(\mathrm{span}\{\hat{\boldsymbol k}\}\) must be the outer product \(\boldsymbol N = \hat{\boldsymbol k}\,\boldsymbol w^T\) for some unknown row \(\boldsymbol w^T\). Everything from here is the hunt for \(\boldsymbol w\), because — looking ahead — \(\boldsymbol z_a^T = \boldsymbol w^T\).

Step 5 — idempotency pins the scale: \(\boldsymbol w^T\hat{\boldsymbol k} = 1\)

Feed \(\boldsymbol N = \hat{\boldsymbol k}\,\boldsymbol w^T\) into \(\boldsymbol N\hat{\boldsymbol k} = \hat{\boldsymbol k}\) from Step 3: \(\boldsymbol N\hat{\boldsymbol k} = \hat{\boldsymbol k}(\boldsymbol w^T\hat{\boldsymbol k}) = (\boldsymbol w^T\hat{\boldsymbol k})\hat{\boldsymbol k}\), and comparing with \(\hat{\boldsymbol k}\) forces the scalar \(\boldsymbol w^T\hat{\boldsymbol k} = 1\). (Same condition as §12.) This fixes the length of \(\boldsymbol w\), not yet its direction.

Pause — a callout the verification made sharp. Everything in Steps 2–5 holds for any right inverse of \(\boldsymbol\Gamma\), not just the \(\boldsymbol M\)-weighted one. Idempotency, rank one, \(\mathrm{range}\,\boldsymbol N = \ker\boldsymbol\Gamma\), and \(\boldsymbol w^T\hat{\boldsymbol k} = 1\) would all hold if you used the Euclidean inverse instead. So Steps 2–5 do not determine \(\boldsymbol z_a\). The inertia weighting enters at exactly one place — Step 6 — and that is the whole reason the answer is \(\boldsymbol M\hat{\boldsymbol k}\) rather than \(\hat{\boldsymbol k}\). (We make this concrete in Part V.)

Step 6 — the heart: \(\ker\boldsymbol N\) is the \(\boldsymbol M\)-orthogonal complement of \(\hat{\boldsymbol k}\)

This locates the direction of \(\boldsymbol w\). Four deliberate sub-moves.

(a) \(\ker\boldsymbol N = \mathrm{range}\,\bar{\boldsymbol\Gamma}\). Complementary projectors swap range and kernel (§10): \(\ker\boldsymbol N = \mathrm{range}(\boldsymbol P) = \mathrm{range}(\bar{\boldsymbol\Gamma}\boldsymbol\Gamma) = \mathrm{range}(\bar{\boldsymbol\Gamma})\) (the last equality because \(\boldsymbol\Gamma\) is onto). Directly: any \(\bar{\boldsymbol\Gamma}\boldsymbol\xi\) is killed by \(\boldsymbol N\), since \(\boldsymbol N\bar{\boldsymbol\Gamma}\boldsymbol\xi = \bar{\boldsymbol\Gamma}\boldsymbol\xi - \bar{\boldsymbol\Gamma}(\boldsymbol\Gamma\bar{\boldsymbol\Gamma})\boldsymbol\xi = \boldsymbol 0\).

(b) \(\mathrm{range}\,\bar{\boldsymbol\Gamma} = \boldsymbol M^{-1}\,\mathrm{row}(\boldsymbol\Gamma)\). Read \(\bar{\boldsymbol\Gamma} = \boldsymbol M^{-1}\boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1}\) right to left. The trailing \((\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1}\) is an invertible \(12\times 12\) map, hence a bijection of \(\mathbb R^{12}\) onto itself — it cannot shrink the reachable set, only re-coordinate it. So as \(\boldsymbol\xi\) sweeps \(\mathbb R^{12}\), \(\boldsymbol\Gamma^T(\cdots)\boldsymbol\xi\) sweeps the full column space of \(\boldsymbol\Gamma^T\), which is the row space of \(\boldsymbol\Gamma\). Then \(\boldsymbol M^{-1}\) is applied out front. Note where the lone \(\boldsymbol M^{-1}\) sits — after the row space is formed.

(c) \(\mathrm{row}(\boldsymbol\Gamma) = (\mathrm{span}\,\hat{\boldsymbol k})^{\perp}\). Fundamental theorem of linear algebra: the row space is the Euclidean orthogonal complement of the null space. Intuitively, \(\boldsymbol\Gamma\boldsymbol x = \boldsymbol 0\) says every row of \(\boldsymbol\Gamma\) is perpendicular to \(\boldsymbol x\). Since \(\ker\boldsymbol\Gamma = \mathrm{span}\{\hat{\boldsymbol k}\}\), \[ \mathrm{row}(\boldsymbol\Gamma) = \{\,\boldsymbol y : \hat{\boldsymbol k}^T\boldsymbol y = 0\,\} \quad\text{(ordinary, Euclidean perpendicularity).} \]

(d) The climax — \(\boldsymbol M^{-1}\) turns Euclidean-\(\perp\) into \(\boldsymbol M\)-\(\perp\). Combine (a)–(c): \(\ker\boldsymbol N = \boldsymbol M^{-1}\{\boldsymbol y : \hat{\boldsymbol k}^T\boldsymbol y = 0\}\). A vector \(\boldsymbol x\) is in this set iff \(\boldsymbol x = \boldsymbol M^{-1}\boldsymbol y\) with \(\hat{\boldsymbol k}^T\boldsymbol y = 0\) — equivalently \(\boldsymbol y = \boldsymbol M\boldsymbol x\) satisfies \(\hat{\boldsymbol k}^T\boldsymbol y = 0\). Substitute: \[ \boldsymbol x \in \ker\boldsymbol N \iff \hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x = 0 . \] That is exactly \(\boldsymbol M\)-orthogonality of \(\boldsymbol x\) to \(\hat{\boldsymbol k}\). So \[ \boxed{\ \ker\boldsymbol N = \{\,\boldsymbol x : \hat{\boldsymbol k}^T\boldsymbol M\boldsymbol x = 0\,\} = (\mathrm{span}\,\hat{\boldsymbol k})^{\perp_{\boldsymbol M}}\ } \] The single factor of \(\boldsymbol M^{-1}\), sitting out front in \(\bar{\boldsymbol\Gamma}\), is the only agent that carries us from Euclidean geometry to energy geometry. (Also: \(\boldsymbol M\hat{\boldsymbol k} \neq \boldsymbol 0\) because \(\boldsymbol M\) is SPD and \(\hat{\boldsymbol k} \neq \boldsymbol 0\), so \(\hat{\boldsymbol k}^T\boldsymbol M(\cdot)\) is a nonzero functional and its kernel is a genuine \(12\)-dimensional hyperplane — we need that for Step 7.)

Toy. \(\ker\boldsymbol N\) from \(\boldsymbol N\) above is \(\{\boldsymbol x : \tfrac23 x_1 + \tfrac13 x_2 = 0\} = \{2x_1 + x_2 = 0\}\). Compare \(\hat{\boldsymbol k}^T\boldsymbol M = (1,1,0)\,\mathrm{diag}(2,1,1) = (2,1,0)\), so the \(\boldsymbol M\)-orthogonal complement is \(\{2x_1 + x_2 = 0\}\)the same set. Meanwhile the Euclidean complement would be \(\{x_1 + x_2 = 0\}\) — different. The tilt is right there in the numbers: \(2x_1+x_2\) versus \(x_1+x_2\).

Step 7 — two descriptions of \(\ker\boldsymbol N\) force \(\boldsymbol w \parallel \boldsymbol M\hat{\boldsymbol k}\)

From the factored form, \(\boldsymbol N\boldsymbol x = \hat{\boldsymbol k}(\boldsymbol w^T\boldsymbol x) = \boldsymbol 0 \iff \boldsymbol w^T\boldsymbol x = 0\), so \(\ker\boldsymbol N = (\mathrm{span}\,\boldsymbol w)^{\perp}\). From Step 6, \(\ker\boldsymbol N = (\mathrm{span}\,\boldsymbol M\hat{\boldsymbol k})^{\perp}\), using \(\hat{\boldsymbol k}^T\boldsymbol M = (\boldsymbol M\hat{\boldsymbol k})^T\) (this is where \(\boldsymbol M = \boldsymbol M^T\) is load-bearing). The same hyperplane has been written as “perpendicular to \(\boldsymbol w\)” and “perpendicular to \(\boldsymbol M\hat{\boldsymbol k}\).” A hyperplane has a unique normal direction, so \[ \boldsymbol w = c\,\boldsymbol M\hat{\boldsymbol k} \quad\text{for some scalar } c . \]

Close — fix the scale, read off \(\boldsymbol z_a\)

Impose the Step 5 normalization on \(\boldsymbol w = c\,\boldsymbol M\hat{\boldsymbol k}\): \[ \boldsymbol w^T\hat{\boldsymbol k} = c\,\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k} = 1 \;\Rightarrow\; c = \frac{1}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}} , \] where the denominator is strictly positive (twice the self-motion’s kinetic energy, \(\boldsymbol M\) SPD), so \(c\) never blows up. Therefore \[ \boxed{\ \boldsymbol z_a^T = \boldsymbol w^T = \frac{\hat{\boldsymbol k}^T\boldsymbol M}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}}, \qquad \boldsymbol z_a = \frac{\boldsymbol M\hat{\boldsymbol k}}{\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}}\ } \] In words: to read how much self-motion is in a velocity \(\boldsymbol x\), take the \(\boldsymbol M\)-weighted (kinetic-energy) overlap of \(\boldsymbol x\) with \(\hat{\boldsymbol k}\), normalized so pure self-motion reads \(1\).

Toy. \(\boldsymbol M\hat{\boldsymbol k} = (2,1,0)^T\), \(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k} = 3\), so \(\boldsymbol z_a^T = (2/3,\,1/3,\,0)\). Check the outer product: \(\hat{\boldsymbol k}\,\boldsymbol z_a^T = (1,1,0)^T(2/3,1/3,0) = \begin{bmatrix}2/3&1/3&0\\2/3&1/3&0\\0&0&0\end{bmatrix} = \boldsymbol N\). ✓ And \(\boldsymbol z_a^T\hat{\boldsymbol k} = 2/3 + 1/3 = 1\). ✓ The whole chain closes by hand.


Part V — The falsifier: why \(\boldsymbol M\) was necessary (and the common wrong answer)

Run the same toy with the Euclidean right inverse \(\bar{\boldsymbol\Gamma}_E = \boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol\Gamma^T)^{-1}\). You get \(\bar{\boldsymbol\Gamma}_E = \begin{bmatrix}1/2&0\\-1/2&0\\0&1\end{bmatrix}\) and \[ \boldsymbol N_E = \boldsymbol E_3 - \bar{\boldsymbol\Gamma}_E\boldsymbol\Gamma = \begin{bmatrix}1/2&1/2&0\\1/2&1/2&0\\0&0&0\end{bmatrix} = \hat{\boldsymbol k}\,\boldsymbol w_E^T, \quad \boldsymbol w_E^T = (1/2,\,1/2,\,0). \] Check what survives: \(\boldsymbol N_E\) is still idempotent, still rank one, still fixes \(\hat{\boldsymbol k}\), and still satisfies \(\boldsymbol w_E^T\hat{\boldsymbol k} = 1\). Steps 2–5 all pass. But its kernel is \(\{x_1 + x_2 = 0\}\) — the Euclidean complement of \(\hat{\boldsymbol k}\) — so its normal is \(\boldsymbol w_E \parallel \hat{\boldsymbol k}\), giving \[ \boldsymbol z_{a,\text{wrong}}^T = \frac{\hat{\boldsymbol k}^T}{\hat{\boldsymbol k}^T\hat{\boldsymbol k}} = (1/2,\,1/2,\,0) \;\neq\; (2/3,\,1/3,\,0) = \boldsymbol z_a^T . \]

Common wrong answer (the muscle-memory trap). \[ \boldsymbol z_{a,\text{wrong}} = \frac{\hat{\boldsymbol k}}{\hat{\boldsymbol k}^T\hat{\boldsymbol k}} \qquad\text{(Euclidean — what \texttt{lstsq} gives).} \] The correct answer replaces the numerator \(\hat{\boldsymbol k}^T\) with \(\hat{\boldsymbol k}^T\boldsymbol M\) and the denominator \(\hat{\boldsymbol k}^T\hat{\boldsymbol k}\) with \(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k}\). The two factors of \(\boldsymbol M\) are the entire difference, and they are exactly the leaked \(v_n\).

Notice in the toy how the tilt redistributes weight: the heavy first axis gets \(2/3\) in the correct \(\boldsymbol z_a\) versus \(1/2\) in the Euclidean one. This contrast is the answer to the examiner’s “how do you know the \(\boldsymbol M\) was necessary?” — both inverses pass Steps 2–5; only the \(\boldsymbol M\)-weighted one gives a residual that is energetically decoupled from the self-motion.


Part VI — Capstone: in what sense is this oblique \(\boldsymbol N\) “orthogonal”?

\(\boldsymbol N\) is oblique to Euclidean eyes (\(\boldsymbol N \neq \boldsymbol N^T\)). The clean characterization is that it is an orthogonal projector in the energy metric — i.e. \(\boldsymbol N\) is self-adjoint with respect to \(\langle\cdot,\cdot\rangle_{\boldsymbol M}\), which means \[ \boldsymbol M\boldsymbol N = \boldsymbol N^T\boldsymbol M . \] (Equivalently \(\langle\boldsymbol N\boldsymbol a, \boldsymbol b\rangle_{\boldsymbol M} = \langle\boldsymbol a, \boldsymbol N\boldsymbol b\rangle_{\boldsymbol M}\) for all \(\boldsymbol a,\boldsymbol b\).) Its range \(\mathrm{span}\{\hat{\boldsymbol k}\}\) and kernel \((\mathrm{span}\,\hat{\boldsymbol k})^{\perp_{\boldsymbol M}}\) are \(\boldsymbol M\)-orthogonal — a genuine right angle in the metric the physics uses. That one identity is the one-sentence answer to “your projector isn’t symmetric, so how is it orthogonal?”

Toy. \(\boldsymbol M\boldsymbol N = \mathrm{diag}(2,1,1)\begin{bmatrix}2/3&1/3&0\\2/3&1/3&0\\0&0&0\end{bmatrix} = \begin{bmatrix}4/3&2/3&0\\2/3&1/3&0\\0&0&0\end{bmatrix}\), which is symmetric — and equals \(\boldsymbol N^T\boldsymbol M\). So \(\boldsymbol N\) is \(\boldsymbol M\)-self-adjoint. By contrast \(\boldsymbol N_E\) from Part V is Euclidean-symmetric (\(\boldsymbol N_E = \boldsymbol N_E^T\)) but not \(\boldsymbol M\)-self-adjoint: \(\boldsymbol M\boldsymbol N_E = \begin{bmatrix}1&1&0\\1/2&1/2&0\\0&0&0\end{bmatrix} \neq \boldsymbol N_E^T\boldsymbol M\). Each projector is the clean orthogonal one in its own metric; ours must be clean in the energy metric, which is why it looks skew on graph paper.


Part VII — Numeric check recipe (and what was verified)

You can reproduce the whole thing at full scale:

  1. Draw a random \(13\times 13\) Gaussian \(\boldsymbol A\); set \(\boldsymbol M = \boldsymbol A\boldsymbol A^T + 13\,\boldsymbol E\) (guaranteed SPD).
  2. Draw a random \(12\times 13\) Gaussian \(\boldsymbol\Gamma\) (full row rank almost surely).
  3. SVD \(\boldsymbol\Gamma = \boldsymbol U\boldsymbol\Sigma\boldsymbol V^T\); take $ = $ last column of \(\boldsymbol V\) (check \(\|\boldsymbol\Gamma\hat{\boldsymbol k}\| \approx 0\)).
  4. Form \(\bar{\boldsymbol\Gamma} = \boldsymbol M^{-1}\boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol M^{-1}\boldsymbol\Gamma^T)^{-1}\) (check \(\boldsymbol\Gamma\bar{\boldsymbol\Gamma} = \boldsymbol E_{12}\)) and \(\boldsymbol N = \boldsymbol E_{13} - \bar{\boldsymbol\Gamma}\boldsymbol\Gamma\).
  5. Verify, to \(\sim 10^{-7}\): (a) \(\boldsymbol N^2 = \boldsymbol N\); (b) \(\mathrm{rank}\,\boldsymbol N = 1\); (c) \(\boldsymbol N\hat{\boldsymbol k} = \hat{\boldsymbol k}\); (d) \(\hat{\boldsymbol k}^T\boldsymbol M\bar{\boldsymbol\Gamma} = \boldsymbol 0\) (kernel is \(\boldsymbol M\)-orthogonal to \(\hat{\boldsymbol k}\)); (e) with \(\boldsymbol w^T = \hat{\boldsymbol k}^T\boldsymbol M/(\hat{\boldsymbol k}^T\boldsymbol M\hat{\boldsymbol k})\), that \(\hat{\boldsymbol k}\boldsymbol w^T = \boldsymbol N\) and \(\boldsymbol w^T\hat{\boldsymbol k} = 1\); (f) the capstone \(\boldsymbol M\boldsymbol N = \boldsymbol N^T\boldsymbol M\).
  6. Falsifier run: repeat with \(\bar{\boldsymbol\Gamma}_E = \boldsymbol\Gamma^T(\boldsymbol\Gamma\boldsymbol\Gamma^T)^{-1}\). Confirm (a)(b)(c) and \(\boldsymbol w_E^T\hat{\boldsymbol k}=1\) still hold, but \(\hat{\boldsymbol k}^T\boldsymbol M\bar{\boldsymbol\Gamma}_E \neq \boldsymbol 0\) and \(\boldsymbol w_E \parallel \hat{\boldsymbol k}\) (not \(\boldsymbol M\hat{\boldsymbol k}\)).

This was run over 2000 random trials: every equality in step 5 held; the falsifier behaved exactly as predicted (Euclidean \(\boldsymbol M\)-orthogonality fails, normal collapses onto \(\hat{\boldsymbol k}\)). If you want it pinned in the repo, I can add it as validation/test_projector_za.py.


Part VIII — How this connects back to your derivation

This is the projector route to the same \(\boldsymbol z_a\) that 7dof_walkthrough.md §3 reaches by the two-property argument, and that Part II reaches variationally. Three lenses, one object:

Citation, kept honest (from 7dof_walkthrough.md §5): the construction is Khatib 1987 eqs. 51–52, 55 (not eq. 18); the phrase “dynamically consistent inverse” is Featherstone–Khatib 1997; origin is Khatib’s 1980 thesis / 1983 CISM–IFToMM. Yours is the scalar \(1\)-D specialization and the explicit computation of the section.

Nothing here touches deriv_7dof.tex.