IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty

Authors: Zheng, Tsiotras · Year: 2024 · Venue: arXiv preprint / IEEE conference (Georgia Tech, School of Aerospace Engineering)
Raw: md

Summary

IBBT is a sampling-based belief-space motion planner for systems with motion noise, state-dependent measurement noise, and non-convex obstacles, subject to per-step collision chance constraints. Its core idea is to decompose the stochastic planning problem into (i) a deterministic planning problem solved by a Rapidly-exploring Random Graph (RRG) that yields a nominal-trajectory graph, and (ii) a belief-tree search over that graph guided by an admissible, informed cost-to-go heuristic obtained by value iteration on the deterministic graph. By combining batch sampling with ordered, heuristic-guided search, IBBT avoids the exhaustive expansion of non-dominated belief nodes performed by RRBT, finding first solutions faster and with lower cost while retaining the anytime, incremental property.

Key Claims

Decomposing belief-space planning into a deterministic RRG sub-problem plus a belief-tree graph search lets the deterministic nominal cost-to-go serve as an admissible heuristic for the stochastic problem.
Admissibility via Jensen’s inequality: because the stage cost J(x_k,u_k) is convex, J(x̄_k,ū_k) ≤ E[J(x_k,u_k)], so the nominal path cost underestimates the expected cost and the value-iteration cost-to-go v_i.h is an admissible (and informed) heuristic.
IBBT retains RRBT’s partial ordering of belief nodes — needed because belief-space planning lacks the optimal-substructure property (no total ordering of paths by cost) — so it can still find plans that detour to information-rich regions to reduce uncertainty.
Empirically (400 problems: 10 environments × 20 start–goal queries × 2 environment models), IBBT finds the first solution faster and at lower average cost than both RRBT and Monte Carlo Motion Planning (MCMP), and shows better cost-vs-time performance. Example (DI-IR): first-solution time 0.324 s (IBBT) vs 0.668 s (RRBT) vs 0.940 s (MCMP); improved cost 28.91 vs 35.83 vs 32.15.
IBBT is anytime and incremental: reusing value-iteration results and graph structure between batches accelerates convergence; RRBT and IBBT are anytime whereas MCMP is not.

Method

Regime: This is a generic robot belief-space planner, not a space-manipulator paper. It is demonstrated on a double-integrator and a Dubins-vehicle model; neither free-flying nor free-floating manipulator dynamics appear. The relevance is methodological (the planning/risk layer), not dynamical.

Problem. Minimize expected cost (1) subject to boundary conditions (2), per-step chance constraint (3), motion model (4), and sensing model (5):

min_{u_k} E[ Σ_{k=0}^{N-1} J(x_k,u_k) ]
x_0 ~ N(x̄_s, P_0), x̄_N = x̄_g
P(x_k ∈ X_obs) < δ, k = 0,…,N
x_{k+1} = f(x_k,u_k,w_k); y_k = h(x_k,v_k)

with w_k, v_k i.i.d. standard Gaussian and independent; J convex in (x_k,u_k), e.g. quadratic xᵀQ_k x + uᵀR_k u.

Plan structure. A plan is a nominal trajectory plus a feedback controller. A Connect(x̄ᵃ,x̄ᵇ) primitive returns (X̄ᵃᵇ, Ūᵃᵇ, Kᵃᵇ): nominal state/control sequences and finite-time-LQR feedback gains, with control u_k = ū_k + K_k x̂_k (Eq. 7). A Kalman filter estimates the error state x̌_k = x_k − x̄_k.

Covariance propagation (linearization along nominal trajectory, LTV system, Eq. 10–14). The error dynamics use Jacobians A,B,G,C,D. The Kalman update gives estimation-error covariance P̃_k (Eq. 13a–13c) and the estimate covariance P̂_k (Eq. 14), with the key decomposition P_k = P̂_k + P̃_k (using E[x̂_k x̃_kᵀ]=0). P_k is the state covariance used for chance-constraint evaluation.

Informed heuristic. Value iteration VI(G) computes the true deterministic cost-to-go v_i.h for every vertex of the nominal-trajectory graph; by Jensen’s inequality this is an admissible, informed cost-to-go for the stochastic problem. Belief nodes are ranked by total cost n.f = n.c + n.h.

Partial ordering (Eq. 19). For belief nodes n_a, n_b at the same vertex, n_a ≺ n_b (n_a dominates n_b) iff (n_a.f < n_b.f) ∧ (n_a.P < n_b.P) ∧ (n_a.P̃ < n_b.P̃).

Algorithm. Algorithm 1 (IBBT) repeats: RRG adds a batch of m=20 samples; VI recomputes cost-to-go (warm-started from the previous iteration); the belief queue is updated and pruned (Prune(Q,Cost) drops nodes with n.f > current solution cost); GraphSearch (Algorithm 2) grows the belief tree via Pop (lowest n.f), Propagate (covariance propagation + Monte-Carlo chance-constraint check + cost), and Append (add if non-dominated, prune dominated descendants). Search terminates when the goal belief node is popped (solution) or Q empties (no solution on the current graph).

Note on notation. The markdown’s Kalman-gain Eq. (13a) renders as L_k = P̃_k − C_kᵀ(...)⁻¹; in standard Kalman form this should read L_k = P̃_{k⁻} C_kᵀ (C_k P̃_{k⁻} C_kᵀ + D_k D_kᵀ)⁻¹. The stray minus signs are OCR/transcription artifacts of subscripts (k⁻) and the multiplication; the intended quantities are the predicted-covariance Kalman gain. Flagging for the pencil-and-paper reader.

Relevance to thesis

The thesis’s risk layer needs to plan in belief space under motion and sensing uncertainty with collision chance constraints — exactly IBBT’s setting. The decomposition into a deterministic nominal-trajectory graph plus an admissible-heuristic belief search is directly transferable: for a free-flying manipulator the deterministic layer would solve nominal guidance (fully-actuated 6-DOF base + arm), and the belief layer would propagate covariance under a tracking controller and enforce chance constraints. The Jensen-inequality admissibility argument (convex stage cost ⇒ nominal cost underestimates expected cost) is a clean, regime-independent result worth reusing. The state-dependent measurement model (information-rich regions, line-of-sight landmarks) maps onto inspection/sensing-aware planning where pose-estimation quality depends on geometry — relevant to active perception for inspection. Caveat: IBBT assumes Gaussian beliefs and linearization along the nominal trajectory, and does not address the dynamic coupling specific to manipulators.

Connections

Topics: belief_space_planning · chance_constraints · sampling_based_motion_planning · covariance_propagation · informed_heuristic_search

Key Equations / Quotes

“The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem.” (Abstract)

“Since J(x_k,u_k) is a convex function, via Jensen’s inequality, we have J(x̄_k,ū_k) ≤ E[J(x_k,u_k)] … Thus, Σ J(x̄_k,ū_k) is an underestimate of the actual cost, and v_i.h is an admissible cost-to-go heuristic for the belief tree search problem.” (Sec. V)

Per-step chance constraint (Eq. 3): P(x_k ∈ X_obs) < δ, k = 0,…,N, with δ = 0.1 in all experiments.

Covariance decomposition: P_k = P̂_k + P̃_k.

Partial ordering (Eq. 19): n_a ≺ n_b ⟺ (n_a.f < n_b.f) ∧ (n_a.P < n_b.P) ∧ (n_a.P̃ < n_b.P̃).

Open Questions

Asymptotic optimality / completeness of IBBT is not established; the conclusion lists “studying the asymptotic property of IBBT” as future work.
Extension to non-Gaussian noise and to replanning in changing environments is left open.
How does the per-step chance constraint P(x_k ∈ X_obs) < δ relate to a joint (trajectory-level) collision probability — the paper enforces it pointwise, leaving the union-bound conservatism / risk allocation unaddressed.
Sensitivity of the linearization assumption (system stays near nominal under feedback) for strongly nonlinear or strongly coupled dynamics is not characterized.

Quartz 5

Explorer

zheng2024informed