Chance-Constrained Dynamic Programming with Application to Risk-Aware Robotic Space Exploration

Authors: Masahiro Ono, Marco Pavone, Yoshiaki Kuwata, J. (Bob) Balaram · Year: 2015 · Venue: Autonomous Robots, 39(4):555–571

Raw full-text conversion pending — the LAN/marker cluster is down (2026-07-04). Built from the abstract + §1 (intro/contributions) and §3 (problem + Boole reformulation) read in full, with §2 (preliminaries) and §4–5 (algorithm/experiments) read in part, from the author-hosted PDF via firecrawl (web.stanford.edu/~pavone/papers/Ono.Pavone.ea.AURO14.pdf); the scraped PDF's display equations are partly mangled, so equations below are transcribed conservatively. File the PDF to Docs/raw/pdf/ono2015chance.pdf and run marker when the cluster returns.

Summary

This is the canonical space-robotics precedent for joint chance-constrained planning with violation-probability semantics (co-authored by Pavone’s Stanford ASL and JPL). Standard constrained dynamic programming (and constrained MDP theory) requires constraints to share the additive structure of the cost — an expectation of a sum of one-stage terms — which a joint chance constraint does not have. The paper bridges this gap: it conservatively reformulates the joint chance constraint, via Boole’s inequality, into a bound on the expectation of a sum of indicator random variables (now additive), folds that into the cost through a Lagrangian dual, solves the primal for a fixed dual variable by ordinary Bellman recursion, and optimizes the single dual variable by Brent’s root-finding (exponentially convergent). Suboptimality bounds on the primal and dual objectives give rigorous stopping criteria. It is demonstrated on path planning, a Mars entry-descent-landing (EDL) problem, and a Lunar landing problem, using real Mars terrain with ~4 million discrete states per step (simulation only).

Key Claims

Joint chance constraints break additive constrained DP. Existing constrained-MDP tools assume additive constraint structure; a joint chance constraint (probability the whole trajectory stays feasible) is non-additive, so those tools do not apply directly.
Boole + indicator reformulation (conservative). Define $I_{k} (x_{k}) = 1$ if $x_{k} \in / X_{k}$ (state infeasible at step $k$ ), else $0$ . By Boole’s inequality, bounding $E {\sum_{k = 1}^{N} I_{k} (x_{k}) ∣ x_{0}} \leq Δ$ is a sufficient condition for the joint chance constraint $Pr {⋀_{k} x_{k} \in X_{k} ∣ x_{0}} \geq 1 - Δ$ (Theorem 1: a feasible solution of the approximation is feasible for the original). The reformulated constraint is additive, hence a standard constrained MDP.
Small, quantified conservatism. Under an independent-failure assumption, the Boole gap is $O (Δ^{2})$ ; since mission risk bounds $Δ$ are small, the added conservatism is moderate, and (per prior work) much smaller than competing approximations.
Dual DP with root-finding. For a fixed dual variable $λ$ , the Lagrangian is minimized by ordinary DP (Bellman recursion); the scalar dual variable is then optimized by Brent’s method, which converges exponentially and whose per-iteration complexity is independent of the primal problem size — typically 10–30 iterations.
Rigorous suboptimality bounds. Because the dual objective is generally non-differentiable, they derive explicit bounds on the suboptimality of both primal and dual objective values, yielding principled stopping criteria for the root-finder.
Validated on real space problems. Path planning, Mars EDL, and Lunar landing; Mars runs use real terrain with ~4M discrete states/step, supporting the claims of computational efficiency and low conservatism (near-optimality). All results are in simulation.
Violation-probability semantics only. The formulation bounds the probability of constraint violation and carries no notion of violation severity / tail magnitude — there is no CVaR or coherent-risk content. This is exactly the chance-constraint pole of the CVaR-vs-chance comparison.

Method

Discrete-time stochastic system $x_{k + 1} = f (x_{k}, u_{k}, w_{k})$ , $u_{k} \in U_{k} (x_{k})$ , $w_{k} \sim p_{k} (w_{k})$ , state fully observed, horizon $N$ . Objective: minimize $E {g_{N} (x_{N}) + \sum_{k = 0}^{N - 1} g_{k} (x_{k}, μ_{k} (x_{k}))}$ over policy $μ = {μ_{0}, \dots, μ_{N - 1}}$ subject to the joint chance constraint at risk bound $Δ \in [0, 1]$ . The reformulation replaces the joint constraint by $E {\sum_{k = 1}^{N} I_{k} (x_{k}) ∣ x_{0}} \leq Δ$ and applies Lagrangian duality; the dual function is concave even when the primal is nonconvex, and weak duality bounds the gap. The chain of inequalities (Theorem 1 proof) is: $Pr {⋀_{k} x_{k} \in X_{k}} = 1 - Pr {⋁_{k} x_{k} \in / X_{k}} \geq 1 - \sum_{k} Pr {x_{k} \in / X_{k}} = 1 - E {\sum_{k} I_{k}}$ .

Regime. Planetary EDL / rover / lander guidance — a genuine spacecraft-GNC precedent, but not a manipulator and carrying no base-arm dynamic coupling; it is neither free-flying nor free-floating in our sense. Its transferable content is entirely at the risk/planning layer.

Relevance to thesis

This anchors the chance-constraint side of the CVaR-vs-chance-constraint comparison with a real space precedent: on-board risk management for space missions, formulated (as MER EDL was, at 91%/96% success bounds) as a bound on violation probability. Two things carry over to risk-aware view scoring. (i) The Boole/indicator + Lagrangian-dual DP machinery is a template for imposing a whole-sortie joint chance constraint on an inspection trajectory (many viewpoints, many keep-out surfaces) while keeping the per-step DP/Bellman structure — and it exposes the risk-allocation question (how Boole splits one budget $Δ$ across steps), the DP analogue of the allocation in ren2022chance. (ii) Its central limitation is the thesis motivation: chance constraints are blind to violation severity; later JPL CVaR work (Ahmadi/Ono) cites precisely this probability-only limitation as the reason to move to coherent/tail risk. For a free-flying manipulator inspecting under camera-pose uncertainty, where a severe collision is categorically worse than a marginal one, that is the gap CVaR (conditional_value_at_risk) fills.

Connections

Topics: chance_constraints · motion_planning · model_predictive_control · receding_horizon_control Sources: ren2022chance (per-mode chance constraints + risk allocation) · dixit2023risk (coherent-risk generalization) · majumdar2017how (why probability-only is tail-blind) · conditional_value_at_risk

Key Equations / Quotes

Joint chance constraint and its Boole/indicator sufficient condition (Problem 1 → Problem 2):

Pr {⋀_{k = 1}^{N} x_{k} \in X_{k} x_{0}} \geq 1 - Δ ⟸ E {\sum_{k = 1}^{N} I_{k} (x_{k}) x_{0}} \leq Δ, I_{k} (x_{k}) = 1 [x_{k} \in / X_{k}]

“we (conservatively) reformulate a joint chance constraint as a constraint on the expectation of a summation of indicator random variables, which can be incorporated into the cost function by considering a dual formulation… the primal variables can be optimized by standard dynamic programming, while the dual variable is optimized by a root-finding algorithm that converges exponentially.” (Abstract)

“risk requirements are formulated as bounds on the probability of success, as exemplified by the 91% and 96% bounds… for MER EDL.” (§1)

Open Questions

Boole’s inequality allocates one budget $Δ$ across $N$ steps; for a multi-view inspection sortie with many keep-out surfaces, how should the per-step/per-surface budget be split without compounding conservatism (cf. the $\sum_{k} π_{k} ϵ_{k} = ϵ$ allocation in ren2022chance)?
The method needs a discretized state space (4M states/step here); a 6-DOF free-flying base plus redundant arm has a far larger continuous state — does the dual-DP approach survive without prohibitive discretization?
What is the right CVaR analogue of this joint (whole-trajectory) chance constraint, and does the conservatism ordering survive a Boole-style decomposition of a CVaR budget?

Inspection with a Free-Flying Space Manipulator

Explorer

ono2015chance