Quantitative temporal logic for safe and robust planning, learning, and control
OA Version
Citation
Abstract
Temporal logics enable formal specification of robotic tasks with explicit timing constraints. Signal Temporal Logic (STL) operates over continuous-time signals with real-valued predicates and has established quantitative robustness measures, while Time Window Temporal Logic (TWTL) expresses sequential tasks with bounded time horizons through explicit time windows but previously lacked quantitative semantics. Traditional min-max robustness for STL focuses on critical time points, creating non-smooth optimization landscapes. This dissertation introduces the first quantitative semantics for TWTL, developing both traditional min-max and Arithmetic-Geometric Mean (AGM) robustness. AGM evaluates specification satisfaction holistically across all time points while maintaining soundness guarantees. We make three main contributions. First, we develop AGM robustness semantics for TWTL using arithmetic means for mixed-sign values and geometric means for uniform-sign values. We prove soundness, introduce interval semantics for partial trajectories, and present efficient incremental monitoring algorithms. Similarly, for STL AGM robustness, we also introduce interval semantics for partial trajectories to monitor the robustness value. Second, we present RRT^η, integrating AGM robustness into sampling-based motion planning for STL specifications. We introduce interval semantics for trajectory prefixes, Direction of Increasing AGM Satisfaction (DIAS) vectors for gradient-like guidance, and Fulfillment Priority Logic (FPL) for principled multi-objective composition. We prove probabilistic completeness and asymptotic optimality of the planning algorithm. Experiments demonstrate AGM-based methods discover feasible solutions where traditional robustness approaches fail, with computational advantages through FPL-based composition. Third, we develop Accelerated Proximal Policy Optimization (APPO), combining a hybrid policy architecture with TWTL-based reward shaping for reinforcement learning under delayed rewards. We introduce temporal logic progress measures for credit assignment and prove monotonic improvement with bounded optimality gap (2𝜍𝛾𝛂²)/(1-𝛾)², where 𝜶 is the mixing parameter, 𝛾 is the discount factor, and 𝜍 bounds the expected advantage, and optimal policy preservation under general value function approximation. Experiments demonstrate successful learning for tasks with sparse, delayed rewards where standard RL approaches struggle, including game-playing and sequential manipulation tasks. AGM robustness provides a unifying framework across temporal logic formalisms and application domains, with smooth optimization landscapes, formal guarantees, and demonstrated benefits on diverse robotic systems.
Description
2026
License
Attribution 4.0 International