-
Inexact Policy Iteration Methods for Large-Scale Markov Decision Processes
Authors:
Matilde Gargiani,
Robin Sieber,
Efe Balta,
Dominic Liao-McPherson,
John Lygeros
Abstract:
We consider inexact policy iteration methods for large-scale infinite-horizon discounted MDPs with finite spaces, a variant of policy iteration where the policy evaluation step is implemented inexactly using an iterative solver for linear systems. In the classical dynamic programming literature, a similar principle is deployed in optimistic policy iteration, where an a-priori fixed-number of itera…
▽ More
We consider inexact policy iteration methods for large-scale infinite-horizon discounted MDPs with finite spaces, a variant of policy iteration where the policy evaluation step is implemented inexactly using an iterative solver for linear systems. In the classical dynamic programming literature, a similar principle is deployed in optimistic policy iteration, where an a-priori fixed-number of iterations of value iteration is used to inexactly solve the policy evaluation step. Inspired by the connection between policy iteration and semismooth Newton's method, we investigate a class of iPI methods that mimic the inexact variants of semismooth Newton's method by adopting a parametric stopping condition to regulate the level of inexactness of the policy evaluation step. For this class of methods we discuss local and global convergence properties and derive a practical range of values for the stopping-condition parameter that provide contraction guarantees. Our analysis is general and therefore encompasses a variety of iterative solvers for policy evaluation, including the standard value iteration as well as more sophisticated ones such as GMRES. As underlined by our analysis, the selection of the inner solver is of fundamental importance for the performance of the overall method. We therefore consider different iterative methods to solve the policy evaluation step and analyze their applicability and contraction properties when used for policy evaluation. We show that the contraction properties of these methods tend to be enhanced by the specific structure of policy evaluation and that there is margin for substantial improvement in terms of convergence rate. Finally, we study the numerical performance of different instances of inexact policy iteration on large-scale MDPs for the design of health policies to control the spread of infectious diseases in epidemiology.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
A Log-domain Interior Point Method for Convex Quadratic Games
Authors:
Bingqi Liu,
Dominic Liao-McPherson
Abstract:
In this paper, we propose an equilibrium-seeking algorithm for finding generalized Nash equilibria of non-cooperative monotone convex quadratic games. Specifically, we recast the Nash equilibrium-seeking problem as variational inequality problem that we solve using a log-domain interior point method and provide a general purpose solver based on this algorithm. This approach is suitable for non-pot…
▽ More
In this paper, we propose an equilibrium-seeking algorithm for finding generalized Nash equilibria of non-cooperative monotone convex quadratic games. Specifically, we recast the Nash equilibrium-seeking problem as variational inequality problem that we solve using a log-domain interior point method and provide a general purpose solver based on this algorithm. This approach is suitable for non-potential, general sum games and does not require extensive structural assumptions. We demonstrate the efficiency and versatility of our method using three benchmark games and demonstrate our algorithm is especially effective on small to medium scale problems.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Stability and Robustness of Distributed Suboptimal Model Predictive Control
Authors:
Giuseppe Belgioioso,
Dominic Liao-McPherson,
Mathias Hudoba de Badyn,
Nicolas Pelzmann,
John Lygeros,
Florian Dörfler
Abstract:
In distributed model predictive control (MPC), the control input at each sampling time is computed by solving a large-scale optimal control problem (OCP) over a finite horizon using distributed algorithms. Typically, such algorithms require several (virtually, infinite) communication rounds between the subsystems to converge, which is a major drawback both computationally and from an energetic per…
▽ More
In distributed model predictive control (MPC), the control input at each sampling time is computed by solving a large-scale optimal control problem (OCP) over a finite horizon using distributed algorithms. Typically, such algorithms require several (virtually, infinite) communication rounds between the subsystems to converge, which is a major drawback both computationally and from an energetic perspective (for wireless systems). Motivated by these challenges, we propose a suboptimal distributed MPC scheme in which the total communication burden is distributed also in time, by maintaining a running solution estimate for the large-scale OCP and updating it at each sampling time. We demonstrate that, under some regularity conditions, the resulting suboptimal MPC control law recovers the qualitative robust stability properties of optimal MPC, if the communication budget at each sampling time is large enough.
△ Less
Submitted 27 March, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Inexact GMRES Policy Iteration for Large-Scale Markov Decision Processes
Authors:
Matilde Gargiani,
Dominic Liao-McPherson,
Andrea Zanelli,
John Lygeros
Abstract:
Policy iteration enjoys a local quadratic rate of contraction, but its iterations are computationally expensive for Markov decision processes (MDPs) with a large number of states. In light of the connection between policy iteration and the semismooth Newton method and taking inspiration from the inexact variants of the latter, we propose \textit{inexact policy iteration}, a new class of methods fo…
▽ More
Policy iteration enjoys a local quadratic rate of contraction, but its iterations are computationally expensive for Markov decision processes (MDPs) with a large number of states. In light of the connection between policy iteration and the semismooth Newton method and taking inspiration from the inexact variants of the latter, we propose \textit{inexact policy iteration}, a new class of methods for large-scale finite MDPs with local contraction guarantees. We then design an instance based on the deployment of GMRES for the approximate policy evaluation step, which we call inexact GMRES policy iteration. Finally, we demonstrate the superior practical performance of inexact GMRES policy iteration on an MDP with 10000 states, where it achieves a $\times 5.8$ and $\times 2.2$ speedup with respect to policy iteration and optimistic policy iteration, respectively.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Online Feedback Equilibrium Seeking
Authors:
Giuseppe Belgioioso,
Dominic Liao-McPherson,
Mathias Hudoba de Badyn,
Saverio Bolognani,
Roy S. Smith,
John Lygeros,
Florian Dörfler
Abstract:
This paper proposes a unifying design framework for dynamic feedback controllers that track solution trajectories of time-varying generalized equations, such as local minimizers of nonlinear programs or competitive equilibria (e.g., Nash) of non-cooperative games. Inspired by the feedback optimization paradigm, the core idea of the proposed approach is to re-purpose classic iterative algorithms fo…
▽ More
This paper proposes a unifying design framework for dynamic feedback controllers that track solution trajectories of time-varying generalized equations, such as local minimizers of nonlinear programs or competitive equilibria (e.g., Nash) of non-cooperative games. Inspired by the feedback optimization paradigm, the core idea of the proposed approach is to re-purpose classic iterative algorithms for solving generalized equations (e.g., Josephy--Newton, forward-backward splitting) as dynamic feedback controllers by integrating online measurements of the continuous-time nonlinear plant. Sufficient conditions for closed-loop stability and robustness of the algorithm-plant cyber-physical interconnection are derived in a sampled-data setting by combining and tailoring results from (monotone) operator, fixed-point, and nonlinear systems theory. Numerical simulations on smart building automation and competitive supply-chain management are presented to support the theoretical findings.
△ Less
Submitted 14 February, 2024; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Dynamic Programming Through the Lens of Semismooth Newton-Type Methods (Extended Version)
Authors:
Matilde Gargiani,
Andrea Zanelli,
Dominic Liao-McPherson,
Tyler Summers,
John Lygeros
Abstract:
Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to solve the Bellman equation. In particular, we prove that policy iteration is equivalent to the exact semismooth Newton method and enjoys local quadratic converg…
▽ More
Policy iteration and value iteration are at the core of many (approximate) dynamic programming methods. For Markov Decision Processes with finite state and action spaces, we show that they are instances of semismooth Newton-type methods to solve the Bellman equation. In particular, we prove that policy iteration is equivalent to the exact semismooth Newton method and enjoys local quadratic convergence rate. This finding is corroborated by extensive numerical evidence in the fields of control and operations research, which confirms that policy iteration generally requires few iterations to achieve convergence even when the number of policies is vast. We then show that value iteration is an instance of the fixed-point iteration method. In this spirit, we develop a novel locally accelerated version of value iteration with global convergence guarantees and negligible extra computational costs.
△ Less
Submitted 24 June, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Sampled-Data Online Feedback Equilibrium Seeking: Stability and Tracking
Authors:
Giuseppe Belgioioso,
Dominic Liao-McPherson,
Mathias Hudoba de Badyn,
Saverio Bolognani,
John Lygeros,
Florian Dörfler
Abstract:
This paper proposes a general framework for constructing feedback controllers that drive complex dynamical systems to "efficient" steady-state (or slowly varying) operating points. Efficiency is encoded using generalized equations which can model a broad spectrum of useful objectives, such as optimality or equilibria (e.g. Nash, Wardrop, etc.) in noncooperative games. The core idea of the proposed…
▽ More
This paper proposes a general framework for constructing feedback controllers that drive complex dynamical systems to "efficient" steady-state (or slowly varying) operating points. Efficiency is encoded using generalized equations which can model a broad spectrum of useful objectives, such as optimality or equilibria (e.g. Nash, Wardrop, etc.) in noncooperative games. The core idea of the proposed approach is to directly implement iterative solution (or equilibrium seeking) algorithms in closed loop with physical systems. Sufficient conditions for closed-loop stability and robustness are derived; these also serve as the first closed-loop stability results for sampled-data feedback-based optimization. Numerical simulations of smart building automation and game-theoretic robotic swarm coordination support the theoretical results.
△ Less
Submitted 16 September, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Feasibility Governor for Linear Model Predictive Control
Authors:
Terrence Skibik,
Dominic Liao-McPherson,
Torbjørn Cunis,
Ilya Kolmanovsky,
Marco M. Nicotra
Abstract:
This paper introduces the Feasibility Governor (FG): an add-on unit that enlarges the region of attraction of Model Predictive Control by manipulating the reference to ensure that the underlying optimal control problem remains feasible. The FG is developed for linear systems subject to polyhedral state and input constraints. Offline computations using polyhedral projection algorithms are used to c…
▽ More
This paper introduces the Feasibility Governor (FG): an add-on unit that enlarges the region of attraction of Model Predictive Control by manipulating the reference to ensure that the underlying optimal control problem remains feasible. The FG is developed for linear systems subject to polyhedral state and input constraints. Offline computations using polyhedral projection algorithms are used to construct the feasibility set. Online implementation relies on the solution of a convex quadratic program that guarantees recursive feasibility. The closed-loop system is shown to satisfy constraints, achieve asymptotic stability, and exhibit zero-offset tracking.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
A Feasibility Governor for Enlarging the Region of Attraction of Linear Model Predictive Controllers
Authors:
Dominic Liao-McPherson,
Terrence Skibik,
Torbjørn Cunis,
Ilya Kolmanovsky,
Marco M. Nicotra
Abstract:
This paper proposes a method for enlarging the region of attraction of Linear Model Predictive Controllers (MPC) when tracking piecewise-constant references in the presence of pointwise-in-time constraints. It consists of an add-on unit, the Feasibility Governor (FG), that manipulates the reference command so as to ensure that the optimal control problem that underlies the MPC feedback law remains…
▽ More
This paper proposes a method for enlarging the region of attraction of Linear Model Predictive Controllers (MPC) when tracking piecewise-constant references in the presence of pointwise-in-time constraints. It consists of an add-on unit, the Feasibility Governor (FG), that manipulates the reference command so as to ensure that the optimal control problem that underlies the MPC feedback law remains feasible. Offline polyhedral projection algorithms based on multi-objective linear programming are employed to compute the set of feasible states and reference commands. Online, the action of the FG is computed by solving a convex quadratic program. The closed-loop system is shown to satisfy constraints, be asymptotically stable, exhibit zero-offset tracking, and display finite-time convergence of the reference.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
An Analysis of Closed-Loop Stability for Linear Model Predictive Control Based on Time-Distributed Optimization
Authors:
Dominic Liao-McPherson,
Terrence Skibik,
Jordan Leung,
Ilya Kolmanovsky,
Marco M. Nicotra
Abstract:
Time-distributed Optimization (TDO) is an approach for reducing the computational burden of Model Predictive Control (MPC). When using TDO, optimization iterations are distributed over time by maintaining a running solution estimate and updating it at each sampling instant. In this paper, TDO applied to input constrained linear MPC is studied in detail, and analytic expressions for the system gain…
▽ More
Time-distributed Optimization (TDO) is an approach for reducing the computational burden of Model Predictive Control (MPC). When using TDO, optimization iterations are distributed over time by maintaining a running solution estimate and updating it at each sampling instant. In this paper, TDO applied to input constrained linear MPC is studied in detail, and analytic expressions for the system gains and a bound on the number of optimization iterations per sampling instant required to guarantee closed-loop stability is derived. Further, it is shown that the closed-loop stability of TDO-based MPC can be guaranteed using multiple mechanisms including increasing the number of solver iterations, preconditioning the optimal control problem, adjusting the MPC cost matrices, and reducing the length of the receding horizon. These results in a linear system setting also provide insights and guidelines that could be more broadly applicable, e.g., to nonlinear MPC.
△ Less
Submitted 23 February, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Sensitivity-based Warmstarting for Nonlinear Model Predictive Control with Polyhedral State and Control Constraints
Authors:
Dominic Liao-McPherson,
Marco M. Nicotra,
Asen L. Dontchev,
Ilya V. Kolmanovsky,
Vladimir. M. Veliov
Abstract:
Model predictive control (MPC) is of increasing interest in applications for constrained control of multivariable systems. However, one of the major obstacles to its broader use is the computation time and effort required to solve a possibly non-convex optimal control problem (OCP) online. This paper introduces a sensitivity-based warmstarting strategy for systems with nonlinear dynamics and polyh…
▽ More
Model predictive control (MPC) is of increasing interest in applications for constrained control of multivariable systems. However, one of the major obstacles to its broader use is the computation time and effort required to solve a possibly non-convex optimal control problem (OCP) online. This paper introduces a sensitivity-based warmstarting strategy for systems with nonlinear dynamics and polyhedral constraints with the goal of reducing the computational footprint of MPC controllers. It predicts changes in the solution of the parameterized OCP as the parameter varies, by calculating the semiderivative of the solution mapping. The main novelty of the paper is that the polyhedrality of the constraints allows us to avoid imposing any constraint qualification conditions or strict complementarity assumptions. A numerical study featuring MPC applied to unmanned aerial vehicles illustrates the proposed approach.
△ Less
Submitted 27 September, 2019; v1 submitted 26 June, 2019;
originally announced June 2019.
-
A Semismooth Predictor Corrector Method for Suboptimal Model Predictive Control
Authors:
Dominic Liao-McPherson,
Marco Nicotra,
Ilya Kolmanovsky
Abstract:
Suboptimal model predictive control is a technique that can reduce the computational cost of model predictive control (MPC) by exploiting its robustness to incomplete optimization. Instead of solving the optimal control problem exactly, this method maintains an estimate of the optimal solution and updates it at each sampling instance. The resulting controller can be viewed as a dynamic compensator…
▽ More
Suboptimal model predictive control is a technique that can reduce the computational cost of model predictive control (MPC) by exploiting its robustness to incomplete optimization. Instead of solving the optimal control problem exactly, this method maintains an estimate of the optimal solution and updates it at each sampling instance. The resulting controller can be viewed as a dynamic compensator which runs in parallel with the plant. This paper explores the use of the semismooth predictor-corrector method to implement suboptimal MPC. The dynamic interconnection of the combined plant-optimizer system is studied using the input-to-state stability framework and sufficient conditions for closed-loop asymptotic stability and constraint enforcement are derived using small gain arguments. Numerical simulations demonstrate the efficacy of the scheme.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Time Distributed Optimization for Model Predictive Control: Stability, Robustness, and Constraint Satisfaction
Authors:
Dominic Liao-McPherson,
Marco Nicotra,
Ilya Kolmanovsky
Abstract:
Time distributed optimization is an implementation strategy that can significantly reduce the computational burden of model predictive control by exploiting its robustness to incomplete optimization. When using this strategy, optimization iterations are distributed over time by maintaining a running solution estimate for the optimal control problem and updating it at each sampling instant. The res…
▽ More
Time distributed optimization is an implementation strategy that can significantly reduce the computational burden of model predictive control by exploiting its robustness to incomplete optimization. When using this strategy, optimization iterations are distributed over time by maintaining a running solution estimate for the optimal control problem and updating it at each sampling instant. The resulting controller can be viewed as a dynamic compensator which is placed in closed-loop with the plant. This paper presents a general systems theoretic analysis framework for time distributed optimization. The coupled plant-optimizer system is analyzed using input-to-state stability concepts and sufficient conditions for stability and constraint satisfaction are derived. When applied to time distributed sequential quadratic programming, the framework significantly extends the existing theoretical analysis for the real-time iteration scheme. Numerical simulations are presented that demonstrate the effectiveness of the scheme.
△ Less
Submitted 27 October, 2019; v1 submitted 6 March, 2019;
originally announced March 2019.
-
FBstab: A Stabilized Semismooth Quadratic Programming Algorithm with Applications in Model Predictive Control
Authors:
Dominic Liao-McPherson,
Ilya Kolmanovsky
Abstract:
This paper introduces the proximally stabilized Fischer-Burmeister method (FBstab); a new algorithm for convex quadratic programming that synergistically combines the proximal point algorithm with a primal-dual semismooth Newton-type method. FBstab is numerically robust, easy to warmstart, handles degenerate primal-dual solutions, detects infeasibility/unboundedness and requires only that the Hess…
▽ More
This paper introduces the proximally stabilized Fischer-Burmeister method (FBstab); a new algorithm for convex quadratic programming that synergistically combines the proximal point algorithm with a primal-dual semismooth Newton-type method. FBstab is numerically robust, easy to warmstart, handles degenerate primal-dual solutions, detects infeasibility/unboundedness and requires only that the Hessian matrix be positive semidefinite. We outline the algorithm, provide convergence and convergence rate proofs, report some numerical results from model predictive control benchmarks, and also include experimental results. We show that FBstab is competitive with and often superior to, state of the art methods, has attractive scaling properties, and is especially promising for model predictive control applications.
△ Less
Submitted 19 May, 2019; v1 submitted 13 January, 2019;
originally announced January 2019.
-
A Semismooth Predictor Corrector Method for Real-Time Constrained Parametric Optimization with Applications in Model Predictive Control
Authors:
Dominic Liao-McPherson,
Marco Nicotra,
Ilya Kolmanovsky
Abstract:
Real-time optimization problems are ubiquitous in control and estimation, and are typically parameterized by incoming measurement data and/or operator commands. This paper proposes solving parameterized constrained nonlinear programs using a semismooth predictor-corrector (SSPC) method. Nonlinear complementarity functions are used to reformulate the first order necessary conditions of the optimiza…
▽ More
Real-time optimization problems are ubiquitous in control and estimation, and are typically parameterized by incoming measurement data and/or operator commands. This paper proposes solving parameterized constrained nonlinear programs using a semismooth predictor-corrector (SSPC) method. Nonlinear complementarity functions are used to reformulate the first order necessary conditions of the optimization problem into a parameterized non-smooth root-finding problem. Starting from an approximate solution, a semismooth Euler-Newton algorithm is proposed for tracking the trajectory of the primal-dual solution as the parameter varies over time. Active set changes are naturally handled by the SSPC method, which only requires the solution of linear systems of equations. The paper establishes conditions under which the solution trajectories of the root-finding problem are well behaved and provides sufficient conditions for ensuring boundedness of the tracking error. Numerical case studies featuring the application of the SSPC method to nonlinear model predictive control are reported and demonstrate the advantages of the proposed method.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.
-
A Regularized and Smoothed Fischer-Burmeister Method for Quadratic Programming with Applications to Model Predictive Control
Authors:
Dominic Liao-McPherson,
Mike Huang,
Ilya Kolmanovsky
Abstract:
This paper considers solving convex quadratic programs (QPs) in a real-time setting using a regularized and smoothed Fischer-Burmeister method (FBRS). The Fischer-Burmeister function is used to map the optimality conditions of the quadratic program to a nonlinear system of equations which is solved using Newton's method. Regularization and smoothing are applied to improve the practical performance…
▽ More
This paper considers solving convex quadratic programs (QPs) in a real-time setting using a regularized and smoothed Fischer-Burmeister method (FBRS). The Fischer-Burmeister function is used to map the optimality conditions of the quadratic program to a nonlinear system of equations which is solved using Newton's method. Regularization and smoothing are applied to improve the practical performance of the algorithm and a merit function is used to globalize convergence. FBRS is simple to code, easy to warmstart, robust to early termination, and has attractive theoretical properties, making it appealing for real-time and embedded applications. Numerical experiments using several predictive control examples show that the proposed method is competitive with other state of the art solvers.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.