-
Intrinsic Successive Convexification: Trajectory Optimization on Smooth Manifolds
Authors:
Spencer Kraisler,
Mehran Mesbahi,
Behcet Acikmese
Abstract:
A fundamental issue at the core of trajectory optimization on smooth manifolds is handling the implicit manifold constraint within the dynamics. The conventional approach is to enforce the dynamic model as a constraint. However, we show this approach leads to significantly redundant operations, as well as being heavily dependent on the state space representation. Specifically, we propose an intrin…
▽ More
A fundamental issue at the core of trajectory optimization on smooth manifolds is handling the implicit manifold constraint within the dynamics. The conventional approach is to enforce the dynamic model as a constraint. However, we show this approach leads to significantly redundant operations, as well as being heavily dependent on the state space representation. Specifically, we propose an intrinsic successive convexification methodology for optimal control on smooth manifolds. This so-called iSCvx is then applied to a representative example involving attitude trajectory optimization for a spacecraft subject to non-convex constraints.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Estimation-Aware Trajectory Optimization with Set-Valued Measurement Uncertainties
Authors:
Aditya Deole,
Mehran Mesbahi
Abstract:
In this paper, an optimization-based framework for generating estimation-aware trajectories is presented. In this setup, measurement (output) uncertainties are state-dependent and set-valued. Enveloping ellipsoids are employed to characterize state-dependent uncertainties with unknown distributions. The concept of regularity for set-valued output maps is then introduced, facilitating the formulati…
▽ More
In this paper, an optimization-based framework for generating estimation-aware trajectories is presented. In this setup, measurement (output) uncertainties are state-dependent and set-valued. Enveloping ellipsoids are employed to characterize state-dependent uncertainties with unknown distributions. The concept of regularity for set-valued output maps is then introduced, facilitating the formulation of the estimation-aware trajectory generation problem. Specifically, it is demonstrated that for output-regular maps, one can utilize a set-valued observability measure that is concave with respect to the finite horizon state trajectories. By maximizing this measure, estimation-aware trajectories can then be synthesized for a broad class of systems. Trajectory planning routines are also examined in this work, by which the observability measure is optimized for systems with locally linearized dynamics. To illustrate the effectiveness of the proposed approach, representative examples in the context of trajectory planning with vision-based estimation are presented. Moreover, the paper presents estimation-aware planning for an uncooperative Target-Rendezvous problem, where an Ego-satellite employs an onboard machine learning (ML)-based estimation module to realize the rendezvous trajectory.
△ Less
Submitted 10 May, 2025; v1 submitted 15 January, 2025;
originally announced January 2025.
-
Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems
Authors:
Joshua Holder,
Natasha Jaques,
Mehran Mesbahi
Abstract:
Assignment problems are a classic combinatorial optimization problem in which a group of agents must be assigned to a group of tasks such that maximum utility is achieved while satisfying assignment constraints. Given the utility of each agent completing each task, polynomial-time algorithms exist to solve a single assignment problem in its simplest form. However, in many modern-day applications s…
▽ More
Assignment problems are a classic combinatorial optimization problem in which a group of agents must be assigned to a group of tasks such that maximum utility is achieved while satisfying assignment constraints. Given the utility of each agent completing each task, polynomial-time algorithms exist to solve a single assignment problem in its simplest form. However, in many modern-day applications such as satellite constellations, power grids, and mobile robot scheduling, assignment problems unfold over time, with the utility for a given assignment depending heavily on the state of the system. We apply multi-agent reinforcement learning to this problem, learning the value of assignments by bootstrapping from a known polynomial-time greedy solver and then learning from further experience. We then choose assignments using a distributed optimal assignment mechanism rather than by selecting them directly. We demonstrate that this algorithm is theoretically justified and avoids pitfalls experienced by other RL algorithms in this setting. Finally, we show that our algorithm significantly outperforms other methods in the literature, even while scaling to realistic scenarios with hundreds of agents and tasks.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Policy Optimization in Control: Geometry and Algorithmic Implications
Authors:
Shahriar Talebi,
Yang Zheng,
Spencer Kraisler,
Na Li,
Mehran Mesbahi
Abstract:
This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of…
▽ More
This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Six-Degree-of-Freedom Aircraft Landing Trajectory Planning with Runway Alignment
Authors:
Taewan Kim,
Abhinav G. Kamath,
Niyousha Rahimi,
Jasper Corleis,
Behçet Açıkmeşe,
Mehran Mesbahi
Abstract:
This paper presents a numerical optimization algorithm for generating approach and landing trajectories for a six-degree-of-freedom (6-DoF) aircraft. We improve on the existing research on aircraft landing trajectory generation by formulating the trajectory optimization problem with additional real-world operational constraints, including 6-DoF aircraft dynamics, runway alignment, constant wind fi…
▽ More
This paper presents a numerical optimization algorithm for generating approach and landing trajectories for a six-degree-of-freedom (6-DoF) aircraft. We improve on the existing research on aircraft landing trajectory generation by formulating the trajectory optimization problem with additional real-world operational constraints, including 6-DoF aircraft dynamics, runway alignment, constant wind field, and obstacle avoidance, to obtain a continuous-time nonconvex optimal control problem. Particularly, the runway alignment constraint enforces the trajectory of the aircraft to be aligned with the runway only during the final approach phase. This is a novel feature that is essential for preventing an approach that is either too steep or too shallow. The proposed method models the runway alignment constraint through a multi-phase trajectory planning scheme, imposing alignment conditions exclusively during the final approach phase. We compare this formulation with the existing state-triggered constraint formulation for runway alignment. To solve the formulated problem, we design a novel sequential convex programming algorithm called xPTR that extends the penalized trust-region (PTR) algorithm by incorporating an extrapolation step to expedite convergence. We validate the proposed method through extensive numerical simulations, including a Monte Carlo study, to evaluate the robustness of the algorithm to varying initial conditions.
△ Less
Submitted 10 June, 2025; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Output-feedback Synthesis Orbit Geometry: Quotient Manifolds and LQG Direct Policy Optimization
Authors:
Spencer Kraisler,
Mehran Mesbahi
Abstract:
We consider direct policy optimization for the linear-quadratic Gaussian (LQG) setting. Over the past few years, it has been recognized that the landscape of dynamic output-feedback controllers of relevance to LQG has an intricate geometry, particularly pertaining to the existence of degenerate stationary points, that hinders gradient methods. In order to address these challenges, in this paper, w…
▽ More
We consider direct policy optimization for the linear-quadratic Gaussian (LQG) setting. Over the past few years, it has been recognized that the landscape of dynamic output-feedback controllers of relevance to LQG has an intricate geometry, particularly pertaining to the existence of degenerate stationary points, that hinders gradient methods. In order to address these challenges, in this paper, we adopt a system-theoretic coordinate-invariant Riemannian metric for the space of dynamic output-feedback controllers and develop a Riemannian gradient descent for direct LQG policy optimization. We then proceed to prove that the orbit space of such controllers, modulo the coordinate transformation, admits a Riemannian quotient manifold structure. This geometric structure--that is of independent interest--provides an effective approach to derive direct policy optimization algorithms for LQG with a local linear rate convergence guarantee. Subsequently, we show that the proposed approach exhibits significantly faster and more robust numerical performance as compared with ordinary gradient descent.
△ Less
Submitted 15 August, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Data-Guided Regulator for Adaptive Nonlinear Control
Authors:
Niyousha Rahimi,
Mehran Mesbahi
Abstract:
This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics. Such disturbances are modeled as the "unknown" part of the system dynamics. The goal is to achieve finite-time regulation of system states through direct policy updates while also generating informative data that…
▽ More
This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics. Such disturbances are modeled as the "unknown" part of the system dynamics. The goal is to achieve finite-time regulation of system states through direct policy updates while also generating informative data that can subsequently be used for data-driven stabilization or system identification. First, we expand upon the notion of "regularizability" and characterize this system characteristic for a linear time-varying representation of the nonlinear system with locally-bounded higher-order terms. "Rapid-regularizability" then gauges the extent by which a system can be regulated in finite time, in contrast to its asymptotic behavior. We then propose the Data-Guided Regulation for Adaptive Nonlinear Control ( DG-RAN) algorithm, an online iterative synthesis procedure that utilizes discrete time-series data from a single trajectory for regulating system states and identifying disturbance dynamics. The effectiveness of our approach is demonstrated on a 6-DOF power descent guidance problem in the presence of adverse environmental disturbances.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
An Active-Sensing Approach for Bearing-based Target Localization
Authors:
Beniamino Pozzan,
Giulia Michieletto,
Mehran Mesbahi,
Angelo Cenedese
Abstract:
Characterized by a cross-disciplinary nature, the bearing-based target localization task involves estimating the position of an entity of interest by a group of agents capable of collecting noisy bearing measurements. In this work, this problem is tackled by resting both on the weighted least square estimation approach and on the active-sensing control paradigm. Indeed, we propose an iterative alg…
▽ More
Characterized by a cross-disciplinary nature, the bearing-based target localization task involves estimating the position of an entity of interest by a group of agents capable of collecting noisy bearing measurements. In this work, this problem is tackled by resting both on the weighted least square estimation approach and on the active-sensing control paradigm. Indeed, we propose an iterative algorithm that provides an estimate of the target position under the assumption of Gaussian noise distribution, which can be considered valid when more specific information is missing. Then, we present a seeker agents control law that aims at minimizing the localization uncertainty by optimizing the covariance matrix associated with the estimated target position. The validity of the designed bearing-based target localization solution is confirmed by the results of an extensive Monte Carlo simulation campaign.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Consensus on Lie groups for the Riemannian Center of Mass
Authors:
Spencer Kraisler,
Shahriar Talebi,
Mehran Mesbahi
Abstract:
In this paper, we develop a consensus algorithm for distributed computation of the Riemannian center of mass (RCM) on Lie Groups. The algorithm is built upon a distributed optimization reformulation that allows developing an intrinsic, distributed (without relying on a consensus subroutine), and a computationally efficient protocol for the RCM computation. The novel idea for developing this fast d…
▽ More
In this paper, we develop a consensus algorithm for distributed computation of the Riemannian center of mass (RCM) on Lie Groups. The algorithm is built upon a distributed optimization reformulation that allows developing an intrinsic, distributed (without relying on a consensus subroutine), and a computationally efficient protocol for the RCM computation. The novel idea for developing this fast distributed algorithm is to utilize a Riemannian version of distributed gradient flow combined with a gradient tracking technique. We first guarantee that, under certain conditions, the limit point of our algorithm is the RCM point of interest. We then provide a proof of global convergence in the Euclidean setting, that can be viewed as a "geometric" dynamic consensus that converges to the average from arbitrary initial points. Finally, we proceed to showcase the superior convergence properties of the proposed approach as compared with other classes of consensus optimization-based algorithms for the RCM computation.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Data-driven Optimal Filtering for Linear Systems with Unknown Noise Covariances
Authors:
Shahriar Talebi,
Amirhossein Taghvaei,
Mehran Mesbahi
Abstract:
This paper examines learning the optimal filtering policy, known as the Kalman gain, for a linear system with unknown noise covariance matrices using noisy output data. The learning problem is formulated as a stochastic policy optimization problem, aiming to minimize the output prediction error. This formulation provides a direct bridge between data-driven optimal control and, its dual, optimal fi…
▽ More
This paper examines learning the optimal filtering policy, known as the Kalman gain, for a linear system with unknown noise covariance matrices using noisy output data. The learning problem is formulated as a stochastic policy optimization problem, aiming to minimize the output prediction error. This formulation provides a direct bridge between data-driven optimal control and, its dual, optimal filtering. Our contributions are twofold. Firstly, we conduct a thorough convergence analysis of the stochastic gradient descent algorithm, adopted for the filtering problem, accounting for biased gradients and stability constraints. Secondly, we carefully leverage a combination of tools from linear system theory and high-dimensional statistics to derive bias-variance error bounds that scale logarithmically with problem dimension, and, in contrast to subspace methods, the length of output trajectories only affects the bias term.
△ Less
Submitted 26 October, 2023; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Optimization-based Constrained Funnel Synthesis for Systems with Lipschitz Nonlinearities via Numerical Optimal Control
Authors:
Taewan Kim,
Purnanand Elango,
Taylor P. Reynolds,
Behçet Açıkmeşe,
Mehran Mesbahi
Abstract:
This paper presents a funnel synthesis algorithm for computing controlled invariant sets and feedback control gains around a given nominal trajectory for dynamical systems with locally Lipschitz nonlinearities and bounded disturbances. The resulting funnel synthesis problem involves a differential linear matrix inequality (DLMI) whose solution satisfies a Lyapunov condition that implies invariance…
▽ More
This paper presents a funnel synthesis algorithm for computing controlled invariant sets and feedback control gains around a given nominal trajectory for dynamical systems with locally Lipschitz nonlinearities and bounded disturbances. The resulting funnel synthesis problem involves a differential linear matrix inequality (DLMI) whose solution satisfies a Lyapunov condition that implies invariance and attractivity properties. Due to these properties, the proposed method can balance maximization of initial invariant funnel size, i.e., size of the funnel entry, and minimization of the size of the attractive funnel for attenuating the effect of disturbance. To solve the resulting funnel synthesis problem with the DLMI as constraints, we employ a numerical optimal control approach that uses a multiple shooting method to convert the problem into a finite dimensional semidefinite programming problem. This framework does not require piecewise linear system matrices and funnel parameters, which is typically assumed in recent related work. We illustrate the proposed funnel synthesis method with a numerical example.
△ Less
Submitted 1 July, 2023; v1 submitted 18 March, 2023;
originally announced March 2023.
-
Duality-Based Stochastic Policy Optimization for Estimation with Unknown Noise Covariances
Authors:
Shahriar Talebi,
Amirhossein Taghvaei,
Mehran Mesbahi
Abstract:
Duality of control and estimation allows mapping recent advances in data-guided control to the estimation setup. This paper formalizes and utilizes such a mapping to consider learning the optimal (steady-state) Kalman gain when process and measurement noise statistics are unknown. Specifically, building on the duality between synthesizing optimal control and estimation gains, the filter design pro…
▽ More
Duality of control and estimation allows mapping recent advances in data-guided control to the estimation setup. This paper formalizes and utilizes such a mapping to consider learning the optimal (steady-state) Kalman gain when process and measurement noise statistics are unknown. Specifically, building on the duality between synthesizing optimal control and estimation gains, the filter design problem is formalized as direct policy learning. In this direction, the duality is used to extend existing theoretical guarantees of direct policy updates for Linear Quadratic Regulator (LQR) to establish global convergence of the Gradient Descent (GD) algorithm for the estimation problem--while addressing subtle differences between the two synthesis problems. Subsequently, a Stochastic Gradient Descent (SGD) approach is adopted to learn the optimal Kalman gain without the knowledge of noise covariances. The results are illustrated via several numerical examples.
△ Less
Submitted 6 March, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies
Authors:
Bin Hu,
Kaiqing Zhang,
Na Li,
Mehran Mesbahi,
Maryam Fazel,
Tamer Başar
Abstract:
Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synt…
▽ More
Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), $\mathcal{H}_\infty$ control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Vector-valued Privacy-Preserving Average Consensus
Authors:
Lulu Pan,
Haibin Shao,
Yang Lu,
Mehran Mesbahi,
Dewei Li,
Yugeng Xi
Abstract:
Achieving average consensus without disclosing sensitive information can be a critical concern for multi-agent coordination. This paper examines privacy-preserving average consensus (PPAC) for vector-valued multi-agent networks. In particular, a set of agents with vector-valued states aim to collaboratively reach an exact average consensus of their initial states, while each agent's initial state…
▽ More
Achieving average consensus without disclosing sensitive information can be a critical concern for multi-agent coordination. This paper examines privacy-preserving average consensus (PPAC) for vector-valued multi-agent networks. In particular, a set of agents with vector-valued states aim to collaboratively reach an exact average consensus of their initial states, while each agent's initial state cannot be disclosed to other agents. We show that the vector-valued PPAC problem can be solved via associated matrix-weighted networks with the higher-dimensional agent state. Specifically, a novel distributed vector-valued PPAC algorithm is proposed by lifting the agent-state to higher-dimensional space and designing the associated matrix-weighted network with dynamic, low-rank, positive semi-definite coupling matrices to both conceal the vector-valued agent state and guarantee that the multi-agent network asymptotically converges to the average consensus. Essentially, the convergence analysis can be transformed into the average consensus problem on switching matrix-weighted networks. We show that the exact average consensus can be guaranteed and the initial agents' states can be kept private if each agent has at least one "legitimate" neighbor. The algorithm, involving only basic matrix operations, is computationally more efficient than cryptography-based approaches and can be implemented in a fully distributed manner without relying on a third party. Numerical simulation is provided to illustrate the effectiveness of the proposed algorithm.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Structural Adaptivity of Directed Networks
Authors:
Lulu Pan,
Haibin Shao,
Mehran Mesbahi,
Dewei Li,
Yugeng Xi
Abstract:
Network structure plays a critical role in functionality and performance of network systems. This paper examines structural adaptivity of diffusively coupled, directed multi-agent networks that are subject to diffusion performance. Inspired by the observation that the link redundancy in a network may degrade its diffusion performance, a distributed data-driven neighbor selection framework is propo…
▽ More
Network structure plays a critical role in functionality and performance of network systems. This paper examines structural adaptivity of diffusively coupled, directed multi-agent networks that are subject to diffusion performance. Inspired by the observation that the link redundancy in a network may degrade its diffusion performance, a distributed data-driven neighbor selection framework is proposed to adaptively adjust the network structure for improving the diffusion performance of exogenous influence over the network. Specifically, each agent is allowed to interact with only a specific subset of neighbors while global reachability from exogenous influence to all agents of the network is maintained. Both continuous-time and discrete-time directed networks are examined. For each of the two cases, we first examine the reachability properties encoded in the eigenvectors of perturbed variants of graph Laplacian or SIA matrix associated with directed networks, respectively. Then, an eigenvector-based rule for neighbor selection is proposed to derive a reduced network, on which the diffusion performance is enhanced. Finally, motivated by the necessity of distributed and data-driven implementation of the neighbor selection rule, quantitative connections between eigenvectors of the perturbed graph Laplacian and SIA matrix and relative rate of change in agent state are established, respectively. These connections immediately enable a data-driven inference of the reduced neighbor set for each agent using only locally accessible data. As an immediate extension, we further discuss the distributed data-driven construction of directed spanning trees of directed networks using the proposed neighbor selection framework. Numerical simulations are provided to demonstrate the theoretical results.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
To charge in-flight or not: an inquiry into parallel-hybrid electric aircraft configurations via optimal control
Authors:
Mengyuan Wang,
Mehran Mesbahi
Abstract:
We examine two configurations for parallel hybrid electric aircraft, one with, and one without, a mechanical connection between the engines and the electric motors. For this two designs, we then review the power allocation problem in the context of aircraft energy management for a 19-seat conceptual Hybrid Electric Aircraft. We then represent the original optimal control problem as a finite-dimens…
▽ More
We examine two configurations for parallel hybrid electric aircraft, one with, and one without, a mechanical connection between the engines and the electric motors. For this two designs, we then review the power allocation problem in the context of aircraft energy management for a 19-seat conceptual Hybrid Electric Aircraft. We then represent the original optimal control problem as a finite-dimensional optimization and validate the second-order sufficient conditions for global optimality of the obtained solution. This is then followed by a sensitivity analysis of the fuel consumption on the initial aircraft weight and flight endurance. Our simulation and theoretical results clarify the limited benefit of charging the battery in-flight for this class of hybrid electric aircraft to reduce $CO_2$ emissions.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Vertiport Selection in Hybrid Air-Ground Transportation Networks via Mathematical Programs with Equilibrium Constraints
Authors:
Yue Yu,
Mengyuan Wang,
Mehran Mesbahi,
Ufuk Topcu
Abstract:
Urban air mobility is a concept that promotes aerial modes of transport in urban areas. In these areas, the location and capacity of the vertiports--where the travelers embark and disembark the aircraft--not only affect the flight delays of the aircraft, but can also aggravate the congestion of ground vehicles by creating extra ground travel demands. We introduce a mathematical model for selecting…
▽ More
Urban air mobility is a concept that promotes aerial modes of transport in urban areas. In these areas, the location and capacity of the vertiports--where the travelers embark and disembark the aircraft--not only affect the flight delays of the aircraft, but can also aggravate the congestion of ground vehicles by creating extra ground travel demands. We introduce a mathematical model for selecting the location and capacity of the vertiports that minimizes the traffic congestion in hybrid air-ground transportation networks. Our model is based on a mathematical program with bilinear equilibrium constraints. Furthermore, we show how to compute a global optimal solution of this mathematical program by solving a mixed integer linear program. We demonstrate our results via the Anaheim transportation network model, which contains more than 400 nodes and 900 links.
△ Less
Submitted 1 July, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Policy Optimization over Submanifolds for Linearly Constrained Feedback Synthesis
Authors:
Shahriar Talebi,
Mehran Mesbahi
Abstract:
In this paper, we study linearly constrained policy optimization over the manifold of Schur stabilizing controllers, equipped with a Riemannian metric that emerges naturally in the context of optimal control problems. We provide extrinsic analysis of a generic constrained smooth cost function, that subsequently facilitates subsuming any such constrained problem into this framework. By studying the…
▽ More
In this paper, we study linearly constrained policy optimization over the manifold of Schur stabilizing controllers, equipped with a Riemannian metric that emerges naturally in the context of optimal control problems. We provide extrinsic analysis of a generic constrained smooth cost function, that subsequently facilitates subsuming any such constrained problem into this framework. By studying the second order geometry of this manifold, we provide a Newton-type algorithm that does not rely on the exponential mapping nor a retraction, while ensuring local convergence guarantees. The algorithm hinges instead upon the developed stability certificate and the linear structure of the constraints. We then apply our methodology to two well-known constrained optimal control problems. Finally, several numerical examples showcase the performance of the proposed algorithm.
△ Less
Submitted 26 October, 2023; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Discrete-Time Linear-Quadratic Regulation via Optimal Transport
Authors:
Mathias Hudoba de Badyn,
Erik Miehling,
Dylan Janak,
Behçet Açıkmeşe,
Mehran Mesbahi,
Tamer Başar,
John Lygeros,
Roy S. Smith
Abstract:
In this paper, we consider a discrete-time stochastic control problem with uncertain initial and target states. We first discuss the connection between optimal transport and stochastic control problems of this form. Next, we formulate a linear-quadratic regulator problem where the initial and terminal states are distributed according to specified probability densities. A closed-form solution for t…
▽ More
In this paper, we consider a discrete-time stochastic control problem with uncertain initial and target states. We first discuss the connection between optimal transport and stochastic control problems of this form. Next, we formulate a linear-quadratic regulator problem where the initial and terminal states are distributed according to specified probability densities. A closed-form solution for the optimal transport map in the case of linear-time varying systems is derived, along with an algorithm for computing the optimal map. Two numerical examples pertaining to swarm deployment demonstrate the practical applicability of the model, and performance of the numerical method.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Distributed Neighbor Selection in Multi-agent Networks
Authors:
Haibin Shao,
Lulu Pan,
Mehran Mesbahi,
Yugeng Xi,
Dewei Li
Abstract:
Achieving consensus via nearest neighbor rules is an important prerequisite for multi-agent networks to accomplish collective tasks. A common assumption in consensus setup is that each agent interacts with all its neighbors. This paper examines whether network functionality and performance can be maintained-and even enhanced-when agents interact only with a subset of their respective (available) n…
▽ More
Achieving consensus via nearest neighbor rules is an important prerequisite for multi-agent networks to accomplish collective tasks. A common assumption in consensus setup is that each agent interacts with all its neighbors. This paper examines whether network functionality and performance can be maintained-and even enhanced-when agents interact only with a subset of their respective (available) neighbors. As shown in the paper, the answer to this inquiry is affirmative. In this direction, we show that by exploring the monotonicity property of the Laplacian eigenvectors, a neighbor selection rule with guaranteed performance enhancements, can be realized for consensus-type networks. For distributed implementation, a quantitative connection between entries of Laplacian eigenvectors and the "relative rate of change" in the state between neighboring agents is further established; this connection facilitates a distributed algorithm for each agent to identify "favorable" neighbors to interact with. Multi-agent networks with and without external influence are examined, as well as extensions to signed networks. This paper underscores the utility of Laplacian eigenvectors in the context of distributed neighbor selection, providing novel insights into distributed data-driven control of multi-agent systems.
△ Less
Submitted 22 June, 2022; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Cluster Consensus on Matrix-weighted Switching Networks
Authors:
Lulu Pan,
Haibin Shao,
Mehran Mesbahi,
Dewei Li,
Yugeng Xi
Abstract:
This paper examines the cluster consensus problem of multi-agent systems on matrix-weighted switching networks. Necessary and/or sufficient conditions under which cluster consensus can be achieved are obtained and quantitative characterization of the steady-state of the cluster consensus are provided as well. Specifically, if the underlying network switches amongst finite number of networks, a nec…
▽ More
This paper examines the cluster consensus problem of multi-agent systems on matrix-weighted switching networks. Necessary and/or sufficient conditions under which cluster consensus can be achieved are obtained and quantitative characterization of the steady-state of the cluster consensus are provided as well. Specifically, if the underlying network switches amongst finite number of networks, a necessary condition for cluster consensus of multi-agent system on switching matrix-weighted networks is firstly presented, it is shown that the steady-state of the system lies in the intersection of the null space of matrix-valued Laplacians corresponding to all switching networks. Second, if the underlying network switches amongst infinite number of networks, the matrix-weighted integral network is employed to provide sufficient conditions for cluster consensus and the quantitative characterization of the corresponding steady-state of the multi-agent system, using null space analysis of matrix-valued Laplacian related of integral network associated with the switching networks. In particular, conditions for the bipartite consensus under the matrix-weighted switching networks are examined. Simulation results are finally provided to demonstrate the theoretical analysis.
△ Less
Submitted 20 July, 2021; v1 submitted 20 July, 2021;
originally announced July 2021.
-
Data-Driven Structured Policy Iteration for Homogeneous Distributed Systems
Authors:
Siavash Alemzadeh,
Shahriar Talebi,
Mehran Mesbahi
Abstract:
Control of networked systems, comprised of interacting agents, is often achieved through modeling the underlying interactions. Constructing accurate models of such interactions--in the meantime--can become prohibitive in applications. Data-driven control methods avoid such complications by directly synthesizing a controller from the observed data. In this paper, we propose an algorithm referred to…
▽ More
Control of networked systems, comprised of interacting agents, is often achieved through modeling the underlying interactions. Constructing accurate models of such interactions--in the meantime--can become prohibitive in applications. Data-driven control methods avoid such complications by directly synthesizing a controller from the observed data. In this paper, we propose an algorithm referred to as Data-driven Structured Policy Iteration (D2SPI), for synthesizing an efficient feedback mechanism that respects the sparsity pattern induced by the underlying interaction network. In particular, our algorithm uses temporary "auxiliary" communication links in order to enable the required information exchange on a (smaller) sub-network during the "learning phase" -- links that will be removed subsequently for the final distributed feedback synthesis. We then proceed to show that the learned policy results in a stabilizing structured policy for the entire network. Our analysis is then followed by showing the stability and convergence of the proposed distributed policies throughout the learning phase, exploiting a construct referred to as the "Patterned monoid.'' The performance of D2SPI is then demonstrated using representative simulation scenarios.
△ Less
Submitted 16 November, 2023; v1 submitted 21 March, 2021;
originally announced March 2021.
-
On Controllability and Persistency of Excitation in Data-Driven Control: Extensions of Willems' Fundamental Lemma
Authors:
Yue Yu,
Shahriar Talebi,
Henk J. van Waarde,
Ufuk Topcu,
Mehran Mesbahi,
Behçet Açıkmeşe
Abstract:
Willems' fundamental lemma asserts that all trajectories of a linear time-invariant system can be obtained from a finite number of measured ones, assuming that controllability and a persistency of excitation condition hold. We show that these two conditions can be relaxed. First, we prove that the controllability condition can be replaced by a condition on the controllable subspace, unobservable s…
▽ More
Willems' fundamental lemma asserts that all trajectories of a linear time-invariant system can be obtained from a finite number of measured ones, assuming that controllability and a persistency of excitation condition hold. We show that these two conditions can be relaxed. First, we prove that the controllability condition can be replaced by a condition on the controllable subspace, unobservable subspace, and a certain subspace associated with the measured trajectories. Second, we prove that the persistency of excitation requirement can be relaxed if the degree of a certain minimal polynomial is tightly bounded. Our results show that data-driven predictive control using online data is equivalent to model predictive control, even for uncontrollable systems. Moreover, our results significantly reduce the amount of data needed in identifying homogeneous multi-agent systems.
△ Less
Submitted 9 April, 2021; v1 submitted 4 February, 2021;
originally announced February 2021.
-
Adaptive Traffic Control with Deep Reinforcement Learning: Towards State-of-the-art and Beyond
Authors:
Siavash Alemzadeh,
Ramin Moslemi,
Ratnesh Sharma,
Mehran Mesbahi
Abstract:
In this work, we study adaptive data-guided traffic planning and control using Reinforcement Learning (RL). We shift from the plain use of classic methods towards state-of-the-art in deep RL community. We embed several recent techniques in our algorithm that improve the original Deep Q-Networks (DQN) for discrete control and discuss the traffic-related interpretations that follow. We propose a nov…
▽ More
In this work, we study adaptive data-guided traffic planning and control using Reinforcement Learning (RL). We shift from the plain use of classic methods towards state-of-the-art in deep RL community. We embed several recent techniques in our algorithm that improve the original Deep Q-Networks (DQN) for discrete control and discuss the traffic-related interpretations that follow. We propose a novel DQN-based algorithm for Traffic Control (called TC-DQN+) as a tool for fast and more reliable traffic decision-making. We introduce a new form of reward function which is further discussed using illustrative examples with comparisons to traditional traffic control methods.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
Deep Learning-based Resource Allocation for Infrastructure Resilience
Authors:
Siavash Alemzadeh,
Hesam Talebiyan,
Shahriar Talebi,
Leonardo Duenas-Osorio,
Mehran Mesbahi
Abstract:
From an optimization point of view, resource allocation is one of the cornerstones of research for addressing limiting factors commonly arising in applications such as power outages and traffic jams. In this paper, we take a data-driven approach to estimate an optimal nodal restoration sequence for immediate recovery of the infrastructure networks after natural disasters such as earthquakes. We ge…
▽ More
From an optimization point of view, resource allocation is one of the cornerstones of research for addressing limiting factors commonly arising in applications such as power outages and traffic jams. In this paper, we take a data-driven approach to estimate an optimal nodal restoration sequence for immediate recovery of the infrastructure networks after natural disasters such as earthquakes. We generate data from td-INDP, a high-fidelity simulator of optimal restoration strategies for interdependent networks, and employ deep neural networks to approximate those strategies. Despite the fact that the underlying problem is NP-complete, the restoration sequences obtained by our method are observed to be nearly optimal. In addition, by training multiple models---the so-called estimators---for a variety of resource availability levels, our proposed method balances a trade-off between resource utilization and restoration time. Decision-makers can use our trained models to allocate resources more efficiently after contingencies, and in turn, improve the community resilience. Besides their predictive power, such trained estimators unravel the effect of interdependencies among different nodal functionalities in the restoration strategies. We showcase our methodology by the real-world interdependent infrastructure of Shelby County, TN.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Graph-theoretic optimization for edge consensus
Authors:
Mathias Hudoba de Badyn,
Dillon R. Foight,
Daniel Calderone,
Mehran Mesbahi,
Roy S. Smith
Abstract:
We consider network structures that optimize the $\mathcal{H}_2$ norm of weighted, time scaled consensus networks, under a minimal representation of such consensus networks described by the edge Laplacian. We show that a greedy algorithm can be used to find the minimum-$\mathcal{H}_2$ norm spanning tree, as well as how to choose edges to optimize the $\mathcal{H}_2$ norm when edges are added back…
▽ More
We consider network structures that optimize the $\mathcal{H}_2$ norm of weighted, time scaled consensus networks, under a minimal representation of such consensus networks described by the edge Laplacian. We show that a greedy algorithm can be used to find the minimum-$\mathcal{H}_2$ norm spanning tree, as well as how to choose edges to optimize the $\mathcal{H}_2$ norm when edges are added back to a spanning tree. In the case of edge consensus with a measurement model considering all edges in the graph, we show that adding edges between slow nodes in the graph provides the smallest increase in the $\mathcal{H}_2$ norm.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Policy Gradient-based Algorithms for Continuous-time Linear Quadratic Control
Authors:
Jingjing Bu,
Afshin Mesbahi,
Mehran Mesbahi
Abstract:
We consider the continuous-time Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. The results developed are in parallel to those in Bu et al. [1] for discrete-time LTI systems. In this direction, we characterize several analytical properties (smoothness, coerciveness, quadratic growth) that are crucial in the analysis of g…
▽ More
We consider the continuous-time Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. The results developed are in parallel to those in Bu et al. [1] for discrete-time LTI systems. In this direction, we characterize several analytical properties (smoothness, coerciveness, quadratic growth) that are crucial in the analysis of gradient-based algorithms. We also point out similarities and distinctive features of the continuous time setup in comparison with its discrete time analogue. First, we examine three types of well-posed flows direct policy update for LQR: gradient flow, natural gradient flow and the quasi-Newton flow. The coercive property of the corresponding cost function suggests that these flows admit unique solutions while the gradient dominated property indicates that the underling Lyapunov functionals decay at an exponential rate; quadratic growth on the other hand guarantees that the trajectories of these flows are exponentially stable in the sense of Lyapunov. We then discuss the forward Euler discretization of these flows, realized as gradient descent, natural gradient descent and quasi-Newton iteration. We present stepsize criteria for gradient descent and natural gradient descent, guaranteeing that both algorithms converge linearly to the global optima. An optimal stepsize for the quasi-Newton iteration is also proposed, guaranteeing a $Q$-quadratic convergence rate--and in the meantime--recovering the Kleinman-Newton iteration. Lastly, we examine LQR state feedback synthesis with a sparsity pattern. In this case, we develop the necessary formalism and insights for projected gradient descent, allowing us to guarantee a sublinear rate of convergence to a first-order stationary point.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
A Note on Nesterov's Accelerated Method in Nonconvex Optimization: a Weak Estimate Sequence Approach
Authors:
Jingjing Bu,
Mehran Mesbahi
Abstract:
We present a variant of accelerated gradient descent algorithms, adapted from Nesterov's optimal first-order methods, for weakly-quasi-convex and weakly-quasi-strongly-convex functions. We show that by tweaking the so-called estimate sequence method, the derived algorithm achieves optimal convergence rate for weakly-quasi-convex and weakly-quasi-strongly-convex in terms of oracle complexity. In pa…
▽ More
We present a variant of accelerated gradient descent algorithms, adapted from Nesterov's optimal first-order methods, for weakly-quasi-convex and weakly-quasi-strongly-convex functions. We show that by tweaking the so-called estimate sequence method, the derived algorithm achieves optimal convergence rate for weakly-quasi-convex and weakly-quasi-strongly-convex in terms of oracle complexity. In particular, for a weakly-quasi-convex function with Lipschitz continuous gradient, we require $O(\frac{1}{\sqrt{\varepsilon}})$ iterations to acquire an $\varepsilon$-solution; for weakly-quasi-strongly-convex functions, the iteration complexity is $O\left( \ln\left(\frac{1}{\varepsilon}\right) \right)$. Furthermore, we discuss the implications of these algorithms for linear quadratic optimal control problem.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Performance and design of consensus on matrix-weighted and time scaled graphs
Authors:
Dillon R. Foight,
Mathias Hudoba de Badyn,
Mehran Mesbahi
Abstract:
In this paper, we consider the $\mathcal{H}_2$-norm of networked systems with multi-time scale consensus dynamics and vector-valued agent states. This allows us to explore how measurement and process noise affect consensus on matrix-weighted graphs by examining edge-state consensus. In particular, we highlight an interesting case where the influences of the weighting and scaling on the…
▽ More
In this paper, we consider the $\mathcal{H}_2$-norm of networked systems with multi-time scale consensus dynamics and vector-valued agent states. This allows us to explore how measurement and process noise affect consensus on matrix-weighted graphs by examining edge-state consensus. In particular, we highlight an interesting case where the influences of the weighting and scaling on the $\mathcal{H}_2$ norm can be separated in the design problem. We then consider optimization algorithms for updating the time scale parameters and matrix weights in order to minimize network response to injected noise. Finally, we present an application to formation control for multi-vehicle systems.
△ Less
Submitted 24 December, 2020; v1 submitted 5 June, 2020;
originally announced June 2020.
-
From noisy data to feedback controllers: non-conservative design via a matrix S-lemma
Authors:
Henk J. van Waarde,
M. Kanat Camlibel,
Mehran Mesbahi
Abstract:
We propose a new method to obtain feedback controllers of an unknown dynamical system directly from noisy input/state data. The key ingredient of our design is a new matrix S-lemma that will be proven in this paper. We provide both strict and non-strict versions of this S-lemma, that are of interest in their own right. Thereafter, we will apply these results to data-driven control. In particular,…
▽ More
We propose a new method to obtain feedback controllers of an unknown dynamical system directly from noisy input/state data. The key ingredient of our design is a new matrix S-lemma that will be proven in this paper. We provide both strict and non-strict versions of this S-lemma, that are of interest in their own right. Thereafter, we will apply these results to data-driven control. In particular, we will derive non-conservative design methods for quadratic stabilization, H_2 and H_inf control, all in terms of data-based linear matrix inequalities. In contrast to previous work, the dimensions of our decision variables are independent of the time horizon of the experiment. Our approach thus enables control design from large data sets.
△ Less
Submitted 9 December, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
On Regularizability and its Application to Online Control of Unstable LTI Systems
Authors:
Shahriar Talebi,
Siavash Alemzadeh,
Niyousha Rahimi,
Mehran Mesbahi
Abstract:
Learning, say through direct policy updates, often requires assumptions such as knowing a priori that the initial policy (gain) is stabilizing, or persistently exciting (PE) input-output data, is available. In this paper, we examine online regulation of (possibly unstable) partially unknown linear systems with no prior access to an initial stabilizing controller nor PE input-output data; we instea…
▽ More
Learning, say through direct policy updates, often requires assumptions such as knowing a priori that the initial policy (gain) is stabilizing, or persistently exciting (PE) input-output data, is available. In this paper, we examine online regulation of (possibly unstable) partially unknown linear systems with no prior access to an initial stabilizing controller nor PE input-output data; we instead leverage the knowledge of the input matrix for online regulation. First, we introduce and characterize the notion of "regularizability" for linear systems that gauges the extent by which a system can be regulated in finite-time in contrast to its asymptotic behavior (commonly characterized by stabilizability/controllability). Next, having access only to the input matrix, we propose the Data-Guided Regulation (DGR) synthesis procedure that -- as its name suggests -- regulates the underlying state while also generating informative data that can subsequently be used for data-driven stabilization or system identification. We further improve the computational performance of DGR via a rank-one update and demonstrate its utility in online regulation of the X-29 aircraft.
△ Less
Submitted 19 January, 2022; v1 submitted 29 May, 2020;
originally announced June 2020.
-
$\mathcal{H}_2$ performance of series-parallel networks: A compositional perspective
Authors:
Mathias Hudoba de Badyn,
Mehran Mesbahi
Abstract:
We examine the $\mathcal{H}_2$ norm of matrix-weighted leader-follower consensus on series-parallel networks. By using an extension of electrical network theory on matrix-valued resistances, voltages and currents, we show that the computation of the $\mathcal{H}_2$ norm can be performed efficiently by decomposing the network into atomic elements and composition rules. Lastly, we examine the proble…
▽ More
We examine the $\mathcal{H}_2$ norm of matrix-weighted leader-follower consensus on series-parallel networks. By using an extension of electrical network theory on matrix-valued resistances, voltages and currents, we show that the computation of the $\mathcal{H}_2$ norm can be performed efficiently by decomposing the network into atomic elements and composition rules. Lastly, we examine the problem of efficiently adapting the matrix-valued edge weights to optimize the $\mathcal{H}_2$ norm of the network.
△ Less
Submitted 24 December, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Global Convergence of Policy Gradient Algorithms for Indefinite Least Squares Stationary Optimal Control
Authors:
Jingjing Bu,
Mehran Mesbahi
Abstract:
We consider policy gradient algorithms for the indefinite least squares stationary optimal control, e.g., linear-quadratic-regulator (LQR) with indefinite state and input penalization matrices. Such a setup has important applications in control design with conflicting objectives, such as linear quadratic dynamic games. We show the global convergence of gradient, natural gradient and quasi-Newton p…
▽ More
We consider policy gradient algorithms for the indefinite least squares stationary optimal control, e.g., linear-quadratic-regulator (LQR) with indefinite state and input penalization matrices. Such a setup has important applications in control design with conflicting objectives, such as linear quadratic dynamic games. We show the global convergence of gradient, natural gradient and quasi-Newton policies for this class of indefinite least squares problems.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Consensus on Matrix-weighted Time-varying Networks
Authors:
Lulu Pan,
Haibin Shao,
Mehran Mesbahi,
Yugeng Xi,
Dewei Li
Abstract:
This paper examines the consensus problem on time-varying matrix-weighed undirected networks. First, we introduce the matrix-weighted integral network for the analysis of such networks. Under mild assumptions on the switching pattern of the time-varying network, necessary and/or sufficient conditions for which average consensus can be achieved are then provided in terms of the null space of matrix…
▽ More
This paper examines the consensus problem on time-varying matrix-weighed undirected networks. First, we introduce the matrix-weighted integral network for the analysis of such networks. Under mild assumptions on the switching pattern of the time-varying network, necessary and/or sufficient conditions for which average consensus can be achieved are then provided in terms of the null space of matrix-valued Laplacian of the corresponding integral network. In particular, for periodic matrix-weighted time-varying networks, necessary and sufficient conditions for reaching average consensus is obtained from an algebraic perspective. Moreover, we show that if the integral network with period $T>0$ has a positive spanning tree over the time span $[0,T)$, average consensus for the node states is achieved. Simulation results are provided to demonstrate the theoretical analysis.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
On the Controllability of Matrix-weighted Networks
Authors:
Lulu Pan,
Haibin Shao,
Mehran Mesbahi,
Yugeng Xi,
Dewei Li
Abstract:
This letter examines the controllability of consensus dynamics on matrix-weighed networks from a graph-theoretic perspective. Unlike the scalar-weighted networks, the rank of weight matrix introduces additional intricacies into characterizing the dimension of controllable subspace for such networks. Specifically, we investigate how the definiteness of weight matrices influences the dimension of th…
▽ More
This letter examines the controllability of consensus dynamics on matrix-weighed networks from a graph-theoretic perspective. Unlike the scalar-weighted networks, the rank of weight matrix introduces additional intricacies into characterizing the dimension of controllable subspace for such networks. Specifically, we investigate how the definiteness of weight matrices influences the dimension of the controllable subspace. In this direction, graph-theoretic characterizations of the lower and upper bounds on the dimension of the controllable subspace are provided by employing, respectively, distance partition and almost equitable partition of matrix-weighted networks. Furthermore, the structure of an uncontrollable input for such networks is examined. Examples are then provided to demonstrate the theoretical results.
△ Less
Submitted 12 January, 2020;
originally announced January 2020.
-
Data-driven parameterizations of suboptimal LQR and H2 controllers
Authors:
Henk J. van Waarde,
Mehran Mesbahi
Abstract:
In this paper we design suboptimal control laws for an unknown linear system on the basis of measured data. We focus on the suboptimal linear quadratic regulator problem and the suboptimal H2 control problem. For both problems, we establish conditions under which a given data set contains sufficient information for controller design. We follow up by providing a data-driven parameterization of all…
▽ More
In this paper we design suboptimal control laws for an unknown linear system on the basis of measured data. We focus on the suboptimal linear quadratic regulator problem and the suboptimal H2 control problem. For both problems, we establish conditions under which a given data set contains sufficient information for controller design. We follow up by providing a data-driven parameterization of all suboptimal controllers. We will illustrate our results by numerical simulations, which will reveal an interesting trade-off between the number of collected data samples and the achieved controller performance.
△ Less
Submitted 7 May, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Global Convergence of Policy Gradient for Sequential Zero-Sum Linear Quadratic Dynamic Games
Authors:
Jingjing Bu,
Lillian J. Ratliff,
Mehran Mesbahi
Abstract:
We propose projection-free sequential algorithms for linear-quadratic dynamics games. These policy gradient based algorithms are akin to Stackelberg leadership model and can be extended to model-free settings. We show that if the leader performs natural gradient descent/ascent, then the proposed algorithm has a global sublinear convergence to the Nash equilibrium. Moreover, if the leader adopts a…
▽ More
We propose projection-free sequential algorithms for linear-quadratic dynamics games. These policy gradient based algorithms are akin to Stackelberg leadership model and can be extended to model-free settings. We show that if the leader performs natural gradient descent/ascent, then the proposed algorithm has a global sublinear convergence to the Nash equilibrium. Moreover, if the leader adopts a quasi-Newton policy, the algorithm enjoys a global quadratic convergence. Along the way, we examine and clarify the intricacies of adopting sequential policy updates for LQ games, namely, issues pertaining to stabilization, indefinite cost structure, and circumventing projection steps.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Time scale design for network resilience
Authors:
Dillon R. Foight,
Mathias Hudoba de Badyn,
Mehran Mesbahi
Abstract:
In this paper we consider the $\mathcal{H}_2$-norm of networked systems with multi-time scale consensus dynamics. We develop a general framework for such systems that allows for edge weighting, independent agent-based time scales, as well as measurement and process noise. From this general system description, we highlight an interesting case where the influences of the weighting and scaling can be…
▽ More
In this paper we consider the $\mathcal{H}_2$-norm of networked systems with multi-time scale consensus dynamics. We develop a general framework for such systems that allows for edge weighting, independent agent-based time scales, as well as measurement and process noise. From this general system description, we highlight an interesting case where the influences of the weighting and scaling can be separated in the design problem. We then consider the design of the time scale parameters for minimizing the $\mathcal{H}_2$-norm for the purpose of network resilience.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Augmented State Feedback for Improving Observability of Linear Systems with Nonlinear Measurements
Authors:
Atiye Alaeddini,
Kristi A. Morgansen,
Mehran Mesbahi
Abstract:
This paper is concerned with the design of an augmented state feedback controller for finite-dimensional linear systems with nonlinear observation dynamics. Most of the theoretical results in the area of (optimal) feedback design are based on the assumption that the state is available for measurement. In this paper, we focus on finding a feedback control that avoids state trajectories with undesir…
▽ More
This paper is concerned with the design of an augmented state feedback controller for finite-dimensional linear systems with nonlinear observation dynamics. Most of the theoretical results in the area of (optimal) feedback design are based on the assumption that the state is available for measurement. In this paper, we focus on finding a feedback control that avoids state trajectories with undesirable observability properties. In particular, we introduce an optimal control problem that specifically considers an index of observability in the control synthesis. The resulting cost functional is a combination of LQR-like quadratic terms and an index of observability. The main contribution of the paper is presenting a control synthesis procedure that on one hand, provides closed loop asymptotic stability, and addresses the observability of the system--as a transient performance criteria--on the other.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
Strong Structural Controllability of Signed Networks
Authors:
Shima Sadat Mousavi,
Mohammad Haeri,
Mehran Mesbahi
Abstract:
In this paper, we discuss the controllability of a family of linear time-invariant (LTI) networks defined on a signed graph. In this direction, we introduce the notion of positive and negative signed zero forcing sets for the controllability analysis of positive and negative eigenvalues of system matrices with the same sign pattern. A sufficient combinatorial condition that ensures the strong stru…
▽ More
In this paper, we discuss the controllability of a family of linear time-invariant (LTI) networks defined on a signed graph. In this direction, we introduce the notion of positive and negative signed zero forcing sets for the controllability analysis of positive and negative eigenvalues of system matrices with the same sign pattern. A sufficient combinatorial condition that ensures the strong structural controllability of signed networks is then proposed. Moreover, an upper bound on the maximum multiplicity of positive and negative eigenvalues associated with a signed graph is provided.
△ Less
Submitted 10 October, 2019; v1 submitted 15 August, 2019;
originally announced August 2019.
-
LQR through the Lens of First Order Methods: Discrete-time Case
Authors:
Jingjing Bu,
Afshin Mesbahi,
Maryam Fazel,
Mehran Mesbahi
Abstract:
We consider the Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. Such a setup facilitates examining the implications of a natural initial-state independent formulation of LQR in designing first order algorithms. It is shown that this cost function is smooth and coercive, and provide an alternate means of noting its gradie…
▽ More
We consider the Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. Such a setup facilitates examining the implications of a natural initial-state independent formulation of LQR in designing first order algorithms. It is shown that this cost function is smooth and coercive, and provide an alternate means of noting its gradient dominated property. In the process, we provide a number of analytic observations on the LQR cost when directly analyzed in terms of the feedback gain. We then examine three types of well-posed flows for LQR: gradient flow, natural gradient flow and the quasi-Newton flow. The coercive property suggests that these flows admit unique solutions while gradient dominated property indicates that the corresponding Lyapunov functionals decay at an exponential rate; we also prove that these flows are exponentially stable in the sense of Lyapunov. We then discuss the forward Euler discretization of these flows, realized as gradient descent, natural gradient descent and the quasi-Newton iteration. We present stepsize criteria for gradient descent and natural gradient descent, guaranteeing that both algorithms converge linearly to the global optima. An optimal stepsize for the quasi-Newton iteration is also proposed, guaranteeing a $Q$-quadratic convergence rate--and in the meantime--recovering the Hewer algorithm.
△ Less
Submitted 29 July, 2019; v1 submitted 21 July, 2019;
originally announced July 2019.
-
Fast Trajectory Optimization via Successive Convexification for Spacecraft Rendezvous with Integer Constraints
Authors:
Danylo Malyuta,
Taylor P. Reynolds,
Michael Szmuk,
Behcet Acikmese,
Mehran Mesbahi
Abstract:
In this paper we present a fast method based on successive convexification for generating fuel-optimized spacecraft rendezvous trajectories in the presence of mixed-integer constraints. A recently developed paradigm of state-triggered constraints allows to efficiently embed a subset of discrete decision constraints into the continuous optimization framework of successive convexification. As a resu…
▽ More
In this paper we present a fast method based on successive convexification for generating fuel-optimized spacecraft rendezvous trajectories in the presence of mixed-integer constraints. A recently developed paradigm of state-triggered constraints allows to efficiently embed a subset of discrete decision constraints into the continuous optimization framework of successive convexification. As a result, we are able to solve difficult trajectory optimization problems at interactive speeds, as opposed to a mixed-integer programming approach that would require significantly more solution time and computing power. Our method is applied to the real problem of transposition and docking of the Apollo command and service module with the lunar module. We demonstrate that, within seconds, we are able to obtain trajectories that are up to 90 percent more fuel efficient (saving up to 45 kg of fuel) than non-optimization based Apollo-era design targets. Our trajectories take explicit account of minimum thrust pulse width and plume impingement constraints. Both of these constraints are naturally mixed-integer, but we handle them as state-triggered constraints. In its current state, our algorithm will serve as a useful off-line design tool for rapid trajectory trade studies.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Strong Structural Controllability of Networks under Time-Invariant and Time-Varying Topological Perturbations
Authors:
Shima Sadat Mousavi,
Mohammad Haeri,
Mehran Mesbahi
Abstract:
This paper investigates the robustness of strong structural controllability for linear time-invariant and linear time-varying directed networks with respect to structural perturbations, including edge deletions and additions. In this direction, we introduce a new construct referred to as a perfect graph associated with a network with a given set of control nodes. The tight upper bounds on the numb…
▽ More
This paper investigates the robustness of strong structural controllability for linear time-invariant and linear time-varying directed networks with respect to structural perturbations, including edge deletions and additions. In this direction, we introduce a new construct referred to as a perfect graph associated with a network with a given set of control nodes. The tight upper bounds on the number of edges that can be added to, or removed from a network, while ensuring strong structural controllability, are then derived. Moreover, we obtain a characterization of critical edge-sets, the maximal sets of edges whose any subset can be respectively added to, or removed from a network, while preserving strong structural controllability. In addition, procedures for combining networks to obtain strongly structurally controllable network-of-networks are proposed. Finally, controllability conditions are proposed for networks whose edge weights, as well as their structures, can vary over time.
△ Less
Submitted 21 May, 2020; v1 submitted 22 April, 2019;
originally announced April 2019.
-
Dual Quaternion Based Powered Descent Guidance with State-Triggered Constraints
Authors:
Taylor P. Reynolds,
Michael Szmuk,
Danylo Malyuta,
Mehran Mesbahi,
Behcet Acikmese,
John M. Carson III
Abstract:
This paper presents a numerical algorithm for computing 6-degree-of-freedom free-final-time powered descent guidance trajectories. The trajectory generation problem is formulated using a unit dual quaternion representation of the rigid body dynamics, and several standard path constraints. Our formulation also includes a special line of sight constraints that is enforced only within a specified ban…
▽ More
This paper presents a numerical algorithm for computing 6-degree-of-freedom free-final-time powered descent guidance trajectories. The trajectory generation problem is formulated using a unit dual quaternion representation of the rigid body dynamics, and several standard path constraints. Our formulation also includes a special line of sight constraints that is enforced only within a specified band of slant ranges relative to the landing site, a novel feature that is especially relevant to Terrain and Hazard Relative Navigation. We use the newly introduced state-triggered constraints to formulate these range constraints in a manner that is amenable to real-time implementations. The resulting non-convex optimal control problem is solved iteratively as a sequence of convex second-order cone programs that locally approximate the non-convex problem. Each second-order cone program is solved using a customizable interior point method solver. Also introduced are a scaling method and a new heuristic technique that guide the convergence process towards dynamic feasibility. To demonstrate the capabilities of our algorithm, two numerical case studies are presented. The first studies the effect of including a slant-range-triggered line of sight constraint on the resulting trajectories. The second study performs a Monte Carlo analysis to assess the algorithm's robustness to initial conditions and real-time performance.
△ Less
Submitted 19 April, 2019;
originally announced April 2019.
-
On Topological Properties of the Set of Stabilizing Feedback Gains
Authors:
Jingjing Bu,
Afshin Mesbahi,
Mehran Mesbahi
Abstract:
This work presents a fairly complete account on various topological and metrical aspects of feedback stabilization for single-input-single-output (SISO) continuous and discrete time linear-time-invariant (LTI) systems. In particular, we prove that the set of stabilizing output feedback gains for a SISO system with n states has at most $\lceil{\frac{n}{2}}\rceil$ connected components. Furthermore,…
▽ More
This work presents a fairly complete account on various topological and metrical aspects of feedback stabilization for single-input-single-output (SISO) continuous and discrete time linear-time-invariant (LTI) systems. In particular, we prove that the set of stabilizing output feedback gains for a SISO system with n states has at most $\lceil{\frac{n}{2}}\rceil$ connected components. Furthermore, our analysis yields an algorithm for determining intervals of stabilizing gains for general continuous and discrete LIT systems; the proposed algorithm also computes the number of unstable roots in each unstable interval. Along the way, we also make a number of observations on the set of stabilizing state feedback gains for MIMO systems.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
Nonlinear Observability via Koopman Analysis: Characterizing the Role of Symmetry
Authors:
Afshin Mesbahi,
Jingjing Bu,
Mehran Mesbahi
Abstract:
This paper considers the observability of nonlinear systems from a Koopman operator theoretic perspective--and in particular--the effect of symmetry on observability. We first examine an infinite-dimensional linear system (constructed using independent Koopman eigenfunctions) such that its observability is equivalent to the observability of the original nonlinear system. Next, we derive an analyti…
▽ More
This paper considers the observability of nonlinear systems from a Koopman operator theoretic perspective--and in particular--the effect of symmetry on observability. We first examine an infinite-dimensional linear system (constructed using independent Koopman eigenfunctions) such that its observability is equivalent to the observability of the original nonlinear system. Next, we derive an analytic relation between symmetry and nonlinear observability; it is shown that symmetry in the nonlinear dynamics is reflected in the symmetry of the corresponding Koopman eigenfunctions, as well as presence of repeated Koopman eigenvalues. We then proceed to show that the loss of observability in symmetric nonlinear systems can be traced back to the presence of these repeated eigenvalues. In the case where we have a sufficient number of measurements, the nonlinear system remains unobservable when these functions have symmetries that mirror those of the dynamics. The proposed observability framework provides insights into the minimum number of the measurements needed to make an unobservable nonlinear system, observable. The proposed results are then applied to a network of nano-electromechanical oscillators coupled via a symmetric interaction topology.
△ Less
Submitted 10 February, 2020; v1 submitted 17 April, 2019;
originally announced April 2019.
-
On Topological and Metrical Properties of Stabilizing Feedback Gains: the MIMO Case
Authors:
Jingjing Bu,
Afshin Mesbahi,
Mehran Mesbahi
Abstract:
In this paper, we discuss various topological and metrical aspects of the set of stabilizing static feedback gains for multiple-input-multiple-output (MIMO) linear-time-invariant (LTI) systems, in both continuous and discrete-time. Recently, connectivity properties of this set (for continuous time) have been reported in the literature, along with a discussion on how this connectivity is affected b…
▽ More
In this paper, we discuss various topological and metrical aspects of the set of stabilizing static feedback gains for multiple-input-multiple-output (MIMO) linear-time-invariant (LTI) systems, in both continuous and discrete-time. Recently, connectivity properties of this set (for continuous time) have been reported in the literature, along with a discussion on how this connectivity is affected by restricting the feedback gain to linear subspaces. We show that analogous to the continuous-time case, one can construct instances where the set of stabilizing feedback gains for discrete time LTI systems has exponentially many connected components.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Efficient Computation of H2 Performance on Series-Parallel Networks
Authors:
Mathias Hudoba de Badyn,
Mehran Mesbahi
Abstract:
Series-parallel networks are a class of graphs on which many NP-hard problems have tractable solutions. In this paper, we examine performance measures on leader-follower consensus on series-parallel networks. We show that a distributed computation of the $\mathcal{H}_2$ norm can be done efficiently on this system by exploiting a decomposition of the network into atomic elements and composition rul…
▽ More
Series-parallel networks are a class of graphs on which many NP-hard problems have tractable solutions. In this paper, we examine performance measures on leader-follower consensus on series-parallel networks. We show that a distributed computation of the $\mathcal{H}_2$ norm can be done efficiently on this system by exploiting a decomposition of the network into atomic elements and composition rules. Lastly, we examine the problem of adaptively re-weighting the network to optimize the $\mathcal{H}_2$ norm, and show that it can be done with similar complexity.
△ Less
Submitted 25 April, 2020; v1 submitted 13 March, 2019;
originally announced March 2019.
-
Successive Convexification for 6-DoF Powered Descent Guidance with Compound State-Triggered Constraints
Authors:
Michael Szmuk,
Taylor P. Reynolds,
Behcet Acikmese,
Mehran Mesbahi,
John M. Carson III
Abstract:
This paper introduces a continuous formulation for compound state-triggered constraints, which are generalizations of the recently introduced state-triggered constraints. State-triggered constraints are different from ordinary constraints found in optimal control in that they use a state-dependent trigger condition to enable or disable a constraint condition, and can be expressed as continuous fun…
▽ More
This paper introduces a continuous formulation for compound state-triggered constraints, which are generalizations of the recently introduced state-triggered constraints. State-triggered constraints are different from ordinary constraints found in optimal control in that they use a state-dependent trigger condition to enable or disable a constraint condition, and can be expressed as continuous functions that are readily handled by successive convexification. Compound state-triggered constraints go a step further, giving designers the ability to compose trigger and constraint conditions using Boolean and and or operations. Simulations of the 6-degree-of-freedom (DoF) powered descent guidance problem obtained using successive convexification are presented to illustrate the utility of state-triggered and compound state-triggered constraints. The examples employ a velocity-triggered angle of attack constraint to alleviate aerodynamic loads, and a collision avoidance constraint to avoid large geological formations. In particular, the velocity-triggered angle of attack constraint demonstrates the ability of state-triggered constraints to introduce new constraint phases to the solution without resorting to combinatorial techniques.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
Distributed Q-Learning for Dynamically Decoupled Systems
Authors:
Siavash Alemzadeh,
Mehran Mesbahi
Abstract:
Control of large-scale networked systems often necessitates the availability of complex models for the interactions amongst the agents. However in many applications, building accurate models of agents or interactions amongst them might be infeasible or computationally prohibitive due to the curse of dimensionality or the complexity of these interactions. In the meantime, data-guided control method…
▽ More
Control of large-scale networked systems often necessitates the availability of complex models for the interactions amongst the agents. However in many applications, building accurate models of agents or interactions amongst them might be infeasible or computationally prohibitive due to the curse of dimensionality or the complexity of these interactions. In the meantime, data-guided control methods can circumvent model complexity by directly synthesizing the controller from the observed data. In this paper, we propose a distributed Q-learning algorithm to design a feedback mechanism based on a given underlying graph structure parameterizing the agents' interaction network. We assume that the distributed nature of the system arises from the cost function of the corresponding control problem and show that for the specific case of identical dynamically decoupled systems, the learned controller converges to the optimal Linear Quadratic Regulator (LQR) controller for each subsystem. We provide a convergence analysis and verify the result with an example.
△ Less
Submitted 19 March, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.