Search | arXiv e-print repository

Nonconvex Linear System Identification with Minimal State Representation

Authors: Uday Kiran Reddy Tadipatri, Benjamin D. Haeffele, Joshua Agterberg, Ingvar Ziemann, René Vidal

Abstract: Low-order linear System IDentification (SysID) addresses the challenge of estimating the parameters of a linear dynamical system from finite samples of observations and control inputs with minimal state representation. Traditional approaches often utilize Hankel-rank minimization, which relies on convex relaxations that can require numerous, costly singular value decompositions (SVDs) to optimize.… ▽ More Low-order linear System IDentification (SysID) addresses the challenge of estimating the parameters of a linear dynamical system from finite samples of observations and control inputs with minimal state representation. Traditional approaches often utilize Hankel-rank minimization, which relies on convex relaxations that can require numerous, costly singular value decompositions (SVDs) to optimize. In this work, we propose two nonconvex reformulations to tackle low-order SysID (i) Burer-Monterio (BM) factorization of the Hankel matrix for efficient nuclear norm minimization, and (ii) optimizing directly over system parameters for real, diagonalizable systems with an atomic norm style decomposition. These reformulations circumvent the need for repeated heavy SVD computations, significantly improving computational efficiency. Moreover, we prove that optimizing directly over the system parameters yields lower statistical error rates, and lower sample complexities that do not scale linearly with trajectory length like in Hankel-nuclear norm minimization. Additionally, while our proposed formulations are nonconvex, we provide theoretical guarantees of achieving global optimality in polynomial time. Finally, we demonstrate algorithms that solve these nonconvex programs and validate our theoretical claims on synthetic data. △ Less

Submitted 3 June, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

Comments: Accepted to the 7th Annual Conference on Learning for Dynamics and Control (L4DC) 2025. The full version including appendix

arXiv:2501.16639 [pdf, ps, other]

Finite Sample Analysis of Subspace Identification Methods

Authors: Jiabao He, Ingvar Ziemann, Cristian R. Rojas, S. Joe Qin, Håkan Hjalmarsson

Abstract: As one of the mainstream approaches in system identification, subspace identification methods (SIMs) are known for their simple parameterization for MIMO systems and robust numerical properties. However, a comprehensive statistical analysis of SIMs remains an open problem. Amid renewed focus on identifying state-space models in the non-asymptotic regime, this work presents a finite sample analysis… ▽ More As one of the mainstream approaches in system identification, subspace identification methods (SIMs) are known for their simple parameterization for MIMO systems and robust numerical properties. However, a comprehensive statistical analysis of SIMs remains an open problem. Amid renewed focus on identifying state-space models in the non-asymptotic regime, this work presents a finite sample analysis for a large class of open-loop SIMs. It establishes high-probability upper bounds for system matrices obtained via SIMs, and reveals that convergence rates for estimating Markov parameters and system matrices are $\mathcal{O}(1/\sqrt{N})$ up to logarithmic terms, in line with classical asymptotic results. Following the key steps of SIMs, we arrive at the above results by a three-step procedure. In Step 1, we begin with a parsimonious SIM (PARSIM) that uses least-squares regression to estimate multiple high-order ARX models in parallel. Leveraging a recent analysis of an individual ARX model, we obtain a union error bound for a bank of ARX models. Step 2 involves model reduction via weighted singular value decomposition (SVD), where we consider different data-dependent weighting matrices and use robustness results for SVD to obtain error bounds on extended controllability and observability matrices, respectively. The final Step 3 focuses on deriving error bounds for system matrices, where two different realization algorithms, the MOESP type and the Larimore type, are considered. Although our study initially focuses on PARSIM, the methodologies apply broadly across many variants of SIMs. △ Less

Submitted 17 January, 2025; originally announced January 2025.

arXiv:2410.11227 [pdf, other]

Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

Authors: Thomas T. Zhang, Bruce D. Lee, Ingvar Ziemann, George J. Pappas, Nikolai Matni

Abstract: A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statis… ▽ More A driving force behind the diverse applicability of modern machine learning is the ability to extract meaningful features across many sources. However, many practical domains involve data that are non-identically distributed across sources, and statistically dependent within its source, violating vital assumptions in existing theoretical studies. Toward addressing these issues, we establish statistical guarantees for learning general $\textit{nonlinear}$ representations from multiple data sources that admit different input distributions and possibly dependent data. Specifically, we study the sample-complexity of learning $T+1$ functions $f_\star^{(t)} \circ g_\star$ from a function class $\mathcal F \times \mathcal G$, where $f_\star^{(t)}$ are task specific linear functions and $g_\star$ is a shared nonlinear representation. A representation $\hat g$ is estimated using $N$ samples from each of $T$ source tasks, and a fine-tuning function $\hat f^{(0)}$ is fit using $N'$ samples from a target task passed through $\hat g$. We show that when $N \gtrsim C_{\mathrm{dep}} (\mathrm{dim}(\mathcal F) + \mathrm{C}(\mathcal G)/T)$, the excess risk of $\hat f^{(0)} \circ \hat g$ on the target task decays as $ν_{\mathrm{div}} \big(\frac{\mathrm{dim}(\mathcal F)}{N'} + \frac{\mathrm{C}(\mathcal G)}{N T} \big)$, where $C_{\mathrm{dep}}$ denotes the effect of data dependency, $ν_{\mathrm{div}}$ denotes an (estimatable) measure of $\textit{task-diversity}$ between the source and target tasks, and $\mathrm C(\mathcal G)$ denotes the complexity of the representation class $\mathcal G$. In particular, our analysis reveals: as the number of tasks $T$ increases, both the sample requirement and risk bound converge to that of $r$-dimensional regression as if $g_\star$ had been given, and the effect of dependency only enters the sample requirement, leaving the risk bound matching the iid setting. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: Appeared at ICML 2024

arXiv:2409.13421 [pdf, other]

State space models, emergence, and ergodicity: How many parameters are needed for stable predictions?

Authors: Ingvar Ziemann, Nikolai Matni, George J. Pappas

Abstract: How many parameters are required for a model to execute a given task? It has been argued that large language models, pre-trained via self-supervised learning, exhibit emergent capabilities such as multi-step reasoning as their number of parameters reach a critical scale. In the present work, we explore whether this phenomenon can analogously be replicated in a simple theoretical model. We show tha… ▽ More How many parameters are required for a model to execute a given task? It has been argued that large language models, pre-trained via self-supervised learning, exhibit emergent capabilities such as multi-step reasoning as their number of parameters reach a critical scale. In the present work, we explore whether this phenomenon can analogously be replicated in a simple theoretical model. We show that the problem of learning linear dynamical systems -- a simple instance of self-supervised learning -- exhibits a corresponding phase transition. Namely, for every non-ergodic linear system there exists a critical threshold such that a learner using fewer parameters than said threshold cannot achieve bounded error for large sequence lengths. Put differently, in our model we find that tasks exhibiting substantial long-range correlation require a certain critical number of parameters -- a phenomenon akin to emergence. We also investigate the role of the learner's parametrization and consider a simple version of a linear dynamical system with hidden state -- an imperfectly observed random walk in $\mathbb{R}$. For this situation, we show that there exists no learner using a linear filter which can succesfully learn the random walk unless the filter length exceeds a certain threshold depending on the effective memory length and horizon of the problem. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.06437 [pdf, ps, other]

A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

Authors: Ingvar Ziemann

Abstract: In this note, we give a short information-theoretic proof of the consistency of the Gaussian maximum likelihood estimator in linear auto-regressive models. Our proof yields nearly optimal non-asymptotic rates for parameter recovery and works without any invocation of stability in the case of finite hypothesis classes. In this note, we give a short information-theoretic proof of the consistency of the Gaussian maximum likelihood estimator in linear auto-regressive models. Our proof yields nearly optimal non-asymptotic rates for parameter recovery and works without any invocation of stability in the case of finite hypothesis classes. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2404.17331 [pdf, ps, other]

Finite Sample Analysis for a Class of Subspace Identification Methods

Authors: Jiabao He, Ingvar Ziemann, Cristian R. Rojas, Håkan Hjalmarsson

Abstract: While subspace identification methods (SIMs) are appealing due to their simple parameterization for MIMO systems and robust numerical realizations, a comprehensive statistical analysis of SIMs remains an open problem, especially in the non-asymptotic regime. In this work, we provide a finite sample analysis for a class of SIMs, which reveals that the convergence rates for estimating Markov paramet… ▽ More While subspace identification methods (SIMs) are appealing due to their simple parameterization for MIMO systems and robust numerical realizations, a comprehensive statistical analysis of SIMs remains an open problem, especially in the non-asymptotic regime. In this work, we provide a finite sample analysis for a class of SIMs, which reveals that the convergence rates for estimating Markov parameters and system matrices are $\mathcal{O}(1/\sqrt{N})$, in line with classical asymptotic results. Based on the observation that the model format in classical SIMs becomes non-causal because of a projection step, we choose a parsimonious SIM that bypasses the projection step and strictly enforces a causal model to facilitate the analysis, where a bank of ARX models are estimated in parallel. Leveraging recent results from finite sample analysis of an individual ARX model, we obtain an overall error bound of an array of ARX models and proceed to derive error bounds for system matrices via robustness results for the singular value decomposition. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.09030 [pdf, other]

Active Learning for Control-Oriented Identification of Nonlinear Systems

Authors: Bruce D. Lee, Ingvar Ziemann, George J. Pappas, Nikolai Matni

Abstract: Model-based reinforcement learning is an effective approach for controlling an unknown system. It is based on a longstanding pipeline familiar to the control community in which one performs experiments on the environment to collect a dataset, uses the resulting dataset to identify a model of the system, and finally performs control synthesis using the identified model. As interacting with the syst… ▽ More Model-based reinforcement learning is an effective approach for controlling an unknown system. It is based on a longstanding pipeline familiar to the control community in which one performs experiments on the environment to collect a dataset, uses the resulting dataset to identify a model of the system, and finally performs control synthesis using the identified model. As interacting with the system may be costly and time consuming, targeted exploration is crucial for developing an effective control-oriented model with minimal experimentation. Motivated by this challenge, recent work has begun to study finite sample data requirements and sample efficient algorithms for the problem of optimal exploration in model-based reinforcement learning. However, existing theory and algorithms are limited to model classes which are linear in the parameters. Our work instead focuses on models with nonlinear parameter dependencies, and presents the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics. In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors. We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems. △ Less

Submitted 13 August, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.07937 [pdf, ps, other]

Rate-Optimal Non-Asymptotics for the Quadratic Prediction Error Method

Authors: Charis Stamouli, Ingvar Ziemann, George J. Pappas

Abstract: We study the quadratic prediction error method -- i.e., nonlinear least squares -- for a class of time-varying parametric predictor models satisfying a certain identifiability condition. While this method is known to asymptotically achieve the optimal rate for a wide range of problems, there have been no non-asymptotic results matching these optimal rates outside of a select few, typically linear,… ▽ More We study the quadratic prediction error method -- i.e., nonlinear least squares -- for a class of time-varying parametric predictor models satisfying a certain identifiability condition. While this method is known to asymptotically achieve the optimal rate for a wide range of problems, there have been no non-asymptotic results matching these optimal rates outside of a select few, typically linear, model classes. By leveraging modern tools from learning with dependent data, we provide the first rate-optimal non-asymptotic analysis of this method for our more general setting of nonlinearly parametrized model classes. Moreover, we show that our results can be applied to a particular class of identifiable AutoRegressive Moving Average (ARMA) models, resulting in the first optimal non-asymptotic rates for identification of ARMA models. △ Less

Submitted 15 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: 38 pages, added acknowledgements

arXiv:2309.03873 [pdf, ps, other]

A Tutorial on the Non-Asymptotic Theory of System Identification

Authors: Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, George J. Pappas

Abstract: This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the… ▽ More This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems. △ Less

Submitted 16 June, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2303.15637 [pdf, other]

The Fundamental Limitations of Learning Linear-Quadratic Regulators

Authors: Bruce D. Lee, Ingvar Ziemann, Anastasios Tsiamis, Henrik Sandberg, Nikolai Matni

Abstract: We present a local minimax lower bound on the excess cost of designing a linear-quadratic controller from offline data. The bound is valid for any offline exploration policy that consists of a stabilizing controller and an energy bounded exploratory input. The derivation leverages a relaxation of the minimax estimation problem to Bayesian estimation, and an application of Van Trees' inequality. We… ▽ More We present a local minimax lower bound on the excess cost of designing a linear-quadratic controller from offline data. The bound is valid for any offline exploration policy that consists of a stabilizing controller and an energy bounded exploratory input. The derivation leverages a relaxation of the minimax estimation problem to Bayesian estimation, and an application of Van Trees' inequality. We show that the bound aligns with system-theoretic intuition. In particular, we demonstrate that the lower bound increases when the optimal control objective value increases. We also show that the lower bound increases when the system is poorly excitable, as characterized by the spectrum of the controllability gramian of the system mapping the noise to the state and the $\mathcal{H}_\infty$ norm of the system mapping the input to the state. We further show that for some classes of systems, the lower bound may be exponential in the state dimension, demonstrating exponential sample complexity for learning the linear-quadratic regulator offline. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2212.09508 [pdf, ps, other]

A note on the smallest eigenvalue of the empirical covariance of causal Gaussian processes

Authors: Ingvar Ziemann

Abstract: We present a simple proof for bounding the smallest eigenvalue of the empirical covariance in a causal Gaussian process. Along the way, we establish a one-sided tail inequality for Gaussian quadratic forms using a causal decomposition. Our proof only uses elementary facts about the Gaussian distribution and the union bound. We conclude with an example in which we provide a performance guarantee fo… ▽ More We present a simple proof for bounding the smallest eigenvalue of the empirical covariance in a causal Gaussian process. Along the way, we establish a one-sided tail inequality for Gaussian quadratic forms using a causal decomposition. Our proof only uses elementary facts about the Gaussian distribution and the union bound. We conclude with an example in which we provide a performance guarantee for least squares identification of a vector autoregression. △ Less

Submitted 27 October, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

arXiv:2209.05423 [pdf, other]

Statistical Learning Theory for Control: A Finite Sample Perspective

Authors: Anastasios Tsiamis, Ingvar Ziemann, Nikolai Matni, George J. Pappas

Abstract: This tutorial survey provides an overview of recent non-asymptotic advances in statistical learning theory as relevant to control and system identification. While there has been substantial progress across all areas of control, the theory is most well-developed when it comes to linear system identification and learning for the linear quadratic regulator, which are the focus of this manuscript. Fro… ▽ More This tutorial survey provides an overview of recent non-asymptotic advances in statistical learning theory as relevant to control and system identification. While there has been substantial progress across all areas of control, the theory is most well-developed when it comes to linear system identification and learning for the linear quadratic regulator, which are the focus of this manuscript. From a theoretical perspective, much of the labor underlying these advances has been in adapting tools from modern high-dimensional statistics and learning theory. While highly relevant to control theorists interested in integrating tools from machine learning, the foundational material has not always been easily accessible. To remedy this, we provide a self-contained presentation of the relevant material, outlining all the key ideas and the technical machinery that underpin recent results. We also present a number of open problems and future directions. △ Less

Submitted 27 April, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: Survey Paper, Submitted to Control Systems Magazine. Second version contains additional motivation for finite sample statistics and more detailed comparison with classical literature

arXiv:2205.14035 [pdf, ps, other]

Learning to Control Linear Systems can be Hard

Authors: Anastasios Tsiamis, Ingvar Ziemann, Manfred Morari, Nikolai Matni, George J. Pappas

Abstract: In this paper, we study the statistical difficulty of learning to control linear systems. We focus on two standard benchmarks, the sample complexity of stabilization, and the regret of the online learning of the Linear Quadratic Regulator (LQR). Prior results state that the statistical difficulty for both benchmarks scales polynomially with the system state dimension up to system-theoretic quantit… ▽ More In this paper, we study the statistical difficulty of learning to control linear systems. We focus on two standard benchmarks, the sample complexity of stabilization, and the regret of the online learning of the Linear Quadratic Regulator (LQR). Prior results state that the statistical difficulty for both benchmarks scales polynomially with the system state dimension up to system-theoretic quantities. However, this does not reveal the whole picture. By utilizing minimax lower bounds for both benchmarks, we prove that there exist non-trivial classes of systems for which learning complexity scales dramatically, i.e. exponentially, with the system dimension. This situation arises in the case of underactuated systems, i.e. systems with fewer inputs than states. Such systems are structurally difficult to control and their system theoretic quantities can scale exponentially with the system dimension dominating learning complexity. Under some additional structural assumptions (bounding systems away from uncontrollability), we provide qualitatively matching upper bounds. We prove that learning complexity can be at most exponential with the controllability index of the system, that is the degree of underactuation. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: Accepted to COLT 2022

Showing 1–13 of 13 results for author: Ziemann, I