Search | arXiv e-print repository

Safe exploration in reproducing kernel Hilbert spaces

Authors: Abdullah Tokmak, Kiran G. Krishnan, Thomas B. Schön, Dominik Baumann

Abstract: Popular safe Bayesian optimization (BO) algorithms learn control policies for safety-critical systems in unknown environments. However, most algorithms make a smoothness assumption, which is encoded by a known bounded norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it remains unclear how to reliably obtain the RKHS norm of an unknown fun… ▽ More Popular safe Bayesian optimization (BO) algorithms learn control policies for safety-critical systems in unknown environments. However, most algorithms make a smoothness assumption, which is encoded by a known bounded norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it remains unclear how to reliably obtain the RKHS norm of an unknown function. In this work, we propose a safe BO algorithm capable of estimating the RKHS norm from data. We provide statistical guarantees on the RKHS norm estimation, integrate the estimated RKHS norm into existing confidence intervals and show that we retain theoretical guarantees, and prove safety of the resulting safe BO algorithm. We apply our algorithm to safely optimize reinforcement learning policies on physics simulators and on a real inverted pendulum, demonstrating improved performance, safety, and scalability compared to the state-of-the-art. △ Less

Submitted 13 March, 2025; originally announced March 2025.

Comments: Accepted to AISTATS 2025

arXiv:2410.13526 [pdf]

Generative Adversarial Synthesis of Radar Point Cloud Scenes

Authors: Muhammad Saad Nawaz, Thomas Dallmann, Torsten Schoen, Dirk Heberling

Abstract: For the validation and verification of automotive radars, datasets of realistic traffic scenarios are required, which, how ever, are laborious to acquire. In this paper, we introduce radar scene synthesis using GANs as an alternative to the real dataset acquisition and simulation-based approaches. We train a PointNet++ based GAN model to generate realistic radar point cloud scenes and use a binary… ▽ More For the validation and verification of automotive radars, datasets of realistic traffic scenarios are required, which, how ever, are laborious to acquire. In this paper, we introduce radar scene synthesis using GANs as an alternative to the real dataset acquisition and simulation-based approaches. We train a PointNet++ based GAN model to generate realistic radar point cloud scenes and use a binary classifier to evaluate the performance of scenes generated using this model against a test set of real scenes. We demonstrate that our GAN model achieves similar performance (~87%) to the real scenes test set. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: ICMIM 2024; 7th IEEE MTT Conference

arXiv:2409.01163 [pdf, other]

PACSBO: Probably approximately correct safe Bayesian optimization

Authors: Abdullah Tokmak, Thomas B. Schön, Dominik Baumann

Abstract: Safe Bayesian optimization (BO) algorithms promise to find optimal control policies without knowing the system dynamics while at the same time guaranteeing safety with high probability. In exchange for those guarantees, popular algorithms require a smoothness assumption: a known upper bound on a norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space… ▽ More Safe Bayesian optimization (BO) algorithms promise to find optimal control policies without knowing the system dynamics while at the same time guaranteeing safety with high probability. In exchange for those guarantees, popular algorithms require a smoothness assumption: a known upper bound on a norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it is unclear how to, in practice, obtain an upper bound of an unknown function in its corresponding RKHS. In response, we propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data and investigate its theoretical properties. Moreover, akin to Lipschitz-based methods, we treat the RKHS norm as a local rather than a global object, and thus reduce conservatism. Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO, an algorithm for probably approximately correct safe Bayesian optimization, for which we provide numerical and hardware experiments that demonstrate its applicability and benefits over popular safe BO algorithms. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Accepted to the Symposium on Systems Theory in Data and Optimization (SysDO 2024). This is a preprint of the final version, which is to appear in Lecture Notes in Control and Information Sciences - Proceedings

arXiv:2408.12266 [pdf, other]

Accounts of using the Tustin-Net architecture on a rotary inverted pendulum

Authors: Stijn van Esch, Fabio Bonassi, Thomas B. Schön

Abstract: In this report we investigate the use of the Tustin neural network architecture (Tustin-Net) for the identification of a physical rotary inverse pendulum. This physics-based architecture is of particular interest as it builds on the known relationship between velocities and positions. We here aim at discussing the advantages, limitations and performance of Tustin-Nets compared to first-principles… ▽ More In this report we investigate the use of the Tustin neural network architecture (Tustin-Net) for the identification of a physical rotary inverse pendulum. This physics-based architecture is of particular interest as it builds on the known relationship between velocities and positions. We here aim at discussing the advantages, limitations and performance of Tustin-Nets compared to first-principles grey-box models on a real physical apparatus, showing how, with a standard training procedure, the former can hardly achieve the same accuracy as the latter. To address this limitation, we present a training strategy based on transfer learning that yields Tustin-Nets that are competitive with the first-principles model, without requiring extensive knowledge of the setup as the latter. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2403.05860 [pdf, other]

doi 10.1109/LCSYS.2024.3403473

On the equivalence of direct and indirect data-driven predictive control approaches

Authors: Per Mattsson, Fabio Bonassi, Valentina Breschi, Thomas B. Schön

Abstract: Recently, several direct Data-Driven Predictive Control (DDPC) methods have been proposed, advocating the possibility of designing predictive controllers from historical input-output trajectories without the need to identify a model. In this work, we show that these approaches are equivalent to an indirect approach. Reformulating the direct methods in terms of estimated parameters and covariance m… ▽ More Recently, several direct Data-Driven Predictive Control (DDPC) methods have been proposed, advocating the possibility of designing predictive controllers from historical input-output trajectories without the need to identify a model. In this work, we show that these approaches are equivalent to an indirect approach. Reformulating the direct methods in terms of estimated parameters and covariance matrices allows us to give new insights into how they work in comparison with, for example, Subspace Predictive Control (SPC). In particular, we show that for unconstrained problems the direct methods are equivalent to SPC with a reduced weight on the tracking cost. Via a numerical experiment, motivated by the reformulation, we also illustrate why the performance of direct DDPC methods with fixed regularization tends to degrade as the number of training samples increases. △ Less

Submitted 20 May, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2402.04080 [pdf, other]

Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning

Authors: Ruoqi Zhang, Ziwei Luo, Jens Sjölund, Thomas B. Schön, Per Mattsson

Abstract: This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL). At the core is a mean-reverting stochastic differential equation (SDE) that transfers a complex action distribution into a standard Gaussian and then samples actions conditioned on the environment state with a corresponding reverse-time SDE, like a typical diffusion policy. We show that… ▽ More This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL). At the core is a mean-reverting stochastic differential equation (SDE) that transfers a complex action distribution into a standard Gaussian and then samples actions conditioned on the environment state with a corresponding reverse-time SDE, like a typical diffusion policy. We show that such an SDE has a solution that we can use to calculate the log probability of the policy, yielding an entropy regularizer that improves the exploration of offline datasets. To mitigate the impact of inaccurate value functions from out-of-distribution data points, we further propose to learn the lower confidence bound of Q-ensembles for more robust policy improvement. By combining the entropy-regularized diffusion policy with Q-ensembles in offline RL, our method achieves state-of-the-art performance on most tasks in D4RL benchmarks. Code is available at https://github.com/ruoqizzz/Entropy-Regularized-Diffusion-Policy-with-QEnsemble. △ Less

Submitted 8 January, 2025; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2312.06211 [pdf, other]

Structured state-space models are deep Wiener models

Authors: Fabio Bonassi, Carl Andersson, Per Mattsson, Thomas B. Schön

Abstract: The goal of this paper is to provide a system identification-friendly introduction to the Structured State-space Models (SSMs). These models have become recently popular in the machine learning community since, owing to their parallelizability, they can be efficiently and scalably trained to tackle extremely-long sequence classification and regression problems. Interestingly, SSMs appear as an eff… ▽ More The goal of this paper is to provide a system identification-friendly introduction to the Structured State-space Models (SSMs). These models have become recently popular in the machine learning community since, owing to their parallelizability, they can be efficiently and scalably trained to tackle extremely-long sequence classification and regression problems. Interestingly, SSMs appear as an effective way to learn deep Wiener models, which allows to reframe SSMs as an extension of a model class commonly used in system identification. In order to stimulate a fruitful exchange of ideas between the machine learning and system identification communities, we deem it useful to summarize the recent contributions on the topic in a structured and accessible form. At last, we highlight future research directions for which this community could provide impactful contributions. △ Less

Submitted 20 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: \c{opyright} 2024 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND

arXiv:2306.16042 [pdf, other]

Guarantees for data-driven control of nonlinear systems using semidefinite programming: A survey

Authors: Tim Martin, Thomas B. Schön, Frank Allgöwer

Abstract: This survey presents recent research on determining control-theoretic properties and designing controllers with rigorous guarantees using semidefinite programming and for nonlinear systems for which no mathematical models but measured trajectories are available. Data-driven control techniques have been developed to circumvent a time-consuming modelling by first principles and because of the increa… ▽ More This survey presents recent research on determining control-theoretic properties and designing controllers with rigorous guarantees using semidefinite programming and for nonlinear systems for which no mathematical models but measured trajectories are available. Data-driven control techniques have been developed to circumvent a time-consuming modelling by first principles and because of the increasing availability of data. Recently, this research field has gained increased attention by the application of Willems' fundamental lemma, which provides a fertile ground for the development of data-driven control schemes with guarantees for linear time-invariant systems. While the fundamental lemma can be generalized to further system classes, there does not exist a comparable data-based system representation for nonlinear systems. At the same time, nonlinear systems constitute the majority of practical systems. Moreover, they include additional challenges such as data-based surrogate models that prevent system analysis and controller design by convex optimization. Therefore, a variety of data-driven control approaches has been developed with different required prior insights into the system to ensure a guaranteed inference. In this survey, we will discuss developments in the context of data-driven control for nonlinear systems. In particular, we will focus on methods based on system representations providing guarantees from finite data, while the analysis and the controller design boil down to convex optimization problems given as SDP. Thus, these approaches achieve reasonable advances compared to the state-of-the-art system analysis and controller design by models from system identification. Specifically, the paper covers system representations based on extensions of Willems' fundamental lemma, set membership, kernel techniques, the Koopman operator, and feedback linearization. △ Less

Submitted 3 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.03953 [pdf, other]

doi 10.1017/dce.2024.12

Rao-Blackwellized Particle Smoothing for Simultaneous Localization and Mapping

Authors: Manon Kok, Arno Solin, Thomas B. Schön

Abstract: Simultaneous localization and mapping (SLAM) is the task of building a map representation of an unknown environment while at the same time using it for positioning. A probabilistic interpretation of the SLAM task allows for incorporating prior knowledge and for operation under uncertainty. Contrary to the common practice of computing point estimates of the system states, we capture the full poster… ▽ More Simultaneous localization and mapping (SLAM) is the task of building a map representation of an unknown environment while at the same time using it for positioning. A probabilistic interpretation of the SLAM task allows for incorporating prior knowledge and for operation under uncertainty. Contrary to the common practice of computing point estimates of the system states, we capture the full posterior density through approximate Bayesian inference. This dynamic learning task falls under state estimation, where the state-of-the-art is in sequential Monte Carlo methods that tackle the forward filtering problem. In this paper, we introduce a framework for probabilistic SLAM using particle smoothing that does not only incorporate observed data in current state estimates, but it also back-tracks the updated knowledge to correct for past drift and ambiguities in both the map and in the states. Our solution can efficiently handle both dense and sparse map representations by Rao-Blackwellization of conditionally linear and conditionally linearized models. We show through simulations and real-world experiments how the principles apply to radio (BLE/Wi-Fi), magnetic field, and visual SLAM. The proposed solution is general, efficient, and works well under confounding noise. △ Less

Submitted 5 June, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 23 pages, 7 figures

Journal ref: DCE 5 (2024) e15

arXiv:2304.00559 [pdf, ps, other]

On the trade-off between event-based and periodic state estimation under bandwidth constraints

Authors: Dominik Baumann, Thomas B. Schön

Abstract: Event-based methods carefully select when to transmit information to enable high-performance control and estimation over resource-constrained communication networks. However, they come at a cost. For instance, event-based communication induces a higher computational load and increases the complexity of the scheduling problem. Thus, in some cases, allocating available slots to agents periodically i… ▽ More Event-based methods carefully select when to transmit information to enable high-performance control and estimation over resource-constrained communication networks. However, they come at a cost. For instance, event-based communication induces a higher computational load and increases the complexity of the scheduling problem. Thus, in some cases, allocating available slots to agents periodically in circular order may be superior. In this article, we discuss, for a specific example, when the additional complexity of event-based methods is beneficial. We evaluate our analysis in a synthetical example and on 20 simulated cart-pole systems. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: 6 pages

arXiv:2303.05043 [pdf, other]

doi 10.1109/LSP.2023.3275499

Invertible Kernel PCA with Random Fourier Features

Authors: Daniel Gedon, Antôni H. Ribeiro, Niklas Wahlström, Thomas B. Schön

Abstract: Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA -- an important task for denoising -- requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows n… ▽ More Kernel principal component analysis (kPCA) is a widely studied method to construct a low-dimensional data representation after a nonlinear transformation. The prevailing method to reconstruct the original input signal from kPCA -- an important task for denoising -- requires us to solve a supervised learning problem. In this paper, we present an alternative method where the reconstruction follows naturally from the compression step. We first approximate the kernel with random Fourier features. Then, we exploit the fact that the nonlinear transformation is invertible in a certain subdomain. Hence, the name \emph{invertible kernel PCA (ikPCA)}. We experiment with different data modalities and show that ikPCA performs similarly to kPCA with supervised reconstruction on denoising tasks, making it a strong alternative. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2301.12832 [pdf, other]

Deep networks for system identification: a Survey

Authors: Gianluigi Pillonetto, Aleksandr Aravkin, Daniel Gedon, Lennart Ljung, Antônio H. Ribeiro, Thomas B. Schön

Abstract: Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the adv… ▽ More Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the advances of deep neural networks to enrich the possible range of models to choose from. For this reason, we provide a survey of deep learning from a system identification perspective. We cover a wide spectrum of topics to enable researchers to understand the methods, providing rigorous practical and theoretical insights into the benefits and challenges of using them. The main aim of the identified model is to predict new data from previous observations. This can be achieved with different deep learning based modelling techniques and we discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks. Their parameters have to be estimated from past data trying to optimize the prediction performance. For this purpose, we discuss a specific set of first-order optimization tools that is emerged as efficient. The survey then draws connections to the well-studied area of kernel-based methods. They control the data fit by regularization terms that penalize models not in line with prior assumptions. We illustrate how to cast them in deep architectures to obtain deep kernel-based methods. The success of deep learning also resulted in surprising empirical observations, like the counter-intuitive behaviour of models with many parameters. We discuss the role of overparameterized models, including their connection to kernels, as well as implicit regularization mechanisms which affect generalization, specifically the interesting phenomena of benign overfitting ... △ Less

Submitted 30 January, 2023; originally announced January 2023.

arXiv:2212.13890 [pdf, other]

ECG-Based Electrolyte Prediction: Evaluating Regression and Probabilistic Methods

Authors: Philipp Von Bachmann, Daniel Gedon, Fredrik K. Gustafsson, Antônio H. Ribeiro, Erik Lampa, Stefan Gustafsson, Johan Sundström, Thomas B. Schön

Abstract: Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple… ▽ More Objective: Imbalances of the electrolyte concentration levels in the body can lead to catastrophic consequences, but accurate and accessible measurements could improve patient outcomes. While blood tests provide accurate measurements, they are invasive and the laboratory analysis can be slow or inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool which is quick and simple to acquire. However, the problem of estimating continuous electrolyte concentrations directly from ECGs is not well-studied. We therefore investigate if regression methods can be used for accurate ECG-based prediction of electrolyte concentrations. Methods: We explore the use of deep neural networks (DNNs) for this task. We analyze the regression performance across four electrolytes, utilizing a novel dataset containing over 290000 ECGs. For improved understanding, we also study the full spectrum from continuous predictions to binary classification of extreme concentration levels. To enhance clinical usefulness, we finally extend to a probabilistic regression approach and evaluate different uncertainty estimates. Results: We find that the performance varies significantly between different electrolytes, which is clinically justified in the interplay of electrolytes and their manifestation in the ECG. We also compare the regression accuracy with that of traditional machine learning models, demonstrating superior performance of DNNs. Conclusion: Discretization can lead to good classification performance, but does not help solve the original problem of predicting continuous concentration levels. While probabilistic regression demonstrates potential practical usefulness, the uncertainty estimates are not particularly well-calibrated. Significance: Our study is a first step towards accurate and reliable ECG-based prediction of electrolyte concentration levels. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: Code and trained models are available at https://github.com/philippvb/ecg-electrolyte-regression

arXiv:2211.05639 [pdf, other]

Gaussian inference for data-driven state-feedback design of nonlinear systems

Authors: Tim Martin, Thomas B. Schön, Frank Allgöwer

Abstract: Data-driven control of nonlinear systems with rigorous guarantees is a challenging problem as it usually calls for nonconvex optimization and requires often knowledge of the true basis functions of the system dynamics. To tackle these drawbacks, this work is based on a data-driven polynomial representation of general nonlinear systems exploiting Taylor polynomials. Thereby, we design state-feedbac… ▽ More Data-driven control of nonlinear systems with rigorous guarantees is a challenging problem as it usually calls for nonconvex optimization and requires often knowledge of the true basis functions of the system dynamics. To tackle these drawbacks, this work is based on a data-driven polynomial representation of general nonlinear systems exploiting Taylor polynomials. Thereby, we design state-feedback laws that render a known equilibrium point globally asymptotically stable while operating with respect to a desired quadratic performance criterion. The calculation of the polynomial state feedback boils down to a single sum-ofsquares optimization problem, and hence to computationally tractable linear matrix inequalities. Moreover, we examine state-input data in presence of Gaussian noise by Bayesian inference to overcome the conservatism of deterministic noise characterizations from recent data-driven control approaches for Gaussian noise. △ Less

Submitted 24 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: Final version, accepted for presentation at the 22nd IFAC World Congress, 2023

arXiv:2205.12695 [pdf, other]

Surprises in adversarially-trained linear regression

Authors: Antônio H. Ribeiro, Dave Zachariah, Thomas B. Schön

Abstract: State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against such examples. It is formulated as a min-max problem, searching for the best solution when the training data was corrupted by the worst-case attacks. For linear regression problems, adversarial training can… ▽ More State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against such examples. It is formulated as a min-max problem, searching for the best solution when the training data was corrupted by the worst-case attacks. For linear regression problems, adversarial training can be formulated as a convex problem. We use this reformulation to make two technical contributions: First, we formulate the training problem as an instance of robust regression to reveal its connection to parameter-shrinking methods, specifically that $\ell_\infty$-adversarial training produces sparse solutions. Secondly, we study adversarial training in the overparameterized regime, i.e. when there are more parameters than data. We prove that adversarial training with small disturbances gives the solution with the minimum-norm that interpolates the training data. Ridge regression and lasso approximate such interpolating solutions as their regularization parameter vanishes. By contrast, for adversarial training, the transition into the interpolation regime is abrupt and for non-zero values of disturbance. This result is proved and illustrated with numerical examples. △ Less

Submitted 20 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.06306 [pdf, other]

doi 10.1109/TSP.2023.3245720

Probabilistic Estimation of Instantaneous Frequencies of Chirp Signals

Authors: Zheng Zhao, Simo Särkkä, Jens Sjölund, Thomas B. Schön

Abstract: We present a continuous-time probabilistic approach for estimating the chirp signal and its instantaneous frequency function when the true forms of these functions are not accessible. Our model represents these functions by non-linearly cascaded Gaussian processes represented as non-linear stochastic differential equations. The posterior distribution of the functions is then estimated with stochas… ▽ More We present a continuous-time probabilistic approach for estimating the chirp signal and its instantaneous frequency function when the true forms of these functions are not accessible. Our model represents these functions by non-linearly cascaded Gaussian processes represented as non-linear stochastic differential equations. The posterior distribution of the functions is then estimated with stochastic filters and smoothers. We compute a (posterior) Cramér--Rao lower bound for the Gaussian process model, and derive a theoretical upper bound for the estimation error in the mean squared sense. The experiments show that the proposed method outperforms a number of state-of-the-art methods on a synthetic data. We also show that the method works out-of-the-box for two real-world datasets. △ Less

Submitted 13 February, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted for publication in IEEE Transactions on Signal Processing

arXiv:2204.06274 [pdf, other]

doi 10.1109/TSP.2023.3246228

Overparameterized Linear Regression under Adversarial Attacks

Authors: Antônio H. Ribeiro, Thomas B. Schön

Abstract: We study the error of linear regression in the face of adversarial attacks. In this framework, an adversary changes the input to the regression model in order to maximize the prediction error. We provide bounds on the prediction error in the presence of an adversary as a function of the parameter norm and the error in the absence of such an adversary. We show how these bounds make it possible to s… ▽ More We study the error of linear regression in the face of adversarial attacks. In this framework, an adversary changes the input to the regression model in order to maximize the prediction error. We provide bounds on the prediction error in the presence of an adversary as a function of the parameter norm and the error in the absence of such an adversary. We show how these bounds make it possible to study the adversarial error using analysis from non-adversarial setups. The obtained results shed light on the robustness of overparameterized linear models to adversarial attacks. Adding features might be either a source of additional robustness or brittleness. On the one hand, we use asymptotic results to illustrate how double-descent curves can be obtained for the adversarial error. On the other hand, we derive conditions under which the adversarial error can grow to infinity as more features are added, while at the same time, the test error goes to zero. We show this behavior is caused by the fact that the norm of the parameter vector grows with the number of features. It is also established that $\ell_\infty$ and $\ell_2$-adversarial attacks might behave fundamentally differently due to how the $\ell_1$ and $\ell_2$-norms of random projections concentrate. We also show how our reformulation allows for solving adversarial training as a convex optimization problem. This fact is then exploited to establish similarities between adversarial training and parameter-shrinking methods and to study how the training might affect the robustness of the estimated models. △ Less

Submitted 27 January, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

arXiv:2104.13853 [pdf, other]

Learning deep autoregressive models for hierarchical data

Authors: Carl R. Andersson, Niklas Wahlström, Thomas B. Schön

Abstract: We propose a model for hierarchical structured data as an extension to the stochastic temporal convolutional network. The proposed model combines an autoregressive model with a hierarchical variational autoencoder and downsampling to achieve superior computational complexity. We evaluate the proposed model on two different types of sequential data: speech and handwritten text. The results are prom… ▽ More We propose a model for hierarchical structured data as an extension to the stochastic temporal convolutional network. The proposed model combines an autoregressive model with a hierarchical variational autoencoder and downsampling to achieve superior computational complexity. We evaluate the proposed model on two different types of sequential data: speech and handwritten text. The results are promising with the proposed model achieving state-of-the-art performance. △ Less

Submitted 1 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

arXiv:2103.08782 [pdf, other]

doi 10.1109/LCSYS.2021.3090349

Data to Controller for Nonlinear Systems: An Approximate Solution

Authors: Johannes N. Hendriks, James R. Z. Holdsworth, Adrian G. Wills, Thomas B. Schon, Brett Ninness

Abstract: This paper considers the problem of determining an optimal control action based on observed data. We formulate the problem assuming that the system can be modelled by a nonlinear state-space model, but where the model parameters, state and future disturbances are not known and are treated as random variables. Central to our formulation is that the joint distribution of these unknown objects is con… ▽ More This paper considers the problem of determining an optimal control action based on observed data. We formulate the problem assuming that the system can be modelled by a nonlinear state-space model, but where the model parameters, state and future disturbances are not known and are treated as random variables. Central to our formulation is that the joint distribution of these unknown objects is conditioned on the observed data. Crucially, as new measurements become available, this joint distribution continues to evolve so that control decisions are made accounting for uncertainty as evidenced in the data. The resulting problem is intractable which we obviate by providing approximations that result in finite dimensional deterministic optimisation problems. The proposed approach is demonstrated in simulation on a nonlinear system. △ Less

Submitted 30 June, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

Comments: in IEEE Control Systems Letters, 2021

arXiv:2103.00930 [pdf, other]

doi 10.23919/FUSION49751.2022.9841369

Unsupervised dynamic modeling of medical image transformation

Authors: Niklas Gunnarsson, Peter Kimstrand, Jens Sjölund, Thomas B. Schön

Abstract: Spatiotemporal imaging has applications in e.g. cardiac diagnostics, surgical guidance, and radiotherapy monitoring, In this paper, we explain the temporal motion by identifying the underlying dynamics, only based on the sequential images. Our dynamical model maps the inputs of observed high-dimensional sequential images to a low-dimensional latent space wherein a linear relationship between a hid… ▽ More Spatiotemporal imaging has applications in e.g. cardiac diagnostics, surgical guidance, and radiotherapy monitoring, In this paper, we explain the temporal motion by identifying the underlying dynamics, only based on the sequential images. Our dynamical model maps the inputs of observed high-dimensional sequential images to a low-dimensional latent space wherein a linear relationship between a hidden state process and the lower-dimensional representation of the inputs holds. For this, we use a conditional variational auto-encoder (CVAE) to nonlinearly map the higher-dimensional image to a lower-dimensional space, wherein we model the dynamics with a linear Gaussian state-space model (LG-SSM). The model, a modified version of the Kalman variational auto-encoder, is end-to-end trainable, and the weights, both in the CVAE and LG-SSM, are simultaneously updated by maximizing the evidence lower bound of the marginal likelihood. In contrast to the original model, we explain the motion with a spatial transformation from one image to another. This results in sharper reconstructions and the possibility of transferring auxiliary information, such as segmentation, through the image sequence. Our experiments, on cardiac ultrasound time series, show that the dynamic model outperforms traditional image registration in execution time, to a similar performance. Further, our model offers the possibility to impute and extrapolate for missing samples. △ Less

Submitted 7 November, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

Comments: published in 2022 25th International Conference on Information Fusion (FUSION)

arXiv:2102.07757 [pdf, other]

How Convolutional Neural Networks Deal with Aliasing

Authors: Antônio H. Ribeiro, Thomas B. Schön

Abstract: The convolutional neural network (CNN) remains an essential tool in solving computer vision problems. Standard convolutional architectures consist of stacked layers of operations that progressively downscale the image. Aliasing is a well-known side-effect of downsampling that may take place: it causes high-frequency components of the original signal to become indistinguishable from its low-frequen… ▽ More The convolutional neural network (CNN) remains an essential tool in solving computer vision problems. Standard convolutional architectures consist of stacked layers of operations that progressively downscale the image. Aliasing is a well-known side-effect of downsampling that may take place: it causes high-frequency components of the original signal to become indistinguishable from its low-frequency components. While downsampling takes place in the max-pooling layers or in the strided-convolutions in these models, there is no explicit mechanism that prevents aliasing from taking place in these layers. Due to the impressive performance of these models, it is natural to suspect that they, somehow, implicitly deal with this distortion. The question we aim to answer in this paper is simply: "how and to what extent do CNNs counteract aliasing?" We explore the question by means of two examples: In the first, we assess the CNNs capability of distinguishing oscillations at the input, showing that the redundancies in the intermediate channels play an important role in succeeding at the task; In the second, we show that an image classifier CNN while, in principle, capable of implementing anti-aliasing filters, does not prevent aliasing from taking place in the intermediate layers. △ Less

Submitted 15 February, 2021; originally announced February 2021.

Comments: To appear in the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2012.06341 [pdf, other]

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

Authors: Antônio H. Ribeiro, Johannes N. Hendriks, Adrian G. Wills, Thomas B. Schön

Abstract: System identification aims to build models of dynamical systems from data. Traditionally, choosing the model requires the designer to balance between two goals of conflicting nature; the model must be rich enough to capture the system dynamics, but not so flexible that it learns spurious random effects from the dataset. It is typically observed that the model validation performance follows a U-sha… ▽ More System identification aims to build models of dynamical systems from data. Traditionally, choosing the model requires the designer to balance between two goals of conflicting nature; the model must be rich enough to capture the system dynamics, but not so flexible that it learns spurious random effects from the dataset. It is typically observed that the model validation performance follows a U-shaped curve as the model complexity increases. Recent developments in machine learning and statistics, however, have observed situations where a "double-descent" curve subsumes this U-shaped model-performance curve. With a second decrease in performance occurring beyond the point where the model has reached the capacity of interpolating - i.e., (near) perfectly fitting - the training data. To the best of our knowledge, such phenomena have not been studied within the context of dynamic systems. The present paper aims to answer the question: "Can such a phenomenon also be observed when estimating parameters of dynamic systems?" We show that the answer is yes, verifying such behavior experimentally both for artificially generated and real-world datasets. △ Less

Submitted 6 August, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: To appear in the Proceedings of the 19th IFAC Symposium in System Identification (2021)

arXiv:2012.05072 [pdf, ps, other]

Variational System Identification for Nonlinear State-Space Models

Authors: Jarrad Courts, Adrian Wills, Thomas Schön, Brett Ninness

Abstract: This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is… ▽ More This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is deterministic, tractable and can be solved using standard optimisation tools. A specialisation of this approach for systems with additive Gaussian noise is also detailed. The proposed method is examined numerically on a range of simulated and real examples focusing on the robustness to parameter initialisation; additionally, favourable comparisons are performed against state-of-the-art alternatives. △ Less

Submitted 14 September, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

arXiv:2012.04136 [pdf, other]

Deep Energy-Based NARX Models

Authors: Johannes N. Hendriks, Fredrik K. Gustafsson, Antônio H. Ribeiro, Adrian G. Wills, Thomas B. Schön

Abstract: This paper is directed towards the problem of learning nonlinear ARX models based on system input--output data. In particular, our interest is in learning a conditional distribution of the current output based on a finite window of past inputs and outputs. To achieve this, we consider the use of so-called energy-based models, which have been developed in allied fields for learning unknown distribu… ▽ More This paper is directed towards the problem of learning nonlinear ARX models based on system input--output data. In particular, our interest is in learning a conditional distribution of the current output based on a finite window of past inputs and outputs. To achieve this, we consider the use of so-called energy-based models, which have been developed in allied fields for learning unknown distributions based on data. This energy-based model relies on a general function to describe the distribution, and here we consider a deep neural network for this purpose. The primary benefit of this approach is that it is capable of learning both simple and highly complex noise models, which we demonstrate on simulated and experimental data. △ Less

Submitted 7 December, 2020; originally announced December 2020.

arXiv:2003.14162 [pdf, other]

Deep State Space Models for Nonlinear System Identification

Authors: Daniel Gedon, Niklas Wahlström, Thomas B. Schön, Lennart Ljung

Abstract: Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertaint… ▽ More Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertainty of the system to be modelled. In this work a deep SSM class and its parameter learning algorithm are explained in an effort to extend the toolbox of nonlinear identification methods with a deep learning based method. Six recent deep SSMs are evaluated in a first unified implementation on nonlinear system identification benchmarks. △ Less

Submitted 18 June, 2021; v1 submitted 31 March, 2020; originally announced March 2020.

arXiv:2003.10819 [pdf, ps, other]

Registration by tracking for sequential 2D MRI

Authors: Niklas Gunnarsson, Jens Sjölund, Thomas B. Schön

Abstract: Our anatomy is in constant motion. With modern MR imaging it is possible to record this motion in real-time during an ongoing radiation therapy session. In this paper we present an image registration method that exploits the sequential nature of 2D MR images to estimate the corresponding displacement field. The method employs several discriminative correlation filters that independently track spec… ▽ More Our anatomy is in constant motion. With modern MR imaging it is possible to record this motion in real-time during an ongoing radiation therapy session. In this paper we present an image registration method that exploits the sequential nature of 2D MR images to estimate the corresponding displacement field. The method employs several discriminative correlation filters that independently track specific points. Together with a sparse-to-dense interpolation scheme we can then estimate of the displacement field. The discriminative correlation filters are trained online, and our method is modality agnostic. For the interpolation scheme we use a neural network with normalized convolutions that is trained using synthetic diffeomorphic displacement fields. The method is evaluated on a segmented cardiac dataset and when compared to two conventional methods we observe an improved performance. This improvement is especially pronounced when it comes to the detection of larger motions of small objects. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: Currently under review for a conference

arXiv:1910.00463 [pdf, other]

doi 10.1109/LSP.2019.2943995

A Fast and Robust Algorithm for Orientation Estimation using Inertial Sensors

Authors: Manon Kok, Thomas B. Schön

Abstract: We present a novel algorithm for online, real-time orientation estimation. Our algorithm integrates gyroscope data and corrects the resulting orientation estimate for integration drift using accelerometer and magnetometer data. This correction is computed, at each time instance, using a single gradient descent step with fixed step length. This fixed step length results in robustness against model… ▽ More We present a novel algorithm for online, real-time orientation estimation. Our algorithm integrates gyroscope data and corrects the resulting orientation estimate for integration drift using accelerometer and magnetometer data. This correction is computed, at each time instance, using a single gradient descent step with fixed step length. This fixed step length results in robustness against model errors, e.g. caused by large accelerations or by short-term magnetic field disturbances, which we numerically illustrate using Monte Carlo simulations. Our algorithm estimates a three-dimensional update to the orientation rather than the entire orientation itself. This reduces the computational complexity by approximately 1/3 with respect to the state of the art. It also improves the quality of the resulting estimates, specifically when the orientation corrections are large. We illustrate the efficacy of the algorithm using experimental data. △ Less

Submitted 1 October, 2019; originally announced October 2019.

Comments: 9 pages, 2 figures

Journal ref: IEEE Signal Processing Letters, 2019

arXiv:1909.01730 [pdf, other]

Deep Convolutional Networks in System Identification

Authors: Carl Andersson, Antônio H. Ribeiro, Koen Tiels, Niklas Wahlström, Thomas B. Schön

Abstract: Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore t… ▽ More Recent developments within deep learning are relevant for nonlinear system identification problems. In this paper, we establish connections between the deep learning and the system identification communities. It has recently been shown that convolutional architectures are at least as capable as recurrent architectures when it comes to sequence modeling tasks. Inspired by these results we explore the explicit relationships between the recently proposed temporal convolutional network (TCN) and two classic system identification model structures; Volterra series and block-oriented models. We end the paper with an experimental study where we provide results on two real-world problems, the well-known Silverbox dataset and a newer dataset originating from ground vibration experiments on an F-16 fighter aircraft. △ Less

Submitted 19 November, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

Comments: Accepted to Conference on Decision and Control, The first two authors contributed equally

arXiv:1909.01238 [pdf, other]

Stochastic quasi-Newton with line-search regularization

Authors: Adrian Wills, Thomas Schön

Abstract: In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these i… ▽ More In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models. △ Less

Submitted 3 September, 2019; originally announced September 2019.

arXiv:1905.00820 [pdf, other]

doi 10.1016/j.automatica.2020.109158

On the smoothness of nonlinear system identification

Authors: Antônio H. Ribeiro, Koen Tiels, Jack Umenberger, Thomas B. Schön, Luis A. Aguirre

Abstract: We shed new light on the \textit{smoothness} of optimization problems arising in prediction error parameter estimation of linear and nonlinear systems. We show that for regions of the parameter space where the model is not contractive, the Lipschitz constant and $β$-smoothness of the objective function might blow up exponentially with the simulation length, making it hard to numerically find minim… ▽ More We shed new light on the \textit{smoothness} of optimization problems arising in prediction error parameter estimation of linear and nonlinear systems. We show that for regions of the parameter space where the model is not contractive, the Lipschitz constant and $β$-smoothness of the objective function might blow up exponentially with the simulation length, making it hard to numerically find minima within those regions or, even, to escape from them. In addition to providing theoretical understanding of this problem, this paper also proposes the use of multiple shooting as a viable solution. The proposed method minimizes the error between a prediction model and the observed values. Rather than running the prediction model over the entire dataset, multiple shooting splits the data into smaller subsets and runs the prediction model over each subset, making the simulation length a design parameter and making it possible to solve problems that would be infeasible using a standard approach. The equivalence to the original problem is obtained by including constraints in the optimization. The new method is illustrated by estimating the parameters of nonlinear systems with chaotic or unstable behavior, as well as neural networks. We also present a comparative analysis of the proposed method with multi-step-ahead prediction error minimization. △ Less

Submitted 7 August, 2020; v1 submitted 2 May, 2019; originally announced May 2019.

Journal ref: Automatica, vol. 121, 109158, Nov. 2020

arXiv:1904.01949 [pdf, other]

doi 10.1038/s41467-020-15432-4

Automatic diagnosis of the 12-lead ECG using a deep neural network

Authors: Antônio H. Ribeiro, Manoel Horta Ribeiro, Gabriela M. M. Paixão, Derick M. Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton P. S. Ferreira, Carl R. Andersson, Peter W. Macfarlane, Wagner Meira Jr., Thomas B. Schön, Antonio Luiz P. Ribeiro

Abstract: The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it might improve clinical practice. Here we present a DN… ▽ More The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it might improve clinical practice. Here we present a DNN model trained in a dataset with more than 2 million labeled exams analyzed by the Telehealth Network of Minas Gerais and collected under the scope of the CODE (Clinical Outcomes in Digital Electrocardiology) study. The DNN outperform cardiology resident medical doctors in recognizing 6 types of abnormalities in 12-lead ECG recordings, with F1 scores above 80% and specificity over 99%. These results indicate ECG analysis based on DNNs, previously studied in a single-lead setup, generalizes well to 12-lead exams, taking the technology closer to the standard clinical practice. △ Less

Submitted 14 April, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

Comments: A preliminary version of this work titled: "Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network " was presented in the Machine Learning for Health Workshop at NeurIPS 2018 and was made available under a different identifier: arXiv:1811.12194. The current version subsumes all previous versions

Journal ref: Nature Communications 11, article number: 1760 (2020)

arXiv:1903.02250 [pdf, other]

Nonlinear input design as optimal control of a Hamiltonian system

Authors: Jack Umenberger, Thomas B. Schön

Abstract: We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior a… ▽ More We propose an input design method for a general class of parametric probabilistic models, including nonlinear dynamical systems with process noise. The goal of the procedure is to select inputs such that the parameter posterior distribution concentrates about the true value of the parameters; however, exact computation of the posterior is intractable. By representing (samples from) the posterior as trajectories from a certain Hamiltonian system, we transform the input design task into an optimal control problem. The method is illustrated via numerical examples, including MRI pulse sequence design. △ Less

Submitted 6 March, 2019; originally announced March 2019.

arXiv:1811.12194 [pdf, other]

Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network

Authors: Antônio H. Ribeiro, Manoel Horta Ribeiro, Gabriela Paixão, Derick Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton Pifano, Wagner Meira Jr., Thomas B. Schön, Antonio Luiz Ribeiro

Abstract: We present a model for predicting electrocardiogram (ECG) abnormalities in short-duration 12-lead ECG signals which outperformed medical doctors on the 4th year of their cardiology residency. Such exams can provide a full evaluation of heart activity and have not been studied in previous end-to-end machine learning papers. Using the database of a large telehealth network, we built a novel dataset… ▽ More We present a model for predicting electrocardiogram (ECG) abnormalities in short-duration 12-lead ECG signals which outperformed medical doctors on the 4th year of their cardiology residency. Such exams can provide a full evaluation of heart activity and have not been studied in previous end-to-end machine learning papers. Using the database of a large telehealth network, we built a novel dataset with more than 2 million ECG tracings, orders of magnitude larger than those used in previous studies. Moreover, our dataset is more realistic, as it consist of 12-lead ECGs recorded during standard in-clinics exams. Using this data, we trained a residual neural network with 9 convolutional layers to map 7 to 10 second ECG signals to 6 classes of ECG abnormalities. Future work should extend these results to cover a large range of ECG abnormalities, which could improve the accessibility of this diagnostic tool and avoid wrong diagnosis from medical doctors. △ Less

Submitted 17 February, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/82

arXiv:1808.05889 [pdf, ps, other]

Data Consistency Approach to Model Validation

Authors: Andreas Svensson, Dave Zachariah, Petre Stoica, Thomas B. Schön

Abstract: In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automat… ▽ More In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automatically gauging the models' ability to generate data that is similar to the observed data. Importantly, the criterion follows from the model class itself and is therefore directly applicable to a broad range of inference problems with varying data types, ranging from independent univariate data to high-dimensional time-series. The proposed data consistency criterion is illustrated, evaluated and compared to several well-established methods using three synthetic and two real data sets. △ Less

Submitted 20 May, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

Journal ref: IEEE Access, 7(1):59788-59796, 2019

arXiv:1801.08383 [pdf, other]

Data-Driven Impulse Response Regularization via Deep Learning

Authors: Carl Andersson, Niklas Wahlström, Thomas B. Schön

Abstract: We consider the problem of impulse response estimation of stable linear single-input single-output systems. It is a well-studied problem where flexible non-parametric models recently offered a leap in performance compared to the classical finite-dimensional model structures. Inspired by this development and the success of deep learning we propose a new flexible data-driven model. Our experiments i… ▽ More We consider the problem of impulse response estimation of stable linear single-input single-output systems. It is a well-studied problem where flexible non-parametric models recently offered a leap in performance compared to the classical finite-dimensional model structures. Inspired by this development and the success of deep learning we propose a new flexible data-driven model. Our experiments indicate that the new model is capable of exploiting even more of the hidden patterns that are present in the input-output data as compared to the non-parametric models. △ Less

Submitted 11 October, 2018; v1 submitted 25 January, 2018; originally announced January 2018.

arXiv:1712.02675 [pdf, other]

How consistent is my model with the data? Information-Theoretic Model Check

Authors: Andreas Svensson, Dave Zachariah, Thomas B. Schön

Abstract: The choice of model class is fundamental in statistical learning and system identification, no matter whether the class is derived from physical principles or is a generic black-box. We develop a method to evaluate the specified model class by assessing its capability of reproducing data that is similar to the observed data record. This model check is based on the information-theoretic properties… ▽ More The choice of model class is fundamental in statistical learning and system identification, no matter whether the class is derived from physical principles or is a generic black-box. We develop a method to evaluate the specified model class by assessing its capability of reproducing data that is similar to the observed data record. This model check is based on the information-theoretic properties of models viewed as data generators and is applicable to e.g. sequential data and nonlinear dynamical models. The method can be understood as a specific two-sided posterior predictive test. We apply the information-theoretic model check to both synthetic and real data and compare it with a classical whiteness test. △ Less

Submitted 19 December, 2017; v1 submitted 7 December, 2017; originally announced December 2017.

Comments: The title has been updated, but no other significant changes have been made from the previous version

arXiv:1711.10765 [pdf, other]

Learning nonlinear state-space models using smooth particle-filter-based likelihood approximations

Authors: Andreas Svensson, Fredrik Lindsten, Thomas B. Schön

Abstract: When classical particle filtering algorithms are used for maximum likelihood parameter estimation in nonlinear state-space models, a key challenge is that estimates of the likelihood function and its derivatives are inherently noisy. The key idea in this paper is to run a particle filter based on a current parameter estimate, but then use the output from this particle filter to re-evaluate the lik… ▽ More When classical particle filtering algorithms are used for maximum likelihood parameter estimation in nonlinear state-space models, a key challenge is that estimates of the likelihood function and its derivatives are inherently noisy. The key idea in this paper is to run a particle filter based on a current parameter estimate, but then use the output from this particle filter to re-evaluate the likelihood function approximation also for other parameter values. This results in a (local) deterministic approximation of the likelihood and any standard optimization routine can be applied to find the maximum of this local approximation. By iterating this procedure we eventually arrive at a final parameter estimate. △ Less

Submitted 29 November, 2017; originally announced November 2017.

arXiv:1710.04009 [pdf, other]

Regularized parametric system identification: a decision-theoretic formulation

Authors: Johan Wågberg, Dave Zachariah, Thomas B. Schön

Abstract: Parametric prediction error methods constitute a classical approach to the identification of linear dynamic systems with excellent large-sample properties. A more recent regularized approach, inspired by machine learning and Bayesian methods, has also gained attention. Methods based on this approach estimate the system impulse response with excellent small-sample properties. In several application… ▽ More Parametric prediction error methods constitute a classical approach to the identification of linear dynamic systems with excellent large-sample properties. A more recent regularized approach, inspired by machine learning and Bayesian methods, has also gained attention. Methods based on this approach estimate the system impulse response with excellent small-sample properties. In several applications, however, it is desirable to obtain a compact representation of the system in the form of a parametric model. By viewing the identification of such models as a decision, we develop a decision-theoretic formulation of the parametric system identification problem that bridges the gap between the classical and regularized approaches above. Using the output-error model class as an illustration, we show that this decision-theoretic approach leads to a regularized method that is robust to small sample-sizes as well as overparameterization. △ Less

Submitted 11 October, 2017; originally announced October 2017.

Comments: 10 pages, 8 figures

arXiv:1706.01042 [pdf, ps, other]

Optimal controller/observer gains of discounted-cost LQG systems

Authors: Hildo Bijl, Thomas B. Schön

Abstract: The linear-quadratic-Gaussian (LQG) control paradigm is well-known in literature. The strategy of minimizing the cost function is available, both for the case where the state is known and where it is estimated through an observer. The situation is different when the cost function has an exponential discount factor, also known as a prescribed degree of stability. In this case, the optimal control s… ▽ More The linear-quadratic-Gaussian (LQG) control paradigm is well-known in literature. The strategy of minimizing the cost function is available, both for the case where the state is known and where it is estimated through an observer. The situation is different when the cost function has an exponential discount factor, also known as a prescribed degree of stability. In this case, the optimal control strategy is only available when the state is known. This paper builds on from that result, deriving an optimal control strategy when working with an estimated state. Expressions for the resulting optimal expected cost are also given. △ Less

Submitted 7 December, 2018; v1 submitted 4 June, 2017; originally announced June 2017.

arXiv:1704.06053 [pdf, other]

doi 10.1561/2000000094

Using Inertial Sensors for Position and Orientation Estimation

Authors: Manon Kok, Jeroen D. Hol, Thomas B. Schön

Abstract: In recent years, MEMS inertial sensors (3D accelerometers and 3D gyroscopes) have become widely available due to their small size and low cost. Inertial sensor measurements are obtained at high sampling rates and can be integrated to obtain position and orientation information. These estimates are accurate on a short time scale, but suffer from integration drift over longer time scales. To overcom… ▽ More In recent years, MEMS inertial sensors (3D accelerometers and 3D gyroscopes) have become widely available due to their small size and low cost. Inertial sensor measurements are obtained at high sampling rates and can be integrated to obtain position and orientation information. These estimates are accurate on a short time scale, but suffer from integration drift over longer time scales. To overcome this issue, inertial sensors are typically combined with additional sensors and models. In this tutorial we focus on the signal processing aspects of position and orientation estimation using inertial sensors. We discuss different modeling choices and a selected number of important algorithms. The algorithms include optimization-based smoothing and filtering as well as computationally cheaper extended Kalman filter and complementary filter implementations. The quality of their estimates is illustrated using both experimental and simulated data. △ Less

Submitted 10 June, 2018; v1 submitted 20 April, 2017; originally announced April 2017.

Comments: 90 pages, 38 figures

Journal ref: Foundations and Trends in Signal Processing, Vol. 11: No. 1-2, Pages 1-153, 2017

arXiv:1703.02419 [pdf, ps, other]

doi 10.1016/j.ymssp.2017.10.033

Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo

Authors: Thomas B. Schön, Andreas Svensson, Lawrence Murray, Fredrik Lindsten

Abstract: Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, predictions and decisions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state-space models. There is no closed-form solution available for this problem, implying that we a… ▽ More Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, predictions and decisions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state-space models. There is no closed-form solution available for this problem, implying that we are forced to use approximations. In this tutorial we will provide a self-contained introduction to one of the state-of-the-art methods---the particle Metropolis--Hastings algorithm---which has proven to offer a practical approximation. This is a Monte Carlo based method, where the particle filter is used to guide a Markov chain Monte Carlo method through the parameter space. One of the key merits of the particle Metropolis--Hastings algorithm is that it is guaranteed to converge to the "true solution" under mild assumptions, despite being based on a particle filter with only a finite number of particles. We will also provide a motivating numerical example illustrating the method using a modeling language tailored for sequential Monte Carlo methods. The intention of modeling languages of this kind is to open up the power of sophisticated Monte Carlo methods---including particle Metropolis--Hastings---to a large group of users without requiring them to know all the underlying mathematical details. △ Less

Submitted 15 December, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

Comments: Thomas B. Schön, Andreas Svensson, Lawrence Murray and Fredrik Lindsten, 2018. Probabilistic learning of nonlinear dynamical systems using sequential Monte Carlo. In Mechanical Systems and Signal Processing, Volume 104, pp. 866-883

arXiv:1604.00169 [pdf, other]

A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization

Authors: Hildo Bijl, Thomas B. Schön, Jan-Willem van Wingerden, Michel Verhaegen

Abstract: Bayesian optimization through Gaussian process regression is an effective method of optimizing an unknown function for which every measurement is expensive. It approximates the objective function and then recommends a new measurement point to try out. This recommendation is usually selected by optimizing a given acquisition function. After a sufficient number of measurements, a recommendation abou… ▽ More Bayesian optimization through Gaussian process regression is an effective method of optimizing an unknown function for which every measurement is expensive. It approximates the objective function and then recommends a new measurement point to try out. This recommendation is usually selected by optimizing a given acquisition function. After a sufficient number of measurements, a recommendation about the maximum is made. However, a key realization is that the maximum of a Gaussian process is not a deterministic point, but a random variable with a distribution of its own. This distribution cannot be calculated analytically. Our main contribution is an algorithm, inspired by sequential Monte Carlo samplers, that approximates this maximum distribution. Subsequently, by taking samples from this distribution, we enable Thompson sampling to be applied to (armed-bandit) optimization problems with a continuous input space. All this is done without requiring the optimization of a nonlinear acquisition function. Experiments have shown that the resulting optimization method has a competitive performance at keeping the cumulative regret limited. △ Less

Submitted 16 May, 2017; v1 submitted 1 April, 2016; originally announced April 2016.

arXiv:1603.09157 [pdf, ps, other]

Linear System Identification via EM with Latent Disturbances and Lagrangian Relaxation

Authors: Jack Umenberger, Johan Wågberg, Ian R. Manchester, Thomas B. Schön

Abstract: In the application of the Expectation Maximization algorithm to identification of dynamical systems, internal states are typically chosen as latent variables, for simplicity. In this work, we propose a different choice of latent variables, namely, system disturbances. Such a formulation elegantly handles the problematic case of singular state space models, and is shown, under certain circumstances… ▽ More In the application of the Expectation Maximization algorithm to identification of dynamical systems, internal states are typically chosen as latent variables, for simplicity. In this work, we propose a different choice of latent variables, namely, system disturbances. Such a formulation elegantly handles the problematic case of singular state space models, and is shown, under certain circumstances, to improve the fidelity of bounds on the likelihood, leading to convergence in fewer iterations. To access these benefits we develop a Lagrangian relaxation of the nonconvex optimization problems that arise in the latent disturbances formulation, and proceed via semidefinite programming. △ Less

Submitted 30 March, 2016; originally announced March 2016.

Comments: 21 pages, 4 figures

arXiv:1603.06443 [pdf, other]

A Scalable and Distributed Solution to the Inertial Motion Capture Problem

Authors: Manon Kok, Sina Khoshfetrat Pakazad, Thomas B. Schön, Anders Hansson, Jeroen D. Hol

Abstract: In inertial motion capture, a multitude of body segments are equipped with inertial sensors, consisting of 3D accelerometers and 3D gyroscopes. Using an optimization-based approach to solve the motion capture problem allows for natural inclusion of biomechanical constraints and for modeling the connection of the body segments at the joint locations. The computational complexity of solving this pro… ▽ More In inertial motion capture, a multitude of body segments are equipped with inertial sensors, consisting of 3D accelerometers and 3D gyroscopes. Using an optimization-based approach to solve the motion capture problem allows for natural inclusion of biomechanical constraints and for modeling the connection of the body segments at the joint locations. The computational complexity of solving this problem grows both with the length of the data set and with the number of sensors and body segments considered. In this work, we present a scalable and distributed solution to this problem using tailored message passing, capable of exploiting the structure that is inherent in the problem. As a proof-of-concept we apply our algorithm to data from a lower body configuration. △ Less

Submitted 18 August, 2016; v1 submitted 21 March, 2016; originally announced March 2016.

Comments: 14 pages, 5 figures. In proceedings of the 19th International Conference on Information Fusion, pp. 1348-1355, Heidelberg, Germany, July 2016

arXiv:1603.05486 [pdf, other]

A flexible state space model for learning nonlinear dynamical systems

Authors: Andreas Svensson, Thomas B. Schön

Abstract: We consider a nonlinear state-space model with the state transition and observation functions expressed as basis function expansions. The coefficients in the basis function expansions are learned from data. Using a connection to Gaussian processes we also develop priors on the coefficients, for tuning the model flexibility and to prevent overfitting to data, akin to a Gaussian process state-space… ▽ More We consider a nonlinear state-space model with the state transition and observation functions expressed as basis function expansions. The coefficients in the basis function expansions are learned from data. Using a connection to Gaussian processes we also develop priors on the coefficients, for tuning the model flexibility and to prevent overfitting to data, akin to a Gaussian process state-space model. The priors can alternatively be seen as a regularization, and helps the model in generalizing the data without sacrificing the richness offered by the basis function expansion. To learn the coefficients and other unknown parameters efficiently, we tailor an algorithm using state-of-the-art sequential Monte Carlo methods, which comes with theoretical guarantees on the learning. Our approach indicates promising results when evaluated on a classical benchmark as well as real data. △ Less

Submitted 28 March, 2017; v1 submitted 17 March, 2016; originally announced March 2016.

Journal ref: Automatica 80(2017), page 189-199

arXiv:1602.02524 [pdf, ps, other]

doi 10.1016/j.automatica.2016.01.030

Mean and variance of the LQG cost function

Authors: Hildo Bijl, Jan Willem van Wingerden, Thomas B. Schön, Michel Verhaegen

Abstract: Linear Quadratic Gaussian (LQG) systems are well-understood and methods to minimize the expected cost are readily available. Less is known about the statistical properties of the resulting cost function. The contribution of this paper is a set of analytic expressions for the mean and variance of the LQG cost function. These expressions are derived using two different methods, one using solutions t… ▽ More Linear Quadratic Gaussian (LQG) systems are well-understood and methods to minimize the expected cost are readily available. Less is known about the statistical properties of the resulting cost function. The contribution of this paper is a set of analytic expressions for the mean and variance of the LQG cost function. These expressions are derived using two different methods, one using solutions to Lyapunov equations and the other using only matrix exponentials. Both the discounted and the non-discounted cost function are considered, as well as the finite-time and the infinite-time cost function. The derived expressions are successfully applied to an example system to reduce the probability of the cost exceeding a given threshold. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Journal ref: Automatica, Volume 67, May 2016, Pages 216-223

arXiv:1601.08068 [pdf, other]

System Identification through Online Sparse Gaussian Process Regression with Input Noise

Authors: Hildo Bijl, Thomas B. Schön, Jan-Willem van Wingerden, Michel Verhaegen

Abstract: There has been a growing interest in using non-parametric regression methods like Gaussian Process (GP) regression for system identification. GP regression does traditionally have three important downsides: (1) it is computationally intensive, (2) it cannot efficiently implement newly obtained measurements online, and (3) it cannot deal with stochastic (noisy) input points. In this paper we presen… ▽ More There has been a growing interest in using non-parametric regression methods like Gaussian Process (GP) regression for system identification. GP regression does traditionally have three important downsides: (1) it is computationally intensive, (2) it cannot efficiently implement newly obtained measurements online, and (3) it cannot deal with stochastic (noisy) input points. In this paper we present an algorithm tackling all these three issues simultaneously. The resulting Sparse Online Noisy Input GP (SONIG) regression algorithm can incorporate new noisy measurements in constant runtime. A comparison has shown that it is more accurate than similar existing regression algorithms. When applied to non-linear black-box system modeling, its performance is competitive with existing non-linear ARX models. △ Less

Submitted 15 August, 2017; v1 submitted 29 January, 2016; originally announced January 2016.

arXiv:1601.05257 [pdf, other]

doi 10.1109/JSEN.2016.2569160

Magnetometer calibration using inertial sensors

Authors: Manon Kok, Thomas B. Schön

Abstract: In this work we present a practical algorithm for calibrating a magnetometer for the presence of magnetic disturbances and for magnetometer sensor errors. To allow for combining the magnetometer measurements with inertial measurements for orientation estimation, the algorithm also corrects for misalignment between the magnetometer and the inertial sensor axes. The calibration algorithm is formulat… ▽ More In this work we present a practical algorithm for calibrating a magnetometer for the presence of magnetic disturbances and for magnetometer sensor errors. To allow for combining the magnetometer measurements with inertial measurements for orientation estimation, the algorithm also corrects for misalignment between the magnetometer and the inertial sensor axes. The calibration algorithm is formulated as the solution to a maximum likelihood problem and the computations are performed offline. The algorithm is shown to give good results using data from two different commercially available sensor units. Using the calibrated magnetometer measurements in combination with the inertial sensors to determine the sensor's orientation is shown to lead to significantly improved heading estimates. △ Less

Submitted 14 July, 2016; v1 submitted 20 January, 2016; originally announced January 2016.

Comments: 19 pages, 8 figures

Journal ref: IEEE Sensors Journal, Volume 16, Issue 14, Pages 5679--5689, 2016

arXiv:1510.00563 [pdf, other]

doi 10.1109/CAMSAP.2015.7383841

Nonlinear State Space Model Identification Using a Regularized Basis Function Expansion

Authors: Andreas Svensson, Thomas B. Schön, Arno Solin, Simo Särkkä

Abstract: This paper is concerned with black-box identification of nonlinear state space models. By using a basis function expansion within the state space model, we obtain a flexible structure. The model is identified using an expectation maximization approach, where the states and the parameters are updated iteratively in such a way that a maximum likelihood estimate is obtained. We use recent particle me… ▽ More This paper is concerned with black-box identification of nonlinear state space models. By using a basis function expansion within the state space model, we obtain a flexible structure. The model is identified using an expectation maximization approach, where the states and the parameters are updated iteratively in such a way that a maximum likelihood estimate is obtained. We use recent particle methods with sound theoretical properties to infer the states, whereas the model parameters can be updated using closed-form expressions by exploiting the fact that our model is linear in the parameters. Not to over-fit the flexible model to the data, we also propose a regularization scheme without increasing the computational burden. Importantly, this opens up for systematic use of regularization in nonlinear state space models. We conclude by evaluating our proposed approach on one simulation example and two real-data problems. △ Less

Submitted 2 October, 2015; originally announced October 2015.

Comments: Accepted to the 6th IEEE international workshop on computational advances in multi-sensor adaptive processing (CAMSAP), Cancun, Mexico, December 2015

arXiv:1502.03697 [pdf, other]

Nonlinear state space smoothing using the conditional particle filter

Authors: Andreas Svensson, Thomas B. Schön, Manon Kok

Abstract: To estimate the smoothing distribution in a nonlinear state space model, we apply the conditional particle filter with ancestor sampling. This gives an iterative algorithm in a Markov chain Monte Carlo fashion, with asymptotic convergence results. The computational complexity is analyzed, and our proposed algorithm is successfully applied to the challenging problem of sensor fusion between ultra-w… ▽ More To estimate the smoothing distribution in a nonlinear state space model, we apply the conditional particle filter with ancestor sampling. This gives an iterative algorithm in a Markov chain Monte Carlo fashion, with asymptotic convergence results. The computational complexity is analyzed, and our proposed algorithm is successfully applied to the challenging problem of sensor fusion between ultra-wideband and accelerometer/gyroscope measurements for indoor positioning. It appears to be a competitive alternative to existing nonlinear smoothing algorithms, in particular the forward filtering-backward simulation smoother. △ Less

Submitted 16 September, 2015; v1 submitted 12 February, 2015; originally announced February 2015.

Comments: Accepted for the 17th IFAC Symposium on System Identification (SYSID), Beijing, China, October 2015

Showing 1–50 of 56 results for author: Schoen, T