-
Bayes and Biased Estimators Without Hyper-parameter Estimation: Comparable Performance to the Empirical-Bayes-Based Regularized Estimator
Authors:
Yue Ju,
Bo Wahlberg,
Håkan Hjalmarsson
Abstract:
Regularized system identification has become a significant complement to more classical system identification. It has been numerically shown that kernel-based regularized estimators often perform better than the maximum likelihood estimator in terms of minimizing mean squared error (MSE). However, regularized estimators often require hyper-parameter estimation. This paper focuses on ridge regressi…
▽ More
Regularized system identification has become a significant complement to more classical system identification. It has been numerically shown that kernel-based regularized estimators often perform better than the maximum likelihood estimator in terms of minimizing mean squared error (MSE). However, regularized estimators often require hyper-parameter estimation. This paper focuses on ridge regression and the regularized estimator by employing the empirical Bayes hyper-parameter estimator. We utilize the excess MSE to quantify the MSE difference between the empirical-Bayes-based regularized estimator and the maximum likelihood estimator for large sample sizes. We then exploit the excess MSE expressions to develop both a family of generalized Bayes estimators and a family of closed-form biased estimators. They have the same excess MSE as the empirical-Bayes-based regularized estimator but eliminate the need for hyper-parameter estimation. Moreover, we conduct numerical simulations to show that the performance of these new estimators is comparable to the empirical-Bayes-based regularized estimator, while computationally, they are more efficient.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Learning the Step-size Policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm
Authors:
Lucas N. Egidio,
Anders Hansson,
Bo Wahlberg
Abstract:
We consider the problem of how to learn a step-size policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm. This is a limited computational memory quasi-Newton method widely used for deterministic unconstrained optimization but currently avoided in large-scale problems for requiring step sizes to be provided at each iteration. Existing methodologies for the step size sel…
▽ More
We consider the problem of how to learn a step-size policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm. This is a limited computational memory quasi-Newton method widely used for deterministic unconstrained optimization but currently avoided in large-scale problems for requiring step sizes to be provided at each iteration. Existing methodologies for the step size selection for L-BFGS use heuristic tuning of design parameters and massive re-evaluations of the objective function and gradient to find appropriate step-lengths. We propose a neural network architecture with local information of the current iterate as the input. The step-length policy is learned from data of similar optimization problems, avoids additional evaluations of the objective function, and guarantees that the output step remains inside a pre-defined interval. The corresponding training procedure is formulated as a stochastic optimization problem using the backpropagation through time algorithm. The performance of the proposed method is evaluated on the training of classifiers for the MNIST database for handwritten digits and for CIFAR-10. The results show that the proposed algorithm outperforms heuristically tuned optimizers such as ADAM, RMSprop, L-BFGS with a backtracking line search, and L-BFGS with a constant step size. The numerical results also show that a learned policy can be used as a warm-start to train new policies for different problems after a few additional training steps, highlighting its potential use in multiple large-scale optimization problems.
△ Less
Submitted 9 February, 2021; v1 submitted 3 October, 2020;
originally announced October 2020.
-
Task-similarity Aware Meta-learning through Nonparametric Kernel Regression
Authors:
Arun Venkitaraman,
Anders Hansson,
Bo Wahlberg
Abstract:
This paper investigates the use of nonparametric kernel-regression to obtain a tasksimilarity aware meta-learning algorithm. Our hypothesis is that the use of tasksimilarity helps meta-learning when the available tasks are limited and may contain outlier/ dissimilar tasks. While existing meta-learning approaches implicitly assume the tasks as being similar, it is generally unclear how this task-si…
▽ More
This paper investigates the use of nonparametric kernel-regression to obtain a tasksimilarity aware meta-learning algorithm. Our hypothesis is that the use of tasksimilarity helps meta-learning when the available tasks are limited and may contain outlier/ dissimilar tasks. While existing meta-learning approaches implicitly assume the tasks as being similar, it is generally unclear how this task-similarity could be quantified and used in the learning. As a result, most popular metalearning approaches do not actively use the similarity/dissimilarity between the tasks, but rely on availability of huge number of tasks for their working. Our contribution is a novel framework for meta-learning that explicitly uses task-similarity in the form of kernels and an associated meta-learning algorithm. We model the task-specific parameters to belong to a reproducing kernel Hilbert space where the kernel function captures the similarity across tasks. The proposed algorithm iteratively learns a meta-parameter which is used to assign a task-specific descriptor for every task. The task descriptors are then used to quantify the task-similarity through the kernel function. We show how our approach conceptually generalizes the popular meta-learning approaches of model-agnostic meta-learning (MAML) and Meta-stochastic gradient descent (Meta-SGD) approaches. Numerical experiments with regression tasks show that our algorithm outperforms these approaches when the number of tasks is limited, even in the presence of outlier or dissimilar tasks. This supports our hypothesis that task-similarity helps improve the metalearning performance in task-limited and adverse settings.
△ Less
Submitted 12 October, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
On Training and Evaluation of Neural Network Approaches for Model Predictive Control
Authors:
Rebecka Winqvist,
Arun Venkitaraman,
Bo Wahlberg
Abstract:
The contribution of this paper is a framework for training and evaluation of Model Predictive Control (MPC) implemented using constrained neural networks. Recent studies have proposed to use neural networks with differentiable convex optimization layers to implement model predictive controllers. The motivation is to replace real-time optimization in safety critical feedback control systems with le…
▽ More
The contribution of this paper is a framework for training and evaluation of Model Predictive Control (MPC) implemented using constrained neural networks. Recent studies have proposed to use neural networks with differentiable convex optimization layers to implement model predictive controllers. The motivation is to replace real-time optimization in safety critical feedback control systems with learnt mappings in the form of neural networks with optimization layers. Such mappings take as the input the state vector and predict the control law as the output. The learning takes place using training data generated from off-line MPC simulations. However, a general framework for characterization of learning approaches in terms of both model validation and efficient training data generation is lacking in literature. In this paper, we take the first steps towards developing such a coherent framework. We discuss how the learning problem has similarities with system identification, in particular input design, model structure selection and model validation. We consider the study of neural network architectures in PyTorch with the explicit MPC constraints implemented as a differentiable optimization layer using CVXPY. We propose an efficient approach of generating MPC input samples subject to the MPC model constraints using a hit-and-run sampler. The corresponding true outputs are generated by solving the MPC offline using OSOP. We propose different metrics to validate the resulting approaches. Our study further aims to explore the advantages of incorporating domain knowledge into the network structure from a training and evaluation perspective. Different model structures are numerically tested using the proposed framework in order to obtain more insights in the properties of constrained neural networks based MPC.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Learning sparse linear dynamic networks in a hyper-parameter free setting
Authors:
Arun Venkitaraman,
Håkan Hjalmarsson,
Bo Wahlberg
Abstract:
We address the issue of estimating the topology and dynamics of sparse linear dynamic networks in a hyperparameter-free setting. We propose a method to estimate the network dynamics in a computationally efficient and parameter tuning-free iterative framework known as SPICE (Sparse Iterative Covariance Estimation). The estimated dynamics directly reveal the underlying topology. Our approach does no…
▽ More
We address the issue of estimating the topology and dynamics of sparse linear dynamic networks in a hyperparameter-free setting. We propose a method to estimate the network dynamics in a computationally efficient and parameter tuning-free iterative framework known as SPICE (Sparse Iterative Covariance Estimation). The estimated dynamics directly reveal the underlying topology. Our approach does not assume that the network is undirected and is applicable even with varying noise levels across the modules of the network. We also do not assume any explicit prior knowledge on the network dynamics. Numerical experiments with realistic dynamic networks illustrate the usefulness of our method.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Recursive Prediction of Graph Signals with Incoming Nodes
Authors:
Arun Venkitaraman,
Saikat Chatterjee,
Bo Wahlberg
Abstract:
Kernel and linear regression have been recently explored in the prediction of graph signals as the output, given arbitrary input signals that are agnostic to the graph. In many real-world problems, the graph expands over time as new nodes get introduced. Keeping this premise in mind, we propose a method to recursively obtain the optimal prediction or regression coefficients for the recently propos…
▽ More
Kernel and linear regression have been recently explored in the prediction of graph signals as the output, given arbitrary input signals that are agnostic to the graph. In many real-world problems, the graph expands over time as new nodes get introduced. Keeping this premise in mind, we propose a method to recursively obtain the optimal prediction or regression coefficients for the recently propose Linear Regression over Graphs (LRG), as the graph expands with incoming nodes. This comes as a natural consequence of the structure C(W)= of the regression problem, and obviates the need to solve a new regression problem each time a new node is added. Experiments with real-world graph signals show that our approach results in good prediction performance which tends to be close to that obtained from knowing the entire graph apriori.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Evaluation of Spectral Learning for the Identification of Hidden Markov Models
Authors:
Robert Mattila,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative metho…
▽ More
Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative method employing a spectral subspace-like approach has recently been proposed in the machine learning literature. This paper evaluates the performance of this algorithm, and compares it to the performance of the expectation-maximization algorithm, on a number of numerical examples. We find that the performance is mixed; it successfully identifies some systems with relatively few available observations, but fails completely for some systems even when a large amount of observations is available. An open question is how this discrepancy can be explained. We provide some indications that it could be related to how well-conditioned some system parameters are.
△ Less
Submitted 22 July, 2015;
originally announced July 2015.
-
How to monitor and mitigate stair-casing in l1 trend filtering
Authors:
Cristian R. Rojas,
Bo Wahlberg
Abstract:
In this paper we study the estimation of changing trends in time-series using $\ell_1$ trend filtering. This method generalizes 1D Total Variation (TV) denoising for detection of step changes in means to detecting changes in trends, and it relies on a convex optimization problem for which there are very efficient numerical algorithms. It is known that TV denoising suffers from the so-called stair-…
▽ More
In this paper we study the estimation of changing trends in time-series using $\ell_1$ trend filtering. This method generalizes 1D Total Variation (TV) denoising for detection of step changes in means to detecting changes in trends, and it relies on a convex optimization problem for which there are very efficient numerical algorithms. It is known that TV denoising suffers from the so-called stair-case effect, which leads to detecting false change points. The objective of this paper is to show that $\ell_1$ trend filtering also suffers from a certain stair-case problem. The analysis is based on an interpretation of the dual variables of the optimization problem in the method as integrated random walk. We discuss consistency conditions for $\ell_1$ trend filtering, how to monitor their fulfillment, and how to modify the algorithm to avoid the stair-case false detection problem.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
Approximate Regularization Path for Nuclear Norm Based H2 Model Reduction
Authors:
Niclas Blomberg,
Cristian R. Rojas,
Bo Wahlberg
Abstract:
This paper concerns model reduction of dynamical systems using the nuclear norm of the Hankel matrix to make a trade-off between model fit and model complexity. This results in a convex optimization problem where this trade-off is determined by one crucial design parameter. The main contribution is a methodology to approximately calculate all solutions up to a certain tolerance to the model reduct…
▽ More
This paper concerns model reduction of dynamical systems using the nuclear norm of the Hankel matrix to make a trade-off between model fit and model complexity. This results in a convex optimization problem where this trade-off is determined by one crucial design parameter. The main contribution is a methodology to approximately calculate all solutions up to a certain tolerance to the model reduction problem as a function of the design parameter. This is called the regularization path in sparse estimation and is a very important tool in order to find the appropriate balance between fit and complexity. We extend this to the more complicated nuclear norm case. The key idea is to determine when to exactly calculate the optimal solution using an upper bound based on the so-called duality gap. Hence, by solving a fixed number of optimization problems the whole regularization path up to a given tolerance can be efficiently computed. We illustrate this approach on some numerical examples.
△ Less
Submitted 22 July, 2014;
originally announced July 2014.
-
On change point detection using the fused lasso method
Authors:
Cristian R. Rojas,
Bo Wahlberg
Abstract:
In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and…
▽ More
In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and has many important applications in most fields of science and engineering. We establish the (approximate) sparse consistency properties, including rate of convergence, of the so-called fused lasso signal approximator (FLSA). We show that this only holds if the sign of the corresponding consecutive changes are all different, and that this estimator is otherwise incapable of correctly detecting the underlying sparsity pattern. The key idea is to notice that the optimality conditions for this problem can be analyzed using techniques related to brownian bridge theory.
△ Less
Submitted 21 January, 2014;
originally announced January 2014.
-
An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems
Authors:
Bo Wahlberg,
Stephen Boyd,
Mariette Annergren,
Yang Wang
Abstract:
We present an alternating augmented Lagrangian method for convex optimization problems where the cost function is the sum of two terms, one that is separable in the variable blocks, and a second that is separable in the difference between consecutive variable blocks. Examples of such problems include Fused Lasso estimation, total variation denoising, and multi-period portfolio optimization with tr…
▽ More
We present an alternating augmented Lagrangian method for convex optimization problems where the cost function is the sum of two terms, one that is separable in the variable blocks, and a second that is separable in the difference between consecutive variable blocks. Examples of such problems include Fused Lasso estimation, total variation denoising, and multi-period portfolio optimization with transaction costs. In each iteration of our method, the first step involves separately optimizing over each variable block, which can be carried out in parallel. The second step is not separable in the variables, but can be carried out very efficiently. We apply the algorithm to segmentation of data based on changes inmean (l_1 mean filtering) or changes in variance (l_1 variance filtering). In a numerical example, we show that our implementation is around 10000 times faster compared with the generic optimization solver SDPT3.
△ Less
Submitted 8 March, 2012;
originally announced March 2012.
-
On l_1 Mean and Variance Filtering
Authors:
Bo Wahlberg,
Cristian R. Rojas,
Mariette Annergren
Abstract:
This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to est…
▽ More
This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to estimate the change times and the mean values within the segments. The second case is when the mean value is constant, but the variance can change. The assumption is that the variance is piecewise constant in time, and we want to estimate change times and the variance values within the segments. To find solutions to these problems, we will study an l_1 regularized maximum likelihood method, related to the fused lasso method and l_1 trend filtering, where the parameters to be estimated are free to vary at each sample. To penalize variations in the estimated parameters, the l_1-norm of the time difference of the parameters is used as a regularization term. This idea is closely related to total variation denoising. The main contribution is that a convex formulation of this variance estimation problem, where the parametrization is based on the inverse of the variance, can be formulated as a certain l_1 mean estimation problem. This implies that results and methods for mean estimation can be applied to the challenging problem of variance segmentation/estimation.
△ Less
Submitted 25 November, 2011;
originally announced November 2011.