-
A Relaxation Approach to Feature Selection for Linear Mixed Effects Models
Authors:
Aleksei Sholokhov,
James V. Burke,
Damian F. Santomauro,
Peng Zheng,
Aleksandr Aravkin
Abstract:
Linear Mixed-Effects (LME) models are a fundamental tool for modeling correlated data, including cohort studies, longitudinal data analysis, and meta-analysis. Design and analysis of variable selection methods for LMEs is more difficult than for linear regression because LME models are nonlinear. In this work we propose a relaxation strategy and optimization methods that enable a wide range of var…
▽ More
Linear Mixed-Effects (LME) models are a fundamental tool for modeling correlated data, including cohort studies, longitudinal data analysis, and meta-analysis. Design and analysis of variable selection methods for LMEs is more difficult than for linear regression because LME models are nonlinear. In this work we propose a relaxation strategy and optimization methods that enable a wide range of variable selection methods for LMEs using both convex and nonconvex regularizers, including $\ell_1$, Adaptive-$\ell_1$, CAD, and $\ell_0$. The computational framework only requires the proximal operator for each regularizer to be available, and the implementation is available in an open source python package pysr3, consistent with the sklearn standard. The numerical results on simulated data sets indicate that the proposed strategy improves on the state of the art for both accuracy and compute time. The variable selection techniques are also validated on a real example using a data set on bullying victimization.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Convergence of the Gradient Sampling Algorithm on Directionally Lipschitz Functions
Authors:
James V. Burke,
Qiuying Lin
Abstract:
The convergence theory for the gradient sampling algorithm is extended to directionally Lipschitz functions. Although directionally Lipschitz functions are not necessarily locally Lipschitz, they are almost everywhere differentiable and well approximated by gradients and so are a natural candidate for the application of the gradient sampling algorithm. The main obstacle to this extension is the po…
▽ More
The convergence theory for the gradient sampling algorithm is extended to directionally Lipschitz functions. Although directionally Lipschitz functions are not necessarily locally Lipschitz, they are almost everywhere differentiable and well approximated by gradients and so are a natural candidate for the application of the gradient sampling algorithm. The main obstacle to this extension is the potential unboundedness or emptiness of the Clarke subdifferential at points of interest. The convergence analysis we present provides one path to addressing these issues. In particular, we recover the usual convergence theory when the function is locally Lipschitz. Moreover, if the algorithm does not drive a certain measure of criticality to zero, then the iterates must converge to a point at which either the Clarke subdifferential is empty or the direction of steepest descent is degenerate in the sense that it does lie in the interior of the domain of the regular subderivative.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
IRLS for Sparse Recovery Revisited: Examples of Failure and a Remedy
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Daiwei He
Abstract:
Compressed sensing is a central topic in signal processing with myriad applications, where the goal is to recover a signal from as few observations as possible. Iterative re-weighting is one of the fundamental tools to achieve this goal. This paper re-examines the iteratively reweighted least squares (IRLS) algorithm for sparse recovery proposed by Daubechies, Devore, Fornasier, and Güntürk in \em…
▽ More
Compressed sensing is a central topic in signal processing with myriad applications, where the goal is to recover a signal from as few observations as possible. Iterative re-weighting is one of the fundamental tools to achieve this goal. This paper re-examines the iteratively reweighted least squares (IRLS) algorithm for sparse recovery proposed by Daubechies, Devore, Fornasier, and Güntürk in \emph{Iteratively reweighted least squares minimization for sparse recovery}, {\sf Communications on Pure and Applied Mathematics}, {\bf 63}(2010) 1--38. Under the null space property of order $K$, the authors show that their algorithm converges to the unique $k$-sparse solution for $k$ strictly bounded above by a value strictly less than $K$, and this $k$-sparse solution coincides with the unique $\ell_1$ solution. On the other hand, it is known that, for $k$ less than or equal to $K$, the $k$-sparse and $\ell_1$ solutions are unique and coincide. The authors emphasize that their proof method does not apply for $k$ sufficiently close to $K$, and remark that they were unsuccessful in finding an example where the algorithm fails for these values of $k$.
In this note we construct a family of examples where the Daubechies-Devore-Fornasier-Güntürk IRLS algorithm fails for $k=K$, and provide a modification to their algorithm that provably converges to the unique $k$-sparse solution for $k$ less than or equal to $K$ while preserving the local linear rate. The paper includes numerical studies of this family as well as the modified IRLS algorithm, testing their robustness under perturbations and to parameter selection.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
A study of convex convex-composite functions via infimal convolution with applications
Authors:
James V. Burke,
Tim Hoheisel,
Quang V. Nguyen
Abstract:
In this note we provide a full conjugacy and subdifferential calculus for convex convex-composite functions in finite-dimensional space. Our approach, based on infimal convolution and cone-convexity, is straightforward and yields the desired results under a verifiable Slater-type condition, with relaxed monotonicity and without lower semicontinuity assumptions on the functions in play. The versati…
▽ More
In this note we provide a full conjugacy and subdifferential calculus for convex convex-composite functions in finite-dimensional space. Our approach, based on infimal convolution and cone-convexity, is straightforward and yields the desired results under a verifiable Slater-type condition, with relaxed monotonicity and without lower semicontinuity assumptions on the functions in play. The versatility of our findings is illustrated by a series of applications in optimization and matrix analysis, including conic programming, matrix-fractional, variational Gram, and spectral functions.
△ Less
Submitted 21 August, 2019; v1 submitted 18 July, 2019;
originally announced July 2019.
-
Robust Singular Smoothers For Tracking Using Low-Fidelity Data
Authors:
Jonathan Jonker,
Aleksandr Aravkin,
James V. Burke,
Gianluigi Pillonetto,
Sarah Webster
Abstract:
Tracking underwater autonomous platforms is often difficult because of noisy, biased, and discretized input data. Classic filters and smoothers based on standard assumptions of Gaussian white noise break down when presented with any of these challenges. Robust models (such as the Huber loss) and constraints (e.g. maximum velocity) are used to attenuate these issues. Here, we consider robust smooth…
▽ More
Tracking underwater autonomous platforms is often difficult because of noisy, biased, and discretized input data. Classic filters and smoothers based on standard assumptions of Gaussian white noise break down when presented with any of these challenges. Robust models (such as the Huber loss) and constraints (e.g. maximum velocity) are used to attenuate these issues. Here, we consider robust smoothing with singular covariance, which covers bias and correlated noise, as well as many specific model types, such as those used in navigation. In particular, we show how to combine singular covariance models with robust losses and state-space constraints in a unified framework that can handle very low-fidelity data. A noisy, biased, and discretized navigation dataset from a submerged, low-cost inertial measurement unit (IMU) package, with ultra short baseline (USBL) data for ground truth, provides an opportunity to stress-test the proposed framework with promising results. We show how robust modeling elements improve our ability to analyze the data, and present batch processing results for 10 minutes of data with three different frequencies of available USBL position fixes (gaps of 30 seconds, 1 minute, and 2 minutes). The results suggest that the framework can be extended to real-time tracking using robust windowed estimation.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Variational Properties of Matrix Functions via the Generalized Matrix-Fractional Function
Authors:
James V. Burke,
Yuan Gao,
Tim Hoheisel
Abstract:
We show that many important convex matrix functions can be represented as the partial infimal projection of the generalized matrix fractional (GMF) and a relatively simple convex function. This representation provides conditions under which such functions are closed and proper as well as formulas for the ready computation of both their conjugates and subdifferentials. Special attention is given to…
▽ More
We show that many important convex matrix functions can be represented as the partial infimal projection of the generalized matrix fractional (GMF) and a relatively simple convex function. This representation provides conditions under which such functions are closed and proper as well as formulas for the ready computation of both their conjugates and subdifferentials. Special attention is given to support and indicator functions. Particular instances yield all weighted Ky Fan norms and squared gauges on $\mathbb R^{n\times m}$, and as an example we show that all variational Gram functions are representable as squares of gauges. Other instances yield weighted sums of the Frobenius and nuclear norms. The scope of applications is large and the range of variational properties and insight is fascinating and fundamental. An important byproduct of these representations is that they lay the foundation for a smoothing approach to many matrix functions on the interior of the domain of the GMF function, which opens the door to a range of unexplored optimization methods.
△ Less
Submitted 9 May, 2019; v1 submitted 3 July, 2018;
originally announced July 2018.
-
Line Search and Trust-Region Methods for Convex-Composite Optimization
Authors:
James V. Burke,
Abraham Engle
Abstract:
We consider descent methods for solving non-finite valued nonsmooth convex-composite optimization problems that employ Gauss-Newton subproblems to determine the iteration update. Specifically, we establish the global convergence properties for descent methods that use a backtracking line search, a weak Wolfe line search, or a trust-region update. All of these approaches are designed to exploit the…
▽ More
We consider descent methods for solving non-finite valued nonsmooth convex-composite optimization problems that employ Gauss-Newton subproblems to determine the iteration update. Specifically, we establish the global convergence properties for descent methods that use a backtracking line search, a weak Wolfe line search, or a trust-region update. All of these approaches are designed to exploit the structure associated with convex-composite problems.
△ Less
Submitted 10 September, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Strong Metric (Sub)regularity of KKT Mappings for Piecewise Linear-Quadratic Convex-Composite Optimization
Authors:
James V. Burke,
Abraham Engle
Abstract:
This work concerns the local convergence theory of Newton and quasi-Newton methods for convex-composite optimization: minimize f(x):=h(c(x)), where h is an infinite-valued proper convex function and c is C^2-smooth. We focus on the case where h is infinite-valued piecewise linear-quadratic and convex. Such problems include nonlinear programming, mini-max optimization, estimation of nonlinear dynam…
▽ More
This work concerns the local convergence theory of Newton and quasi-Newton methods for convex-composite optimization: minimize f(x):=h(c(x)), where h is an infinite-valued proper convex function and c is C^2-smooth. We focus on the case where h is infinite-valued piecewise linear-quadratic and convex. Such problems include nonlinear programming, mini-max optimization, estimation of nonlinear dynamics with non-Gaussian noise as well as many modern approaches to large-scale data analysis and machine learning. Our approach embeds the optimality conditions for convex-composite optimization problems into a generalized equation. We establish conditions for strong metric subregularity and strong metric regularity of the corresponding set-valued mappings. This allows us to extend classical convergence of Newton and quasi-Newton methods to the broader class of non-finite valued piecewise linear-quadratic convex-composite optimization problems. In particular we establish local quadratic convergence of the Newton method under conditions that parallel those in nonlinear programming when h is non-finite valued piecewise linear.
△ Less
Submitted 16 June, 2018; v1 submitted 2 May, 2018;
originally announced May 2018.
-
Gradient Sampling Methods for Nonsmooth Optimization
Authors:
James V. Burke,
Frank E. Curtis,
Adrian S. Lewis,
Michael L. Overton,
Lucas E. A. Simões
Abstract:
This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide o…
▽ More
This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide overviews of various enhancements that have been proposed to improve practical performance, as well as of several extensions that have been made in the literature, such as to solve constrained problems. The paper also includes clarification of certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions.
△ Less
Submitted 29 April, 2018;
originally announced April 2018.
-
Inexact Sequential Quadratic Optimization with Penalty Parameter Updates Within the QP Solve: Extended Version
Authors:
James V. Burke,
Frank E. Curtis,
Hao Wang,
Jiashan Wang
Abstract:
This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact…
▽ More
This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact solve of a single QP subproblem to establish the convergence of the overall SQP method. It is known that SQP methods can be plagued by poor behavior of the global convergence mechanism. To confront this issue, we propose the use of an exact penalty function with a dynamic penalty parameter updating strategy to be employed within the subproblem solver in such a way that the resulting search direction predicts progress toward both feasibility and optimality. We present our parameter updating strategy and prove that, under reasonable assumptions, the strategy does not modify the penalty parameter unnecessarily. We also discuss a matrix-free subproblem solver in which our updating strategy can be incorporated. We close the paper with a discussion of the results of numerical experiments that illustrate the benefits of our proposed techniques.
△ Less
Submitted 26 February, 2020; v1 submitted 25 March, 2018;
originally announced March 2018.
-
Fast Robust Methods for Singular State-Space Models
Authors:
Jonathan Jonker,
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto,
Sarah Webster
Abstract:
State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information.
Here we develo…
▽ More
State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information.
Here we develop methods on state-space models where either innovations or error covariances may be singular. These models frequently arise in navigation (e.g. for `colored noise' models or deterministic integrals) and are ubiquitous in auto-correlated time series models such as ARMA. We reformulate all state-space models (singular as well as nonsinguar) as constrained convex optimization problems, and develop an efficient algorithm for this reformulation. The convergence rate is {\it locally linear}, with constants that do not depend on the conditioning of the problem.
Numerical comparisons show that the new approach outperforms competing approaches for {\it nonsingular} models, including state of the art interior point (IP) methods. IP methods converge at superlinear rates; we expect them to dominate. However, the steep rate of the proposed approach (independent of problem conditioning) combined with cheap iterations wins against IP in a run-time comparison. We therefore suggest that the proposed approach be the {\it default choice} for estimating state space models outside of the Gaussian context, regardless of whether the error covariances are singular or not.
△ Less
Submitted 28 June, 2018; v1 submitted 7 March, 2018;
originally announced March 2018.
-
Convex Geometry of the Generalized Matrix-Fractional Function
Authors:
James V. Burke,
Yuan Gao,
Tim Hoheisel
Abstract:
Generalized matrix-fractional (GMF) functions are a class of matrix support functions introduced by Burke and Hoheisel as a tool for unifying a range of seemingly divergent matrix optimization problems associated with inverse problems, regularization and learning. In this paper we dramatically simplify the support function representation for GMF functions as well as the representation of their sub…
▽ More
Generalized matrix-fractional (GMF) functions are a class of matrix support functions introduced by Burke and Hoheisel as a tool for unifying a range of seemingly divergent matrix optimization problems associated with inverse problems, regularization and learning. In this paper we dramatically simplify the support function representation for GMF functions as well as the representation of their subdifferentials. These new representations allow the ready computation of a range of important related geometric objects whose formulations were previously unavailable.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
Foundations of gauge and perspective duality
Authors:
Alexandre Y. Aravkin,
James V. Burke,
Dmitriy Drusvyatskiy,
Michael P. Friedlander,
Kellie MacPhee
Abstract:
We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allo…
▽ More
We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
△ Less
Submitted 18 June, 2018; v1 submitted 28 February, 2017;
originally announced February 2017.
-
Generalized Kalman Smoothing: Modeling and Algorithms
Authors:
A. Y. Aravkin,
J. V. Burke,
L. Ljung,
A. Lozano,
G. Pillonetto
Abstract:
State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model.
T…
▽ More
State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model.
These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities.
In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples.
△ Less
Submitted 25 September, 2016; v1 submitted 20 September, 2016;
originally announced September 2016.
-
Level-set methods for convex optimization
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Dmitriy Drusvyatskiy,
Michael P. Friedlander,
Scott Roy
Abstract:
Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based o…
▽ More
Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference.
△ Less
Submitted 3 February, 2016;
originally announced February 2016.
-
Variational Analysis of Convexly Generated Spectral Max Functions
Authors:
James V. Burke,
Julia Eaton
Abstract:
The spectral abscissa is the largest real part of an eigenvalue of a matrix and the spectral radius is the largest modulus. Both are examples of spectral max functions---the maximum of a real-valued function over the spectrum of a matrix. These mappings arise in the control and stabilization of dynamical systems. In 2001, Burke and Overton characterized the regular subdifferential of the spectral…
▽ More
The spectral abscissa is the largest real part of an eigenvalue of a matrix and the spectral radius is the largest modulus. Both are examples of spectral max functions---the maximum of a real-valued function over the spectrum of a matrix. These mappings arise in the control and stabilization of dynamical systems. In 2001, Burke and Overton characterized the regular subdifferential of the spectral abscissa and showed that the spectral abscissa is subdifferentially regular in the sense of Clarke when all active eigenvalues are nonderogatory. In this paper we develop new techniques to obtain these results for the more general class of convexly generated spectral max functions. In particular, we extend the Burke-Overton subdifferential regularity result to this class. These techniques allow us to obtain new variational results for the spectral radius.
△ Less
Submitted 7 November, 2016; v1 submitted 11 November, 2015;
originally announced November 2015.
-
Matrix-Free Solvers for Exact Penalty Subproblems
Authors:
James V. Burke,
Frank E. Curtis,
Hao Wang,
Jiashan Wang
Abstract:
We present two matrix-free methods for approximately solving exact penalty subproblems that arise when solving large-scale optimization problems. The first approach is a novel iterative re-weighting algorithm (IRWA), which iteratively minimizes quadratic models of relaxed subproblems while automatically updating a relaxation vector. The second approach is based on alternating direction augmented L…
▽ More
We present two matrix-free methods for approximately solving exact penalty subproblems that arise when solving large-scale optimization problems. The first approach is a novel iterative re-weighting algorithm (IRWA), which iteratively minimizes quadratic models of relaxed subproblems while automatically updating a relaxation vector. The second approach is based on alternating direction augmented Lagrangian (ADAL) technology applied to our setting. The main computational costs of each algorithm are the repeated minimizations of convex quadratic functions which can be performed matrix-free. We prove that both algorithms are globally convergent under loose assumptions, and that each requires at most $O(1/\varepsilon^2)$ iterations to reach $\varepsilon$-optimality of the objective function.
Numerical experiments exhibit the ability of both algorithms to efficiently find inexact solutions. Moreover, in certain cases, IRWA is shown to be more reliable than ADAL.
△ Less
Submitted 9 February, 2014;
originally announced February 2014.
-
Generalized system identification with stable spline kernels
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stab…
▽ More
Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stable spline estimators, where regularization functionals and data misfits can be selected from a rich set of piecewise linear-quadratic (PLQ) penalties. This class includes the 1-norm, Huber, and Vapnik, in addition to the least-squares penalty.
By representing penalties through their conjugates, the modeler can specify any piecewise linear-quadratic penalty for misfit and regularizer, as well as inequality constraints on the response. The interior-point solver we implement (IPsolve) is locally quadratically convergent, with $O(\min(m,n)^2(m+n))$ arithmetic operations per iteration, where $n$ the number of unknown impulse response coefficients and $m$ the number of observed output measurements. IPsolve is competitive with available alternatives for system identification. This is shown by a comparison with TFOCS, libSVM, and the FISTA algorithm. The code is open source (https://github.com/saravkin/IPsolve).
The impact of the approach for system identification is illustrated with numerical experiments featuring robust formulations for contaminated data, relaxation systems, nonnegativity and unimodality constraints on the impulse response, and sparsity promoting regularization. Incorporating constraints yields particularly significant improvements.
△ Less
Submitted 25 July, 2018; v1 submitted 30 September, 2013;
originally announced September 2013.
-
Robust and Trend Following Student's t Kalman Smoothers
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models.
Robust smooth…
▽ More
We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models.
Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers.
A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state.
These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features.
△ Less
Submitted 22 March, 2013;
originally announced March 2013.
-
Kalman smoothing and block tridiagonal systems: new connections and numerical stability results
Authors:
Aleksandr Y. Aravkin,
Bradley B. Bell,
James V. Burke,
Gianluigi Pillonetto
Abstract:
The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimiz…
▽ More
The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimization and algebraic perspective, revealing new insights on their numerical stability properties. In doing this, we re-interpret classic recursions as matrix decomposition methods for block tridiagonal matrices.
First, we show that the classic RTS smoother is an implementation of the forward block tridiagonal (FBT) algorithm (also known as Thomas algorithm) for particular block tridiagonal systems. We study the numerical stability properties of this scheme, connecting the condition number of the full system to properties of the individual blocks encountered during standard recursion. Second, we study the M smoother, and prove it is equivalent to a backward block tridiagonal (BBT) algorithm with a stronger stability guarantee than RTS. Third, we illustrate how the MF smoother solves a block tridiagonal system, and prove that it has the same numerical stability properties of RTS (but not those of M). Finally, we present a new hybrid RTS/M (FBT/BBT) smoothing scheme, which is faster than MF, and has the same numerical stability guarantees of RTS and MF.
△ Less
Submitted 24 July, 2013; v1 submitted 21 March, 2013;
originally announced March 2013.
-
Linear system identification using stable spline kernels and PLQ penalties
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized l…
▽ More
The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized least squares framework. In particular, the penalty term on the impulse response is defined by so called stable spline kernels. They embed information on regularity and BIBO stability, and depend on a small number of parameters which can be estimated from data. In this paper, we provide new nonsmooth formulations of the stable spline estimator. In particular, we consider linear system identification problems in a very broad context, where regularization functionals and data misfits can come from a rich set of piecewise linear quadratic functions. Moreover, our anal- ysis includes polyhedral inequality constraints on the unknown impulse response. For any formulation in this class, we show that interior point methods can be used to solve the system identification problem, with complexity O(n3)+O(mn2) in each iteration, where n and m are the number of impulse response coefficients and measurements, respectively. The usefulness of the framework is illustrated via a numerical experiment where output measurements are contaminated by outliers.
△ Less
Submitted 12 March, 2013;
originally announced March 2013.
-
Optimization viewpoint on Kalman smoothing, with applications to robust and sparse estimation
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this…
▽ More
In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this problem. Once this equivalence is established, we present extensions of Kalman smoothing to systems with nonlinear process and measurement models, systems with linear and nonlinear inequality constraints, systems with outliers in the measurements or sudden changes in the state, and systems where the sparsity of the state sequence must be accounted for. All extensions preserve the computational efficiency of the classic algorithms, and most of the extensions are illustrated with numerical examples, which are part of an open source Kalman smoothing Matlab/Octave package.
△ Less
Submitted 11 March, 2013; v1 submitted 8 March, 2013;
originally announced March 2013.
-
Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Alessandro Chiuso,
Gianluigi Pillonetto
Abstract:
The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational…
▽ More
The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches.
△ Less
Submitted 26 February, 2013; v1 submitted 26 February, 2013;
originally announced February 2013.
-
The connection between Bayesian estimation of a Gaussian random field and RKHS
Authors:
Aleksandr Y. Aravkin,
Bradley M. Bell,
James V. Burke,
Gianluigi Pillonetto
Abstract:
Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate re…
▽ More
Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate represents the posterior mean (minimum variance estimate) of a Gaussian random field with covariance proportional to the kernel associated with the RKHS. In this paper, we provide a statistical interpretation when more general losses are used, such as absolute value, Vapnik or Huber. Specifically, for any finite set of sampling locations (including where the data were collected), the MAP estimate for the signal samples is given by the RKHS estimate evaluated at these locations.
△ Less
Submitted 17 July, 2013; v1 submitted 22 January, 2013;
originally announced January 2013.
-
Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a…
▽ More
We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.
△ Less
Submitted 2 May, 2013; v1 submitted 19 January, 2013;
originally announced January 2013.
-
Smoothing Dynamic Systems with State-Dependent Covariance Matrices
Authors:
Aleksandr Y. Aravkin,
James V. Burke
Abstract:
Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally o…
▽ More
Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally on unknown parameters. In the Kalman framework, this allows modeling situations where covariance matrices may depend functionally on the state sequence being estimated. We present an extended formulation and generalized Gauss-Newton (GGN) algorithm for inference in this context. When applied to dynamic systems inference, we show the algorithm can be implemented to preserve the computational efficiency of the classic Kalman smoother. The new approach is illustrated with a synthetic numerical example.
△ Less
Submitted 20 March, 2014; v1 submitted 19 November, 2012;
originally announced November 2012.
-
Variational properties of value functions
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Michael P. Friedlander
Abstract:
Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. Th…
▽ More
Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.
△ Less
Submitted 23 May, 2013; v1 submitted 15 November, 2012;
originally announced November 2012.
-
Epi-convergent Smoothing with Applications to Convex Composite Functions
Authors:
James V. Burke,
Tim Hoheisel
Abstract:
Smoothing methods have become part of the standard tool set for the study and solution of nondifferentiable and constrained optimization problems as well as a range of other variational and equilibrium problems. In this note we synthesize and extend recent results due to Beck and Teboulle on infimal convolution smoothing for convex functions with those of X. Chen on gradient consistency for noncon…
▽ More
Smoothing methods have become part of the standard tool set for the study and solution of nondifferentiable and constrained optimization problems as well as a range of other variational and equilibrium problems. In this note we synthesize and extend recent results due to Beck and Teboulle on infimal convolution smoothing for convex functions with those of X. Chen on gradient consistency for nonconvex functions. We use epi-convergence techniques to define a notion of epi-smoothing that allows us to tap into the rich variational structure of the subdifferential calculus for nonsmooth, nonconvex, and nonfinite-valued functions. As an illustration of the versatility and range of epi-smoothing techniques, the results are applied to the general constrained optimization for which nonlinear programming is a special case.
△ Less
Submitted 31 August, 2012;
originally announced August 2012.
-
A statistical and computational theory for robust and sparse Kalman smoothing
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
Kalman smoothers reconstruct the state of a dynamical system starting from noisy output samples. While the classical estimator relies on quadratic penalization of process deviations and measurement errors, extensions that exploit Piecewise Linear Quadratic (PLQ) penalties have been recently proposed in the literature. These new formulations include smoothers robust with respect to outliers in the…
▽ More
Kalman smoothers reconstruct the state of a dynamical system starting from noisy output samples. While the classical estimator relies on quadratic penalization of process deviations and measurement errors, extensions that exploit Piecewise Linear Quadratic (PLQ) penalties have been recently proposed in the literature. These new formulations include smoothers robust with respect to outliers in the data, and smoothers that keep better track of fast system dynamics, e.g. jumps in the state values. In addition to L2, well known examples of PLQ penalties include the L1, Huber and Vapnik losses. In this paper, we use a dual representation for PLQ penalties to build a statistical modeling framework and a computational theory for Kalman smoothing.
We develop a statistical framework by establishing conditions required to interpret PLQ penalties as negative logs of true probability densities. Then, we present a computational framework, based on interior-point methods, that solves the Kalman smoothing problem with PLQ penalties and maintains the linear complexity in the size of the time series, just as in the L2 case. The framework presented extends the computational efficiency of the Mayne-Fraser and Rauch-Tung-Striebel algorithms to a much broader non-smooth setting, and includes many known robust and sparse smoothers as special cases.
△ Less
Submitted 11 November, 2011;
originally announced November 2011.
-
Robust and Trend-following Kalman Smoothers using Student's t
Authors:
Aleksandr Y. Aravkin,
James V. Burke,
Gianluigi Pillonetto
Abstract:
We propose two nonlinear Kalman smoothers that rely on Student's t distributions. The T-Robust smoother finds the maximum a posteriori likelihood (MAP) solution for Gaussian process noise and Student's t observation noise, and is extremely robust against outliers, outperforming the recently proposed l1-Laplace smoother in extreme situations (e.g. 50% or more outliers). The second estimator, which…
▽ More
We propose two nonlinear Kalman smoothers that rely on Student's t distributions. The T-Robust smoother finds the maximum a posteriori likelihood (MAP) solution for Gaussian process noise and Student's t observation noise, and is extremely robust against outliers, outperforming the recently proposed l1-Laplace smoother in extreme situations (e.g. 50% or more outliers). The second estimator, which we call the T-Trend smoother, is able to follow sudden changes in the process model, and is derived as a MAP solver for a model with Student's t-process noise and Gaussian observation noise. We design specialized methods to solve both problems which exploit the special structure of the Student's t-distribution, and provide a convergence theory. Both smoothers can be implemented with only minor modifications to an existing L2 smoother implementation. Numerical results for linear and nonlinear models illustrating both robust and fast tracking applications are presented.
△ Less
Submitted 11 November, 2011; v1 submitted 21 January, 2010;
originally announced January 2010.