Search | arXiv e-print repository

A Relaxation Approach to Feature Selection for Linear Mixed Effects Models

Authors: Aleksei Sholokhov, James V. Burke, Damian F. Santomauro, Peng Zheng, Aleksandr Aravkin

Abstract: Linear Mixed-Effects (LME) models are a fundamental tool for modeling correlated data, including cohort studies, longitudinal data analysis, and meta-analysis. Design and analysis of variable selection methods for LMEs is more difficult than for linear regression because LME models are nonlinear. In this work we propose a relaxation strategy and optimization methods that enable a wide range of var… ▽ More Linear Mixed-Effects (LME) models are a fundamental tool for modeling correlated data, including cohort studies, longitudinal data analysis, and meta-analysis. Design and analysis of variable selection methods for LMEs is more difficult than for linear regression because LME models are nonlinear. In this work we propose a relaxation strategy and optimization methods that enable a wide range of variable selection methods for LMEs using both convex and nonconvex regularizers, including $\ell_1$, Adaptive-$\ell_1$, CAD, and $\ell_0$. The computational framework only requires the proximal operator for each regularizer to be available, and the implementation is available in an open source python package pysr3, consistent with the sklearn standard. The numerical results on simulated data sets indicate that the proposed strategy improves on the state of the art for both accuracy and compute time. The variable selection techniques are also validated on a real example using a data set on bullying victimization. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: 29 pages, 6 figures

MSC Class: 62F35; 65K10; 49M15

arXiv:2107.04918 [pdf, ps, other]

Convergence of the Gradient Sampling Algorithm on Directionally Lipschitz Functions

Authors: James V. Burke, Qiuying Lin

Abstract: The convergence theory for the gradient sampling algorithm is extended to directionally Lipschitz functions. Although directionally Lipschitz functions are not necessarily locally Lipschitz, they are almost everywhere differentiable and well approximated by gradients and so are a natural candidate for the application of the gradient sampling algorithm. The main obstacle to this extension is the po… ▽ More The convergence theory for the gradient sampling algorithm is extended to directionally Lipschitz functions. Although directionally Lipschitz functions are not necessarily locally Lipschitz, they are almost everywhere differentiable and well approximated by gradients and so are a natural candidate for the application of the gradient sampling algorithm. The main obstacle to this extension is the potential unboundedness or emptiness of the Clarke subdifferential at points of interest. The convergence analysis we present provides one path to addressing these issues. In particular, we recover the usual convergence theory when the function is locally Lipschitz. Moreover, if the algorithm does not drive a certain measure of criticality to zero, then the iterates must converge to a point at which either the Clarke subdifferential is empty or the direction of steepest descent is degenerate in the sense that it does lie in the interior of the domain of the regular subderivative. △ Less

Submitted 10 July, 2021; originally announced July 2021.

MSC Class: 49J22; 65K05; 65K10; 90C26

arXiv:1910.07095 [pdf, other]

IRLS for Sparse Recovery Revisited: Examples of Failure and a Remedy

Authors: Aleksandr Y. Aravkin, James V. Burke, Daiwei He

Abstract: Compressed sensing is a central topic in signal processing with myriad applications, where the goal is to recover a signal from as few observations as possible. Iterative re-weighting is one of the fundamental tools to achieve this goal. This paper re-examines the iteratively reweighted least squares (IRLS) algorithm for sparse recovery proposed by Daubechies, Devore, Fornasier, and Güntürk in \em… ▽ More Compressed sensing is a central topic in signal processing with myriad applications, where the goal is to recover a signal from as few observations as possible. Iterative re-weighting is one of the fundamental tools to achieve this goal. This paper re-examines the iteratively reweighted least squares (IRLS) algorithm for sparse recovery proposed by Daubechies, Devore, Fornasier, and Güntürk in \emph{Iteratively reweighted least squares minimization for sparse recovery}, {\sf Communications on Pure and Applied Mathematics}, {\bf 63}(2010) 1--38. Under the null space property of order $K$, the authors show that their algorithm converges to the unique $k$-sparse solution for $k$ strictly bounded above by a value strictly less than $K$, and this $k$-sparse solution coincides with the unique $\ell_1$ solution. On the other hand, it is known that, for $k$ less than or equal to $K$, the $k$-sparse and $\ell_1$ solutions are unique and coincide. The authors emphasize that their proof method does not apply for $k$ sufficiently close to $K$, and remark that they were unsuccessful in finding an example where the algorithm fails for these values of $k$. In this note we construct a family of examples where the Daubechies-Devore-Fornasier-Güntürk IRLS algorithm fails for $k=K$, and provide a modification to their algorithm that provably converges to the unique $k$-sparse solution for $k$ less than or equal to $K$ while preserving the local linear rate. The paper includes numerical studies of this family as well as the modified IRLS algorithm, testing their robustness under perturbations and to parameter selection. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: 10 pages, 5 figures

MSC Class: 80M50; 60G35; 65C60

arXiv:1907.08318 [pdf, ps, other]

A study of convex convex-composite functions via infimal convolution with applications

Authors: James V. Burke, Tim Hoheisel, Quang V. Nguyen

Abstract: In this note we provide a full conjugacy and subdifferential calculus for convex convex-composite functions in finite-dimensional space. Our approach, based on infimal convolution and cone-convexity, is straightforward and yields the desired results under a verifiable Slater-type condition, with relaxed monotonicity and without lower semicontinuity assumptions on the functions in play. The versati… ▽ More In this note we provide a full conjugacy and subdifferential calculus for convex convex-composite functions in finite-dimensional space. Our approach, based on infimal convolution and cone-convexity, is straightforward and yields the desired results under a verifiable Slater-type condition, with relaxed monotonicity and without lower semicontinuity assumptions on the functions in play. The versatility of our findings is illustrated by a series of applications in optimization and matrix analysis, including conic programming, matrix-fractional, variational Gram, and spectral functions. △ Less

Submitted 21 August, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

Comments: 30 pages

MSC Class: 52A41; 65K10; 90C25; 90C46

arXiv:1905.09373 [pdf, other]

Robust Singular Smoothers For Tracking Using Low-Fidelity Data

Authors: Jonathan Jonker, Aleksandr Aravkin, James V. Burke, Gianluigi Pillonetto, Sarah Webster

Abstract: Tracking underwater autonomous platforms is often difficult because of noisy, biased, and discretized input data. Classic filters and smoothers based on standard assumptions of Gaussian white noise break down when presented with any of these challenges. Robust models (such as the Huber loss) and constraints (e.g. maximum velocity) are used to attenuate these issues. Here, we consider robust smooth… ▽ More Tracking underwater autonomous platforms is often difficult because of noisy, biased, and discretized input data. Classic filters and smoothers based on standard assumptions of Gaussian white noise break down when presented with any of these challenges. Robust models (such as the Huber loss) and constraints (e.g. maximum velocity) are used to attenuate these issues. Here, we consider robust smoothing with singular covariance, which covers bias and correlated noise, as well as many specific model types, such as those used in navigation. In particular, we show how to combine singular covariance models with robust losses and state-space constraints in a unified framework that can handle very low-fidelity data. A noisy, biased, and discretized navigation dataset from a submerged, low-cost inertial measurement unit (IMU) package, with ultra short baseline (USBL) data for ground truth, provides an opportunity to stress-test the proposed framework with promising results. We show how robust modeling elements improve our ability to analyze the data, and present batch processing results for 10 minutes of data with three different frequencies of available USBL position fixes (gaps of 30 seconds, 1 minute, and 2 minutes). The results suggest that the framework can be extended to real-time tracking using robust windowed estimation. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Comments: 9 pages, 9 figures, to be included in Robotics: Science and Systems 2019

arXiv:1807.01187 [pdf, ps, other]

Variational Properties of Matrix Functions via the Generalized Matrix-Fractional Function

Authors: James V. Burke, Yuan Gao, Tim Hoheisel

Abstract: We show that many important convex matrix functions can be represented as the partial infimal projection of the generalized matrix fractional (GMF) and a relatively simple convex function. This representation provides conditions under which such functions are closed and proper as well as formulas for the ready computation of both their conjugates and subdifferentials. Special attention is given to… ▽ More We show that many important convex matrix functions can be represented as the partial infimal projection of the generalized matrix fractional (GMF) and a relatively simple convex function. This representation provides conditions under which such functions are closed and proper as well as formulas for the ready computation of both their conjugates and subdifferentials. Special attention is given to support and indicator functions. Particular instances yield all weighted Ky Fan norms and squared gauges on $\mathbb R^{n\times m}$, and as an example we show that all variational Gram functions are representable as squares of gauges. Other instances yield weighted sums of the Frobenius and nuclear norms. The scope of applications is large and the range of variational properties and insight is fascinating and fundamental. An important byproduct of these representations is that they lay the foundation for a smoothing approach to many matrix functions on the interior of the domain of the GMF function, which opens the door to a range of unexplored optimization methods. △ Less

Submitted 9 May, 2019; v1 submitted 3 July, 2018; originally announced July 2018.

MSC Class: 68Q25; 68R10; 68U05

arXiv:1806.05218 [pdf, ps, other]

Line Search and Trust-Region Methods for Convex-Composite Optimization

Authors: James V. Burke, Abraham Engle

Abstract: We consider descent methods for solving non-finite valued nonsmooth convex-composite optimization problems that employ Gauss-Newton subproblems to determine the iteration update. Specifically, we establish the global convergence properties for descent methods that use a backtracking line search, a weak Wolfe line search, or a trust-region update. All of these approaches are designed to exploit the… ▽ More We consider descent methods for solving non-finite valued nonsmooth convex-composite optimization problems that employ Gauss-Newton subproblems to determine the iteration update. Specifically, we establish the global convergence properties for descent methods that use a backtracking line search, a weak Wolfe line search, or a trust-region update. All of these approaches are designed to exploit the structure associated with convex-composite problems. △ Less

Submitted 10 September, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1805.01073 [pdf, ps, other]

Strong Metric (Sub)regularity of KKT Mappings for Piecewise Linear-Quadratic Convex-Composite Optimization

Authors: James V. Burke, Abraham Engle

Abstract: This work concerns the local convergence theory of Newton and quasi-Newton methods for convex-composite optimization: minimize f(x):=h(c(x)), where h is an infinite-valued proper convex function and c is C^2-smooth. We focus on the case where h is infinite-valued piecewise linear-quadratic and convex. Such problems include nonlinear programming, mini-max optimization, estimation of nonlinear dynam… ▽ More This work concerns the local convergence theory of Newton and quasi-Newton methods for convex-composite optimization: minimize f(x):=h(c(x)), where h is an infinite-valued proper convex function and c is C^2-smooth. We focus on the case where h is infinite-valued piecewise linear-quadratic and convex. Such problems include nonlinear programming, mini-max optimization, estimation of nonlinear dynamics with non-Gaussian noise as well as many modern approaches to large-scale data analysis and machine learning. Our approach embeds the optimality conditions for convex-composite optimization problems into a generalized equation. We establish conditions for strong metric subregularity and strong metric regularity of the corresponding set-valued mappings. This allows us to extend classical convergence of Newton and quasi-Newton methods to the broader class of non-finite valued piecewise linear-quadratic convex-composite optimization problems. In particular we establish local quadratic convergence of the Newton method under conditions that parallel those in nonlinear programming when h is non-finite valued piecewise linear. △ Less

Submitted 16 June, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

arXiv:1804.11003 [pdf, ps, other]

Gradient Sampling Methods for Nonsmooth Optimization

Authors: James V. Burke, Frank E. Curtis, Adrian S. Lewis, Michael L. Overton, Lucas E. A. Simões

Abstract: This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide o… ▽ More This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide overviews of various enhancements that have been proposed to improve practical performance, as well as of several extensions that have been made in the literature, such as to solve constrained problems. The paper also includes clarification of certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions. △ Less

Submitted 29 April, 2018; originally announced April 2018.

Comments: Submitted to: Special Methods for Nonsmooth Optimization (Springer, 2018), edited by A. Bagirov, M. Gaudioso, N. Karmitsa and M. Mäkelä

arXiv:1803.09224 [pdf, other]

Inexact Sequential Quadratic Optimization with Penalty Parameter Updates Within the QP Solve: Extended Version

Authors: James V. Burke, Frank E. Curtis, Hao Wang, Jiashan Wang

Abstract: This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact… ▽ More This paper focuses on the design of sequential quadratic optimization (commonly known as SQP) methods for solving large-scale nonlinear optimization problems. The most computationally demanding aspect of such an approach is the computation of the search direction during each iteration, for which we consider the use of matrix-free methods. In particular, we develop a method that requires an inexact solve of a single QP subproblem to establish the convergence of the overall SQP method. It is known that SQP methods can be plagued by poor behavior of the global convergence mechanism. To confront this issue, we propose the use of an exact penalty function with a dynamic penalty parameter updating strategy to be employed within the subproblem solver in such a way that the resulting search direction predicts progress toward both feasibility and optimality. We present our parameter updating strategy and prove that, under reasonable assumptions, the strategy does not modify the penalty parameter unnecessarily. We also discuss a matrix-free subproblem solver in which our updating strategy can be incorporated. We close the paper with a discussion of the results of numerical experiments that illustrate the benefits of our proposed techniques. △ Less

Submitted 26 February, 2020; v1 submitted 25 March, 2018; originally announced March 2018.

arXiv:1803.02525 [pdf, other]

Fast Robust Methods for Singular State-Space Models

Authors: Jonathan Jonker, Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto, Sarah Webster

Abstract: State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information. Here we develo… ▽ More State-space models are used in a wide range of time series analysis formulations. Kalman filtering and smoothing are work-horse algorithms in these settings. While classic algorithms assume Gaussian errors to simplify estimation, recent advances use a broader range of optimization formulations to allow outlier-robust estimation, as well as constraints to capture prior information. Here we develop methods on state-space models where either innovations or error covariances may be singular. These models frequently arise in navigation (e.g. for `colored noise' models or deterministic integrals) and are ubiquitous in auto-correlated time series models such as ARMA. We reformulate all state-space models (singular as well as nonsinguar) as constrained convex optimization problems, and develop an efficient algorithm for this reformulation. The convergence rate is {\it locally linear}, with constants that do not depend on the conditioning of the problem. Numerical comparisons show that the new approach outperforms competing approaches for {\it nonsingular} models, including state of the art interior point (IP) methods. IP methods converge at superlinear rates; we expect them to dominate. However, the steep rate of the proposed approach (independent of problem conditioning) combined with cheap iterations wins against IP in a run-time comparison. We therefore suggest that the proposed approach be the {\it default choice} for estimating state space models outside of the Gaussian context, regardless of whether the error covariances are singular or not. △ Less

Submitted 28 June, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: 11 pages, 4 figures

MSC Class: 62F35; 65K10; 49M15

arXiv:1703.01363 [pdf, ps, other]

Convex Geometry of the Generalized Matrix-Fractional Function

Authors: James V. Burke, Yuan Gao, Tim Hoheisel

Abstract: Generalized matrix-fractional (GMF) functions are a class of matrix support functions introduced by Burke and Hoheisel as a tool for unifying a range of seemingly divergent matrix optimization problems associated with inverse problems, regularization and learning. In this paper we dramatically simplify the support function representation for GMF functions as well as the representation of their sub… ▽ More Generalized matrix-fractional (GMF) functions are a class of matrix support functions introduced by Burke and Hoheisel as a tool for unifying a range of seemingly divergent matrix optimization problems associated with inverse problems, regularization and learning. In this paper we dramatically simplify the support function representation for GMF functions as well as the representation of their subdifferentials. These new representations allow the ready computation of a range of important related geometric objects whose formulations were previously unavailable. △ Less

Submitted 3 March, 2017; originally announced March 2017.

arXiv:1702.08649 [pdf, other]

Foundations of gauge and perspective duality

Authors: Alexandre Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Kellie MacPhee

Abstract: We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allo… ▽ More We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel-Rockafellar duality on equal footing, including explaining gauge dual variables as sensitivity measures, and showing how to recover primal solutions from those of the gauge dual. This vantage point allows a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. We extend the gauge duality framework to the setting in which the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression. △ Less

Submitted 18 June, 2018; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: 29 pages

arXiv:1609.06369 [pdf, ps, other]

Generalized Kalman Smoothing: Modeling and Algorithms

Authors: A. Y. Aravkin, J. V. Burke, L. Ljung, A. Lozano, G. Pillonetto

Abstract: State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. T… ▽ More State-space smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example Rauch-Tung-Striebel and Mayne-Fraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities. In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples. △ Less

Submitted 25 September, 2016; v1 submitted 20 September, 2016; originally announced September 2016.

Comments: 29 pages, 11 figures

MSC Class: 62F35; 65K10; 49M15

arXiv:1602.01506 [pdf, other]

Level-set methods for convex optimization

Authors: Aleksandr Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Scott Roy

Abstract: Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based o… ▽ More Convex optimization problems arising in applications often have favorable objective functions and complicated constraints, thereby precluding first-order methods from being immediately applicable. We describe an approach that exchanges the roles of the objective and constraint functions, and instead approximately solves a sequence of parametric level-set problems. A zero-finding procedure, based on inexact function evaluations and possibly inexact derivative information, leads to an efficient solution scheme for the original problem. We describe the theoretical and practical properties of this approach for a broad range of problems, including low-rank semidefinite optimization, sparse optimization, and generalized linear models for inference. △ Less

Submitted 3 February, 2016; originally announced February 2016.

Comments: 38 pages

arXiv:1511.03687 [pdf, ps, other]

Variational Analysis of Convexly Generated Spectral Max Functions

Authors: James V. Burke, Julia Eaton

Abstract: The spectral abscissa is the largest real part of an eigenvalue of a matrix and the spectral radius is the largest modulus. Both are examples of spectral max functions---the maximum of a real-valued function over the spectrum of a matrix. These mappings arise in the control and stabilization of dynamical systems. In 2001, Burke and Overton characterized the regular subdifferential of the spectral… ▽ More The spectral abscissa is the largest real part of an eigenvalue of a matrix and the spectral radius is the largest modulus. Both are examples of spectral max functions---the maximum of a real-valued function over the spectrum of a matrix. These mappings arise in the control and stabilization of dynamical systems. In 2001, Burke and Overton characterized the regular subdifferential of the spectral abscissa and showed that the spectral abscissa is subdifferentially regular in the sense of Clarke when all active eigenvalues are nonderogatory. In this paper we develop new techniques to obtain these results for the more general class of convexly generated spectral max functions. In particular, we extend the Burke-Overton subdifferential regularity result to this class. These techniques allow us to obtain new variational results for the spectral radius. △ Less

Submitted 7 November, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

arXiv:1402.1917 [pdf, other]

doi 10.1137/130950239

Matrix-Free Solvers for Exact Penalty Subproblems

Authors: James V. Burke, Frank E. Curtis, Hao Wang, Jiashan Wang

Abstract: We present two matrix-free methods for approximately solving exact penalty subproblems that arise when solving large-scale optimization problems. The first approach is a novel iterative re-weighting algorithm (IRWA), which iteratively minimizes quadratic models of relaxed subproblems while automatically updating a relaxation vector. The second approach is based on alternating direction augmented L… ▽ More We present two matrix-free methods for approximately solving exact penalty subproblems that arise when solving large-scale optimization problems. The first approach is a novel iterative re-weighting algorithm (IRWA), which iteratively minimizes quadratic models of relaxed subproblems while automatically updating a relaxation vector. The second approach is based on alternating direction augmented Lagrangian (ADAL) technology applied to our setting. The main computational costs of each algorithm are the repeated minimizations of convex quadratic functions which can be performed matrix-free. We prove that both algorithms are globally convergent under loose assumptions, and that each requires at most $O(1/\varepsilon^2)$ iterations to reach $\varepsilon$-optimality of the objective function. Numerical experiments exhibit the ability of both algorithms to efficiently find inexact solutions. Moreover, in certain cases, IRWA is shown to be more reliable than ADAL. △ Less

Submitted 9 February, 2014; originally announced February 2014.

Comments: 33 pages, 8 figures

MSC Class: 49M20; 49M29; 49M37; 65K05; 65K10; 90C06; 90C20; 90C25

Journal ref: SIAM Journal on Optimization, 25(1):261-294, 2015

arXiv:1309.7857 [pdf, other]

Generalized system identification with stable spline kernels

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stab… ▽ More Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stable spline estimators, where regularization functionals and data misfits can be selected from a rich set of piecewise linear-quadratic (PLQ) penalties. This class includes the 1-norm, Huber, and Vapnik, in addition to the least-squares penalty. By representing penalties through their conjugates, the modeler can specify any piecewise linear-quadratic penalty for misfit and regularizer, as well as inequality constraints on the response. The interior-point solver we implement (IPsolve) is locally quadratically convergent, with $O(\min(m,n)^2(m+n))$ arithmetic operations per iteration, where $n$ the number of unknown impulse response coefficients and $m$ the number of observed output measurements. IPsolve is competitive with available alternatives for system identification. This is shown by a comparison with TFOCS, libSVM, and the FISTA algorithm. The code is open source (https://github.com/saravkin/IPsolve). The impact of the approach for system identification is illustrated with numerical experiments featuring robust formulations for contaminated data, relaxation systems, nonnegativity and unimodality constraints on the impulse response, and sparsity promoting regularization. Incorporating constraints yields particularly significant improvements. △ Less

Submitted 25 July, 2018; v1 submitted 30 September, 2013; originally announced September 2013.

Comments: 23 pages, 6 figures

MSC Class: 62F35; 65K10

arXiv:1303.5588 [pdf, other]

Robust and Trend Following Student's t Kalman Smoothers

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smooth… ▽ More We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers. A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state. These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features. △ Less

Submitted 22 March, 2013; originally announced March 2013.

Comments: 23 pages, 7 figures

MSC Class: 62F35; 65K10

arXiv:1303.5237 [pdf, ps, other]

Kalman smoothing and block tridiagonal systems: new connections and numerical stability results

Authors: Aleksandr Y. Aravkin, Bradley B. Bell, James V. Burke, Gianluigi Pillonetto

Abstract: The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimiz… ▽ More The Rauch-Tung-Striebel (RTS) and the Mayne-Fraser (MF) algorithms are two of the most popular smoothing schemes to reconstruct the state of a dynamic linear system from measurements collected on a fixed interval. Another (less popular) approach is the Mayne (M) algorithm introduced in his original paper under the name of Algorithm A. In this paper, we analyze these three smoothers from an optimization and algebraic perspective, revealing new insights on their numerical stability properties. In doing this, we re-interpret classic recursions as matrix decomposition methods for block tridiagonal matrices. First, we show that the classic RTS smoother is an implementation of the forward block tridiagonal (FBT) algorithm (also known as Thomas algorithm) for particular block tridiagonal systems. We study the numerical stability properties of this scheme, connecting the condition number of the full system to properties of the individual blocks encountered during standard recursion. Second, we study the M smoother, and prove it is equivalent to a backward block tridiagonal (BBT) algorithm with a stronger stability guarantee than RTS. Third, we illustrate how the MF smoother solves a block tridiagonal system, and prove that it has the same numerical stability properties of RTS (but not those of M). Finally, we present a new hybrid RTS/M (FBT/BBT) smoothing scheme, which is faster than MF, and has the same numerical stability guarantees of RTS and MF. △ Less

Submitted 24 July, 2013; v1 submitted 21 March, 2013; originally announced March 2013.

Comments: 11 pages, no figures

MSC Class: 65F05; 65F50; 49M15

arXiv:1303.2827 [pdf, other]

Linear system identification using stable spline kernels and PLQ penalties

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized l… ▽ More The classical approach to linear system identification is given by parametric Prediction Error Methods (PEM). In this context, model complexity is often unknown so that a model order selection step is needed to suitably trade-off bias and variance. Recently, a different approach to linear system identification has been introduced, where model order determination is avoided by using a regularized least squares framework. In particular, the penalty term on the impulse response is defined by so called stable spline kernels. They embed information on regularity and BIBO stability, and depend on a small number of parameters which can be estimated from data. In this paper, we provide new nonsmooth formulations of the stable spline estimator. In particular, we consider linear system identification problems in a very broad context, where regularization functionals and data misfits can come from a rich set of piecewise linear quadratic functions. Moreover, our anal- ysis includes polyhedral inequality constraints on the unknown impulse response. For any formulation in this class, we show that interior point methods can be used to solve the system identification problem, with complexity O(n3)+O(mn2) in each iteration, where n and m are the number of impulse response coefficients and measurements, respectively. The usefulness of the framework is illustrated via a numerical experiment where output measurements are contaminated by outliers. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Comments: 8 pages, 2 figures

MSC Class: 47N30; 65K10

arXiv:1303.1993 [pdf, other]

Optimization viewpoint on Kalman smoothing, with applications to robust and sparse estimation

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this… ▽ More In this paper, we present the optimization formulation of the Kalman filtering and smoothing problems, and use this perspective to develop a variety of extensions and applications. We first formulate classic Kalman smoothing as a least squares problem, highlight special structure, and show that the classic filtering and smoothing algorithms are equivalent to a particular algorithm for solving this problem. Once this equivalence is established, we present extensions of Kalman smoothing to systems with nonlinear process and measurement models, systems with linear and nonlinear inequality constraints, systems with outliers in the measurements or sudden changes in the state, and systems where the sparsity of the state sequence must be accounted for. All extensions preserve the computational efficiency of the classic algorithms, and most of the extensions are illustrated with numerical examples, which are part of an open source Kalman smoothing Matlab/Octave package. △ Less

Submitted 11 March, 2013; v1 submitted 8 March, 2013; originally announced March 2013.

Comments: 46 pages, 11 figures

MSC Class: 62F35; 65K10;

arXiv:1302.6434 [pdf, other]

Convex vs nonconvex approaches for sparse estimation: GLasso, Multiple Kernel Learning and Hyperparameter GLasso

Authors: Aleksandr Y. Aravkin, James V. Burke, Alessandro Chiuso, Gianluigi Pillonetto

Abstract: The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational… ▽ More The popular Lasso approach for sparse estimation can be derived via marginalization of a joint density associated with a particular stochastic model. A different marginalization of the same probabilistic model leads to a different non-convex estimator where hyperparameters are optimized. Extending these arguments to problems where groups of variables have to be estimated, we study a computational scheme for sparse estimation that differs from the Group Lasso. Although the underlying optimization problem defining this estimator is non-convex, an initialization strategy based on a univariate Bayesian forward selection scheme is presented. This also allows us to define an effective non-convex estimator where only one scalar variable is involved in the optimization process. Theoretical arguments, independent of the correctness of the priors entering the sparse model, are included to clarify the advantages of this non-convex technique in comparison with other convex estimators. Numerical experiments are also used to compare the performance of these approaches. △ Less

Submitted 26 February, 2013; v1 submitted 26 February, 2013; originally announced February 2013.

Comments: 50 pages, 12 figures

MSC Class: 62F35; 65K10; 47N30

arXiv:1301.5288 [pdf, other]

The connection between Bayesian estimation of a Gaussian random field and RKHS

Authors: Aleksandr Y. Aravkin, Bradley M. Bell, James V. Burke, Gianluigi Pillonetto

Abstract: Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate re… ▽ More Reconstruction of a function from noisy data is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution describes the observed data and has a small RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate represents the posterior mean (minimum variance estimate) of a Gaussian random field with covariance proportional to the kernel associated with the RKHS. In this paper, we provide a statistical interpretation when more general losses are used, such as absolute value, Vapnik or Huber. Specifically, for any finite set of sampling locations (including where the data were collected), the MAP estimate for the signal samples is given by the RKHS estimate evaluated at these locations. △ Less

Submitted 17 July, 2013; v1 submitted 22 January, 2013; originally announced January 2013.

Comments: 8 pages, 2 figures

MSC Class: 47N30; 65K10

arXiv:1301.4566 [pdf, other]

Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a… ▽ More We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases. △ Less

Submitted 2 May, 2013; v1 submitted 19 January, 2013; originally announced January 2013.

Comments: 41 pages, 4 figures

MSC Class: 62F35; 65K10

arXiv:1211.4601 [pdf, other]

Smoothing Dynamic Systems with State-Dependent Covariance Matrices

Authors: Aleksandr Y. Aravkin, James V. Burke

Abstract: Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally o… ▽ More Kalman filtering and smoothing algorithms are used in many areas, including tracking and navigation, medical applications, and financial trend filtering. One of the basic assumptions required to apply the Kalman smoothing framework is that error covariance matrices are known and given. In this paper, we study a general class of inference problems where covariance matrices can depend functionally on unknown parameters. In the Kalman framework, this allows modeling situations where covariance matrices may depend functionally on the state sequence being estimated. We present an extended formulation and generalized Gauss-Newton (GGN) algorithm for inference in this context. When applied to dynamic systems inference, we show the algorithm can be implemented to preserve the computational efficiency of the classic Kalman smoother. The new approach is illustrated with a synthetic numerical example. △ Less

Submitted 20 March, 2014; v1 submitted 19 November, 2012; originally announced November 2012.

Comments: 8 pages, 1 figure

MSC Class: 62F35; 65K10

arXiv:1211.3724 [pdf, other]

doi 10.1137/120899157

Variational properties of value functions

Authors: Aleksandr Y. Aravkin, James V. Burke, Michael P. Friedlander

Abstract: Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. Th… ▽ More Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters, and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results. △ Less

Submitted 23 May, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

Comments: 30 pages

Journal ref: SIAM Journal on Optimization, 23(3):1689-1717, 2013

arXiv:1208.6591 [pdf, ps, other]

Epi-convergent Smoothing with Applications to Convex Composite Functions

Authors: James V. Burke, Tim Hoheisel

Abstract: Smoothing methods have become part of the standard tool set for the study and solution of nondifferentiable and constrained optimization problems as well as a range of other variational and equilibrium problems. In this note we synthesize and extend recent results due to Beck and Teboulle on infimal convolution smoothing for convex functions with those of X. Chen on gradient consistency for noncon… ▽ More Smoothing methods have become part of the standard tool set for the study and solution of nondifferentiable and constrained optimization problems as well as a range of other variational and equilibrium problems. In this note we synthesize and extend recent results due to Beck and Teboulle on infimal convolution smoothing for convex functions with those of X. Chen on gradient consistency for nonconvex functions. We use epi-convergence techniques to define a notion of epi-smoothing that allows us to tap into the rich variational structure of the subdifferential calculus for nonsmooth, nonconvex, and nonfinite-valued functions. As an illustration of the versatility and range of epi-smoothing techniques, the results are applied to the general constrained optimization for which nonlinear programming is a special case. △ Less

Submitted 31 August, 2012; originally announced August 2012.

MSC Class: 49J52; 49J53; 90C26; 90C30; 90C46

arXiv:1111.2730 [pdf, other]

A statistical and computational theory for robust and sparse Kalman smoothing

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: Kalman smoothers reconstruct the state of a dynamical system starting from noisy output samples. While the classical estimator relies on quadratic penalization of process deviations and measurement errors, extensions that exploit Piecewise Linear Quadratic (PLQ) penalties have been recently proposed in the literature. These new formulations include smoothers robust with respect to outliers in the… ▽ More Kalman smoothers reconstruct the state of a dynamical system starting from noisy output samples. While the classical estimator relies on quadratic penalization of process deviations and measurement errors, extensions that exploit Piecewise Linear Quadratic (PLQ) penalties have been recently proposed in the literature. These new formulations include smoothers robust with respect to outliers in the data, and smoothers that keep better track of fast system dynamics, e.g. jumps in the state values. In addition to L2, well known examples of PLQ penalties include the L1, Huber and Vapnik losses. In this paper, we use a dual representation for PLQ penalties to build a statistical modeling framework and a computational theory for Kalman smoothing. We develop a statistical framework by establishing conditions required to interpret PLQ penalties as negative logs of true probability densities. Then, we present a computational framework, based on interior-point methods, that solves the Kalman smoothing problem with PLQ penalties and maintains the linear complexity in the size of the time series, just as in the L2 case. The framework presented extends the computational efficiency of the Mayne-Fraser and Rauch-Tung-Striebel algorithms to a much broader non-smooth setting, and includes many known robust and sparse smoothers as special cases. △ Less

Submitted 11 November, 2011; originally announced November 2011.

Comments: 8 pages

MSC Class: 62F35; 65K10

arXiv:1001.3907 [pdf, other]

Robust and Trend-following Kalman Smoothers using Student's t

Authors: Aleksandr Y. Aravkin, James V. Burke, Gianluigi Pillonetto

Abstract: We propose two nonlinear Kalman smoothers that rely on Student's t distributions. The T-Robust smoother finds the maximum a posteriori likelihood (MAP) solution for Gaussian process noise and Student's t observation noise, and is extremely robust against outliers, outperforming the recently proposed l1-Laplace smoother in extreme situations (e.g. 50% or more outliers). The second estimator, which… ▽ More We propose two nonlinear Kalman smoothers that rely on Student's t distributions. The T-Robust smoother finds the maximum a posteriori likelihood (MAP) solution for Gaussian process noise and Student's t observation noise, and is extremely robust against outliers, outperforming the recently proposed l1-Laplace smoother in extreme situations (e.g. 50% or more outliers). The second estimator, which we call the T-Trend smoother, is able to follow sudden changes in the process model, and is derived as a MAP solver for a model with Student's t-process noise and Gaussian observation noise. We design specialized methods to solve both problems which exploit the special structure of the Student's t-distribution, and provide a convergence theory. Both smoothers can be implemented with only minor modifications to an existing L2 smoother implementation. Numerical results for linear and nonlinear models illustrating both robust and fast tracking applications are presented. △ Less

Submitted 11 November, 2011; v1 submitted 21 January, 2010; originally announced January 2010.

Comments: 7 pages, 4 figures

MSC Class: 62F35; 65K10

Showing 1–30 of 30 results for author: Burke, J V