Search | arXiv e-print repository

doi 10.1137/23M1549687

Approximating Higher-Order Derivative Tensors Using Secant Updates

Abstract: Quasi-Newton methods employ an update rule that gradually improves the Hessian approximation using the already available gradient evaluations. We propose higher-order secant updates which generalize this idea to higher-order derivatives, approximating for example third derivatives (which are tensors) from given Hessian evaluations. Our generalization is based on the observation that quasi-Newton u… ▽ More Quasi-Newton methods employ an update rule that gradually improves the Hessian approximation using the already available gradient evaluations. We propose higher-order secant updates which generalize this idea to higher-order derivatives, approximating for example third derivatives (which are tensors) from given Hessian evaluations. Our generalization is based on the observation that quasi-Newton updates are least-change updates satisfying the secant equation, with different methods using different norms to measure the size of the change. We present a full characterization for least-change updates in weighted Frobenius norms (satisfying an analogue of the secant equation) for derivatives of arbitrary order. Moreover, we establish convergence of the approximations to the true derivative under standard assumptions and explore the quality of the generated approximations in numerical experiments. △ Less

Submitted 15 August, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

MSC Class: 90C53; 65D25

Journal ref: SIAM Journal on Optimization Vol. 34 Iss. 1 (2024) pp. 893-917

arXiv:2106.13434 [pdf, other]

Binary Matrix Factorisation and Completion via Integer Programming

Authors: Reka A. Kovacs, Oktay Gunluk, Raphael A. Hauser

Abstract: Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n x m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n x k and k x m respectively, which minimise the distance between X… ▽ More Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n x m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n x k and k x m respectively, which minimise the distance between X and the Boolean product of A and B in the squared Frobenius distance. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak LP relaxation, while the exponential size IPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times the zero is erroneously covered in a rank-k factorisation. For one of the exponential size IPs we describe a computational approach based on column generation. Experimental results on synthetic and real word datasets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorisations. △ Less

Submitted 3 August, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2011.04457 [pdf, other]

Binary Matrix Factorisation via Column Generation

Authors: Reka A. Kovacs, Oktay Gunluk, Raphael A. Hauser

Abstract: Identifying discrete patterns in binary data is an important dimensionality reduction tool in machine learning and data mining. In this paper, we consider the problem of low-rank binary matrix factorisation (BMF) under Boolean arithmetic. Due to the hardness of this problem, most previous attempts rely on heuristic techniques. We formulate the problem as a mixed integer linear program and use a la… ▽ More Identifying discrete patterns in binary data is an important dimensionality reduction tool in machine learning and data mining. In this paper, we consider the problem of low-rank binary matrix factorisation (BMF) under Boolean arithmetic. Due to the hardness of this problem, most previous attempts rely on heuristic techniques. We formulate the problem as a mixed integer linear program and use a large scale optimisation technique of column generation to solve it without the need of heuristic pattern mining. Our approach focuses on accuracy and on the provision of optimality guarantees. Experimental results on real world datasets demonstrate that our proposed method is effective at producing highly accurate factorisations and improves on the previously available best known results for 15 out of 24 problem instances. △ Less

Submitted 3 August, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: final version as published by AAAI2021, plus including Appendix

arXiv:1805.07459 [pdf, other]

PCA by Optimisation of Symmetric Functions has no Spurious Local Optima

Authors: Raphael A. Hauser, Armin Eftekhari

Abstract: Principal Component Analysis (PCA) finds the best linear representation of data, and is an indispensable tool in many learning and inference tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. This paper introduces many other w… ▽ More Principal Component Analysis (PCA) finds the best linear representation of data, and is an indispensable tool in many learning and inference tasks. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. This paper introduces many other ways of performing PCA, with various geometric interpretations, and proves that the corresponding family of non-convex programs have no spurious local optima, while possessing only strict saddle points. These programs therefore loosely behave like convex problems and can be efficiently solved to global optimality, for example, with certain variants of the stochastic gradient descent. Beyond providing new geometric interpretations and enhancing our theoretical understanding of PCA, our findings might pave the way for entirely new approaches to structured dimensionality reduction, such as sparse PCA and nonnegative matrix factorisation. More specifically, we study an unconstrained formulation of PCA using determinant optimisation that might provide an elegant alternative to the deflating scheme commonly used in sparse PCA. △ Less

Submitted 21 December, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

arXiv:1803.04049 [pdf, other]

PCA by Determinant Optimization has no Spurious Local Optima

Authors: Raphael A. Hauser, Armin Eftekhari, Heinrich F. Matzinger

Abstract: Principal component analysis (PCA) is an indispensable tool in many learning tasks that finds the best linear representation for data. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. There are yet other ways of interpreting PCA that… ▽ More Principal component analysis (PCA) is an indispensable tool in many learning tasks that finds the best linear representation for data. Classically, principal components of a dataset are interpreted as the directions that preserve most of its "energy", an interpretation that is theoretically underpinned by the celebrated Eckart-Young-Mirsky Theorem. There are yet other ways of interpreting PCA that are rarely exploited in practice, largely because it is not known how to reliably solve the corresponding non-convex optimisation programs. In this paper, we consider one such interpretation of principal components as the directions that preserve most of the "volume" of the dataset. Our main contribution is a theorem that shows that the corresponding non-convex program has no spurious local optima. We apply a number of solvers for empirical confirmation. △ Less

Submitted 11 March, 2018; originally announced March 2018.

arXiv:math/0104020 [pdf, ps, other]

Self-scaled barriers for irreducible symmetric cones

Authors: Raphael A Hauser, Yongdo Lim

Abstract: Self-scaled barrier functions are fundamental objects in the theory of interior-point methods for linear optimization over symmetric cones, of which linear and semidefinite programming are special cases. We are classifying all self-scaled barriers over irreducible symmetric cones and show that these functions are merely homothetic transformations of the universal barrier function. Together with… ▽ More Self-scaled barrier functions are fundamental objects in the theory of interior-point methods for linear optimization over symmetric cones, of which linear and semidefinite programming are special cases. We are classifying all self-scaled barriers over irreducible symmetric cones and show that these functions are merely homothetic transformations of the universal barrier function. Together with a decomposition theorem for self-scaled barriers this concludes the algebraic classification theory of these functions. After introducing the reader to the concepts relevant to the problem and tracing the history of the subject, we start by deriving our result from first principles in the important special case of semidefinite programming. We then generalise these arguments to irreducible symmetric cones by invoking results from the theory of Euclidean Jordan algebras. △ Less

Submitted 2 April, 2001; originally announced April 2001.

Comments: 12 pages

Report number: DAMTP 2001/NA04 MSC Class: 90C25; 52A41; 90C60 (primary) 90C05; 90C20 (secondary)

Showing 1–6 of 6 results for author: Hauser, R A