Skip to main content

Showing 1–18 of 18 results for author: Roosta, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.12353  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Importance Sampling for Nonlinear Models

    Authors: Prakash Palanivelu Rajmohan, Fred Roosta

    Abstract: While norm-based and leverage-score-based methods have been extensively studied for identifying "important" data points in linear models, analogous tools for nonlinear models remain significantly underdeveloped. By introducing the concept of the adjoint operator of a nonlinear map, we address this gap and generalize norm-based and leverage-score-based importance sampling to nonlinear settings. We… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: This work is accepted at ICML 2025

  2. arXiv:2503.04424  [pdf, other

    stat.ML cs.LG math.NA

    Determinant Estimation under Memory Constraints and Neural Scaling Laws

    Authors: Siavash Ameli, Chris van der Heide, Liam Hodgkinson, Fred Roosta, Michael W. Mahoney

    Abstract: Calculating or accurately estimating log-determinants of large positive semi-definite matrices is of fundamental importance in many machine learning tasks. While its cubic computational complexity can already be prohibitive, in modern applications, even storing the matrices themselves can pose a memory bottleneck. To address this, we derive a novel hierarchical algorithm based on block-wise comput… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  3. arXiv:2502.02870  [pdf, other

    stat.ML cs.LG

    Uncertainty Quantification with the Empirical Neural Tangent Kernel

    Authors: Joseph Wilson, Chris van der Heide, Liam Hodgkinson, Fred Roosta

    Abstract: While neural networks have demonstrated impressive performance across various tasks, accurately quantifying uncertainty in their predictions is essential to ensure their trustworthiness and enable widespread adoption in critical systems. Several Bayesian uncertainty quantification (UQ) methods exist that are either cheap or reliable, but not both. We propose a post-hoc, sampling-based UQ method fo… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 24 pages, 5 figures, 9 tables

  4. arXiv:2401.00122  [pdf, other

    stat.ML cs.LG

    SALSA: Sequential Approximate Leverage-Score Algorithm with Application in Analyzing Big Time Series Data

    Authors: Ali Eshragh, Luke Yerbury, Asef Nazari, Fred Roosta, Michael W. Mahoney

    Abstract: We develop a new efficient sequential approximate leverage score algorithm, SALSA, using methods from randomized numerical linear algebra (RandNLA) for large matrices. We demonstrate that, with high probability, the accuracy of SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage scores. In addition, we show that the theoretical computational complexity and numerical accu… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 42 pages, 7 figures

    MSC Class: 62M10

  5. arXiv:2311.07013  [pdf, ps, other

    stat.ML cs.LG

    A PAC-Bayesian Perspective on the Interpolating Information Criterion

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 9 pages

  6. arXiv:2307.07785  [pdf, other

    stat.ML cs.LG

    The Interpolating Information Criterion for Overparameterized Models

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized mod… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: 23 pages, 2 figures

  7. arXiv:2210.07612  [pdf, other

    stat.ML cs.LG

    Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input d… ▽ More

    Submitted 25 July, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 33 pages, 21 figures

  8. arXiv:2106.08544  [pdf, other

    cs.LG math.NA stat.ML

    Non-PSD Matrix Sketching with Applications to Regression and Optimization

    Authors: Zhili Feng, Fred Roosta, David P. Woodruff

    Abstract: A variety of dimensionality reduction techniques have been applied for computations involving large matrices. The underlying matrix is randomly compressed into a smaller one, while approximately retaining many of its original properties. As a result, much of the expensive computation can be performed on the small matrix. The sketching of positive semidefinite (PSD) matrices is well understood, but… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  9. arXiv:2002.09547  [pdf, other

    stat.ML cs.LG

    Stochastic Normalizing Flows

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equ… ▽ More

    Submitted 25 February, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: 17 pages, 4 figures

  10. arXiv:2002.08517  [pdf, other

    cs.LG stat.ML

    Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks

    Authors: Russell Tsuchida, Tim Pearce, Chris van der Heide, Fred Roosta, Marcus Gallagher

    Abstract: Analysing and computing with Gaussian processes arising from infinitely wide neural networks has recently seen a resurgence in popularity. Despite this, many explicit covariance functions of networks with activation functions used in modern networks remain unknown. Furthermore, while the kernels of deep networks can be computed iteratively, theoretical understanding of deep kernels is lacking, par… ▽ More

    Submitted 28 February, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: AAAI camera ready version. 18 pages, 9 figures, 2 tables. Corrected name particle capitalisation and formatting

  11. arXiv:2001.09266  [pdf, other

    math.ST stat.ML

    The reproducing Stein kernel approach for post-hoc corrected sampling

    Authors: Liam Hodgkinson, Robert Salomone, Fred Roosta

    Abstract: Stein importance sampling is a widely applicable technique based on kernelized Stein discrepancy, which corrects the output of approximate sampling algorithms by reweighting the empirical distribution of the samples. A general analysis of this technique is conducted for the previously unconsidered setting where samples are obtained via the simulation of a Markov chain, and applies to an arbitrary… ▽ More

    Submitted 13 September, 2021; v1 submitted 25 January, 2020; originally announced January 2020.

    Comments: 26 pages, 2 figures

    MSC Class: 65C05 (Primary) 60J22; 60B10 (Secondary)

  12. arXiv:1911.12927  [pdf, other

    cs.LG stat.ML

    Richer priors for infinitely wide multi-layer perceptrons

    Authors: Russell Tsuchida, Fred Roosta, Marcus Gallagher

    Abstract: It is well-known that the distribution over functions induced through a zero-mean iid prior distribution over the parameters of a multi-layer perceptron (MLP) converges to a Gaussian process (GP), under mild conditions. We extend this result firstly to independent priors with general zero or non-zero means, and secondly to a family of partially exchangeable priors which generalise iid priors. We d… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: Pre-print

  13. arXiv:1911.12321  [pdf, other

    stat.ME stat.ML

    LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data

    Authors: Ali Eshragh, Fred Roosta, Asef Nazari, Michael W. Mahoney

    Abstract: We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive (AR) model in big data regimes. We show that the accuracy of approximations lies within $(1+\bigO{\varepsilon})$ of the true leverage scores with high probabili… ▽ More

    Submitted 30 October, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: 38 pages, 8 figures

  14. arXiv:1910.00423  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings

    Authors: Keith Levin, Fred Roosta, Minh Tang, Michael W. Mahoney, Carey E. Priebe

    Abstract: Graph embeddings, a class of dimensionality reduction techniques designed for relational data, have proven useful in exploring and modeling network structure. Most dimensionality reduction methods allow out-of-sample extensions, by which an embedding can be applied to observations not present in the training set. Applied to graphs, the out-of-sample extension problem concerns how to compute the em… ▽ More

    Submitted 29 September, 2019; originally announced October 2019.

    Comments: Portions of this work originally appeared in ICML2018 as "Out-of-sample extension of graph adjacency spectral embedding" (accompanying technical report available at arXiv:1802.06307). This work extends the results of that earlier paper to a second graph embedding technique called the Laplacian spectral embedding and presents additional experiments

  15. arXiv:1903.12322  [pdf, other

    stat.ML cs.LG stat.CO

    Implicit Langevin Algorithms for Sampling From Log-concave Densities

    Authors: Liam Hodgkinson, Robert Salomone, Fred Roosta

    Abstract: For sampling from a log-concave density, we study implicit integrators resulting from $θ$-method discretization of the overdamped Langevin diffusion stochastic differential equation. Theoretical and algorithmic properties of the resulting sampling methods for $ θ\in [0,1] $ and a range of step sizes are established. Our results generalize and extend prior works in several directions. In particular… ▽ More

    Submitted 10 July, 2021; v1 submitted 28 March, 2019; originally announced March 2019.

  16. arXiv:1810.08351  [pdf, other

    cs.LG stat.ML

    Exchangeability and Kernel Invariance in Trained MLPs

    Authors: Russell Tsuchida, Fred Roosta, Marcus Gallagher

    Abstract: In the analysis of machine learning models, it is often convenient to assume that the parameters are IID. This assumption is not satisfied when the parameters are updated through training processes such as SGD. A relaxation of the IID condition is a probabilistic symmetry known as exchangeability. We show the sense in which the weights in MLPs are exchangeable. This yields the result that in certa… ▽ More

    Submitted 27 October, 2018; v1 submitted 19 October, 2018; originally announced October 2018.

    Comments: 26 pages, 16 Figures; Changed Fred (Farbod) Roosta to Fred Roosta in Metadata

  17. arXiv:1807.07132  [pdf, other

    cs.LG stat.ML

    Newton-ADMM: A Distributed GPU-Accelerated Optimizer for Multiclass Classification Problems

    Authors: Chih-Hao Fang, Sudhir B Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama

    Abstract: First-order optimization methods, such as stochastic gradient descent (SGD) and its variants, are widely used in machine learning applications due to their simplicity and low per-iteration costs. However, they often require larger numbers of iterations, with associated communication costs in distributed environments. In contrast, Newton-type methods, while having higher per-iteration costs, typica… ▽ More

    Submitted 4 February, 2020; v1 submitted 18 July, 2018; originally announced July 2018.

  18. arXiv:1708.07164  [pdf, ps, other

    math.OC cs.CC cs.LG stat.ML

    Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information

    Authors: Peng Xu, Fred Roosta, Michael W. Mahoney

    Abstract: We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $ ε$-approximate second-order optimality which have shown to be tight. Our Hessian approximation cond… ▽ More

    Submitted 14 May, 2019; v1 submitted 23 August, 2017; originally announced August 2017.

    Comments: 32 pages

    Journal ref: Mathematical Programming 2019