Skip to main content

Showing 1–44 of 44 results for author: Derezinski, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.17556  [pdf, ps, other

    cs.DS cs.LG

    Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nyström Method

    Authors: Sachin Garg, Michał Dereziński

    Abstract: The Nyström method is a popular low-rank approximation technique for large matrices that arise in kernel methods and convex optimization. Yet, when the data exhibits heavy-tailed spectral decay, the effective dimension of the problem often becomes so large that even the Nyström method may be outside of our computational budget. To address this, we propose Block-Nyström, an algorithm that injects a… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  2. arXiv:2505.13723  [pdf, ps, other

    cs.LG math.OC stat.ML

    Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

    Authors: Pratik Rathore, Zachary Frangella, Sachin Garg, Shaghayegh Fazliani, Michał Dereziński, Madeleine Udell

    Abstract: Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number o… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 28 pages, 6 figures, 2 tables

  3. arXiv:2501.11673  [pdf, other

    math.NA cs.DS cs.LG math.OC stat.ML

    Randomized Kaczmarz Methods with Beyond-Krylov Convergence

    Authors: Michał Dereziński, Deanna Needell, Elizaveta Rebrova, Jiaming Yang

    Abstract: Randomized Kaczmarz methods form a family of linear system solvers which converge by repeatedly projecting their iterates onto randomly sampled equations. While effective in some contexts, such as highly over-determined least squares, Kaczmarz methods are traditionally deemed secondary to Krylov subspace methods, since this latter family of solvers can exploit outliers in the input's singular valu… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  4. arXiv:2411.08773  [pdf, ps, other

    cs.DS cs.LG math.NA math.PR stat.ML

    Optimal Oblivious Subspace Embeddings with Near-optimal Sparsity

    Authors: Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong

    Abstract: An oblivious subspace embedding is a random $m\times n$ matrix $Π$ such that, for any $d$-dimensional subspace, with high probability $Π$ preserves the norms of all vectors in that subspace within a $1\pmε$ factor. In this work, we give an oblivious subspace embedding with the optimal dimension $m=Θ(d/ε^2)$ that has a near-optimal sparsity of $\tilde O(1/ε)$ non-zero entries per column of $Π$. Thi… ▽ More

    Submitted 28 April, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: ICALP 2025

  5. arXiv:2407.10070  [pdf, other

    cs.LG math.OC stat.ML

    Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression

    Authors: Pratik Rathore, Zachary Frangella, Jiaming Yang, Michał Dereziński, Madeleine Udell

    Abstract: Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive c… ▽ More

    Submitted 21 February, 2025; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 64 pages (including appendices), 16 figures, 5 tables

    MSC Class: 65F10; 68W20; 90C06

  6. arXiv:2406.11151  [pdf, other

    cs.LG math.NA stat.ML

    Recent and Upcoming Developments in Randomized Numerical Linear Algebra for Machine Learning

    Authors: Michał Dereziński, Michael W. Mahoney

    Abstract: Large matrices arise in many machine learning and data analysis applications, including as representations of datasets, graphs, model weights, and first and second-order derivatives. Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness to develop improved algorithms for ubiquitous matrix problems. The area has reached a certain level of maturity; but recent hardware trend… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.01478  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Newton Proximal Extragradient Method

    Authors: Ruichen Jiang, Michał Dereziński, Aryan Mokhtari

    Abstract: Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that… ▽ More

    Submitted 11 November, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to NeurIPS 2024; 35 pages, 3 figures

  8. arXiv:2405.05865  [pdf, ps, other

    cs.DS cs.LG math.NA math.OC

    Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

    Authors: Michał Dereziński, Christopher Musco, Jiaming Yang

    Abstract: We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nyström approximation to $A$ using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the co… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: SODA 2025

  9. arXiv:2405.05818  [pdf, ps, other

    cs.DS cs.LG math.NA math.OC

    Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems

    Authors: Michał Dereziński, Daniel LeJeune, Deanna Needell, Elizaveta Rebrova

    Abstract: Despite being a key bottleneck in many machine learning tasks, the cost of solving large linear systems has proven challenging to quantify due to problem-dependent quantities such as condition numbers. To tackle this, we consider a fine-grained notion of complexity for solving linear systems, which is motivated by applications where the data exhibits low-dimensional structure, including spiked cov… ▽ More

    Submitted 17 June, 2025; v1 submitted 9 May, 2024; originally announced May 2024.

  10. arXiv:2405.05343  [pdf, other

    cs.DS cs.LG math.NA

    Distributed Least Squares in Small Space via Sketching and Bias Reduction

    Authors: Sachin Garg, Kevin Tan, Michał Dereziński

    Abstract: Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its e… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  11. arXiv:2404.14758  [pdf, other

    math.OC cs.LG stat.ML

    Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

    Authors: Sachin Garg, Albert S. Berahas, Michał Dereziński

    Abstract: We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making them more scalable while retaining their benefits over traditional Newton-type approaches. We demonstrate this phenomenon on a prototypical stochastic second-or… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    MSC Class: 65K05; 90C06; 90C30

  12. arXiv:2403.18142  [pdf, other

    cs.LG

    HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

    Authors: Yongyi Yang, Jiaming Yang, Wei Hu, Michał Dereziński

    Abstract: As a variant of Graph Neural Networks (GNNs), Unfolded GNNs offer enhanced interpretability and flexibility over traditional designs. Nevertheless, they still suffer from scalability challenges when it comes to the training cost. Although many methods have been proposed to address the scalability issues, they mostly focus on per-iteration efficiency, without worst-case convergence guarantees. More… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  13. arXiv:2312.08893  [pdf, ps, other

    cs.DS cs.LG math.NA math.OC

    Solving Dense Linear Systems Faster Than via Preconditioning

    Authors: Michał Dereziński, Jiaming Yang

    Abstract: We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $\|A\tilde x-b\|\leq ε\|b\|$ in time: $$\tilde O((n^2+nk^{ω-1})\log1/ε),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $ω< 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: STOC 2024

  14. arXiv:2311.10680  [pdf, other

    cs.DS cs.LG math.NA stat.ML

    Optimal Embedding Dimension for Sparse Subspace Embeddings

    Authors: Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson

    Abstract: A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $ε>0$, $δ\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+ε)^{-1}\|x\|\leq\|Sx\|\leq (1+ε)\|x\|\,\big)\geq 1-δ.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $θ> 0$, a Gaussian embedding matrix wit… ▽ More

    Submitted 5 June, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: STOC 2024

  15. arXiv:2308.15720  [pdf, other

    cs.LG cs.AI

    Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

    Authors: Younghyun Cho, James W. Demmel, Michał Dereziński, Haoyun Li, Hengrui Luo, Michael W. Mahoney, Riley J. Murray

    Abstract: Algorithms from Randomized Numerical Linear Algebra (RandNLA) are known to be effective in handling high-dimensional computational problems, providing high-quality empirical performance as well as strong probabilistic guarantees. However, their practical application is complicated by the fact that the user needs to set various algorithm-specific tuning parameters which are different than those use… ▽ More

    Submitted 10 January, 2025; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Improved the presentation and clarity. Updated experimental results and scenarios. Accepted for publication in SIAM Journal on Matrix Analysis and Applications

    MSC Class: 68W20; 65F20; 65Y20

  16. arXiv:2302.11474  [pdf, other

    math.NA cs.MS math.OC

    Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software

    Authors: Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E. Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra

    Abstract: Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more ef… ▽ More

    Submitted 12 April, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: v1: this is the first arXiv release of LAPACK Working Note 299. v2: complete rewrite of the subsection on trace estimation, among other changes. See frontmatter page ii (pdf page 5) for revision history

  17. arXiv:2206.10291  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs

    Authors: Michał Dereziński

    Abstract: Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to exhibit many robust performance characteristics that are known to occur when a data sample comes from a sub-gaussian random design, which is a powerful statistical… ▽ More

    Submitted 27 July, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

  18. arXiv:2206.02702  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

    Authors: Michał Dereziński

    Abstract: Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in further improving the performance of these first-order methods. Yet, comparatively little is known about the benefits of using variance reduction to accelerate pop… ▽ More

    Submitted 29 April, 2025; v1 submitted 6 June, 2022; originally announced June 2022.

  19. arXiv:2204.09266  [pdf, other

    math.OC cs.LG stat.ML

    Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence

    Authors: Sen Na, Michał Dereziński, Michael W. Mahoney

    Abstract: We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear converge… ▽ More

    Submitted 28 November, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: 43 pages, 16 figures

  20. arXiv:2109.06442  [pdf, ps, other

    cs.DS math.PR

    Domain Sparsification of Discrete Distributions using Entropic Independence

    Authors: Nima Anari, Michał Dereziński, Thuy-Duong Vuong, Elizabeth Yang

    Abstract: We present a framework for speeding up the time it takes to sample from discrete distributions $μ$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime $k\ll n$. We show that having estimates of marginals $\mathbb{P}_{S\sim μ}[i\in S]$, the task of sampling from $μ$ can be reduced to sampling from distributions $ν$ supported on size $k$ subsets of a ground set of only… ▽ More

    Submitted 14 September, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

  21. arXiv:2107.07480  [pdf, other

    math.OC cs.DS cs.LG stat.ML

    Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

    Authors: Michał Dereziński, Jonathan Lacotte, Mert Pilanci, Michael W. Mahoney

    Abstract: In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computation… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  22. arXiv:2105.07320  [pdf, other

    cs.DC stat.ML

    LocalNewton: Reducing Communication Bottleneck for Distributed Learning

    Authors: Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney

    Abstract: To address the communication bottleneck problem in distributed optimization within a master-worker framework, we propose LocalNewton, a distributed second-order algorithm with local averaging. In LocalNewton, the worker machines update their model in every iteration by finding a suitable second-order descent direction using only the data and model stored in their own local memory. We let the worke… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: To be published in Uncertainty in Artificial Intelligence (UAI) 2021

  23. arXiv:2102.02322  [pdf, ps, other

    cs.LG cs.DS

    Query Complexity of Least Absolute Deviation Regression via Robust Uniform Convergence

    Authors: Xue Chen, Michał Dereziński

    Abstract: Consider a regression problem where the learner is given a large collection of $d$-dimensional data points, but can only query a small subset of the real-valued labels. How many queries are needed to obtain a $1+ε$ relative error approximation of the optimum? While this problem has been extensively studied for least squares regression, little is known for other losses. An important example is leas… ▽ More

    Submitted 28 June, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

  24. arXiv:2011.10695  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Sparse sketches with small inversion bias

    Authors: Michał Dereziński, Zhenyu Liao, Edgar Dobriban, Michael W. Mahoney

    Abstract: For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$. This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimization, when averaging multiple independently construc… ▽ More

    Submitted 9 July, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  25. arXiv:2007.01327  [pdf, other

    cs.LG math.OC stat.ML

    Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

    Authors: Michał Dereziński, Burak Bartan, Mert Pilanci, Michael W. Mahoney

    Abstract: In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging. Here, we introduce a new technique for debiasing the local estimates, w… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  26. arXiv:2006.16947  [pdf, other

    cs.LG cs.DS stat.ML

    Sampling from a $k$-DPP without looking at all items

    Authors: Daniele Calandriello, Michał Dereziński, Michal Valko

    Abstract: Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size $k$, our goal is to sample $k$ out of $n$ items with probability proportional to the determinant of the kernel matrix induced by… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  27. arXiv:2006.10653  [pdf, other

    cs.LG stat.ML

    Precise expressions for random projections: Low-rank approximation and randomized Newton

    Authors: Michał Dereziński, Feynman Liang, Zhenyu Liao, Michael W. Mahoney

    Abstract: It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are typically very different from what is observed in p… ▽ More

    Submitted 13 June, 2022; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: This version of the paper includes a correction to the assumptions in a technical result, Theorem 2. None of the other claims are affected by this change. The conference version of this paper does not include the correction, so we recommend to cite this arXiv version when referencing Theorem 2

  28. arXiv:2005.03185  [pdf, ps, other

    cs.DS cs.LG

    Determinantal Point Processes in Randomized Numerical Linear Algebra

    Authors: Michał Dereziński, Michael W. Mahoney

    Abstract: Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop improved algorithms for matrix problems that arise in scientific computing, data science, machine learning, etc. Determinantal Point Processes (DPPs), a seemingly unrelated topic in pure and applied mathematics, is a class of stochastic point processes with probability distribution characterized by sub-determinants of a kerne… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

  29. arXiv:2004.09079  [pdf, ps, other

    cs.DS cs.DM

    Isotropy and Log-Concave Polynomials: Accelerated Sampling and High-Precision Counting of Matroid Bases

    Authors: Nima Anari, Michał Dereziński

    Abstract: We define a notion of isotropy for discrete set distributions. If $μ$ is a distribution over subsets $S$ of a ground set $[n]$, we say that $μ$ is in isotropic position if $P[e \in S]$ is the same for all $e\in [n]$. We design a new approximate sampling algorithm that leverages isotropy for the class of distributions $μ$ that have a log-concave generating polynomial; this class includes determinan… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  30. arXiv:2002.09073  [pdf, other

    cs.LG stat.ML

    Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nyström method

    Authors: Michał Dereziński, Rajiv Khanna, Michael W. Mahoney

    Abstract: The Column Subset Selection Problem (CSSP) and the Nyström method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix… ▽ More

    Submitted 18 December, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: Minor typo corrections and clarifications; slight change in the title; moved part of the related work and background discussion to the appendix

  31. arXiv:1912.04533  [pdf, other

    cs.LG math.ST stat.ML

    Exact expressions for double descent and implicit regularization via surrogate random design

    Authors: Michał Dereziński, Feynman Liang, Michael W. Mahoney

    Abstract: Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical… ▽ More

    Submitted 18 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Minor typo corrections and clarifications; moved the proofs into the appendix

  32. arXiv:1910.11561  [pdf, other

    math.NA cs.LG

    Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling

    Authors: Mojmír Mutný, Michał Dereziński, Andreas Krause

    Abstract: We analyze the convergence rate of the randomized Newton-like method introduced by Qu et. al. (2016) for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true Hessian. The convergence analysis of the algorithm is challenging because of its complex dependence on the structure of $\bM$. However, we show that when the coordi… ▽ More

    Submitted 12 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Journal ref: AISTATS 2020

  33. arXiv:1907.03411  [pdf, other

    stat.ML cs.LG

    Unbiased estimators for random design regression

    Authors: Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

    Abstract: In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over $d$-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the sample is drawn i.i.d. from the input distribution, the least squares solution for that sample can be viewed as the natural estimator of the optimum. Unfortunatel… ▽ More

    Submitted 7 June, 2022; v1 submitted 8 July, 2019; originally announced July 2019.

  34. arXiv:1906.04133  [pdf, other

    cs.LG stat.ML

    Bayesian experimental design using regularized determinantal point processes

    Authors: Michał Dereziński, Feynman Liang, Michael W. Mahoney

    Abstract: In experimental design, we are given $n$ vectors in $d$ dimensions, and our goal is to select $k\ll n$ of them to perform expensive measurements, e.g., to obtain labels/responses, for a linear regression task. Many statistical criteria have been proposed for choosing the optimal design, with popular choices including A- and D-optimality. If prior knowledge is given, typically in the form of a… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

  35. arXiv:1905.13476  [pdf, other

    cs.LG stat.ML

    Exact sampling of determinantal point processes with sublinear time preprocessing

    Authors: Michał Dereziński, Daniele Calandriello, Michal Valko

    Abstract: We study the complexity of sampling from a distribution over all index subsets of the set $\{1,...,n\}$ with the probability of a subset $S$ proportional to the determinant of the submatrix $\mathbf{L}_S$ of some $n\times n$ p.s.d. matrix $\mathbf{L}$, where $\mathbf{L}_S$ corresponds to the entries of $\mathbf{L}$ indexed by $S$. Known as a determinantal point process, this distribution is used i… ▽ More

    Submitted 8 July, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

  36. arXiv:1905.11546  [pdf, ps, other

    cs.LG stat.ML

    Distributed estimation of the inverse Hessian by determinantal averaging

    Authors: Michał Dereziński, Michael W. Mahoney

    Abstract: In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not equal the inverse of the sum. An example of this occurs in distributed Newton's method, where we wish to compute (or implicitly work with) the inverse Hessian… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  37. arXiv:1902.00995  [pdf, ps, other

    cs.LG stat.ML

    Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

    Authors: Michał Dereziński, Kenneth L. Clarkson, Michael W. Mahoney, Manfred K. Warmuth

    Abstract: In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gauss… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

  38. arXiv:1811.03717  [pdf, ps, other

    cs.LG stat.ML

    Fast determinantal point processes via distortion-free intermediate sampling

    Authors: Michał Dereziński

    Abstract: Given a fixed $n\times d$ matrix $\mathbf{X}$, where $n\gg d$, we study the complexity of sampling from a distribution over all subsets of rows where the probability of a subset is proportional to the squared volume of the parallelepiped spanned by the rows (a.k.a. a determinantal point process). In this task, it is important to minimize the preprocessing cost of the procedure (performed once) as… ▽ More

    Submitted 21 February, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

  39. arXiv:1810.02453  [pdf, ps, other

    cs.LG stat.ML

    Correcting the bias in least squares regression with volume-rescaled sampling

    Authors: Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

    Abstract: Consider linear regression where the examples are generated by an unknown distribution on $R^d\times R$. Without any assumptions on the noise, the linear least squares solution for any i.i.d. sample will typically be biased w.r.t. the least squares optimum over the entire distribution. However, we show that if an i.i.d. sample of any size k is augmented by a certain small additional sample, then t… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

  40. arXiv:1806.01969  [pdf, other

    cs.LG stat.ML

    Reverse iterative volume sampling for linear regression

    Authors: Michał Dereziński, Manfred K. Warmuth

    Abstract: We study the following basic machine learning task: Given a fixed set of $d$-dimensional input points for a linear regression problem, we wish to predict a hidden response value for each of the points. We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset. The performance of the predictions is ev… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

  41. arXiv:1802.06749  [pdf, ps, other

    cs.LG

    Leveraged volume sampling for linear regression

    Authors: Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

    Abstract: Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a weight vector whose sum of squares loss over all points is at most $1+ε$ times the minimum. When $k$ is very small (e.g., $k=d$), jointly sampling diverse subset… ▽ More

    Submitted 5 September, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

  42. arXiv:1710.05110  [pdf, ps, other

    cs.LG

    Subsampling for Ridge Regression via Regularized Volume Sampling

    Authors: Michał Dereziński, Manfred K. Warmuth

    Abstract: Given $n$ vectors $\mathbf{x}_i\in \mathbb{R}^d$, we want to fit a linear regression model for noisy labels $y_i\in\mathbb{R}$. The ridge estimator is a classical solution to this problem. However, when labels are expensive, we are forced to select only a small subset of vectors $\mathbf{x}_i$ for which we obtain the labels $y_i$. We propose a new procedure for selecting the subset of vectors, suc… ▽ More

    Submitted 23 February, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

  43. arXiv:1705.06908  [pdf, ps, other

    cs.LG

    Unbiased estimates for linear regression via volume sampling

    Authors: Michał Dereziński, Manfred K. Warmuth

    Abstract: Given a full rank matrix $X$ with more columns than rows, consider the task of estimating the pseudo inverse $X^+$ based on the pseudo inverse of a sampled subset of columns (of size at least the number of rows). We show that this is possible if the subset of columns is chosen proportional to the squared volume spanned by the rows of the chosen submatrix (ie, volume sampling). The resulting estima… ▽ More

    Submitted 5 June, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

  44. arXiv:1704.06731  [pdf, ps, other

    cs.LG

    Batch-Expansion Training: An Efficient Optimization Framework

    Authors: Michał Dereziński, Dhruv Mahajan, S. Sathiya Keerthi, S. V. N. Vishwanathan, Markus Weimer

    Abstract: We propose Batch-Expansion Training (BET), a framework for running a batch optimizer on a gradually expanding dataset. As opposed to stochastic approaches, batches do not need to be resampled i.i.d. at every iteration, thus making BET more resource efficient in a distributed setting, and when disk-access is constrained. Moreover, BET can be easily paired with most batch optimizers, does not requir… ▽ More

    Submitted 23 February, 2018; v1 submitted 21 April, 2017; originally announced April 2017.