Skip to main content

Showing 1–17 of 17 results for author: Holland, M J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.09802  [pdf, other

    stat.ML cs.LG

    Criterion Collapse and Loss Distribution Control

    Authors: Matthew J. Holland

    Abstract: In this work, we consider the notion of "criterion collapse," in which optimization of one metric implies optimality in another, with a particular focus on conditions for collapse into error probability minimizers under a wide variety of learning criteria, ranging from DRO and OCE risks (CVaR, tilted ERM) to non-monotonic criteria underlying recent ascent-descent algorithms explored in the literat… ▽ More

    Submitted 21 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Revised version accepted to ICML 2024

  2. arXiv:2310.10006  [pdf, other

    stat.ML cs.LG

    Soft ascent-descent as a stable and flexible alternative to flooding

    Authors: Matthew J. Holland, Kosuke Nakatani

    Abstract: As a heuristic for improving test accuracy in classification, the "flooding" method proposed by Ishida et al. (2020) sets a threshold for the average surrogate loss at training time; above the threshold, gradient descent is run as usual, but below the threshold, a switch to gradient ascent is made. While setting the threshold is non-trivial and is usually done with validation data, this simple tec… ▽ More

    Submitted 21 October, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: Revised version accepted to NeurIPS 2024

  3. arXiv:2301.11584  [pdf, other

    stat.ML cs.LG

    Robust variance-regularized risk minimization with concomitant scaling

    Authors: Matthew J. Holland

    Abstract: Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation, without trying to accurately estimate the variance. By modifying a technique for variance-free robust mean estimation to fit our problem setting, we derive a simple learning procedure which can be easily combined with standard gradient-based solvers to be used in tradit… ▽ More

    Submitted 8 February, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Revised version accepted to AISTATS 2024

  4. arXiv:2203.14434  [pdf, other

    stat.ML cs.LG

    Flexible risk design using bi-directional dispersion

    Authors: Matthew J. Holland

    Abstract: Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been proposed and studied, but these risks are all at least as sensitive as the mean to loss tails on the upside, and tend to ignore deviations on the downside. We study a complementary new risk class that penalizes loss deviations in a bi-directional manner, while having more flexibility in terms of tail sensitivity than is off… ▽ More

    Submitted 16 February, 2023; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: Final revision, just minor typos corrected for camera-ready at AISTATS 2023

  5. A Survey of Learning Criteria Going Beyond the Usual Risk

    Authors: Matthew J. Holland, Kazuki Tanabe

    Abstract: Virtually all machine learning tasks are characterized using some form of loss function, and "good performance" is typically stated in terms of a sufficiently small average loss, taken over the random draw of test data. While optimizing for performance on average is intuitive, convenient to analyze in theory, and easy to implement in practice, such a choice brings about trade-offs. In this work, w… ▽ More

    Submitted 29 November, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Final version published in JAIR

    Journal ref: Journal of Artificial Intelligence Research, 78:781-821, 2023

  6. Robust learning with anytime-guaranteed feedback

    Authors: Matthew J. Holland

    Abstract: Under data distributions which may be heavy-tailed, many stochastic gradient-based learning algorithms are driven by feedback queried at points with almost no performance guarantees on their own. Here we explore a modified "anytime online-to-batch" mechanism which for smooth objectives admits high-probability error bounds while requiring only lower-order moment bounds on the stochastic gradients.… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6918-6925, 2022

  7. arXiv:2105.04816  [pdf, other

    stat.ML cs.LG

    Spectral risk-based learning using unbounded losses

    Authors: Matthew J. Holland, El Mehdi Haress

    Abstract: In this work, we consider the setting of learning problems under a wide class of spectral risk (or "L-risk") functions, where a Lipschitz-continuous spectral density is used to flexibly assign weight to extreme loss values. We obtain excess risk guarantees for a derivative-free learning procedure under unbounded heavy-tailed loss distributions, and propose a computationally efficient implementatio… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  8. arXiv:2012.07346  [pdf, other

    stat.ML cs.LG

    Better scalability under potentially heavy-tailed feedback

    Authors: Matthew J. Holland

    Abstract: We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we instead focus computational effort… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: This work merges arXiv:2006.00784 and arXiv:2006.01364, providing additional empirical analysis using real-world benchmark datasets

  9. Learning with risks based on M-location

    Authors: Matthew J. Holland

    Abstract: In this work, we study a new class of risks defined in terms of the location and deviation of the loss distribution, generalizing far beyond classical mean-variance risk functions. The class is easily implemented as a wrapper around any smooth loss, it admits finite-sample stationarity guarantees for stochastic gradient methods, it is straightforward to interpret and adjust, with close links to M-… ▽ More

    Submitted 25 April, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: Substantial update to initial version; refined theory, improved exposition, added experimental analysis

    Journal ref: Machine Learning, 111:4679-4718, 2022

  10. arXiv:2007.04486  [pdf, other

    stat.ML cs.LG

    Making learning more transparent using conformalized performance prediction

    Authors: Matthew J. Holland

    Abstract: In this work, we study some novel applications of conformal inference techniques to the problem of providing machine learning procedures with more transparent, accurate, and practical performance guarantees. We provide a natural extension of the traditional conformal prediction framework, done in such a way that we can make valid and well-calibrated predictive statements about the future performan… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  11. arXiv:2006.02001  [pdf, other

    stat.ML cs.LG

    Learning with CVaR-based feedback under potentially heavy tails

    Authors: Matthew J. Holland, El Mehdi Haress

    Abstract: We study learning algorithms that seek to minimize the conditional value-at-risk (CVaR), when all the learner knows is that the losses incurred may be heavy-tailed. We begin by studying a general-purpose estimator of CVaR for potentially heavy-tailed random variables, which is easy to implement in practice, and requires nothing more than finite variance and a distribution function that does not ch… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

  12. arXiv:2006.01364   

    stat.ML cs.LG

    Improved scalability under heavy tails, without strong convexity

    Authors: Matthew J. Holland

    Abstract: Real-world data is laden with outlying values. The challenge for machine learning is that the learner typically has no prior knowledge of whether the feedback it receives (losses, gradients, etc.) will be heavy-tailed or not. In this work, we study a simple algorithmic strategy that can be leveraged when both losses and gradients can be heavy-tailed. The core technique introduces a simple robust v… ▽ More

    Submitted 14 December, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This paper has been superseded by arXiv:2012.07346 (a merge and extension of this article and arXiv:2006.00784)

  13. arXiv:2006.00784   

    stat.ML cs.LG

    Better scalability under potentially heavy-tailed gradients

    Authors: Matthew J. Holland

    Abstract: We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we choose a candidate which does not diverge too fa… ▽ More

    Submitted 14 December, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This paper has been superseded by arXiv:2012.07346 (a merge and extension of this article and arXiv:2006.01364)

  14. arXiv:1905.07900  [pdf, other

    stat.ML cs.LG

    PAC-Bayes under potentially heavy tails

    Authors: Matthew J. Holland

    Abstract: We derive PAC-Bayesian learning guarantees for heavy-tailed losses, and obtain a novel optimal Gibbs posterior which enjoys finite-sample excess risk bounds at logarithmic confidence. Our core technique itself makes use of PAC-Bayesian inequalities in order to derive a robust risk estimator, which by design is easy to compute. In particular, only assuming that the first three moments of the loss d… ▽ More

    Submitted 18 December, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

  15. arXiv:1810.06207  [pdf, other

    stat.ML cs.LG

    Robust descent using smoothed multiplicative noise

    Authors: Matthew J. Holland

    Abstract: To improve the off-sample generalization of classical procedures minimizing the empirical risk under potentially heavy-tailed data, new robust learning algorithms have been proposed in recent years, with generalized median-of-means strategies being particularly salient. These procedures enjoy performance guarantees in the form of sharp risk bounds under weak moment assumptions on the underlying lo… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

  16. arXiv:1810.04863  [pdf, other

    stat.ML cs.LG

    Classification using margin pursuit

    Authors: Matthew J. Holland

    Abstract: In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers. The classical approach to this problem is simply maximization of the expected margin, while more recent proposals consider simultaneous variance control and proxy objectives based on robust location estimates, in the vein of keeping the margin distribution sharply concentrated in a desirable… ▽ More

    Submitted 11 October, 2018; originally announced October 2018.

  17. arXiv:1706.00182  [pdf, other

    stat.ML

    Efficient learning with robust gradient descent

    Authors: Matthew J. Holland, Kazushi Ikeda

    Abstract: Minimizing the empirical risk is a popular training strategy, but for learning tasks where the data may be noisy or heavy-tailed, one may require many observations in order to generalize well. To achieve better performance under less stringent requirements, we introduce a procedure which constructs a robust approximation of the risk gradient for use in an iterative learning routine. Using high-pro… ▽ More

    Submitted 14 October, 2018; v1 submitted 1 June, 2017; originally announced June 2017.