Skip to main content

Showing 1–29 of 29 results for author: Nock, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.02279  [pdf, other

    cs.LG stat.ML

    How to Boost Any Loss Function

    Authors: Richard Nock, Yishay Mansour

    Abstract: Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first orde… ▽ More

    Submitted 14 November, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: NeurIPS'24

    ACM Class: I.2.6

  2. arXiv:2402.11039  [pdf, other

    cs.LG stat.ML

    Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

    Authors: Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar

    Abstract: Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla em… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Generalized Gaussian assumption

  3. arXiv:2311.13459  [pdf, other

    cs.LG stat.ML

    The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

    Authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

    Abstract: Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities. Calculus on TEMs relies on a deformed algebra of arithmetic operators induced by the deformed logarithms used to define the tempered entropy. In this work, we… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  4. arXiv:2306.05487  [pdf, ps, other

    cs.LG stat.ML

    Boosting with Tempered Exponential Measures

    Authors: Richard Nock, Ehsan Amid, Manfred K. Warmuth

    Abstract: One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher probabilities. We generalize this setup to the recently introduced {\it tempered exponential measure}s (TEMs) where normalization is enforced on a specific power… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    ACM Class: I.2.6

  5. arXiv:2301.11695  [pdf, other

    stat.ML cs.LG

    LegendreTron: Uprising Proper Multiclass Loss Learning

    Authors: Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock

    Abstract: Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitti… ▽ More

    Submitted 28 November, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted at the 40th International Conference on Machine Learning (ICML 2023)

  6. arXiv:2201.12947  [pdf, other

    stat.ML cs.LG

    Fair Wrapping for Black-box Predictions

    Authors: Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

    Abstract: We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an $α$-tree, which modifies the prediction. We p… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Published in Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  7. arXiv:2012.00188  [pdf, other

    stat.ML cs.LG

    Fair Densities via Boosting the Sufficient Statistics of Exponential Families

    Authors: Alexander Soen, Hisham Husain, Richard Nock

    Abstract: We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned d… ▽ More

    Submitted 15 August, 2023; v1 submitted 30 November, 2020; originally announced December 2020.

    Comments: Published in Proceedings of the 40th International Conference on Machine Learning (ICML2023)

  8. arXiv:2006.04633  [pdf, other

    cs.LG stat.ML

    All your loss are belong to Bayes

    Authors: Christian Walder, Richard Nock

    Abstract: Loss functions are a cornerstone of machine learning and the starting point of most algorithms. Statistics and Bayesian decision theory have contributed, via properness, to elicit over the past decades a wide set of admissible losses in supervised learning, to which most popular choices belong (logistic, square, Matsushita, etc.). Rather than making a potentially biased ad hoc choice of the loss,… ▽ More

    Submitted 5 November, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

  9. arXiv:2002.04197  [pdf, other

    cs.LG stat.ML

    Generalised Lipschitz Regularisation Equals Distributional Robustness

    Authors: Zac Cranko, Zhan Shi, Xinhua Zhang, Richard Nock, Simon Kornblith

    Abstract: The problem of adversarial examples has highlighted the need for a theory of regularisation that is general enough to apply to exotic function classes, such as universal approximators. In response, we give a very general equality result regarding the relationship between distributional robustness and regularisation, as defined with a transportation cost uncertainty set. The theory allows us to (ti… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

  10. arXiv:2002.03555  [pdf, other

    cs.LG stat.ML

    Supervised Learning: No Loss No Cry

    Authors: Richard Nock, Aditya Krishna Menon

    Abstract: Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss funct… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    ACM Class: I.2.6

  11. arXiv:2001.09384  [pdf, other

    cs.LG stat.ML

    Boosted and Differentially Private Ensembles of Decision Trees

    Authors: Richard Nock, Wilko Henecka

    Abstract: Boosted ensemble of decision tree (DT) classifiers are extremely popular in international competitions, yet to our knowledge nothing is formally known on how to make them \textit{also} differential private (DP), up to the point that random forests currently reign supreme in the DP stage. Our paper starts with the proof that the privacy vs boosting picture for DT involves a notable and general tech… ▽ More

    Submitted 3 February, 2020; v1 submitted 25 January, 2020; originally announced January 2020.

    ACM Class: I.2.6

  12. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  13. arXiv:1902.06881   

    cs.LG stat.ML

    Proper-Composite Loss Functions in Arbitrary Dimensions

    Authors: Zac Cranko, Robert C. Williamson, Richard Nock

    Abstract: The study of a machine learning problem is in many ways is difficult to separate from the study of the loss function being used. One avenue of inquiry has been to look at these loss functions in terms of their properties as scoring rules via the proper-composite representation, in which predictions are mapped to probability distributions which are then scored via a scoring rule. However, recent re… ▽ More

    Submitted 1 September, 2022; v1 submitted 18 February, 2019; originally announced February 2019.

    Comments: Oh there are just some simple mistakes in this

  14. arXiv:1902.00985  [pdf, ps, other

    stat.ML cs.LG

    Adversarial Networks and Autoencoders: The Primal-Dual Relationship and Generalization Bounds

    Authors: Hisham Husain, Richard Nock, Robert C. Williamson

    Abstract: Since the introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAE), the literature on generative modelling has witnessed an overwhelming resurgence. The impressive, yet elusive empirical performance of GANs has lead to the rise of many GAN-VAE hybrids, with the hopes of GAN level performance and additional benefits of VAE, such as an encoder for feature reduction,… ▽ More

    Submitted 26 April, 2019; v1 submitted 3 February, 2019; originally announced February 2019.

  15. arXiv:1901.11311  [pdf, other

    cs.LG stat.ML

    New Tricks for Estimating Gradients of Expectations

    Authors: Christian J. Walder, Paul Roussel, Richard Nock, Cheng Soon Ong, Masashi Sugiyama

    Abstract: We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples. The simplest example of our new estimator, dubbed the fundamental trick estimator, is shown to arise from either a) introducing and approximating an integral representation based on the fundamental theorem… ▽ More

    Submitted 19 April, 2022; v1 submitted 31 January, 2019; originally announced January 2019.

  16. The Bregman chord divergence

    Authors: Frank Nielsen, Richard Nock

    Abstract: Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing. However selecting the most appropriate distance for a given task is an endeavor. Instead of testing one by one the entries of an ever-expanding dictionary of {\em ad hoc} distances, one rather prefers to consider parametric classes of distances that are… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: 10 pages

    Journal ref: GSI 2019: Geometric Science of Information pp 299-308

  17. arXiv:1809.01129  [pdf, ps, other

    stat.ML cs.LG

    Lipschitz Networks and Distributional Robustness

    Authors: Zac Cranko, Simon Kornblith, Zhan Shi, Richard Nock

    Abstract: Robust risk minimisation has several advantages: it has been studied with regards to improving the generalisation properties of models and robustness to adversarial perturbation. We bound the distributionally robust risk for a model class rich enough to include deep neural networks by a regularised empirical risk involving the Lipschitz constant of the model. This allows us to interpretand quantif… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

  18. arXiv:1809.00175  [pdf, other

    stat.ML cs.LG

    Hyperparameter Learning for Conditional Kernel Mean Embeddings with Rademacher Complexity Bounds

    Authors: Kelvin Hsu, Richard Nock, Fabio Ramos

    Abstract: Conditional kernel mean embeddings are nonparametric models that encode conditional expectations in a reproducing kernel Hilbert space. While they provide a flexible and powerful framework for probabilistic inference, their performance is highly dependent on the choice of kernel and regularization hyperparameters. Nevertheless, current hyperparameter tuning methods predominantly rely on expensive… ▽ More

    Submitted 7 November, 2018; v1 submitted 1 September, 2018; originally announced September 2018.

    Comments: Best Student Machine Learning Paper Award Winner at ECML-PKDD 2018 (European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases)

  19. arXiv:1806.04819  [pdf, other

    stat.ML cs.LG

    Integral Privacy for Sampling

    Authors: Hisham Husain, Zac Cranko, Richard Nock

    Abstract: Differential privacy is a leading protection setting, focused by design on individual privacy. Many applications, in medical / pharmaceutical domains or social networks, rather posit privacy at a group level, a setting we call integral privacy. We aim for the strongest form of privacy: the group size is in particular not known in advance. We study a problem with related applications in domains cit… ▽ More

    Submitted 2 July, 2019; v1 submitted 12 June, 2018; originally announced June 2018.

  20. arXiv:1806.02977  [pdf, other

    cs.LG stat.ML

    Monge blunts Bayes: Hardness Results for Adversarial Training

    Authors: Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

    Abstract: The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learnin… ▽ More

    Submitted 7 May, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    ACM Class: I.2.6

  21. arXiv:1803.08178  [pdf, other

    cs.LG cs.IT stat.ML

    Boosted Density Estimation Remastered

    Authors: Zac Cranko, Richard Nock

    Abstract: There has recently been a steady increase in the number iterative approaches to density estimation. However, an accompanying burst of formal convergence guarantees has not followed; all results pay the price of heavy assumptions which are often unrealistic or hard to check. The Generative Adversarial Network (GAN) literature --- seemingly orthogonal to the aforementioned pursuit --- has had the si… ▽ More

    Submitted 17 June, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

    Comments: Contains lots of essential info

  22. arXiv:1707.04385  [pdf, other

    cs.LG stat.ML

    f-GANs in an Information Geometric Nutshell

    Authors: Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

    Abstract: Nowozin \textit{et al} showed last year how to extend the GAN \textit{principle} to all $f$-divergences. The approach is elegant but falls short of a full description of the supervised game, and says little about the key player, the generator: for example, what does the generator actually converge to if solving the GAN game means convergence in some space of parameters? How does that provide hints… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

    ACM Class: I.2.6; I.5.1

  23. arXiv:1702.08530  [pdf, ps, other

    cs.LG stat.ML

    Semi-parametric Network Structure Discovery Models

    Authors: Amir Dezfouli, Edwin V. Bonilla, Richard Nock

    Abstract: We propose a network structure discovery model for continuous observations that generalizes linear causal models by incorporating a Gaussian process (GP) prior on a network-independent component, and random sparsity and weight matrices as the network-dependent parameters. This approach provides flexible modeling of network-independent trends in the observations as well as uncertainty quantificatio… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

    ACM Class: I.2.6; I.5.1

  24. arXiv:1609.03683  [pdf, other

    stat.ML cs.LG

    Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach

    Authors: Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, Lizhen Qu

    Abstract: We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise. We propose two procedures for loss correction that are agnostic to both application domain and network architecture. They simply amount to at most a matrix inversion and multiplication, provided that we know the probability of each class being corrupted… ▽ More

    Submitted 22 March, 2017; v1 submitted 13 September, 2016; originally announced September 2016.

    Comments: Oral paper at CVPR 2017

  25. arXiv:1607.00360  [pdf, other

    cs.LG stat.ML

    A scaled Bregman theorem with applications

    Authors: Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

    Abstract: Bregman divergences play a central role in the design and analysis of a range of machine learning algorithms. This paper explores the use of Bregman divergences to establish reductions between such algorithms and their analyses. We present a new scaled isodistortion theorem involving Bregman divergences (scaled Bregman theorem for short) which shows that certain "Bregman distortions'" (employing a… ▽ More

    Submitted 1 July, 2016; originally announced July 2016.

  26. arXiv:1606.04160  [pdf, ps, other

    cs.LG stat.ML

    The Crossover Process: Learnability and Data Protection from Inference Attacks

    Authors: Richard Nock, Giorgio Patrini, Finnian Lattimore, Tiberio Caetano

    Abstract: It is usual to consider data protection and learnability as conflicting objectives. This is not always the case: we show how to jointly control inference --- seen as the attack --- and learnability by a noise-free process that mixes training examples, the Crossover Process (cp). One key point is that the cp~is typically able to alter joint distributions without touching on marginals, nor altering… ▽ More

    Submitted 7 March, 2017; v1 submitted 13 June, 2016; originally announced June 2016.

    ACM Class: I.2.6; K.4.1

  27. arXiv:1602.02450  [pdf, ps, other

    cs.LG stat.ML

    Loss factorization, weakly supervised learning and label noise robustness

    Authors: Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

    Abstract: We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator --the focal quantity of this work-- which we characterize as the sufficient statistic… ▽ More

    Submitted 9 February, 2016; v1 submitted 7 February, 2016; originally announced February 2016.

  28. arXiv:1406.6314  [pdf, other

    cs.LG cs.CV cs.IR stat.ML

    Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

    Authors: Frank Nielsen, Richard Nock

    Abstract: Finding the optimal $k$-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the $k$-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when… ▽ More

    Submitted 22 June, 2014; originally announced June 2014.

    Comments: 14 pages

  29. arXiv:1301.3891  [pdf

    cs.LG stat.ML

    Combining Feature and Prototype Pruning by Uncertainty Minimization

    Authors: Marc Sebban, Richard Nock

    Abstract: We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms. While this certifying is theoretically justified by the fact that each subproblem is NP-hard, we assume in this paper that a joint storage reduction is in fact mo… ▽ More

    Submitted 16 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

    Report number: UAI-P-2000-PG-533-540