Skip to main content

Showing 1–22 of 22 results for author: Hanneke, S

Searching in archive math. Search in all archives.
.
  1. arXiv:2408.16189  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    Adaptive Sample Aggregation In Transfer Learning

    Authors: Steve Hanneke, Samory Kpotufe

    Abstract: Transfer Learning aims to optimally aggregate samples from a target distribution, with related samples from a so-called source distribution to improve target risk. Multiple procedures have been proposed over the last two decades to address this problem, each driven by one of a multitude of possible divergence measures between source and target distributions. A first question asked in this work is… ▽ More

    Submitted 27 April, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2407.19777  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    Revisiting Agnostic PAC Learning

    Authors: Steve Hanneke, Kasper Green Larsen, Nikita Zhivotovskiy

    Abstract: PAC learning, dating back to Valiant'84 and Vapnik and Chervonenkis'64,'74, is a classic model for studying supervised learning. In the agnostic setting, we have access to a hypothesis set $\mathcal{H}$ and a training set of labeled samples $(x_1,y_1),\dots,(x_n,y_n) \in \mathcal{X} \times \{-1,1\}$ drawn i.i.d. from an unknown distribution $\mathcal{D}$. The goal is to produce a classifier… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  3. arXiv:2407.07765  [pdf, other

    cs.LG cs.CR cs.DS math.CO stat.ML

    Ramsey Theorems for Trees and a General 'Private Learning Implies Online Learning' Theorem

    Authors: Simone Fioravanti, Steve Hanneke, Shay Moran, Hilla Schefler, Iska Tsubari

    Abstract: This work continues to investigate the link between differentially private (DP) and online learning. Alon, Livni, Malliaris, and Moran (2019) showed that for binary concept classes, DP learnability of a given class implies that it has a finite Littlestone dimension (equivalently, that it is online learnable). Their proof relies on a model-theoretic result by Hodges (1997), which demonstrates that… ▽ More

    Submitted 14 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  4. arXiv:2309.17016  [pdf, other

    cs.LG math.ST stat.ML

    Efficient Agnostic Learning with Average Smoothness

    Authors: Steve Hanneke, Aryeh Kontorovich, Guy Kornowski

    Abstract: We study distribution-free nonparametric regression following a notion of average smoothness initiated by Ashlagi et al. (2021), which measures the "effective" smoothness of a function with respect to an arbitrary unknown underlying distribution. While the recent work of Hanneke et al. (2023) established tight uniform convergence bounds for average-smooth functions in the realizable case and provi… ▽ More

    Submitted 13 February, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ALT 2024 camera ready version. arXiv admin note: text overlap with arXiv:2302.06005

  5. arXiv:2302.07186  [pdf, ps, other

    stat.ML cs.LG math.ST

    Adversarial Rewards in Universal Learning for Contextual Bandits

    Authors: Moise Blanchard, Steve Hanneke, Patrick Jaillet

    Abstract: We study the fundamental limits of learning in contextual bandits, where a learner's rewards depend on their actions and a known context, which extends the canonical multi-armed bandit to the case where side-information is available. We are interested in universally consistent algorithms, which achieve sublinear regret compared to any measurable fixed policy, without any function class restriction… ▽ More

    Submitted 12 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  6. arXiv:2302.06005  [pdf, other

    cs.LG math.ST stat.ML

    Near-optimal learning with average Hölder smoothness

    Authors: Steve Hanneke, Aryeh Kontorovich, Guy Kornowski

    Abstract: We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to Hölder smoothness. This measure of the "effective smoothness" of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic "worst-case" Hölder constant. We consider both the realizable and the agnostic (noisy) regression settings, proving… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023 camera ready version

  7. arXiv:2301.00241  [pdf, ps, other

    stat.ML cs.LG math.ST

    Contextual Bandits and Optimistically Universal Learning

    Authors: Moise Blanchard, Steve Hanneke, Patrick Jaillet

    Abstract: We consider the contextual bandit problem on general action and context spaces, where the learner's rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients' records or customers' history, which allows for personalized treatment. We focus on consistency -- vanishing regret co… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

  8. arXiv:2203.06046  [pdf, ps, other

    stat.ML cs.LG math.ST

    Universally Consistent Online Learning with Arbitrarily Dependent Responses

    Authors: Steve Hanneke

    Abstract: This work provides an online learning rule that is universally consistent under processes on (X,Y) pairs, under conditions only on the X process. As a special case, the conditions admit all processes on (X,Y) such that the process on X is stationary. This generalizes past results which required stationarity for the joint process on (X,Y), and additionally required this process to be ergodic. In pa… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  9. arXiv:2201.08903  [pdf, ps, other

    stat.ML cs.LG math.ST

    Universal Online Learning with Unbounded Losses: Memory Is All You Need

    Authors: Moise Blanchard, Romain Cosson, Steve Hanneke

    Abstract: We resolve an open problem of Hanneke on the subject of universally consistent online learning with non-i.i.d. processes and unbounded losses. The notion of an optimistically universal learning rule was defined by Hanneke in an effort to study learning theory under minimal assumptions. A given learning rule is said to be optimistically universal if it achieves a low long-run average loss whenever… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  10. arXiv:2107.09542  [pdf, ps, other

    cs.LG cs.AI math.PR math.ST stat.ML

    Open Problem: Is There an Online Learning Algorithm That Learns Whenever Online Learning Is Possible?

    Authors: Steve Hanneke

    Abstract: This open problem asks whether there exists an online learning algorithm for binary classification that guarantees, for all target concepts, to make a sublinear number of mistakes, under only the assumption that the (possibly random) sequence of points X allows that such a learning algorithm can exist for that sequence. As a secondary problem, it also asks whether a specific concise condition comp… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  11. arXiv:2011.04483  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    A Theory of Universal Learning

    Authors: Olivier Bousquet, Steve Hanneke, Shay Moran, Ramon van Handel, Amir Yehudayoff

    Abstract: How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, d… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

  12. arXiv:2006.15785  [pdf, other

    cs.LG math.ST stat.ML

    A No-Free-Lunch Theorem for MultiTask Learning

    Authors: Steve Hanneke, Samory Kpotufe

    Abstract: Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from $N$ related distributions $\{P_t\}$ are to be combined towards improving performance on any single such distribution ${\cal D}$. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from mult… ▽ More

    Submitted 5 August, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

  13. arXiv:1906.09855  [pdf, other

    cs.LG math.ST stat.ML

    Universal Bayes consistency in metric spaces

    Authors: Steve Hanneke, Aryeh Kontorovich, Sivan Sabato, Roi Weiss

    Abstract: We extend a recently proposed 1-nearest-neighbor based multiclass learning algorithm and prove that our modification is universally strongly Bayes-consistent in all metric spaces admitting any such learner, making it an "optimistically universal" Bayes-consistent learner. This is the first learning algorithm known to enjoy this property; by comparison, the $k$-NN classifier and its variants are no… ▽ More

    Submitted 6 January, 2021; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: To appear in Annals of Statistics

    Journal ref: Annals of Statistics 2021, Vol. 49, No. 4, 2129-2150, August 2021

  14. arXiv:1810.01864  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Agnostic Sample Compression Schemes for Regression

    Authors: Idan Attias, Steve Hanneke, Aryeh Kontorovich, Menachem Sadigurschi

    Abstract: We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression… ▽ More

    Submitted 3 February, 2024; v1 submitted 3 October, 2018; originally announced October 2018.

    Comments: New results in this version: (1) Approximate agnostic sample compression scheme for function classes with finite fat-shattering dimension and the $\ell_p$ loss (section 3), (2) Near-optimal approximate compression for linear functions and the $\ell_p$ loss (section 4.1) The results in sections 4.2 and 4.3 appear in the previous version

  15. arXiv:1805.08140  [pdf, ps, other

    cs.LG math.ST stat.ML

    A New Lower Bound for Agnostic Learning with Sample Compression Schemes

    Authors: Steve Hanneke, Aryeh Kontorovich

    Abstract: We establish a tight characterization of the worst-case rates for the excess risk of agnostic learning with sample compression schemes and for uniform convergence for agnostic sample compression schemes. In particular, we find that the optimal rates of convergence for size-$k$ agnostic sample compression schemes are of the form $\sqrt{\frac{k \log(n/k)}{n}}$, which contrasts with agnostic learning… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

  16. arXiv:1706.01418  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes

    Authors: Steve Hanneke

    Abstract: This work initiates a general study of learning and generalization without the i.i.d. assumption, starting from first principles. While the traditional approach to statistical learning theory typically relies on standard assumptions from probability theory (e.g., i.i.d. or stationary ergodic), in this work we are interested in developing a theory of learning based only on the most fundamental and… ▽ More

    Submitted 20 October, 2020; v1 submitted 5 June, 2017; originally announced June 2017.

  17. arXiv:1606.00922  [pdf, ps, other

    math.ST

    Localization of VC Classes: Beyond Local Rademacher Complexities

    Authors: Nikita Zhivotovskiy, Steve Hanneke

    Abstract: In this paper we introduce an alternative localization approach for binary classification that leads to a novel complexity measure: fixed points of the local empirical entropy. We show that this complexity measure gives a tight control over complexity in the upper bounds. Our results are accompanied by a novel minimax lower bound that involves the same quantity. In particular, we practically answe… ▽ More

    Submitted 17 December, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: 28 pages, accepted version

  18. arXiv:1512.07146  [pdf, ps, other

    cs.LG math.ST stat.ML

    Refined Error Bounds for Several Learning Algorithms

    Authors: Steve Hanneke

    Abstract: This article studies the achievable guarantees on the error rates of certain learning algorithms, with particular focus on refining logarithmic factors. Many of the results are based on a general technique for obtaining bounds on the error rates of sample-consistent classifiers with monotonic error regions, in the realizable case. We prove bounds of this type expressed in terms of either the VC di… ▽ More

    Submitted 10 September, 2016; v1 submitted 22 December, 2015; originally announced December 2015.

    Journal ref: Journal of Machine Learning Research, Vol. 17 (2016), No. 135, pp. 1-55

  19. arXiv:1410.0996  [pdf, ps, other

    cs.LG math.ST stat.ML

    Minimax Analysis of Active Learning

    Authors: Steve Hanneke, Liu Yang

    Abstract: This work establishes distribution-free upper and lower bounds on the minimax label complexity of active learning with general hypothesis classes, under various noise models. The results reveal a number of surprising facts. In particular, under the noise model of Tsybakov (2004), the minimax label complexity of active learning with a VC class is always asymptotically smaller than that of passive l… ▽ More

    Submitted 3 October, 2014; originally announced October 2014.

  20. arXiv:1207.3772  [pdf, ps, other

    math.ST cs.LG stat.ML

    Surrogate Losses in Passive and Active Learning

    Authors: Steve Hanneke, Liu Yang

    Abstract: Active learning is a type of sequential design for supervised machine learning, in which the learning algorithm sequentially requests the labels of selected instances from a large pool of unlabeled data points. The objective is to produce a classifier of relatively low risk, as measured under the 0-1 loss, ideally using fewer label requests than the number of random labeled data points sufficient… ▽ More

    Submitted 13 November, 2019; v1 submitted 16 July, 2012; originally announced July 2012.

    Journal ref: Electronic Journal of Statistics, Volume 13, Number 2 (2019), 4646-4708

  21. arXiv:1108.1766  [pdf, ps, other

    stat.ML cs.LG math.ST

    Activized Learning: Transforming Passive to Active with Improved Label Complexity

    Authors: Steve Hanneke

    Abstract: We study the theoretical advantages of active learning over passive learning. Specifically, we prove that, in noise-free classifier learning for VC classes, any passive learning algorithm can be transformed into an active learning algorithm with asymptotically strictly superior label complexity for all nontrivial target functions and distributions. We further provide a general characterization of… ▽ More

    Submitted 8 August, 2011; originally announced August 2011.

  22. Rates of convergence in active learning

    Authors: Steve Hanneke

    Abstract: We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the general problem of model selection for active learning with a nested hierarchy of hypothesis classes and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to b… ▽ More

    Submitted 9 March, 2011; originally announced March 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOS843 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS843

    Journal ref: Annals of Statistics 2011, Vol. 39, No. 1, 333-361