Data Selection for ERMs
Authors:
Steve Hanneke,
Shay Moran,
Alexander Shlimovich,
Amir Yehudayoff
Abstract:
Learning theory has traditionally followed a model-centric approach, focusing on designing optimal algorithms for a fixed natural learning task (e.g., linear classification or regression). In this paper, we adopt a complementary data-centric perspective, whereby we fix a natural learning rule and focus on optimizing the training data. Specifically, we study the following question: given a learning…
▽ More
Learning theory has traditionally followed a model-centric approach, focusing on designing optimal algorithms for a fixed natural learning task (e.g., linear classification or regression). In this paper, we adopt a complementary data-centric perspective, whereby we fix a natural learning rule and focus on optimizing the training data. Specifically, we study the following question: given a learning rule $\mathcal{A}$ and a data selection budget $n$, how well can $\mathcal{A}$ perform when trained on at most $n$ data points selected from a population of $N$ points? We investigate when it is possible to select $n \ll N$ points and achieve performance comparable to training on the entire population.
We address this question across a variety of empirical risk minimizers. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression. Additionally, we establish two general results: a taxonomy of error rates in binary classification and in stochastic convex optimization. Finally, we propose several open questions and directions for future research.
△ Less
Submitted 25 April, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
Intuitive norms are Euclidean
Authors:
Shay Moran,
Alexander Shlimovich,
Amir Yehudayoff
Abstract:
We call a norm on $\mathbb{R}^n$ intuitive if for every points $p_1,\ldots,p_m$ in $\mathbb{R}^n$, one of the geometric medians of the points over the norm is in their convex hull. We characterize all intuitive norms.
We call a norm on $\mathbb{R}^n$ intuitive if for every points $p_1,\ldots,p_m$ in $\mathbb{R}^n$, one of the geometric medians of the points over the norm is in their convex hull. We characterize all intuitive norms.
△ Less
Submitted 7 January, 2025; v1 submitted 5 January, 2025;
originally announced January 2025.