-
On the probability of linear separability through intrinsic volumes
Authors:
Felix Kuchelmeister
Abstract:
A dataset with two labels is linearly separable if it can be split into its two classes with a hyperplane. This inflicts a curse on some statistical tools (such as logistic regression) but forms a blessing for others (e.g. support vector machines). Recently, the following question has regained interest: What is the probability that the data are linearly separable?
We provide a formula for the pr…
▽ More
A dataset with two labels is linearly separable if it can be split into its two classes with a hyperplane. This inflicts a curse on some statistical tools (such as logistic regression) but forms a blessing for others (e.g. support vector machines). Recently, the following question has regained interest: What is the probability that the data are linearly separable?
We provide a formula for the probability of linear separability for Gaussian features and labels depending only on one marginal of the features (as in generalized linear models). In this setting, we derive an upper bound that complements the recent result by Hayakawa, Lyons, and Oberhauser [2023], and a sharp upper bound for sign-flip noise.
To prove our results, we exploit that this probability can be expressed as a sum of the intrinsic volumes of a polyhedral cone of the form $\text{span}\{v\}\oplus[0,\infty)^n$, as shown in Candès and Sur [2020]. After providing the inequality description for this cone, and an algorithm to project onto it, we calculate its intrinsic volumes. In doing so, we encounter Youden's demon problem, for which we provide a formula following Kabluchko and Zaporozhets [2020]. The key insight of this work is the following: The number of correctly labeled observations in the data affects the structure of this polyhedral cone, allowing the translation of insights from geometry into statistics.
△ Less
Submitted 10 October, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Finite sample rates for logistic regression with small noise or few samples
Authors:
Felix Kuchelmeister,
Sara van de Geer
Abstract:
The logistic regression estimator is known to inflate the magnitude of its coefficients if the sample size $n$ is small, the dimension $p$ is (moderately) large or the signal-to-noise ratio $1/σ$ is large (probabilities of observing a label are close to 0 or 1). With this in mind, we study the logistic regression estimator with $p\ll n/\log n$, assuming Gaussian covariates and labels generated by…
▽ More
The logistic regression estimator is known to inflate the magnitude of its coefficients if the sample size $n$ is small, the dimension $p$ is (moderately) large or the signal-to-noise ratio $1/σ$ is large (probabilities of observing a label are close to 0 or 1). With this in mind, we study the logistic regression estimator with $p\ll n/\log n$, assuming Gaussian covariates and labels generated by the Gaussian link function, with a mild optimization constraint on the estimator's length to ensure existence. We provide finite sample guarantees for its direction, which serves as a classifier, and its Euclidean norm, which is an estimator for the signal-to-noise ratio. We distinguish between two regimes. In the low-noise/small-sample regime ($σ\lesssim (p\log n)/n$), we show that the estimator's direction (and consequentially the classification error) achieve the rate $(p\log n)/n$ - up to the log term as if the problem was noiseless. In this case, the norm of the estimator is at least of order $n/(p\log n)$. If instead $(p\log n)/n\lesssim σ\lesssim 1$, the estimator's direction achieves the rate $\sqrt{σp\log n/n}$, whereas its norm converges to the true norm at the rate $\sqrt{p\log n/(nσ^3)}$. As a corollary, the data are not linearly separable with high probability in this regime. In either regime, logistic regression provides a competitive classifier.
△ Less
Submitted 29 February, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
AdaBoost and robust one-bit compressed sensing
Authors:
Geoffrey Chinot,
Felix Kuchelmeister,
Matthias Löffler,
Sara van de Geer
Abstract:
This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed fea…
▽ More
This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed feature distributions, requiring only a weak moment assumption and an anti-concentration condition. Improved convergence rates are shown when the features satisfy a small deviation lower bound. In particular, the results provide an explanation why interpolating adversarial noise can be harmless for classification problems. Simulations illustrate the presented theory.
△ Less
Submitted 8 December, 2021; v1 submitted 5 May, 2021;
originally announced May 2021.