Skip to main content

Showing 1–18 of 18 results for author: Arlot, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.12567  [pdf, other

    math.ST stat.ML

    Marginal and training-conditional guarantees in one-shot federated conformal prediction

    Authors: Pierre Humbert, Batiste Le Bars, Aurélien Bellet, Sylvain Arlot

    Abstract: We study conformal prediction in the one-shot federated learning setting. The main goal is to compute marginally and training-conditionally valid prediction sets, at the server-level, in only one round of communication between the agents and the server. Using the quantile-of-quantiles family of estimators and split conformal prediction, we introduce a collection of computationally-efficient and di… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  2. arXiv:2302.06322  [pdf, other

    stat.ML

    One-Shot Federated Conformal Prediction

    Authors: Pierre Humbert, Batiste Le Bars, Aurélien Bellet, Sylvain Arlot

    Abstract: In this paper, we introduce a conformal prediction method to construct prediction sets in a oneshot federated learning setting. More specifically, we define a quantile-of-quantiles estimator and prove that for any distribution, it is possible to output prediction sets with desired coverage in only one round of communication. To mitigate privacy issues, we also describe a locally differentially pri… ▽ More

    Submitted 31 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  3. arXiv:2205.14613  [pdf, other

    stat.ML cs.LG math.ST

    A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension

    Authors: Binh T. Nguyen, Bertrand Thirion, Sylvain Arlot

    Abstract: Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still lacks a good solution for accurate inference in the regime where the number of features $p$ is as large as or larger than the number of samples $n$. Here, we t… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

  4. arXiv:2011.11117  [pdf, other

    stat.ML cs.LG

    Online Orthogonal Matching Pursuit

    Authors: El Mehdi Saad, Gilles Blanchard, Sylvain Arlot

    Abstract: Greedy algorithms for feature selection are widely used for recovering sparse high-dimensional vectors in linear models. In classical procedures, the main emphasis was put on the sample complexity, with little or no consideration of the computation resources required. We present a novel online algorithm: Online Orthogonal Matching Pursuit (OOMP) for online support recovery in the random design set… ▽ More

    Submitted 10 February, 2021; v1 submitted 22 November, 2020; originally announced November 2020.

  5. arXiv:2002.09269  [pdf, other

    math.ST stat.AP stat.ME stat.ML

    Aggregation of Multiple Knockoffs

    Authors: Tuan-Binh Nguyen, Jérôme-Alexis Chevalier, Bertrand Thirion, Sylvain Arlot

    Abstract: We develop an extension of the Knockoff Inference procedure, introduced by Barber and Candes (2015). This new method, called Aggregation of Multiple Knockoffs (AKO), addresses the instability inherent to the random nature of Knockoff-based inference. Specifically, AKO improves both the stability and power compared with the original Knockoff algorithm while still maintaining guarantees for False Di… ▽ More

    Submitted 25 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2020 (Thirty-seventh International Conference on Machine Learning). This version includes both the main text of the conference paper and supplementary materials (as appendices). 35 pages, 7 figures

  6. arXiv:1909.13499  [pdf, ps, other

    math.ST stat.ME stat.ML

    Rejoinder on: Minimal penalties and the slope heuristics: a survey

    Authors: Sylvain Arlot

    Abstract: This text is the rejoinder following the discussion of a survey paper about minimal penalties and the slope heuristics (Arlot, 2019. Minimal penalties and the slope heuristics: a survey. Journal de la SFDS). While commenting on the remarks made by the discussants, it provides two new results about the slope heuristics for model selection among a collection of projection estimators in least-squares… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Journal ref: Journal de la Societe Fran{\c c}aise de Statistique, Societe Fran{\c c}aise de Statistique et Societe Mathematique de France, Vol 106, No.3, 158-168. 2019

  7. arXiv:1909.04890  [pdf, ps, other

    math.ST stat.ME stat.ML

    Aggregated Hold-Out

    Authors: Guillaume Maillard, Sylvain Arlot, Matthieu Lerasle

    Abstract: Aggregated hold-out (Agghoo) is a method which averages learning rules selected by hold-out (that is, cross-validation with a single split). We provide the first theoretical guarantees on Agghoo, ensuring that it can be used safely: Agghoo performs at worst like the hold-out when the risk is convex. The same holds true in classification with the 0-1 risk, with an additional constant factor. For th… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

  8. arXiv:1901.07277  [pdf, other

    math.ST stat.ME stat.ML

    Minimal penalties and the slope heuristics: a survey

    Authors: Sylvain Arlot

    Abstract: Birg{é} and Massart proposed in 2001 the slope heuristics as a way to choose optimally from data an unknown multiplicative constant in front of a penalty. It is built upon the notion of minimal penalty, and it has been generalized since to some "minimal-penalty algorithms". This paper reviews the theoretical results obtained for such algorithms, with a self-contained proof in the simplest framewor… ▽ More

    Submitted 25 October, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

    Journal ref: Journal de la Societe Fran{\c c}aise de Statistique, Societe Fran{\c c}aise de Statistique et Societe Mathematique de France, 2019, Minimal penalties and the slope heuristics: a survey, 160 (3), pp.1-106

  9. arXiv:1703.03167  [pdf, other

    math.ST stat.ML

    Cross-validation

    Authors: Sylvain Arlot

    Abstract: This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. For estimator selection… ▽ More

    Submitted 9 March, 2017; originally announced March 2017.

    Comments: in French

  10. arXiv:1604.01515  [pdf, other

    math.ST stat.ME stat.ML

    Comments on: "A Random Forest Guided Tour" by G. Biau and E. Scornet

    Authors: Sylvain Arlot, Robin Genuer

    Abstract: This paper is a comment on the survey paper by Biau and Scornet (2016) about random forests. We focus on the problem of quantifying the impact of each ingredient of random forests on their performance. We show that such a quantification is possible for a simple pure forest , leading to conclusions that could apply more generally. Then, we consider "hold-out" random forests, which are a good middle… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.

  11. arXiv:1407.3939  [pdf, other

    math.ST cs.LG stat.ME

    Analysis of purely random forests bias

    Authors: Sylvain Arlot, Robin Genuer

    Abstract: Random forests are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem. As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of random forests. In this paper, we study the approximation error (the bias) of some purely random forest models in a regression… ▽ More

    Submitted 15 July, 2014; originally announced July 2014.

  12. arXiv:1303.1280  [pdf, other

    cs.LG stat.ML

    Large-Margin Metric Learning for Partitioning Problems

    Authors: Rémi Lajugie, Sylvain Arlot, Francis Bach

    Abstract: In this paper, we consider unsupervised partitioning problems, such as clustering, image segmentation, video segmentation and other change-point detection problems. We focus on partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, which include mean-based change-point detection, K-means, spectral clustering and normalized cuts. Our main goal is to learn… ▽ More

    Submitted 6 March, 2013; originally announced March 2013.

  13. arXiv:0909.1884  [pdf, ps, other

    math.ST stat.ME stat.ML

    Data-driven calibration of linear estimators with minimal penalties

    Authors: Sylvain Arlot, Francis Bach

    Abstract: This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression, spline smoothing or locally weighted regression, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the v… ▽ More

    Submitted 13 September, 2011; v1 submitted 10 September, 2009; originally announced September 2009.

    Comments: Advances in Neural Information Processing Systems (NIPS 2009), Vancouver : Canada (2009)

  14. arXiv:0907.4728  [pdf, ps, other

    math.ST stat.AP stat.ME stat.ML

    A survey of cross-validation procedures for model selection

    Authors: Sylvain Arlot, Alain Celisse

    Abstract: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguis… ▽ More

    Submitted 27 July, 2009; originally announced July 2009.

    MSC Class: 62G08; 62G05; 62G09

    Journal ref: Statistics Surveys 4 (2010) 40--79

  15. Segmentation of the mean of heteroscedastic data via cross-validation

    Authors: Sylvain Arlot, Alain Celisse

    Abstract: This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of change-point detection procedures is proposed, showing that cross-validation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The r… ▽ More

    Submitted 8 April, 2009; v1 submitted 23 February, 2009; originally announced February 2009.

    Journal ref: Statistics and Computing (2009) electronic

  16. arXiv:0804.2937  [pdf, ps, other

    math.ST stat.ML

    Margin-adaptive model selection in statistical learning

    Authors: Sylvain Arlot, Peter L. Bartlett

    Abstract: A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier than… ▽ More

    Submitted 22 April, 2011; v1 submitted 18 April, 2008; originally announced April 2008.

    Comments: Published in at http://dx.doi.org/10.3150/10-BEJ288 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

    Report number: IMS-BEJ-BEJ288

    Journal ref: Bernoulli 17, 2 (2011) 687-713

  17. arXiv:0802.0837  [pdf, ps, other

    math.ST stat.ME

    Data-driven calibration of penalties for least-squares regression

    Authors: Sylvain Arlot, Pascal Massart

    Abstract: Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from the data. We propose a completely data-driven calibration algorithm for this parameter in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently in… ▽ More

    Submitted 17 December, 2008; v1 submitted 6 February, 2008; originally announced February 2008.

    MSC Class: 62G05 (Primary) 62J05 (Secondary)

    Journal ref: Journal of Machine Learning Research 10 (2009) 245-279

  18. arXiv:0802.0566  [pdf, ps, other

    math.ST stat.ML

    V-fold cross-validation improved: V-fold penalization

    Authors: Sylvain Arlot

    Abstract: We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it ``overpenalizes'' all the more that V is large. Hence, asymptotic opt… ▽ More

    Submitted 7 February, 2008; v1 submitted 5 February, 2008; originally announced February 2008.

    Comments: 40 pages, plus a separate technical appendix

    MSC Class: 62G09 (Primary); 62G08; 62M20 (Secondary)