Skip to main content

Showing 1–25 of 25 results for author: Roquain, E

Searching in archive math. Search in all archives.
.
  1. arXiv:2508.10336  [pdf, ps, other

    math.ST stat.ML

    Online selective conformal inference: adaptive scores, convergence rate and optimality

    Authors: Pierre Humbert, Ulysse Gazin, Ruth Heller, Etienne Roquain

    Abstract: In a supervised online setting, quantifying uncertainty has been proposed in the seminal work of \cite{gibbs2021adaptive}. For any given point-prediction algorithm, their method (ACI) produces a conformal prediction set with an average missed coverage getting close to a pre-specified level $α$ for a long time horizon. We introduce an extended version of this algorithm, called OnlineSCI, allowing t… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  2. Selecting informative conformal prediction sets with false coverage rate control

    Authors: Ulysse Gazin, Ruth Heller, Ariane Marandon, Etienne Roquain

    Abstract: In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictor. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider bot… ▽ More

    Submitted 25 November, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 50 pages, 15 figures, 1 table. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2025

  3. arXiv:2306.07819  [pdf, other

    math.ST stat.ME

    False discovery proportion envelopes with m-consistency

    Authors: Iqraa Meah, Gilles Blanchard, Etienne Roquain

    Abstract: We provide new non-asymptotic false discovery proportion (FDP) confidence envelopes in several multiple testing settings relevant for modern high dimensional-data methods. We revisit the multiple testing scenarios considered in the recent work of Katsevich and Ramdas (2020): top-$k$, preordered (including knockoffs), online. Our emphasis is on obtaining FDP confidence bounds that both have non-asy… ▽ More

    Submitted 17 September, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

  4. arXiv:2208.06685  [pdf, other

    stat.ME math.ST stat.ML

    Adaptive novelty detection with false discovery rate guarantee

    Authors: Ariane Marandon, Lihua Lei, David Mary, Etienne Roquain

    Abstract: This paper studies the semi-supervised novelty detection problem where a set of "typical" measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite sampl… ▽ More

    Submitted 25 October, 2023; v1 submitted 13 August, 2022; originally announced August 2022.

  5. arXiv:2203.02597  [pdf, other

    math.ST stat.ML

    False membership rate control in mixture models

    Authors: Ariane Marandon, Tabea Rebafka, Etienne Roquain, Nataliya Sokolovska

    Abstract: The clustering task consists in partitioning elements of a sample into homogeneous groups. Most datasets contain individuals that are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous and should be avoided. To keep the misclassification rate small, one can decide to classify only a… ▽ More

    Submitted 25 October, 2023; v1 submitted 4 March, 2022; originally announced March 2022.

  6. arXiv:2109.13601  [pdf, other

    math.ST

    Sharp multiple testing boundary for sparse sequences

    Authors: Kweku Abraham, Ismael Castillo, Etienne Roquain

    Abstract: This work investigates multiple testing by considering minimax separation rates in the sparse sequence model, when the testing risk is measured as the sum FDR+FNR (False Discovery Rate plus False Negative Rate). First using the popular beta-min separation condition, with all nonzero signals separated from $0$ by at least some amount, we determine the sharp minimax testing risk asymptotically and t… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Revision extending the noise models permitted and allowing for "strong signal" settings. 33 pages (main body) or 86 (including supplement). 4 figures

    MSC Class: 62G10 (primary); 62C20 (secondary)

  7. arXiv:2106.13501  [pdf, other

    math.ST stat.ME

    Semi-supervised multiple testing

    Authors: David Mary, Etienne Roquain

    Abstract: An important limitation of standard multiple testing procedures is that the null distribution should be known. Here, we consider a null distribution-free approach for multiple testing in the following semi-supervised setting: the user does not know the null distribution, but has at hand a sample drawn from this null distribution. In practical situations, this null training sample (NTS) can come… ▽ More

    Submitted 24 November, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

  8. arXiv:2102.00929  [pdf, ps, other

    math.ST

    Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences

    Authors: Kweku Abraham, Ismael Castillo, Etienne Roquain

    Abstract: In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values, also called $\ell$-val… ▽ More

    Submitted 28 March, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

    MSC Class: 62G10 (Primary) 62C12 (Secondary)

  9. arXiv:1912.03109  [pdf, other

    math.ST

    False discovery rate control with unknown null distribution: is it possible to mimic the oracle?

    Authors: Etienne Roquain, Nicolas Verzelen

    Abstract: Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributio… ▽ More

    Submitted 21 December, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

  10. arXiv:1910.11575  [pdf, other

    math.ST stat.ME

    On agnostic post hoc approaches to false positive control

    Authors: Gilles Blanchard, Pierre Neuvial, Etienne Roquain

    Abstract: This document is a book chapter which gives a partial survey on post hoc approaches to false positive control.

    Submitted 25 October, 2019; originally announced October 2019.

  11. arXiv:1907.10176  [pdf, other

    math.ST stat.ME

    Graph inference with clustering and false discovery rate control

    Authors: Tabea Rebafka, Etienne Roquain, Fanny Villers

    Abstract: In this paper, a noisy version of the stochastic block model (NSBM) is introduced and we investigate the three following statistical inferences in this model: estimation of the model parameters, clustering of the nodes and identification of the underlying graph. While the two first inferences are done by using a variational expectation-maximization (VEM) algorithm, the graph inference is done by c… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

  12. arXiv:1809.08330  [pdf, other

    math.ST stat.ME

    Estimating minimum effect with outlier selection

    Authors: Alexandra Carpentier, Sylvain Delattre, Etienne Roquain, Nicolas Verzelen

    Abstract: We introduce one-sided versions of Huber's contamination model, in which corrupted samples tend to take larger values than uncorrupted ones. Two intertwined problems are addressed: estimation of the mean of uncorrupted samples (minimum effect) and selection of corrupted samples (outliers). Regarding the minimum effect estimation, we derive the minimax risks and introduce adaptive estimators to the… ▽ More

    Submitted 21 September, 2018; originally announced September 2018.

    Comments: 70 pages; 7 figures

  13. arXiv:1808.09748  [pdf, other

    math.ST

    On spike and slab empirical Bayes multiple testing

    Authors: Ismael Castillo, Etienne Roquain

    Abstract: This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testi… ▽ More

    Submitted 15 June, 2019; v1 submitted 29 August, 2018; originally announced August 2018.

    Comments: 83 pages, 7 figures

  14. arXiv:1807.01470  [pdf, other

    math.ST

    Post hoc false positive control for spatially structured hypotheses

    Authors: Guillermo Durand, Gilles Blanchard, Pierre Neuvial, Etienne Roquain

    Abstract: In a high dimensional multiple testing framework, we present new confidence bounds on the false positives contained in subsets S of selected null hypotheses. The coverage probability holds simultaneously over all subsets S, which means that the obtained confidence bounds are post hoc. Therefore, S can be chosen arbitrarily, possibly by using the data set several times. We focus in this paper speci… ▽ More

    Submitted 19 September, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  15. arXiv:1706.08250  [pdf, other

    math.ST

    Improving the Benjamini-Hochberg Procedure for Discrete Tests

    Authors: Sebastian Döhler, Guillermo Durand, Etienne Roquain

    Abstract: To find interesting items in genome-wide association studies or next generation sequencing data, a crucial point is to design powerful false discovery rate (FDR) controlling procedures that suitably combine discrete tests (typically binomial or Fisher tests). In particular, recent research has been striving for appropriate modifications of the classical Benjamini-Hochberg (BH) step-up procedure t… ▽ More

    Submitted 15 September, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

  16. Post hoc inference via joint family-wise error rate control

    Authors: Gilles Blanchard, Pierre Neuvial, Etienne Roquain

    Abstract: We introduce a general methodology for post hoc inference in a large-scale multiple testing framework. The approach is called "user-agnostic" in the sense that the statistical guarantee on the number of correct rejections holds for any set of candidate items selected by the user (after having seen the data). This task is investigated by defining a suitable criterion, named the joint-family-wise-er… ▽ More

    Submitted 8 January, 2018; v1 submitted 7 March, 2017; originally announced March 2017.

  17. New procedures controlling the false discovery proportion via Romano-Wolf's heuristic

    Authors: Sylvain Delattre, Etienne Roquain

    Abstract: The false discovery proportion (FDP) is a convenient way to account for false positives when a large number $m$ of tests are performed simultaneously. Romano and Wolf [Ann. Statist. 35 (2007) 1378-1408] have proposed a general principle that builds FDP controlling procedures from $k$-family-wise error rate controlling procedures while incorporating dependencies in an appropriate manner; see Korn e… ▽ More

    Submitted 5 June, 2015; v1 submitted 16 November, 2013; originally announced November 2013.

    Journal ref: Annals of Statistics, Institute of Mathematical Statistics (IMS), 2015, 43 (3), pp.1141-1177

  18. arXiv:1210.2489  [pdf, other

    math.ST

    On empirical distribution function of high-dimensional Gaussian vector components with an application to multiple testing

    Authors: Sylvain Delattre, Etienne Roquain

    Abstract: This paper introduces a new framework to study the asymptotical behavior of the empirical distribution function (e.d.f.) of Gaussian vector components, whose correlation matrix $Γ^{(m)}$ is dimension-dependent. Hence, by contrast with the existing literature, the vector is not assumed to be stationary. Rather, we make a "vanishing second order" assumption ensuring that the covariance matrix… ▽ More

    Submitted 4 May, 2013; v1 submitted 9 October, 2012; originally announced October 2012.

  19. arXiv:1007.1298  [pdf, ps, other

    math.ST

    On the false discovery proportion convergence under Gaussian equi-correlation

    Authors: Sylvain Delattre, Etienne Roquain

    Abstract: We study the convergence of the false discovery proportion (FDP) of the Benjamini-Hochberg procedure in the Gaussian equi-correlated model, when the correlation $ρ_m$ converges to zero as the hypothesis number $m$ grows to infinity. By contrast with the standard convergence rate $m^{1/2}$ holding under independence, this study shows that the FDP converges to the false discovery rate (FDR) at rate… ▽ More

    Submitted 8 July, 2010; originally announced July 2010.

  20. Exact calculations for false discovery proportion with application to least favorable configurations

    Authors: Etienne Roquain, Fanny Villers

    Abstract: In a context of multiple hypothesis testing, we provide several new exact calculations related to the false discovery proportion (FDP) of step-up and step-down procedures. For step-up procedures, we show that the number of erroneous rejections conditionally on the rejection number is simply a binomial variable, which leads to explicit computations of the c.d.f., the {$s$-th} moment and the mean of… ▽ More

    Submitted 26 May, 2010; v1 submitted 15 February, 2010; originally announced February 2010.

    Journal ref: The Annals of Statistics 39, 1 (2011) 584-612

  21. Optimal weighting for false discovery rate control

    Authors: Etienne Roquain, Mark Van De Wiel

    Abstract: How to weigh the Benjamini-Hochberg procedure? In the context of multiple hypothesis testing, we propose a new step-wise procedure that controls the false discovery rate (FDR) and we prove it to be more powerful than any weighted Benjamini-Hochberg procedure. Both finite-sample and asymptotic results are presented. Moreover, we illustrate good performance of our procedure in simulations and a ge… ▽ More

    Submitted 13 July, 2009; v1 submitted 25 July, 2008; originally announced July 2008.

    MSC Class: 62J15 ; 62G10

    Journal ref: Electronic Journal of Statistics 3 (2009) 678-711

  22. Two simple sufficient conditions for FDR control

    Authors: Gilles Blanchard, Etienne Roquain

    Abstract: We show that the control of the false discovery rate (FDR) for a multiple testing procedure is implied by two coupled simple sufficient conditions. The first one, which we call ``self-consistency condition'', concerns the algorithm itself, and the second, called ``dependency control condition'' is related to the dependency assumptions on the $p$-value family. Many standard multiple testing proce… ▽ More

    Submitted 21 October, 2008; v1 submitted 11 February, 2008; originally announced February 2008.

    Comments: Published in at http://dx.doi.org/10.1214/08-EJS180 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    MSC Class: 62J15; 62G10

    Journal ref: Electronic Journal of Statistics 2 (2008) 963-992

  23. Some nonasymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests

    Authors: Sylvain Arlot, Gilles Blanchard, Etienne Roquain

    Abstract: We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, f… ▽ More

    Submitted 11 January, 2010; v1 submitted 5 December, 2007; originally announced December 2007.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOS667; http://dx.doi.org/10.1214/08-AOS668 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS667; IMS-AOS-AOS668 MSC Class: 62G15 (Primary) 62G09 (Secondary); 62G10 (Primary) 62G09 (Secondary)

    Journal ref: The Annals of Statistics 38, 1 (2010) 51-99

  24. arXiv:0707.0536  [pdf, ps, other

    math.ST

    Adaptive FDR control under independence and dependence

    Authors: Gilles Blanchard, Etienne Roquain

    Abstract: In the context of multiple hypotheses testing, the proportion $π_0$ of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efficency is called adaptive. In this paper, we focus on the issue of False Discovery Rate (FDR) cont… ▽ More

    Submitted 17 February, 2009; v1 submitted 4 July, 2007; originally announced July 2007.

    MSC Class: 62G10; 62H15

  25. Resampling-based confidence regions and multiple tests for a correlated random vector

    Authors: Sylvain Arlot, Gilles Blanchard, Etienne Roquain

    Abstract: We derive non-asymptotic confidence regions for the mean of a random vector whose coordinates have an unknown dependence structure. The random vector is supposed to be either Gaussian or to have a symmetric bounded distribution, and we observe $n$ i.i.d copies of it. The confidence regions are built using a data-dependent threshold based on a weighted bootstrap procedure. We consider two approac… ▽ More

    Submitted 22 January, 2007; originally announced January 2007.

    Comments: submitted to COLT

    MSC Class: 62G09 ; 62H15

    Journal ref: Learning Theory 20th Annual Conference on Learning Theory, COLT 2007, San Diego, CA, USA; June 13-15, 2007. Proceedings, Springer Berlin / Heidelberg (Ed.) (2007) 127-141