Skip to main content

Showing 1–50 of 52 results for author: Barber, R F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.03599  [pdf, ps, other

    stat.ME math.ST

    Mosaic inference on panel data

    Authors: Asher Spector, Rina Foygel Barber, Emmanuel Candès

    Abstract: Analysis of panel data via linear regression is widespread across disciplines. To perform statistical inference, such analyses typically assume that clusters of observations are jointly independent. For example, one might assume that observations in New York are independent of observations in New Jersey. Are such assumptions plausible? Might there be hidden dependencies between nearby clusters? Th… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 38 pages, 7 figures

  2. arXiv:2506.02257  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Assumption-free stability for ranking problems

    Authors: Ruiting Liang, Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

    Abstract: In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that estimating a ranking from noisy data can exhibit high sensitivity to small perturbations. Concretely, if we use data to provide a score for… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  3. arXiv:2502.19851  [pdf, other

    stat.ME stat.ML

    Can a calibration metric be both testable and actionable?

    Authors: Raphael Rossellini, Jake A. Soloff, Rina Foygel Barber, Zhimei Ren, Rebecca Willett

    Abstract: Forecast probabilities often serve as critical inputs for binary decision making. In such settings, calibration$\unicode{x2014}$ensuring forecasted probabilities match empirical frequencies$\unicode{x2014}$is essential. Although the common notion of Expected Calibration Error (ECE) provides actionable insights for decision making, it is not testable: it cannot be empirically estimated in many prac… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  4. arXiv:2502.06765  [pdf, ps, other

    math.ST cs.LG stat.ML

    Are all models wrong? Fundamental limits in distribution-free empirical model falsification

    Authors: Manuel M. Müller, Yuetian Luo, Rina Foygel Barber

    Abstract: In statistics and machine learning, when we train a fitted model on available data, we typically want to ensure that we are searching within a model class that contains at least one accurate model -- that is, we would like to ensure an upper bound on the model class risk (the lowest possible risk that can be attained by any model in the class). However, it is also of interest to establish lower bo… ▽ More

    Submitted 5 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: 39 pages, 1 figure

  5. arXiv:2501.06133  [pdf, other

    stat.ME math.ST

    Testing conditional independence under isotonicity

    Authors: Rohan Hore, Jake A. Soloff, Rina Foygel Barber, Richard J. Samworth

    Abstract: We propose a test of the conditional independence of random variables $X$ and $Y$ given $Z$ under the additional assumption that $X$ is stochastically increasing in $Z$. The well-documented hardness of testing conditional independence means that some further restriction on the null hypothesis parameter space is required, but in contrast to existing approaches based on parametric models, smoothness… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: 69 pages, 5 figures

  6. arXiv:2411.11824  [pdf, ps, other

    math.ST stat.ME stat.ML

    Theoretical Foundations of Conformal Prediction

    Authors: Anastasios N. Angelopoulos, Rina Foygel Barber, Stephen Bates

    Abstract: This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypothesis testing and providing uncertainty quantification guarantees for machine learning systems. Much of the current interest in conformal prediction is due to its ability to integrate into complex mac… ▽ More

    Submitted 3 June, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

    Comments: This material will be published by Cambridge University Press as Theoretical Foundations of Conformal Prediction by Anastasios N. Angelopoulos, Rina Foygel Barber, and Stephen Bates. This prepublication version is free to view/download for personal use only. Not for redistribution/resale/use in derivative works. Copyright Anastasios N. Angelopoulos, Rina Foygel Barber, and Stephen Bates, 2025

  7. arXiv:2408.07066  [pdf, ps, other

    stat.ME

    Conformal prediction after data-dependent model selection

    Authors: Ruiting Liang, Wanrong Zhu, Rina Foygel Barber

    Abstract: Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverag… ▽ More

    Submitted 3 July, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

  8. arXiv:2407.06867  [pdf, other

    stat.ME stat.ML

    Distributionally robust risk evaluation with an isotonic constraint

    Authors: Yu Gui, Rina Foygel Barber, Cong Ma

    Abstract: Statistical learning under distribution shift is challenging when neither prior knowledge nor fully accessible data from the target distribution is available. Distributionally robust learning (DRL) aims to control the worst-case statistical performance within an uncertainty set of candidate distributions, but how to properly specify the set remains challenging. To enable distributional robustness… ▽ More

    Submitted 18 December, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  9. arXiv:2406.07449  [pdf, other

    stat.ME stat.ML

    Boosted Conformal Prediction Intervals

    Authors: Ran Xie, Rina Foygel Barber, Emmanuel J. Candès

    Abstract: This paper introduces a boosted conformal procedure designed to tailor conformalized prediction intervals toward specific desired properties, such as enhanced conditional coverage or reduced interval length. We employ machine learning techniques, notably gradient boosting, to systematically improve upon a predefined conformity score function. This process is guided by carefully constructed loss fu… ▽ More

    Submitted 9 November, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  10. arXiv:2405.15107  [pdf, ps, other

    stat.ML cs.LG math.ST

    Is Algorithmic Stability Testable? A Unified Framework under Computational Constraints

    Authors: Yuetian Luo, Rina Foygel Barber

    Abstract: Algorithmic stability is a central notion in learning theory that quantifies the sensitivity of an algorithm to small changes in the training data. If a learning algorithm satisfies certain stability properties, this leads to many important downstream implications, such as generalization, robustness, and reliable predictive inference. Verifying that stability holds for a particular algorithm is th… ▽ More

    Submitted 30 March, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.14064  [pdf, other

    stat.ML cs.LG math.ST

    Building a stable classifier with the inflated argmax

    Authors: Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

    Abstract: We propose a new framework for algorithmic stability in the context of multiclass classification. In practice, classification algorithms often operate by first assigning a continuous score (for instance, an estimated probability) to each possible label, then taking the maximizer -- i.e., selecting the class that has the highest score. A drawback of this type of approach is that it is inherently un… ▽ More

    Submitted 25 April, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024

  12. arXiv:2404.15017  [pdf, other

    stat.ME

    The mosaic permutation test: an exact and nonparametric goodness-of-fit test for factor models

    Authors: Asher Spector, Rina Foygel Barber, Trevor Hastie, Ronald N. Kahn, Emmanuel Candès

    Abstract: Financial firms often rely on fundamental factor models to explain correlations among asset returns and manage risk. Yet after major events, e.g., COVID-19, analysts may reassess whether existing risk models continue to fit well: specifically, after accounting for a set of known factor exposures, are the residuals of the asset returns independent? With this motivation, we introduce the mosaic perm… ▽ More

    Submitted 26 September, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 42 pages, 13 figures

    MSC Class: 62H25 (Primary) 62G10; 62G09 (Secondary)

  13. arXiv:2402.07388  [pdf, ps, other

    math.ST cs.LG stat.ML

    The Limits of Assumption-free Tests for Algorithm Performance

    Authors: Yuetian Luo, Rina Foygel Barber

    Abstract: Algorithm evaluation and comparison are fundamental questions in machine learning and statistics -- how well does an algorithm perform at a given modeling task, and which algorithm performs best? Many methods have been developed to assess algorithm performance, often based around cross-validation type strategies, retraining the algorithm of interest on different subsets of the data and assessing i… ▽ More

    Submitted 22 March, 2025; v1 submitted 11 February, 2024; originally announced February 2024.

  14. arXiv:2402.01139  [pdf, other

    stat.ML cs.LG stat.ME

    Online conformal prediction with decaying step sizes

    Authors: Anastasios N. Angelopoulos, Rina Foygel Barber, Stephen Bates

    Abstract: We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distributio… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  15. arXiv:2401.17452  [pdf, other

    stat.ME

    Group-Weighted Conformal Prediction

    Authors: Aabesh Bhattacharyya, Rina Foygel Barber

    Abstract: Conformal prediction (CP) is a method for constructing a prediction interval around the output of a fitted model, whose validity does not rely on the model being correct--the CP interval offers a coverage guarantee that is distribution-free, but relies on the training data being drawn from the same distribution as the test data. A recent variant, weighted conformal prediction (WCP), reweights the… ▽ More

    Submitted 15 April, 2025; v1 submitted 30 January, 2024; originally announced January 2024.

  16. arXiv:2310.07850  [pdf, other

    stat.ME

    Conformal prediction with local weights: randomization enables local guarantees

    Authors: Rohan Hore, Rina Foygel Barber

    Abstract: In this work, we consider the problem of building distribution-free prediction intervals with finite-sample conditional coverage guarantees. Conformal prediction (CP) is an increasingly popular framework for building such intervals with distribution-free guarantees, but these guarantees only ensure marginal coverage: the probability of coverage is averaged over both the training and test data, mea… ▽ More

    Submitted 25 October, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 45 pages, 13 figures

  17. arXiv:2309.08063  [pdf, other

    stat.ME

    Approximate co-sufficient sampling with regularization

    Authors: Wanrong Zhu, Rina Foygel Barber

    Abstract: In this work, we consider the problem of goodness-of-fit (GoF) testing for parametric models -- for example, testing whether observed data follows a logistic regression model. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via condi… ▽ More

    Submitted 24 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

  18. arXiv:2306.08693  [pdf, other

    stat.ME stat.ML

    Integrating Uncertainty Awareness into Conformalized Quantile Regression

    Authors: Raphael Rossellini, Rina Foygel Barber, Rebecca Willett

    Abstract: Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction interva… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Journal ref: PMLR 238:1540-1548, 2024

  19. arXiv:2306.06342  [pdf, other

    math.ST stat.ME

    Distribution-free inference with hierarchical data

    Authors: Yonghoon Lee, Rina Foygel Barber, Rebecca Willett

    Abstract: This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal predict… ▽ More

    Submitted 2 March, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

  20. arXiv:2305.10637  [pdf, other

    stat.ME

    Conformalized matrix completion

    Authors: Yu Gui, Rina Foygel Barber, Cong Ma

    Abstract: Matrix completion aims to estimate missing entries in a data matrix, using the assumption of a low-complexity structure (e.g., low rank) so that imputation is possible. While many effective estimation algorithms exist in the literature, uncertainty quantification for this problem has proved to be challenging, and existing methods are extremely sensitive to model misspecification. In this work, we… ▽ More

    Submitted 22 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: accepted to 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  21. arXiv:2303.02732  [pdf, other

    stat.ME stat.CO stat.ML

    Iterative Approximate Cross-Validation

    Authors: Yuetian Luo, Zhimei Ren, Rina Foygel Barber

    Abstract: Cross-validation (CV) is one of the most popular tools for assessing and selecting predictive models. However, standard CV suffers from high computational cost when the number of folds is large. Recently, under the empirical risk minimization (ERM) framework, a line of works proposed efficient methods to approximate CV based on the solution of the ERM problem trained on the full dataset. However,… ▽ More

    Submitted 27 May, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

  22. arXiv:2301.12999  [pdf, other

    stat.ME

    Selective inference for clustering with unknown variance

    Authors: Youngjoo Yun, Rina Foygel Barber

    Abstract: In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses-that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for both exploration and testing can lead to massive selection bias, leading to many false discoveries. Selective inference is a framework that allows for performing v… ▽ More

    Submitted 21 July, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  23. arXiv:2301.12600  [pdf, other

    stat.ML cs.LG math.ST

    Bagging Provides Assumption-free Stability

    Authors: Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

    Abstract: Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constan… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

  24. arXiv:2211.01227  [pdf, other

    stat.ME

    Conformalized survival analysis with adaptive cutoffs

    Authors: Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber

    Abstract: This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data. We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) t… ▽ More

    Submitted 6 November, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted by Biometrika; 22 pages

  25. arXiv:2205.15461  [pdf, other

    stat.ME

    Derandomized knockoffs: leveraging e-values for false discovery rate control

    Authors: Zhimei Ren, Rina Foygel Barber

    Abstract: Model-X knockoffs is a flexible wrapper method for high-dimensional regression algorithms, which provides guaranteed control of the false discovery rate (FDR). Due to the randomness inherent to the method, different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is undesirable in practice. In this paper, we introduce a methodology for dera… ▽ More

    Submitted 30 August, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: Accepted by Journal of the Royal Statistical Society: Series B (JRSSB); 35 pages

  26. arXiv:2204.13581  [pdf, ps, other

    stat.ME

    Permutation tests using arbitrary permutation distributions

    Authors: Aaditya Ramdas, Rina Foygel Barber, Emmanuel J. Candes, Ryan J. Tibshirani

    Abstract: Permutation tests date back nearly a century to Fisher's randomized experiments, and remain an immensely popular statistical tool, used for testing hypotheses of independence between variables and other common inferential questions. Much of the existing literature has emphasized that, for the permutation p-value to be valid, one must first pick a subgroup $G$ of permutations (which could equal the… ▽ More

    Submitted 2 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

  27. arXiv:2202.13415  [pdf, other

    stat.ME

    Conformal prediction beyond exchangeability

    Authors: Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani

    Abstract: Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data dis… ▽ More

    Submitted 16 March, 2023; v1 submitted 27 February, 2022; originally announced February 2022.

  28. arXiv:2106.09136  [pdf, other

    math.ST stat.ML

    Binary classification with corrupted labels

    Authors: Yonghoon Lee, Rina Foygel Barber

    Abstract: In a binary classification problem where the goal is to fit an accurate predictor, the presence of corrupted labels in the training data set may create an additional challenge. However, in settings where likelihood maximization is poorly behaved-for example, if positive and negative labels are perfectly separable-then a small fraction of corrupted labels can improve performance by ensuring robustn… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  29. arXiv:2105.07587  [pdf, other

    math.ST stat.ME

    Convergence guarantee for the sparse monotone single index model

    Authors: Ran Dai, Hyebin Song, Rina Foygel Barber, Garvesh Raskutti

    Abstract: We consider a high-dimensional monotone single index model (hdSIM), which is a semiparametric extension of a high-dimensional generalize linear model (hdGLM), where the link function is unknown, but constrained with monotone and non-decreasing shape. We develop a scalable projection-based iterative approach, the "Sparse Orthogonal Descent Single-Index Model" (SOD-SIM), which alternates between spa… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

    MSC Class: 62G08

  30. arXiv:2007.09851  [pdf, other

    stat.ME

    Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling

    Authors: Rina Foygel Barber, Lucas Janson

    Abstract: Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing, just to name a few applications. While testing the GoF of a simple (point) null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for com… ▽ More

    Submitted 14 September, 2021; v1 submitted 19 July, 2020; originally announced July 2020.

  31. arXiv:2002.09025  [pdf, other

    stat.ME

    Predictive Inference Is Free with the Jackknife+-after-Bootstrap

    Authors: Byol Kim, Chen Xu, Rina Foygel Barber

    Abstract: Ensemble learning is widely used in applications to make predictions in complex decision problems---for example, averaging models fitted to a sequence of samples bootstrapped from the available training data. While such methods offer more accurate, stable, and robust predictions and model estimates, much less is known about how to perform valid, assumption-lean inference on the output of these typ… ▽ More

    Submitted 11 November, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: 31 pages, 12 figures, 4 tables. To appear in the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  32. arXiv:1910.02348  [pdf, other

    stat.ME

    Convex and Non-convex Approaches for Statistical Inference with Class-Conditional Noisy Labels

    Authors: Hyebin Song, Ran Dai, Garvesh Raskutti, Rina Foygel Barber

    Abstract: We study the problem of estimation and testing in logistic regression with class-conditional noise in the observed labels, which has an important implication in the Positive-Unlabeled (PU) learning setting. With the key observation that the label noise problem belongs to a special sub-class of generalized linear models (GLM), we discuss convex and non-convex approaches that address this problem. A… ▽ More

    Submitted 12 August, 2020; v1 submitted 5 October, 2019; originally announced October 2019.

  33. arXiv:1908.05428  [pdf, other

    stat.ME cs.CY stat.AP stat.ML

    With Malice Towards None: Assessing Uncertainty via Equalized Coverage

    Authors: Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, Emmanuel J. Candès

    Abstract: An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the se… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: 14 pages, 1 figure, 1 table

  34. arXiv:1905.02928  [pdf, other

    stat.ME

    Predictive inference with the jackknife+

    Authors: Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani

    Abstract: This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in… ▽ More

    Submitted 29 May, 2020; v1 submitted 8 May, 2019; originally announced May 2019.

  35. arXiv:1904.06019  [pdf, other

    stat.ME

    Conformal Prediction Under Covariate Shift

    Authors: Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas

    Abstract: We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurate… ▽ More

    Submitted 6 July, 2020; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: 17 pages, 4 figures

  36. arXiv:1812.11433  [pdf, ps, other

    stat.ME math.ST

    On the Construction of Knockoffs in Case-Control Studies

    Authors: Rina Foygel Barber, Emmanuel Candes

    Abstract: Consider a case-control study in which we have a random sample, constructed in such a way that the proportion of cases in our sample is different from that in the general population---for instance, the sample is constructed to achieve a fixed ratio of cases to controls. Imagine that we wish to determine which of the potentially many covariates under study truly influence the response by applying t… ▽ More

    Submitted 29 December, 2018; originally announced December 2018.

    Comments: 4 pages

  37. arXiv:1807.05405  [pdf, other

    stat.ME math.ST

    The conditional permutation test for independence while controlling for confounders

    Authors: Thomas B. Berrett, Yi Wang, Rina Foygel Barber, Richard J. Samworth

    Abstract: We propose a general new method, the conditional permutation test, for testing the conditional independence of variables $X$ and $Y$ given a potentially high-dimensional random vector $Z$ that may contain confounding factors. The proposed test permutes entries of $X$ non-uniformly, so as to respect the existing dependence between $X$ and $Z$ and thus account for the presence of these confounders.… ▽ More

    Submitted 7 May, 2019; v1 submitted 14 July, 2018; originally announced July 2018.

    Comments: 31 pages, 4 figures

  38. arXiv:1805.06439  [pdf, other

    stat.ML cs.LG

    Prediction Rule Reshaping

    Authors: Matt Bonakdarpour, Sabyasachi Chatterjee, Rina Foygel Barber, John Lafferty

    Abstract: Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for comput… ▽ More

    Submitted 16 May, 2018; originally announced May 2018.

  39. arXiv:1804.08841  [pdf, other

    stat.ME math.ST stat.CO stat.ML

    Between hard and soft thresholding: optimal iterative thresholding algorithms

    Authors: Haoyang Liu, Rina Foygel Barber

    Abstract: Iterative thresholding algorithms seek to optimize a differentiable objective function over a sparsity or rank constraint by alternating between gradient steps that reduce the objective, and thresholding steps that enforce the constraint. This work examines the choice of the thresholding operator, and asks whether it is possible to achieve stronger guarantees than what is possible with hard thresh… ▽ More

    Submitted 30 July, 2019; v1 submitted 24 April, 2018; originally announced April 2018.

  40. arXiv:1801.03896  [pdf, ps, other

    stat.ME

    Robust inference with knockoffs

    Authors: Rina Foygel Barber, Emmanuel J. Candès, Richard J. Samworth

    Abstract: We consider the variable selection problem, which seeks to identify important variables influencing a response $Y$ out of many candidate features $X_1, \ldots, X_p$. We wish to do so while offering finite-sample guarantees about the fraction of false positives - selected variables $X_j$ that in fact have no effect on $Y$ after the other features are known. When the number of features $p$ is large… ▽ More

    Submitted 11 February, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

  41. arXiv:1709.06233  [pdf, other

    stat.ME

    Discretized conformal prediction for efficient distribution-free inference

    Authors: Wenyu Chen, Kelli-Jean Chun, Rina Foygel Barber

    Abstract: In regression problems where there is no known true underlying model, conformal prediction methods enable prediction intervals to be constructed without any assumptions on the distribution of the underlying data, except that the training and test data are assumed to be exchangeable. However, these methods bear a heavy computational cost-and, to be carried out exactly, the regression algorithm woul… ▽ More

    Submitted 27 January, 2023; v1 submitted 18 September, 2017; originally announced September 2017.

  42. arXiv:1709.04451  [pdf, other

    math.OC stat.ML

    Alternating minimization and alternating descent over nonconvex sets

    Authors: Wooseok Ha, Rina Foygel Barber

    Abstract: We analyze the performance of alternating minimization for loss functions optimized over two variables, where each variable may be restricted to lie in some potentially nonconvex constraint set. This type of setting arises naturally in high-dimensional statistics and signal processing, where the variables often reflect different structures or components within the signals being considered. Our ana… ▽ More

    Submitted 25 February, 2019; v1 submitted 13 September, 2017; originally announced September 2017.

  43. arXiv:1703.06222  [pdf

    stat.ME math.ST stat.ML

    A unified treatment of multiple testing with prior knowledge using the p-filter

    Authors: Aaditya Ramdas, Rina Foygel Barber, Martin J. Wainwright, Michael I. Jordan

    Abstract: There is a significant literature on methods for incorporating knowledge into multiple testing procedures so as to improve their power and precision. Some common forms of prior knowledge include (a) beliefs about which hypotheses are null, modeled by non-uniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) multiple arbitrary part… ▽ More

    Submitted 6 August, 2019; v1 submitted 17 March, 2017; originally announced March 2017.

    Comments: 36 pages, 1 figure, accepted for publication at the Annals of Statistics

  44. arXiv:1607.08211  [pdf, other

    stat.ME

    Selective Inference for Group-Sparse Linear Models

    Authors: Fan Yang, Rina Foygel Barber, Prateek Jain, John Lafferty

    Abstract: We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of the magnitude of the projection of the data onto a given subspace, and enables us to develop inference procedures for a broad class of group-sparse selection m… ▽ More

    Submitted 27 July, 2016; originally announced July 2016.

  45. arXiv:1606.07926  [pdf, other

    stat.ME

    Multiple testing with the structure adaptive Benjamini-Hochberg algorithm

    Authors: Ang Li, Rina Foygel Barber

    Abstract: In multiple testing problems, where a large number of hypotheses are tested simultaneously, false discovery rate (FDR) control can be achieved with the well-known Benjamini-Hochberg procedure, which adapts to the amount of signal present in the data. Many modifications of this procedure have been proposed to improve power in scenarios where the hypotheses are organized into groups or into a hierar… ▽ More

    Submitted 13 September, 2017; v1 submitted 25 June, 2016; originally announced June 2016.

  46. arXiv:1602.03589  [pdf, other

    stat.ME

    The knockoff filter for FDR control in group-sparse and multitask regression

    Authors: Ran Dai, Rina Foygel Barber

    Abstract: We propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped, and we would like to select a set of relevant groups which have a nonzero effect on the response. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We also apply our metho… ▽ More

    Submitted 10 February, 2016; originally announced February 2016.

  47. arXiv:1602.03574  [pdf, other

    stat.ME math.ST

    A knockoff filter for high-dimensional selective inference

    Authors: Rina Foygel Barber, Emmanuel J. Candes

    Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced… ▽ More

    Submitted 3 May, 2018; v1 submitted 10 February, 2016; originally announced February 2016.

  48. arXiv:1512.03397  [pdf, other

    stat.ME stat.ML

    The p-filter: multi-layer FDR control for grouped hypotheses

    Authors: Rina Foygel Barber, Aaditya Ramdas

    Abstract: In many practical applications of multiple hypothesis testing using the False Discovery Rate (FDR), the given hypotheses can be naturally partitioned into groups, and one may not only want to control the number of false discoveries (wrongly rejected null hypotheses), but also the number of falsely discovered groups of hypotheses (we say a group is falsely discovered if at least one hypothesis with… ▽ More

    Submitted 28 October, 2016; v1 submitted 10 December, 2015; originally announced December 2015.

  49. arXiv:1505.07352  [pdf, other

    stat.ME

    Accumulation tests for FDR control in ordered hypothesis testing

    Authors: Ang Li, Rina Foygel Barber

    Abstract: Multiple testing problems arising in modern scientific applications can involve simultaneously testing thousands or even millions of hypotheses, with relatively few true signals. In this paper, we consider the multiple testing problem where prior information is available (for instance, from an earlier study under different experimental conditions), that can allow us to test the hypotheses as a ran… ▽ More

    Submitted 25 June, 2016; v1 submitted 27 May, 2015; originally announced May 2015.

  50. arXiv:1505.02097  [pdf, other

    stat.ME

    EigenPrism: Inference for High-Dimensional Signal-to-Noise Ratios

    Authors: Lucas Janson, Rina Foygel Barber, Emmanuel Candès

    Abstract: Consider the following three important problems in statistical inference, namely, constructing confidence intervals for (1) the error of a high-dimensional ($p>n$) regression estimator, (2) the linear regression noise level, and (3) the genetic signal-to-noise ratio of a continuous-valued trait (related to the heritability). All three problems turn out to be closely related to the little-studied p… ▽ More

    Submitted 28 June, 2016; v1 submitted 8 May, 2015; originally announced May 2015.