-
Sharp Results for Hypothesis Testing with Risk-Sensitive Agents
Authors:
Flora C. Shi,
Stephen Bates,
Martin J. Wainwright
Abstract:
Statistical protocols are often used for decision-making involving multiple parties, each with their own incentives, private information, and ability to influence the distributional properties of the data. We study a game-theoretic version of hypothesis testing in which a statistician, also known as a principal, interacts with strategic agents that can generate data. The statistician seeks to desi…
▽ More
Statistical protocols are often used for decision-making involving multiple parties, each with their own incentives, private information, and ability to influence the distributional properties of the data. We study a game-theoretic version of hypothesis testing in which a statistician, also known as a principal, interacts with strategic agents that can generate data. The statistician seeks to design a testing protocol with controlled error, while the data-generating agents, guided by their utility and prior information, choose whether or not to opt in based on expected utility maximization. This strategic behavior affects the data observed by the statistician and, consequently, the associated testing error. We analyze this problem for general concave and monotonic utility functions and prove an upper bound on the Bayes false discovery rate (FDR). Underlying this bound is a form of prior elicitation: we show how an agent's choice to opt in implies a certain upper bound on their prior null probability. Our FDR bound is unimprovable in a strong sense, achieving equality at a single point for an individual agent and at any countable number of points for a population of agents. We also demonstrate that our testing protocols exhibit a desirable maximin property when the principal's utility is considered. To illustrate the qualitative predictions of our theory, we examine the effects of risk aversion, reward stochasticity, and signal-to-noise ratio, as well as the implications for the Food and Drug Administration's testing protocols.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Theoretical Foundations of Conformal Prediction
Authors:
Anastasios N. Angelopoulos,
Rina Foygel Barber,
Stephen Bates
Abstract:
This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypothesis testing and providing uncertainty quantification guarantees for machine learning systems. Much of the current interest in conformal prediction is due to its ability to integrate into complex mac…
▽ More
This book is about conformal prediction and related inferential techniques that build on permutation tests and exchangeability. These techniques are useful in a diverse array of tasks, including hypothesis testing and providing uncertainty quantification guarantees for machine learning systems. Much of the current interest in conformal prediction is due to its ability to integrate into complex machine learning workflows, solving the problem of forming prediction sets without any assumptions on the form of the data generating distribution. Since contemporary machine learning algorithms have generally proven difficult to analyze directly, conformal prediction's main appeal is its ability to provide formal, finite-sample guarantees when paired with such methods.
The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these proof strategies, especially the more recent ones, are scattered among research papers, making it difficult for researchers to understand where to look, which results are important, and how exactly the proofs work. We hope to bridge this gap by curating what we believe to be some of the most important results in the literature and presenting their proofs in a unified language, with illustrations, and with an eye towards pedagogy.
△ Less
Submitted 3 June, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Label Noise Robustness of Conformal Prediction
Authors:
Bat-Sheva Einbinder,
Shai Feldman,
Stephen Bates,
Anastasios N. Angelopoulos,
Asaf Gendler,
Yaniv Romano
Abstract:
We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly control…
▽ More
We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly controlling a general loss function, such as the false negative proportion, with noisy labels. Our theory and experiments suggest that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels whenever the noise is dispersive and increases variability. In other adversarial cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure achieving the correct risk of the ground truth labels without score or data regularity.
△ Less
Submitted 26 November, 2024; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Conformal Risk Control
Authors:
Anastasios N. Angelopoulos,
Stephen Bates,
Adam Fisch,
Lihua Lei,
Tal Schuster
Abstract:
We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversar…
▽ More
We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.
△ Less
Submitted 13 June, 2025; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Principal-Agent Hypothesis Testing
Authors:
Stephen Bates,
Michael I. Jordan,
Michael Sklar,
Jake A. Soloff
Abstract:
Consider the relationship between a regulator (the principal) and an experimenter (the agent) such as a pharmaceutical company. The pharmaceutical company wishes to sell a drug for profit, whereas the regulator wishes to allow only efficacious drugs to be marketed. The efficacy of the drug is not known to the regulator, so the pharmaceutical company must run a costly trial to prove efficacy to the…
▽ More
Consider the relationship between a regulator (the principal) and an experimenter (the agent) such as a pharmaceutical company. The pharmaceutical company wishes to sell a drug for profit, whereas the regulator wishes to allow only efficacious drugs to be marketed. The efficacy of the drug is not known to the regulator, so the pharmaceutical company must run a costly trial to prove efficacy to the regulator. Critically, the statistical protocol used to establish efficacy affects the behavior of a strategic, self-interested agent; a lower standard of statistical evidence incentivizes the agent to run more trials that are less likely to be effective. The interaction between the statistical protocol and the incentives of the pharmaceutical company is crucial for understanding this system and designing protocols with high social utility. In this work, we discuss how the regulator can set up a protocol with payoffs based on statistical evidence. We show how to design protocols that are robust to an agent's strategic actions, and derive the optimal protocol in the presence of strategic entrants.
△ Less
Submitted 15 April, 2024; v1 submitted 13 May, 2022;
originally announced May 2022.
-
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Authors:
Anastasios N. Angelopoulos,
Stephen Bates
Abstract:
Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Conformal prediction is a user-friendly paradigm for creating statistically rigorous uncertainty sets/intervals for the predictions of such models. Critically, the sets are valid in a distribution-free sense: they p…
▽ More
Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Conformal prediction is a user-friendly paradigm for creating statistically rigorous uncertainty sets/intervals for the predictions of such models. Critically, the sets are valid in a distribution-free sense: they possess explicit, non-asymptotic guarantees even without distributional assumptions or model assumptions. One can use conformal prediction with any pre-trained model, such as a neural network, to produce sets that are guaranteed to contain the ground truth with a user-specified probability, such as 90%. It is easy-to-understand, easy-to-use, and general, applying naturally to problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on.
This hands-on introduction is aimed to provide the reader a working understanding of conformal prediction and related distribution-free uncertainty quantification techniques with one self-contained document. We lead the reader through practical theory for and examples of conformal prediction and describe its extensions to complex machine learning tasks involving structured outputs, distribution shift, time-series, outliers, models that abstain, and more. Throughout, there are many explanatory illustrations, examples, and code samples in Python. With each code sample comes a Jupyter notebook implementing the method on a real-data example; the notebooks can be accessed and easily run using our codebase.
△ Less
Submitted 7 December, 2022; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Testing for Outliers with Conformal p-values
Authors:
Stephen Bates,
Emmanuel Candès,
Lihua Lei,
Yaniv Romano,
Matteo Sesia
Abstract:
This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually depende…
▽ More
This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.
△ Less
Submitted 24 May, 2022; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Cross-validation: what does it estimate and how well does it do it?
Authors:
Stephen Bates,
Trevor Hastie,
Robert Tibshirani
Abstract:
Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error o…
▽ More
Cross-validation is a widely-used technique to estimate prediction error, but its behavior is complex and not fully understood. Ideally, one would like to think that cross-validation estimates the prediction error for the model at hand, fit to the training data. We prove that this is not the case for the linear model fit by ordinary least squares; rather it estimates the average prediction error of models fit on other unseen training sets drawn from the same population. We further show that this phenomenon occurs for most popular estimates of prediction error, including data splitting, bootstrapping, and Mallow's Cp. Next, the standard confidence intervals for prediction error derived from cross-validation may have coverage far below the desired level. Because each data point is used for both training and testing, there are correlations among the measured accuracies for each fold, and so the usual estimate of variance is too small. We introduce a nested cross-validation scheme to estimate this variance more accurately, and we show empirically that this modification leads to intervals with approximately correct coverage in many examples where traditional cross-validation intervals fail.
△ Less
Submitted 18 July, 2022; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Uncertainty Sets for Image Classifiers using Conformal Prediction
Authors:
Anastasios Angelopoulos,
Stephen Bates,
Jitendra Malik,
Michael I. Jordan
Abstract:
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies an…
▽ More
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies any classifier to output a predictive set containing the true label with a user-specified probability, such as 90%. The algorithm is simple and fast like Platt scaling, but provides a formal finite-sample coverage guarantee for every model and dataset. Our method modifies an existing conformal prediction algorithm to give more stable predictive sets by regularizing the small scores of unlikely classes after Platt scaling. In experiments on both Imagenet and Imagenet-V2 with ResNet-152 and other classifiers, our scheme outperforms existing approaches, achieving coverage with sets that are often factors of 5 to 10 smaller than a stand-alone Platt scaling baseline.
△ Less
Submitted 3 September, 2022; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Nonlinear quotients
Authors:
Sean M. Bates,
William B. Johnson,
Joram Lindenstrauss,
D. Preiss,
Gideon Schechtman
Abstract:
New concepts related to approximating a Lipschitz function between Banach spaces by affine functions are introduced. Results which clarify when such approximations are possible are proved and in some cases a complete characterization of the spaces $X$, $Y$ for which any Lipschitz function from $X$ to $Y$ can be so approximated is obtained. This is applied to the study of Lipschitz and uniform qu…
▽ More
New concepts related to approximating a Lipschitz function between Banach spaces by affine functions are introduced. Results which clarify when such approximations are possible are proved and in some cases a complete characterization of the spaces $X$, $Y$ for which any Lipschitz function from $X$ to $Y$ can be so approximated is obtained. This is applied to the study of Lipschitz and uniform quotient mappings between Banach spaces. It is proved, in particular, that any Banach space which is a uniform quotient of $L_p$, $1<p<\infty$, is already isomorphic to a linear quotient of $L_p$.
△ Less
Submitted 15 April, 1998; v1 submitted 9 November, 1997;
originally announced November 1997.