-
Sharp empirical Bernstein bounds for the variance of bounded random variables
Authors:
Diego Martinez-Taboada,
Aaditya Ramdas
Abstract:
We develop novel empirical Bernstein inequalities for the variance of bounded random variables. Our inequalities hold under constant conditional variance and mean, without further assumptions like independence or identical distribution of the random variables, making them suitable for sequential decision making contexts. The results are instantiated for both the batch setting (where the sample siz…
▽ More
We develop novel empirical Bernstein inequalities for the variance of bounded random variables. Our inequalities hold under constant conditional variance and mean, without further assumptions like independence or identical distribution of the random variables, making them suitable for sequential decision making contexts. The results are instantiated for both the batch setting (where the sample size is fixed) and the sequential setting (where the sample size is a stopping time). Our bounds are asymptotically sharp: when the data are iid, our CI adpats optimally to both unknown mean $μ$ and unknown $\mathbb{V}[(X-μ)^2]$, meaning that the first order term of our CI exactly matches that of the oracle Bernstein inequality which knows those quantities. We compare our results to a widely used (non-sharp) concentration inequality for the variance based on self-bounding random variables, showing both the theoretical gains and improved empirical performance of our approach. We finally extend our methods to work in any separable Hilbert space.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Conformal changepoint localization
Authors:
Sanjit Dandapanthula,
Aaditya Ramdas
Abstract:
Changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable CONCH (CONformal CHangepoint localization) algorithm, which uses a matrix of conformal p-values to produce a confidence interval for a (single) changepoint under the mild as…
▽ More
Changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable CONCH (CONformal CHangepoint localization) algorithm, which uses a matrix of conformal p-values to produce a confidence interval for a (single) changepoint under the mild assumption that the pre-change and post-change distributions are each exchangeable. We exemplify the CONCH algorithm on a variety of synthetic and real-world datasets, including using black-box pre-trained classifiers to detect changes in sequences of images or text.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Conditional independence testing with a single realization of a multivariate nonstationary nonlinear time series
Authors:
Michael Wieck-Sosa,
Michel F. C. Haddad,
Aaditya Ramdas
Abstract:
Identifying relationships among stochastic processes is a key goal in disciplines that deal with complex temporal systems, such as economics. While the standard toolkit for multivariate time series analysis has many advantages, it can be difficult to capture nonlinear dynamics using linear vector autoregressive models. This difficulty has motivated the development of methods for variable selection…
▽ More
Identifying relationships among stochastic processes is a key goal in disciplines that deal with complex temporal systems, such as economics. While the standard toolkit for multivariate time series analysis has many advantages, it can be difficult to capture nonlinear dynamics using linear vector autoregressive models. This difficulty has motivated the development of methods for variable selection, causal discovery, and graphical modeling for nonlinear time series, which routinely employ nonparametric tests for conditional independence. In this paper, we introduce the first framework for conditional independence testing that works with a single realization of a nonstationary nonlinear process. The key technical ingredients are time-varying nonlinear regression, time-varying covariance estimation, and a distribution-uniform strong Gaussian approximation.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper Bounds
Authors:
Shubhada Agrawal,
Aaditya Ramdas
Abstract:
We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and…
▽ More
We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and $\operatorname{KL_{inf}}$ approaches 0 (meaning that the null and alternative sets are not separated) and equals $c \operatorname{KL_{inf}}^{-1} \log \log \operatorname{KL_{inf}}^{-1}$ for a universal constant $c > 0$. We also provide a sufficient condition for matching the upper bounds and show that this condition is met in several special cases. Given past work, these upper and lower bounds are unsurprising in their form; our main contribution is the generality in which they hold, for example, not requiring reference measures or compactness of the classes.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Bringing closure to FDR control: beating the e-Benjamini-Hochberg procedure
Authors:
Ziyu Xu,
Lasse Fischer,
Aaditya Ramdas
Abstract:
False discovery rate (FDR) has been a key metric for error control in multiple hypothesis testing, and many methods have developed for FDR control across a diverse cross-section of settings and applications. We develop a closure principle for all FDR controlling procedures, i.e., we provide a characterization based on e-values for all admissible FDR controlling procedures. A general version of thi…
▽ More
False discovery rate (FDR) has been a key metric for error control in multiple hypothesis testing, and many methods have developed for FDR control across a diverse cross-section of settings and applications. We develop a closure principle for all FDR controlling procedures, i.e., we provide a characterization based on e-values for all admissible FDR controlling procedures. A general version of this closure principle can recover any multiple testing error metric and allows one to choose the error metric post-hoc. We leverage this idea to formulate the closed eBH procedure, a (usually strict) improvement over the eBH procedure for FDR control when provided with e-values. This also yields a closed BY procedure that dominates the Benjamini-Yekutieli (BY) procedure for FDR control with arbitrarily dependent p-values, thus proving that the latter is inadmissibile. We demonstrate the practical performance of our new procedures in simulations.
△ Less
Submitted 22 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
E-variables for hypotheses generated by constraints
Authors:
Martin Larsson,
Aaditya Ramdas,
Johannes Ruf
Abstract:
An e-variable for a family of distributions $\mathcal{P}$ is a nonnegative random variable whose expected value under every distribution in $\mathcal{P}$ is at most one. E-variables have recently been recognized as fundamental objects in hypothesis testing, and a rapidly growing body of work has attempted to derive admissible or optimal e-variables for various families $\mathcal{P}$. In this paper…
▽ More
An e-variable for a family of distributions $\mathcal{P}$ is a nonnegative random variable whose expected value under every distribution in $\mathcal{P}$ is at most one. E-variables have recently been recognized as fundamental objects in hypothesis testing, and a rapidly growing body of work has attempted to derive admissible or optimal e-variables for various families $\mathcal{P}$. In this paper, we study classes $\mathcal{P}$ that are specified by constraints. Simple examples include bounds on the moments, but our general theory covers arbitrary sets of measurable constraints. Our main results characterize the set of all e-variables for such classes, as well as maximal ones. Three case studies illustrate the scope of our theory: finite constraint sets, one-sided sub-$ψ$ distributions, and distributions invariant under a group of symmetries. In particular, we generalize recent results of Clerico (2024a) by dropping all assumptions on the constraints.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
Locally minimax optimal and dimension-agnostic discrete argmin inference
Authors:
Ilmun Kim,
Aaditya Ramdas
Abstract:
This paper tackles a fundamental inference problem: given $n$ observations from a $d$ dimensional vector with unknown mean $\boldsymbolμ$, we must form a confidence set for the index (or indices) corresponding to the smallest component of $\boldsymbolμ$. By duality, we reduce this to testing, for each $r$ in $1,\ldots,d$, whether $μ_r$ is the smallest. Based on the sample splitting and self-normal…
▽ More
This paper tackles a fundamental inference problem: given $n$ observations from a $d$ dimensional vector with unknown mean $\boldsymbolμ$, we must form a confidence set for the index (or indices) corresponding to the smallest component of $\boldsymbolμ$. By duality, we reduce this to testing, for each $r$ in $1,\ldots,d$, whether $μ_r$ is the smallest. Based on the sample splitting and self-normalization approach of Kim and Ramdas (2024), we propose "dimension-agnostic" tests that maintain validity regardless of how $d$ scales with $n$, and regardless of arbitrary ties in $\boldsymbolμ$. Notably, our validity holds under mild moment conditions, requiring little more than finiteness of a second moment, and permitting possibly strong dependence between coordinates. In addition, we establish the local minimax separation rate for this problem, which adapts to the cardinality of a confusion set, and show that the proposed tests attain this rate. Furthermore, we develop robust variants that continue to achieve the same minimax rate under heavy-tailed distributions with only finite second moments. Empirical results on simulated and real data illustrate the strong performance of our approach in terms of type I error control and power compared to existing methods.
△ Less
Submitted 1 May, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
Anytime-valid FDR control with the stopped e-BH procedure
Authors:
Hongjian Wang,
Sanjit Dandapanthula,
Aaditya Ramdas
Abstract:
The recent e-Benjamini-Hochberg (e-BH) procedure for multiple hypothesis testing is known to control the false discovery rate (FDR) under arbitrary dependence between the input e-values. This paper points out an important subtlety when applying the e-BH procedure with e-processes, which are sequential generalizations of e-values (where the data are observed sequentially). Since adaptively stopped…
▽ More
The recent e-Benjamini-Hochberg (e-BH) procedure for multiple hypothesis testing is known to control the false discovery rate (FDR) under arbitrary dependence between the input e-values. This paper points out an important subtlety when applying the e-BH procedure with e-processes, which are sequential generalizations of e-values (where the data are observed sequentially). Since adaptively stopped e-processes are e-values, the e-BH procedure can be repeatedly applied at every time step, and one can continuously monitor the e-processes and the rejection sets obtained. One would hope that the "stopped e-BH procedure" (se-BH) has an FDR guarantee for the rejection set obtained at any stopping time. However, while this is true if the data in different streams are independent, it is not true in full generality, because each stopped e-process is an e-value only for stopping times in its own local filtration, but the se-BH procedure employs a stopping time with respect to a global filtration. This can cause information to leak across time, allowing one stream to know its future by knowing past data of another stream. This paper formulates a simple causal condition under which local e-processes are also global e-processes and thus the se-BH procedure does indeed control the FDR. The condition excludes unobserved confounding from the past and is met under most reasonable scenarios including genomics.
△ Less
Submitted 30 April, 2025; v1 submitted 12 February, 2025;
originally announced February 2025.
-
Nonasymptotic and distribution-uniform Komlós-Major-Tusnády approximation
Authors:
Ian Waudby-Smith,
Martin Larsson,
Aaditya Ramdas
Abstract:
We present nonasymptotic concentration inequalities for sums of independent and identically distributed random variables that yield asymptotic strong Gaussian approximations of Komlós, Major, and Tusnády (KMT) [1975,1976]. The constants appearing in our inequalities are either universal or explicit, and thus as corollaries, they imply distribution-uniform generalizations of the aforementioned KMT…
▽ More
We present nonasymptotic concentration inequalities for sums of independent and identically distributed random variables that yield asymptotic strong Gaussian approximations of Komlós, Major, and Tusnády (KMT) [1975,1976]. The constants appearing in our inequalities are either universal or explicit, and thus as corollaries, they imply distribution-uniform generalizations of the aforementioned KMT approximations. In particular, it is shown that uniform integrability of a random variable's $q^{\text{th}}$ moment is both necessary and sufficient for the KMT approximations to hold uniformly at the rate of $o(n^{1/q})$ for $q > 2$ and that having a uniformly lower bounded Sakhanenko parameter -- equivalently, a uniformly upper-bounded Bernstein parameter -- is both necessary and sufficient for the KMT approximations to hold uniformly at the rate of $O(\log n)$. Instantiating these uniform results for a single probability space yields the analogous results of KMT exactly.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Multiple testing in multi-stream sequential change detection
Authors:
Sanjit Dandapanthula,
Aaditya Ramdas
Abstract:
Multi-stream sequential change detection involves simultaneously monitoring many streams of data and trying to detect when their distributions change, if at all. Here, we theoretically study multiple testing issues that arise from detecting changes in many streams. We point out that any algorithm with finite average run length (ARL) must have a trivial worst-case false detection rate (FDR), family…
▽ More
Multi-stream sequential change detection involves simultaneously monitoring many streams of data and trying to detect when their distributions change, if at all. Here, we theoretically study multiple testing issues that arise from detecting changes in many streams. We point out that any algorithm with finite average run length (ARL) must have a trivial worst-case false detection rate (FDR), family-wise error rate (FWER), per-family error rate (PFER), and global error rate (GER); thus, any attempt to control these Type I error metrics is fundamentally in conflict with the desire for a finite ARL (which is typically necessary in order to have a small detection delay). One of our contributions is to define a new class of metrics which can be controlled, called error over patience (EOP). We propose algorithms that combine the recent e-detector framework (which generalizes the Shiryaev-Roberts and CUSUM methods) with the recent e-Benjamini-Hochberg procedure and e-Bonferroni procedures. We prove that these algorithms control the EOP at any desired level under very general dependence structures on the data within and across the streams. In fact, we prove a more general error control that holds uniformly over all stopping times and provides a smooth trade-off between the conflicting metrics. Additionally, if finiteness of the ARL is forfeited, we show that our algorithms control the worst-case Type I error.
△ Less
Submitted 3 February, 2025; v1 submitted 7 January, 2025;
originally announced January 2025.
-
Mean Estimation in Banach Spaces Under Infinite Variance and Martingale Dependence
Authors:
Justin Whitehouse,
Ben Chugg,
Diego Martinez-Taboada,
Aaditya Ramdas
Abstract:
We consider estimating the shared mean of a sequence of heavy-tailed random variables taking values in a Banach space. In particular, we revisit and extend a simple truncation-based mean estimator first proposed by Catoni and Giulini. While existing truncation-based approaches require a bound on the raw (non-central) second moment of observations, our results hold under a bound on either the centr…
▽ More
We consider estimating the shared mean of a sequence of heavy-tailed random variables taking values in a Banach space. In particular, we revisit and extend a simple truncation-based mean estimator first proposed by Catoni and Giulini. While existing truncation-based approaches require a bound on the raw (non-central) second moment of observations, our results hold under a bound on either the central or non-central $p$th moment for some $p \in (1,2]$. Our analysis thus handles distributions with infinite variance. The main contributions of the paper follow from exploiting connections between truncation-based mean estimation and the concentration of martingales in smooth Banach spaces. We prove two types of time-uniform bounds on the distance between the estimator and unknown mean: line-crossing inequalities, which can be optimized for a fixed sample size $n$, and iterated logarithm inequalities, which match the tightness of line-crossing inequalities at all points in time up to a doubly logarithmic factor in $n$. Our results do not depend on the dimension of the Banach space, hold under martingale dependence, and all constants in the inequalities are known and small.
△ Less
Submitted 24 March, 2025; v1 submitted 17 November, 2024;
originally announced November 2024.
-
Sharp Matrix Empirical Bernstein Inequalities
Authors:
Hongjian Wang,
Aaditya Ramdas
Abstract:
We present two sharp, closed-form empirical Bernstein inequalities for symmetric random matrices with bounded eigenvalues. By sharp, we mean that both inequalities adapt to the unknown variance in a tight manner: the deviation captured by the first-order $1/\sqrt{n}$ term asymptotically matches the matrix Bernstein inequality exactly, including constants, the latter requiring knowledge of the vari…
▽ More
We present two sharp, closed-form empirical Bernstein inequalities for symmetric random matrices with bounded eigenvalues. By sharp, we mean that both inequalities adapt to the unknown variance in a tight manner: the deviation captured by the first-order $1/\sqrt{n}$ term asymptotically matches the matrix Bernstein inequality exactly, including constants, the latter requiring knowledge of the variance. Our first inequality holds for the sample mean of independent matrices, and our second inequality holds for a mean estimator under martingale dependence at stopping times.
△ Less
Submitted 2 April, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Hypothesis testing with e-values
Authors:
Aaditya Ramdas,
Ruodu Wang
Abstract:
This book is written to offer a humble, but unified, treatment of e-values in hypothesis testing. It is organized into three parts: Fundamental Concepts, Core Ideas, and Advanced Topics. The first part includes four chapters that introduce the basic concepts. The second part includes five chapters of core ideas such as universal inference, log-optimality, e-processes, operations on e-values, and e…
▽ More
This book is written to offer a humble, but unified, treatment of e-values in hypothesis testing. It is organized into three parts: Fundamental Concepts, Core Ideas, and Advanced Topics. The first part includes four chapters that introduce the basic concepts. The second part includes five chapters of core ideas such as universal inference, log-optimality, e-processes, operations on e-values, and e-values in multiple testing. The third part contains seven chapters of advanced topics. The book collates important results from a variety of modern papers on e-values and related concepts, and also contains many results not published elsewhere. It offers a coherent and comprehensive picture on a fast-growing research area, and is ready to use as the basis of a graduate course in statistics and related fields.
△ Less
Submitted 4 May, 2025; v1 submitted 30 October, 2024;
originally announced October 2024.
-
Empirical Bernstein in smooth Banach spaces
Authors:
Diego Martinez-Taboada,
Aaditya Ramdas
Abstract:
Existing concentration bounds for bounded vector-valued random variables include extensions of the scalar Hoeffding and Bernstein inequalities. While the latter is typically tighter, it requires knowing a bound on the variance of the random variables. We derive a new vector-valued empirical Bernstein inequality, which makes use of an empirical estimator of the variance instead of the true variance…
▽ More
Existing concentration bounds for bounded vector-valued random variables include extensions of the scalar Hoeffding and Bernstein inequalities. While the latter is typically tighter, it requires knowing a bound on the variance of the random variables. We derive a new vector-valued empirical Bernstein inequality, which makes use of an empirical estimator of the variance instead of the true variance. The bound holds in 2-smooth separable Banach spaces, which include finite dimensional Euclidean spaces and separable Hilbert spaces. The resulting confidence sets are instantiated for both the batch setting (where the sample size is fixed) and the sequential setting (where the sample size is a stopping time). The confidence set width asymptotically exactly matches that achieved by Bernstein in the leading term. The method and supermartingale proof technique combine several tools of Pinelis (1994) and Waudby-Smith and Ramdas (2024).
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters
Authors:
Abhinandan Dalal,
Patrick Blöbaum,
Shiva Kasiviswanathan,
Aaditya Ramdas
Abstract:
Double (debiased) machine learning (DML) has seen widespread use in recent years for learning causal/structural parameters, in part due to its flexibility and adaptability to high-dimensional nuisance functions as well as its ability to avoid bias from regularization or overfitting. However, the classic double-debiased framework is only valid asymptotically for a predetermined sample size, thus la…
▽ More
Double (debiased) machine learning (DML) has seen widespread use in recent years for learning causal/structural parameters, in part due to its flexibility and adaptability to high-dimensional nuisance functions as well as its ability to avoid bias from regularization or overfitting. However, the classic double-debiased framework is only valid asymptotically for a predetermined sample size, thus lacking the flexibility of collecting more data if sharper inference is needed, or stopping data collection early if useful inferences can be made earlier than expected. This can be of particular concern in large scale experimental studies with huge financial costs or human lives at stake, as well as in observational studies where the length of confidence of intervals do not shrink to zero even with increasing sample size due to partial identifiability of a structural parameter. In this paper, we present time-uniform counterparts to the asymptotic DML results, enabling valid inference and confidence intervals for structural parameters to be constructed at any arbitrary (possibly data-dependent) stopping time. We provide conditions which are only slightly stronger than the standard DML conditions, but offer the stronger guarantee for anytime-valid inference. This facilitates the transformation of any existing DML method to provide anytime-valid guarantees with minimal modifications, making it highly adaptable and easy to use. We illustrate our procedure using two instances: a) local average treatment effect in online experiments with non-compliance, and b) partial identification of average treatment effect in observational studies with potential unmeasured confounding.
△ Less
Submitted 10 September, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
Matrix Concentration: Order versus Anti-order
Authors:
Reihaneh Malekian,
Aaditya Ramdas
Abstract:
The matrix Markov inequality by Ahlswede was stated using the Loewner anti-order between positive definite matrices. Wang use this to derive several other Chebyshev and Chernoff-type inequalities (Hoeffding, Bernstein, empirical Bernstein) in the Loewner anti-order, including self-normalized matrix martingale inequalities. These imply upper tail bounds on the maximum eigenvalue, such as those deve…
▽ More
The matrix Markov inequality by Ahlswede was stated using the Loewner anti-order between positive definite matrices. Wang use this to derive several other Chebyshev and Chernoff-type inequalities (Hoeffding, Bernstein, empirical Bernstein) in the Loewner anti-order, including self-normalized matrix martingale inequalities. These imply upper tail bounds on the maximum eigenvalue, such as those developed by Tropp and howard et al. The current paper develops analogs of all these inequalities in the Loewner order, rather than anti-order, by deriving a new matrix Markov inequality. These yield upper tail bounds on the minimum eigenvalue that are a factor of d tighter than the above bounds on the maximum eigenvalue.
△ Less
Submitted 13 August, 2024; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Testing by Betting while Borrowing and Bargaining
Authors:
Hongjian Wang,
Aaditya Ramdas
Abstract:
Testing by betting has been a cornerstone of the game-theoretic statistics literature. In this framework, a betting score (or more generally an e-process), as opposed to a traditional p-value, is used to quantify the evidence against a null hypothesis: the higher the betting score, the more money one has made betting against the null, and thus the larger the evidence that the null is false. A key…
▽ More
Testing by betting has been a cornerstone of the game-theoretic statistics literature. In this framework, a betting score (or more generally an e-process), as opposed to a traditional p-value, is used to quantify the evidence against a null hypothesis: the higher the betting score, the more money one has made betting against the null, and thus the larger the evidence that the null is false. A key ingredient assumed throughout past works is that one cannot bet more money than one currently has. In this paper, we ask what happens if the bettor is allowed to borrow money after going bankrupt, allowing further financial flexibility in this game of hypothesis testing. We propose various definitions of (adjusted) evidence relative to the wealth borrowed, indebted, and accumulated. We also ask what happens if the bettor can "bargain", in order to obtain odds bettor than specified by the null hypothesis. The adjustment of wealth in order to serve as evidence appeals to the characterization of arbitrage, interest rates, and numéraire-adjusted pricing in this setting.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Combining exchangeable p-values
Authors:
Matteo Gasparin,
Ruodu Wang,
Aaditya Ramdas
Abstract:
The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent p-values (for the same hypothesis) into a single p-value. We show that essentially all these existing rules can be strictly improved when the p-values are e…
▽ More
The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent p-values (for the same hypothesis) into a single p-value. We show that essentially all these existing rules can be strictly improved when the p-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well known rules like ``twice the median'' and ``twice the average'', as well as geometric and harmonic means. Exchangeable p-values are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined p-values stabilize. Our work also improves rules for combining arbitrarily dependent p-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the p-values to e-values (using an $α$-dependent calibrator), averaging those e-values, converting to a level-$α$ test using Markov's inequality, and finally obtaining p-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
△ Less
Submitted 20 March, 2025; v1 submitted 4 April, 2024;
originally announced April 2024.
-
The numeraire e-variable and reverse information projection
Authors:
Martin Larsson,
Aaditya Ramdas,
Johannes Ruf
Abstract:
We consider testing a composite null hypothesis $\mathcal{P}$ against a point alternative $\mathsf{Q}$ using e-variables, which are nonnegative random variables $X$ such that $\mathbb{E}_\mathsf{P}[X] \leq 1$ for every $\mathsf{P} \in \mathcal{P}$. This paper establishes a fundamental result: under no conditions whatsoever on $\mathcal{P}$ or $\mathsf{Q}$, there exists a special e-variable $X^*$ t…
▽ More
We consider testing a composite null hypothesis $\mathcal{P}$ against a point alternative $\mathsf{Q}$ using e-variables, which are nonnegative random variables $X$ such that $\mathbb{E}_\mathsf{P}[X] \leq 1$ for every $\mathsf{P} \in \mathcal{P}$. This paper establishes a fundamental result: under no conditions whatsoever on $\mathcal{P}$ or $\mathsf{Q}$, there exists a special e-variable $X^*$ that we call the numeraire, which is strictly positive and satisfies $\mathbb{E}_\mathsf{Q}[X/X^*] \leq 1$ for every other e-variable $X$. In particular, $X^*$ is log-optimal in the sense that $\mathbb{E}_\mathsf{Q}[\log(X/X^*)] \leq 0$. Moreover, $X^*$ identifies a particular sub-probability measure $\mathsf{P}^*$ via the density $d \mathsf{P}^*/d \mathsf{Q} = 1/X^*$. As a result, $X^*$ can be seen as a generalized likelihood ratio of $\mathsf{Q}$ against $\mathcal{P}$. We show that $\mathsf{P}^*$ coincides with the reverse information projection (RIPr) when additional assumptions are made that are required for the latter to exist. Thus $\mathsf{P}^*$ is a natural definition of the RIPr in the absence of any assumptions on $\mathcal{P}$ or $\mathsf{Q}$. In addition to the abstract theory, we provide several tools for finding the numeraire and RIPr in concrete cases. We discuss several nonparametric examples where we can indeed identify the numeraire and RIPr, despite not having a reference measure. Our results have interpretations outside of testing in that they yield the optimal Kelly bet against $\mathcal{P}$ if we believe reality follows $\mathsf{Q}$. We end with a more general optimality theory that goes beyond the ubiquitous logarithmic utility. We focus on certain power utilities, leading to reverse Rényi projections in place of the RIPr, which also always exist.
△ Less
Submitted 3 February, 2025; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Combining Evidence Across Filtrations
Authors:
Yo Joong Choe,
Aaditya Ramdas
Abstract:
In sequential anytime-valid inference, any admissible procedure must be based on e-processes: generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any stopping time. This paper proposes a method for combining e-processes constructed in different filtrations but for the same null. Although e-processes in the same filtration can be combine…
▽ More
In sequential anytime-valid inference, any admissible procedure must be based on e-processes: generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any stopping time. This paper proposes a method for combining e-processes constructed in different filtrations but for the same null. Although e-processes in the same filtration can be combined effortlessly (by averaging), e-processes in different filtrations cannot because their validity in a coarser filtration does not translate to a finer filtration. This issue arises in sequential tests of randomness and independence, as well as in the evaluation of sequential forecasters. We establish that a class of functions called adjusters can lift arbitrary e-processes across filtrations. The result yields a generally applicable "adjust-then-combine" procedure, which we demonstrate on the problem of testing randomness in real-world financial data. Furthermore, we prove a characterization theorem for adjusters that formalizes a sense in which using adjusters is necessary. There are two major implications. First, if we have a powerful e-process in a coarsened filtration, then we readily have a powerful e-process in the original filtration. Second, when we coarsen the filtration to construct an e-process, there is a logarithmic cost to recovering validity in the original filtration.
△ Less
Submitted 15 February, 2025; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Distribution-uniform strong laws of large numbers
Authors:
Ian Waudby-Smith,
Martin Larsson,
Aaditya Ramdas
Abstract:
We revisit the question of whether the strong law of large numbers (SLLN) holds uniformly in a rich family of distributions, culminating in a distribution-uniform generalization of the Marcinkiewicz-Zygmund SLLN. These results can be viewed as extensions of Chung's distribution-uniform SLLN to random variables with uniformly integrable $q^\text{th}$ absolute central moments for $0 < q < 2$. Furthe…
▽ More
We revisit the question of whether the strong law of large numbers (SLLN) holds uniformly in a rich family of distributions, culminating in a distribution-uniform generalization of the Marcinkiewicz-Zygmund SLLN. These results can be viewed as extensions of Chung's distribution-uniform SLLN to random variables with uniformly integrable $q^\text{th}$ absolute central moments for $0 < q < 2$. Furthermore, we show that uniform integrability of the $q^\text{th}$ moment is both sufficient and necessary for the SLLN to hold uniformly at the Marcinkiewicz-Zygmund rate of $n^{1/q - 1}$. These proofs centrally rely on novel distribution-uniform analogues of some familiar almost sure convergence results including the Khintchine-Kolmogorov convergence theorem, Kolmogorov's three-series theorem, a stochastic generalization of Kronecker's lemma, and the Borel-Cantelli lemmas. We also consider the non-identically distributed case.
△ Less
Submitted 21 October, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Positive Semidefinite Matrix Supermartingales
Authors:
Hongjian Wang,
Aaditya Ramdas
Abstract:
We explore the asymptotic convergence and nonasymptotic maximal inequalities of supermartingales and backward submartingales in the space of positive semidefinite matrices. These are natural matrix analogs of scalar nonnegative supermartingales and backward nonnegative submartingales, whose convergence and maximal inequalities are the theoretical foundations for a wide and ever-growing body of res…
▽ More
We explore the asymptotic convergence and nonasymptotic maximal inequalities of supermartingales and backward submartingales in the space of positive semidefinite matrices. These are natural matrix analogs of scalar nonnegative supermartingales and backward nonnegative submartingales, whose convergence and maximal inequalities are the theoretical foundations for a wide and ever-growing body of results in statistics, econometrics, and theoretical computer science.
Our results lead to new concentration inequalities for either martingale dependent or exchangeable random symmetric matrices under a variety of tail conditions, encompassing now-standard Chernoff bounds to self-normalized heavy-tailed settings. Further, these inequalities are usually expressed in the Loewner order, are sometimes valid simultaneously for all sample sizes or at an arbitrary data-dependent stopping time, and can often be tightened via an external randomization factor.
△ Less
Submitted 28 January, 2025; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Graph fission and cross-validation
Authors:
James Leiner,
Aaditya Ramdas
Abstract:
We introduce a technique called graph fission which takes in a graph which potentially contains only one observation per node (whose distribution lies in a known class) and produces two (or more) independent graphs with the same node/edge set in a way that splits the original graph's information amongst them in any desired proportion. Our proposal builds on data fission/thinning, a method that use…
▽ More
We introduce a technique called graph fission which takes in a graph which potentially contains only one observation per node (whose distribution lies in a known class) and produces two (or more) independent graphs with the same node/edge set in a way that splits the original graph's information amongst them in any desired proportion. Our proposal builds on data fission/thinning, a method that uses external randomization to create independent copies of an unstructured dataset. We extend this idea to the graph setting where there may be latent structure between observations. We demonstrate the utility of this framework via two applications: inference after structural trend estimation on graphs and a model selection procedure we term "graph cross-validation".
△ Less
Submitted 29 January, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
Time-Uniform Confidence Spheres for Means of Random Vectors
Authors:
Ben Chugg,
Hongjian Wang,
Aaditya Ramdas
Abstract:
We study sequential mean estimation in $\mathbb{R}^d$. In particular, we derive time-uniform confidence spheres -- confidence sphere sequences (CSSs) -- which contain the mean of random vectors with high probability simultaneously across all sample sizes. Our results include a dimension-free CSS for log-concave random vectors, a dimension-free CSS for sub-Gaussian random vectors, and CSSs for sub-…
▽ More
We study sequential mean estimation in $\mathbb{R}^d$. In particular, we derive time-uniform confidence spheres -- confidence sphere sequences (CSSs) -- which contain the mean of random vectors with high probability simultaneously across all sample sizes. Our results include a dimension-free CSS for log-concave random vectors, a dimension-free CSS for sub-Gaussian random vectors, and CSSs for sub-$ψ$ random vectors (which includes sub-gamma, sub-Poisson, and sub-exponential distributions). Many of our results are optimal. For sub-Gaussian distributions we also provide a CSS which tracks a time-varying mean, generalizing Robbins' mixture approach to the multivariate setting. Finally, we provide several CSSs for heavy-tailed random vectors (two moments only). Our bounds hold under a martingale assumption on the mean and do not require that the observations be iid. Our work is based on PAC-Bayesian theory and inspired by an approach of Catoni and Giulini.
△ Less
Submitted 14 May, 2025; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Distribution-uniform anytime-valid sequential inference
Authors:
Ian Waudby-Smith,
Edward H. Kennedy,
Aaditya Ramdas
Abstract:
Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stopping t…
▽ More
Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stopping times -- have been justified nonasymptotically. Nevertheless, asymptotic procedures such as those based on the central limit theorem occupy an important part of statistical toolbox due to their simplicity, universality, and weak assumptions. While recent work has derived asymptotic analogues of anytime-valid methods with the aforementioned benefits, these were not shown to be $\mathcal{P}$-uniform, meaning that their asymptotics are not uniformly valid in a class of distributions $\mathcal{P}$. Indeed, the anytime-valid inference literature currently has no central limit theory to draw from that is both uniform in $\mathcal{P}$ and in the sample size $n$. This paper fills that gap by deriving a novel $\mathcal{P}$-uniform strong Gaussian approximation theorem. We apply some of these results to obtain an anytime-valid test of conditional independence without the Model-X assumption, as well as a $\mathcal{P}$-uniform law of the iterated logarithm.
△ Less
Submitted 18 April, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Time-Uniform Self-Normalized Concentration for Vector-Valued Processes
Authors:
Justin Whitehouse,
Zhiwei Steven Wu,
Aaditya Ramdas
Abstract:
Self-normalized processes arise naturally in many learning-related tasks. While self-normalized concentration has been extensively studied for scalar-valued processes, there are few results for multidimensional processes outside of the sub-Gaussian setting. In this work, we construct a general, self-normalized inequality for multivariate processes that satisfy a simple yet broad sub-$ψ$ tail condi…
▽ More
Self-normalized processes arise naturally in many learning-related tasks. While self-normalized concentration has been extensively studied for scalar-valued processes, there are few results for multidimensional processes outside of the sub-Gaussian setting. In this work, we construct a general, self-normalized inequality for multivariate processes that satisfy a simple yet broad sub-$ψ$ tail condition, which generalizes assumptions based on cumulant generating functions. From this general inequality, we derive an upper law of the iterated logarithm for sub-$ψ$ vector-valued processes, which is tight up to small constants. We show how our inequality can be leveraged to derive a variety of novel, self-normalized concentration inequalities under both light and heavy-tailed observations. Further, we provide applications in prototypical statistical tasks, such as parameter estimation in online linear regression, autoregressive modeling, and bounded mean estimation via a new (multivariate) empirical Bernstein concentration inequality.
△ Less
Submitted 30 April, 2025; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance
Authors:
Hongjian Wang,
Aaditya Ramdas
Abstract:
In 1976, Lai constructed a nontrivial confidence sequence for the mean $μ$ of a Gaussian distribution with unknown variance $σ^2$. Curiously, he employed both an improper (right Haar) mixture over $σ$ and an improper (flat) mixture over $μ$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While thi…
▽ More
In 1976, Lai constructed a nontrivial confidence sequence for the mean $μ$ of a Gaussian distribution with unknown variance $σ^2$. Curiously, he employed both an improper (right Haar) mixture over $σ$ and an improper (flat) mixture over $μ$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an "e-process" (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $σ$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious polynomial dependence on the error probability $α$ that we prove to be not only unavoidable, but (for universal inference) even better than the classical fixed-sample t-test. Numerical experiments are provided along the way to compare and contrast the various approaches, including some recent suboptimal ones.
△ Less
Submitted 6 November, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
On the near-optimality of betting confidence sets for bounded means
Authors:
Shubhanshu Shekhar,
Aaditya Ramdas
Abstract:
Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-base…
▽ More
Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-based approach for defining CIs and their time-uniform variants called confidence sequences (CSs), has been shown to be empirically superior to the classical methods. In this paper, we provide theoretical justification for this improved empirical performance of betting CIs and CSs.
Our main contributions are as follows: (i) We first compare CIs using the values of their first-order asymptotic widths (scaled by $\sqrt{n}$), and show that the betting CI of Waudby-Smith and Ramdas (2023) has a smaller limiting width than existing empirical Bernstein (EB)-CIs. (ii) Next, we establish two lower bounds that characterize the minimum width achievable by any method for constructing CIs/CSs in terms of certain inverse information projections. (iii) Finally, we show that the betting CI and CS match the fundamental limits, modulo an additive logarithmic term and a multiplicative constant. Overall these results imply that the betting CI~(and CS) admit stronger theoretical guarantees than the existing state-of-the-art EB-CI~(and CS); both in the asymptotic and finite-sample regimes.
△ Less
Submitted 24 November, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Reducing sequential change detection to sequential estimation
Authors:
Shubhanshu Shekhar,
Aaditya Ramdas
Abstract:
We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $θ$ of the data stream distribution that has small detection delay, but guarantees control on the frequency of false alarms in the absence of changes. In this paper, we describe a simple reduction from sequential change detection to sequential estimati…
▽ More
We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $θ$ of the data stream distribution that has small detection delay, but guarantees control on the frequency of false alarms in the absence of changes. In this paper, we describe a simple reduction from sequential change detection to sequential estimation using confidence sequences: we begin a new $(1-α)$-confidence sequence at each time step, and proclaim a change when the intersection of all active confidence sequences becomes empty. We prove that the average run length is at least $1/α$, resulting in a change detection scheme with minimal structural assumptions~(thus allowing for possibly dependent observations, and nonparametric distribution classes), but strong guarantees. Our approach bears an interesting parallel with the reduction from change detection to sequential testing of Lorden (1971) and the e-detector of Shin et al. (2022).
△ Less
Submitted 24 November, 2023; v1 submitted 16 September, 2023;
originally announced September 2023.
-
On the Sublinear Regret of GP-UCB
Authors:
Justin Whitehouse,
Zhiwei Steven Wu,
Aaditya Ramdas
Abstract:
In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (…
▽ More
In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Matérn kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Matérn kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel $k$. Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.
△ Less
Submitted 14 August, 2023; v1 submitted 14 July, 2023;
originally announced July 2023.
-
On the existence of powerful p-values and e-values for composite hypotheses
Authors:
Zhenyuan Zhang,
Aaditya Ramdas,
Ruodu Wang
Abstract:
Given a composite null $ \mathcal P$ and composite alternative $ \mathcal Q$, when and how can we construct a p-value whose distribution is exactly uniform under the null, and stochastically smaller than uniform under the alternative? Similarly, when and how can we construct an e-value whose expectation exactly equals one under the null, but its expected logarithm under the alternative is positive…
▽ More
Given a composite null $ \mathcal P$ and composite alternative $ \mathcal Q$, when and how can we construct a p-value whose distribution is exactly uniform under the null, and stochastically smaller than uniform under the alternative? Similarly, when and how can we construct an e-value whose expectation exactly equals one under the null, but its expected logarithm under the alternative is positive? We answer these basic questions, and other related ones, when $ \mathcal P$ and $ \mathcal Q$ are convex polytopes (in the space of probability measures). We prove that such constructions are possible if and only if $ \mathcal Q$ does not intersect the span of $ \mathcal P$. If the p-value is allowed to be stochastically larger than uniform under $P\in \mathcal P$, and the e-value can have expectation at most one under $P\in \mathcal P$, then it is achievable whenever $ \mathcal P$ and $ \mathcal Q$ are disjoint. More generally, even when $ \mathcal P$ and $ \mathcal Q$ are not polytopes, we characterize the existence of a bounded nontrivial e-variable whose expectation exactly equals one under any $P \in \mathcal P$. The proofs utilize recently developed techniques in simultaneous optimal transport. A key role is played by coarsening the filtration: sometimes, no such p-value or e-value exists in the richest data filtration, but it does exist in some reduced filtration, and our work provides the first general characterization of this phenomenon. We also provide an iterative construction that explicitly constructs such processes, and under certain conditions it finds the one that grows fastest under a specific alternative $Q$. We discuss implications for the construction of composite nonnegative (super)martingales, and end with some conjectures and open problems.
△ Less
Submitted 30 November, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Risk-limiting Financial Audits via Weighted Sampling without Replacement
Authors:
Shubhanshu Shekhar,
Ziyu Xu,
Zachary C. Lipton,
Pierre J. Liang,
Aaditya Ramdas
Abstract:
We introduce the notion of a risk-limiting financial auditing (RLFA): given $N$ transactions, the goal is to estimate the total misstated monetary fraction~($m^*$) to a given accuracy $ε$, with confidence $1-δ$. We do this by constructing new confidence sequences (CSs) for the weighted average of $N$ unknown values, based on samples drawn without replacement according to a (randomized) weighted sa…
▽ More
We introduce the notion of a risk-limiting financial auditing (RLFA): given $N$ transactions, the goal is to estimate the total misstated monetary fraction~($m^*$) to a given accuracy $ε$, with confidence $1-δ$. We do this by constructing new confidence sequences (CSs) for the weighted average of $N$ unknown values, based on samples drawn without replacement according to a (randomized) weighted sampling scheme. Using the idea of importance weighting to construct test martingales, we first develop a framework to construct CSs for arbitrary sampling strategies. Next, we develop methods to improve the quality of CSs by incorporating side information about the unknown values associated with each item. We show that when the side information is sufficiently predictive, it can directly drive the sampling. Addressing the case where the accuracy is unknown a priori, we introduce a method that incorporates side information via control variates. Crucially, our construction is adaptive: if the side information is highly predictive of the unknown misstated amounts, then the benefits of incorporating it are significant; but if the side information is uncorrelated, our methods learn to ignore it. Our methods recover state-of-the-art bounds for the special case when the weights are equal, which has already found applications in election auditing. The harder weighted case solves our more challenging problem of AI-assisted financial auditing.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Sequential Predictive Two-Sample and Independence Testing
Authors:
Aleksandr Podkopaev,
Aaditya Ramdas
Abstract:
We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data, while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and th…
▽ More
We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data, while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and their wealth measures evidence against the null hypothesis. While recently developed kernel-based betting strategies often work well on simple distributions, selecting a suitable kernel for high-dimensional or structured data, such as images, is often nontrivial. To address this drawback, we design prediction-based betting strategies that rely on the following fact: if a sequentially updated predictor starts to consistently determine (a) which distribution an instance is drawn from, or (b) whether an instance is drawn from the joint distribution or the product of the marginal distributions (the latter produced by external randomization), it provides evidence against the two-sample or independence nulls respectively. We empirically demonstrate the superiority of our tests over kernel-based approaches under structured settings. Our tests can be applied beyond the case of independent and identically distributed data, remaining valid and powerful even when the data distribution drifts over time.
△ Less
Submitted 19 July, 2023; v1 submitted 28 April, 2023;
originally announced May 2023.
-
Online Platt Scaling with Calibeating
Authors:
Chirag Gupta,
Aaditya Ramdas
Abstract:
We present an online post-hoc calibration method, called Online Platt Scaling (OPS), which combines the Platt scaling technique with online logistic regression. We demonstrate that OPS smoothly adapts between i.i.d. and non-i.i.d. settings with distribution drift. Further, in scenarios where the best Platt scaling model is itself miscalibrated, we enhance OPS by incorporating a recently developed…
▽ More
We present an online post-hoc calibration method, called Online Platt Scaling (OPS), which combines the Platt scaling technique with online logistic regression. We demonstrate that OPS smoothly adapts between i.i.d. and non-i.i.d. settings with distribution drift. Further, in scenarios where the best Platt scaling model is itself miscalibrated, we enhance OPS by incorporating a recently developed technique called calibeating to make it more robust. Theoretically, our resulting OPS+calibeating method is guaranteed to be calibrated for adversarial outcome sequences. Empirically, it is effective on a range of synthetic and real-world datasets, with and without distribution drifts, achieving superior performance without hyperparameter tuning. Finally, we extend all OPS ideas to the beta scaling method.
△ Less
Submitted 16 August, 2023; v1 submitted 28 April, 2023;
originally announced May 2023.
-
De Finetti's theorem and related results for infinite weighted exchangeable sequences
Authors:
Rina Foygel Barber,
Emmanuel J. Candes,
Aaditya Ramdas,
Ryan J. Tibshirani
Abstract:
De Finetti's theorem, also called the de Finetti-Hewitt-Savage theorem, is a foundational result in probability and statistics. Roughly, it says that an infinite sequence of exchangeable random variables can always be written as a mixture of independent and identically distributed (i.i.d.) sequences of random variables. In this paper, we consider a weighted generalization of exchangeability that a…
▽ More
De Finetti's theorem, also called the de Finetti-Hewitt-Savage theorem, is a foundational result in probability and statistics. Roughly, it says that an infinite sequence of exchangeable random variables can always be written as a mixture of independent and identically distributed (i.i.d.) sequences of random variables. In this paper, we consider a weighted generalization of exchangeability that allows for weight functions to modify the individual distributions of the random variables along the sequence, provided that -- modulo these weight functions -- there is still some common exchangeable base measure. We study conditions under which a de Finetti-type representation exists for weighted exchangeable sequences, as a mixture of distributions which satisfy a weighted form of the i.i.d. property. Our approach establishes a nested family of conditions that lead to weighted extensions of other well-known related results as well, in particular, extensions of the zero-one law and the law of large numbers.
△ Less
Submitted 27 November, 2023; v1 submitted 8 April, 2023;
originally announced April 2023.
-
Randomized and Exchangeable Improvements of Markov's, Chebyshev's and Chernoff's Inequalities
Authors:
Aaditya Ramdas,
Tudor Manole
Abstract:
We present simple randomized and exchangeable improvements of Markov's inequality, as well as Chebyshev's inequality and Chernoff bounds. Our variants are never worse and typically strictly more powerful than the original inequalities. The proofs are short and elementary, and can easily yield similarly randomized or exchangeable versions of a host of other inequalities that employ Markov's inequal…
▽ More
We present simple randomized and exchangeable improvements of Markov's inequality, as well as Chebyshev's inequality and Chernoff bounds. Our variants are never worse and typically strictly more powerful than the original inequalities. The proofs are short and elementary, and can easily yield similarly randomized or exchangeable versions of a host of other inequalities that employ Markov's inequality as an intermediate step. We point out some simple statistical applications involving tests that combine dependent e-values. In particular, we uniformly improve the power of universal inference, and obtain tighter betting-based nonparametric confidence intervals. Simulations reveal nontrivial gains in power (and no losses) in a variety of settings.
△ Less
Submitted 9 May, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
The extended Ville's inequality for nonintegrable nonnegative supermartingales
Authors:
Hongjian Wang,
Aaditya Ramdas
Abstract:
Following the initial work by Robbins, we rigorously present an extended theory of nonnegative supermartingales, requiring neither integrability nor finiteness. In particular, we derive a key maximal inequality foreshadowed by Robbins, which we call the extended Ville's inequality, that strengthens the classical Ville's inequality (for integrable nonnegative supermartingales), and also applies to…
▽ More
Following the initial work by Robbins, we rigorously present an extended theory of nonnegative supermartingales, requiring neither integrability nor finiteness. In particular, we derive a key maximal inequality foreshadowed by Robbins, which we call the extended Ville's inequality, that strengthens the classical Ville's inequality (for integrable nonnegative supermartingales), and also applies to our nonintegrable setting. We derive an extension of the method of mixtures, which applies to $σ$-finite mixtures of our extended nonnegative supermartingales. We present some implications of our theory for sequential statistics, such as the use of improper mixtures (priors) in deriving nonparametric confidence sequences and (extended) e-processes.
△ Less
Submitted 8 October, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
A unified recipe for deriving (time-uniform) PAC-Bayes bounds
Authors:
Ben Chugg,
Hongjian Wang,
Aaditya Ramdas
Abstract:
We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixture…
▽ More
We present a unified framework for deriving PAC-Bayesian generalization bounds. Unlike most previous literature on this topic, our bounds are anytime-valid (i.e., time-uniform), meaning that they hold at all stopping times, not only for a fixed sample size. Our approach combines four tools in the following order: (a) nonnegative supermartingales or reverse submartingales, (b) the method of mixtures, (c) the Donsker-Varadhan formula (or other convex duality principles), and (d) Ville's inequality. Our main result is a PAC-Bayes theorem which holds for a wide class of discrete stochastic processes. We show how this result implies time-uniform versions of well-known classical PAC-Bayes bounds, such as those of Seeger, McAllester, Maurer, and Catoni, in addition to many recent bounds. We also present several novel bounds. Our framework also enables us to relax traditional assumptions; in particular, we consider nonstationary loss functions and non-i.i.d. data. In sum, we unify the derivation of past bounds and ease the search for future bounds: one may simply check if our supermartingale or submartingale conditions are met and, if so, be guaranteed a (time-uniform) PAC-Bayes bound.
△ Less
Submitted 3 January, 2024; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Sequential change detection via backward confidence sequences
Authors:
Shubhanshu Shekhar,
Aaditya Ramdas
Abstract:
We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $θ$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $θ$, then we can also successfully perform SCD for $θ$. This is accomplished by checking if two CSs…
▽ More
We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $θ$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $θ$, then we can also successfully perform SCD for $θ$. This is accomplished by checking if two CSs -- one forwards and the other backwards -- ever fail to intersect. Since the literature on CSs has been rapidly evolving recently, the reduction provided in this paper immediately solves several old and new change detection problems. Further, our "backward CS", constructed by reversing time, is new and potentially of independent interest. We provide strong nonasymptotic guarantees on the frequency of false alarms and detection delay, and demonstrate numerical effectiveness on several problems.
△ Less
Submitted 5 February, 2023;
originally announced February 2023.
-
Huber-Robust Confidence Sequences
Authors:
Hongjian Wang,
Aaditya Ramdas
Abstract:
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times. This paper presents confidence sequences for a univariate mean of an unknown distribution with a known upper bound on the $p$-th central moment ($p$ > 1), but allowing for (at most) $ε$ fraction of arbitrary distribution corruption, as in Huber's contamination m…
▽ More
Confidence sequences are confidence intervals that can be sequentially tracked, and are valid at arbitrary data-dependent stopping times. This paper presents confidence sequences for a univariate mean of an unknown distribution with a known upper bound on the $p$-th central moment ($p$ > 1), but allowing for (at most) $ε$ fraction of arbitrary distribution corruption, as in Huber's contamination model. We do this by designing new robust exponential supermartingales, and show that the resulting confidence sequences attain the optimal width achieved in the nonsequential setting. Perhaps surprisingly, the constant margin between our sequential result and the lower bound is smaller than even fixed-time robust confidence intervals based on the trimmed mean, for example. Since confidence sequences are a common tool used within A/B/n testing and bandits, these results open the door to sequential experimentation that is robust to outliers and adversarial corruptions.
△ Less
Submitted 7 February, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
A Sequential Test for Log-Concavity
Authors:
Aditya Gangrade,
Alessandro Rinaldo,
Aaditya Ramdas
Abstract:
On observing a sequence of i.i.d.\ data with distribution $P$ on $\mathbb{R}^d$, we ask the question of how one can test the null hypothesis that $P$ has a log-concave density. This paper proves one interesting negative and positive result: the non-existence of test (super)martingales, and the consistency of universal inference. To elaborate, the set of log-concave distributions $\mathcal{L}$ is a…
▽ More
On observing a sequence of i.i.d.\ data with distribution $P$ on $\mathbb{R}^d$, we ask the question of how one can test the null hypothesis that $P$ has a log-concave density. This paper proves one interesting negative and positive result: the non-existence of test (super)martingales, and the consistency of universal inference. To elaborate, the set of log-concave distributions $\mathcal{L}$ is a nonparametric class, which contains the set $\mathcal G$ of all possible Gaussians with any mean and covariance. Developing further the recent geometric concept of fork-convexity, we first prove that there do no exist any nontrivial test martingales or test supermartingales for $\mathcal G$ (a process that is simultaneously a nonnegative supermartingale for every distribution in $\mathcal G$), and hence also for its superset $\mathcal{L}$. Due to this negative result, we turn our attention to constructing an e-process -- a process whose expectation at any stopping time is at most one, under any distribution in $\mathcal{L}$ -- which yields a level-$α$ test by simply thresholding at $1/α$. We take the approach of universal inference, which avoids intractable likelihood asymptotics by taking the ratio of a nonanticipating likelihood over alternatives against the maximum likelihood under the null. Despite its conservatism, we show that the resulting test is consistent (power one), and derive its power against Hellinger alternatives. To the best of our knowledge, there is no other e-process or sequential test for $\mathcal{L}$.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Multiple testing under negative dependence
Authors:
Ziyu Chi,
Aaditya Ramdas,
Ruodu Wang
Abstract:
The multiple testing literature has primarily dealt with three types of dependence assumptions between p-values: independence, positive regression dependence, and arbitrary dependence. In this paper, we provide what we believe are the first theoretical results under various notions of negative dependence (negative Gaussian dependence, negative regression dependence, negative association, negative…
▽ More
The multiple testing literature has primarily dealt with three types of dependence assumptions between p-values: independence, positive regression dependence, and arbitrary dependence. In this paper, we provide what we believe are the first theoretical results under various notions of negative dependence (negative Gaussian dependence, negative regression dependence, negative association, negative orthant dependence and weak negative dependence). These include the Simes global null test and the Benjamini-Hochberg procedure, which are known experimentally to be anti-conservative under negative dependence. The anti-conservativeness of these procedures is bounded by factors smaller than that under arbitrary dependence (in particular, by factors independent of the number of hypotheses). We also provide new results about negatively dependent e-values, and provide several examples as to when negative dependence may arise. Our proofs are elementary and short, thus amenable to extensions.
△ Less
Submitted 8 May, 2024; v1 submitted 19 December, 2022;
originally announced December 2022.
-
A Permutation-Free Kernel Independence Test
Authors:
Shubhanshu Shekhar,
Ilmun Kim,
Aaditya Ramdas
Abstract:
In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of…
▽ More
In nonparametric independence testing, we observe i.i.d.\ data $\{(X_i,Y_i)\}_{i=1}^n$, where $X \in \mathcal{X}, Y \in \mathcal{Y}$ lie in any general spaces, and we wish to test the null that $X$ is independent of $Y$. Modern test statistics such as the kernel Hilbert-Schmidt Independence Criterion (HSIC) and Distance Covariance (dCov) have intractable null distributions due to the degeneracy of the underlying U-statistics. Thus, in practice, one often resorts to using permutation testing, which provides a nonasymptotic guarantee at the expense of recalculating the quadratic-time statistics (say) a few hundred times. This paper provides a simple but nontrivial modification of HSIC and dCov (called xHSIC and xdCov, pronounced ``cross'' HSIC/dCov) so that they have a limiting Gaussian distribution under the null, and thus do not require permutations. This requires building on the newly developed theory of cross U-statistics by Kim and Ramdas (2020), and in particular developing several nontrivial extensions of the theory in Shekhar et al. (2022), which developed an analogous permutation-free kernel two-sample test. We show that our new tests, like the originals, are consistent against fixed alternatives, and minimax rate optimal against smooth local alternatives. Numerical simulations demonstrate that compared to the full dCov or HSIC, our variants have the same power up to a $\sqrt 2$ factor, giving practitioners a new option for large problems or data-analysis pipelines where computation, not sample size, could be the bottleneck.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Sequential Kernelized Independence Testing
Authors:
Aleksandr Podkopaev,
Patrick Blöbaum,
Shiva Prasad Kasiviswanathan,
Aaditya Ramdas
Abstract:
Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), he…
▽ More
Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data.
△ Less
Submitted 19 July, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
A Permutation-free Kernel Two-Sample Test
Authors:
Shubhanshu Shekhar,
Ilmun Kim,
Aaditya Ramdas
Abstract:
The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$α$ test, one usually selects the rejection threshold as the $(1-α)$-quantile of the perm…
▽ More
The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$α$ test, one usually selects the rejection threshold as the $(1-α)$-quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed alternative, and when using the Gaussian kernel, it has minimax rate-optimal power against local alternatives. For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power.
△ Less
Submitted 4 February, 2023; v1 submitted 27 November, 2022;
originally announced November 2022.
-
Anytime-valid off-policy inference for contextual bandits
Authors:
Ian Waudby-Smith,
Lili Wu,
Aaditya Ramdas,
Nikos Karampatziakis,
Paul Mineiro
Abstract:
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to actions $A_t$ in an attempt to maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counte…
▽ More
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to actions $A_t$ in an attempt to maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counterfactual ones: for example, it is often of interest to estimate the properties of a hypothetical policy that is different from the logging policy that was used to collect the data -- a problem known as ``off-policy evaluation'' (OPE). Using modern martingale techniques, we present a comprehensive framework for OPE inference that relax unnecessary conditions made in some past works, significantly improving on them both theoretically and empirically. Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post-hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time-series (such as if they are drifting over time). More concretely, we derive confidence sequences for various functionals of interest in OPE. These include doubly robust ones for time-varying off-policy mean reward values, but also confidence bands for the entire cumulative distribution function of the off-policy reward distribution. All of our methods (a) are valid at arbitrary stopping times (b) only make nonparametric assumptions, (c) do not require importance weights to be uniformly bounded and if they are, we do not need to know these bounds, and (d) adapt to the empirical variance of our estimators. In summary, our methods enable anytime-valid off-policy inference using adaptively collected contextual bandit data.
△ Less
Submitted 15 August, 2024; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Game-theoretic statistics and safe anytime-valid inference
Authors:
Aaditya Ramdas,
Peter Grünwald,
Vladimir Vovk,
Glenn Shafer
Abstract:
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative…
▽ More
Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty -- e-processes for testing and confidence sequences for estimation -- that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.
△ Less
Submitted 17 June, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
E-values as unnormalized weights in multiple testing
Authors:
Nikolaos Ignatiadis,
Ruodu Wang,
Aaditya Ramdas
Abstract:
We study how to combine p-values and e-values, and design multiple testing procedures where both p-values and e-values are available for every hypothesis. Our results provide a new perspective on multiple testing with data-driven weights: while standard weighted multiple testing methods require the weights to deterministically add up to the number of hypotheses being tested, we show that this norm…
▽ More
We study how to combine p-values and e-values, and design multiple testing procedures where both p-values and e-values are available for every hypothesis. Our results provide a new perspective on multiple testing with data-driven weights: while standard weighted multiple testing methods require the weights to deterministically add up to the number of hypotheses being tested, we show that this normalization is not required when the weights are e-values that are independent of the p-values. Such e-values can be obtained in the meta-analysis setting wherein a primary dataset is used to compute p-values, and an independent secondary dataset is used to compute e-values. Going beyond meta-analysis, we showcase settings wherein independent e-values and p-values can be constructed on a single dataset itself. Our procedures can result in a substantial increase in power, especially if the non-null hypotheses have e-values much larger than one.
△ Less
Submitted 18 July, 2023; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Post-selection inference for e-value based confidence intervals
Authors:
Ziyu Xu,
Ruodu Wang,
Aaditya Ramdas
Abstract:
Suppose that one can construct a valid $(1-δ)$-confidence interval (CI) for each of $K$ parameters of potential interest. If a data analyst uses an arbitrary data-dependent criterion to select some subset $S$ of parameters, then the aforementioned CIs for the selected parameters are no longer valid due to selection bias. We design a new method to adjust the intervals in order to control the false…
▽ More
Suppose that one can construct a valid $(1-δ)$-confidence interval (CI) for each of $K$ parameters of potential interest. If a data analyst uses an arbitrary data-dependent criterion to select some subset $S$ of parameters, then the aforementioned CIs for the selected parameters are no longer valid due to selection bias. We design a new method to adjust the intervals in order to control the false coverage rate (FCR). The main established method is the "BY procedure" by Benjamini and Yekutieli (JASA, 2005). The BY guarantees require certain restrictions on the selection criterion and on the dependence between the CIs. We propose a new simple method which, in contrast, is valid under any dependence structure between the original CIs, and any (unknown) selection criterion, but which only applies to a special, yet broad, class of CIs that we call e-CIs. To elaborate, our procedure simply reports $(1-δ|S|/K)$-CIs for the selected parameters, and we prove that it controls the FCR at $δ$ for confidence intervals that implicitly invert e-values; examples include those constructed via supermartingale methods, via universal inference, or via Chernoff-style bounds, among others. The e-BY procedure is admissible, and recovers the BY procedure as a special case via a particular calibrator. Our work also has implications for post-selection inference in sequential settings, since it applies at stopping times, to continuously-monitored confidence sequences, and under bandit sampling. We demonstrate the efficacy of our procedure using numerical simulations and real A/B testing data from Twitter.
△ Less
Submitted 27 February, 2024; v1 submitted 23 March, 2022;
originally announced March 2022.
-
A composite generalization of Ville's martingale theorem
Authors:
Johannes Ruf,
Martin Larsson,
Wouter M. Koolen,
Aaditya Ramdas
Abstract:
We provide a composite version of Ville's theorem that an event has zero measure if and only if there exists a nonnegative martingale which explodes to infinity when that event occurs. This is a classic result connecting measure-theoretic probability to the sequence-by-sequence game-theoretic probability, recently developed by Shafer and Vovk. Our extension of Ville's result involves appropriate c…
▽ More
We provide a composite version of Ville's theorem that an event has zero measure if and only if there exists a nonnegative martingale which explodes to infinity when that event occurs. This is a classic result connecting measure-theoretic probability to the sequence-by-sequence game-theoretic probability, recently developed by Shafer and Vovk. Our extension of Ville's result involves appropriate composite generalizations of nonnegative martingales and measure-zero events: these are respectively provided by ``e-processes'', and a new inverse capital outer measure. We then develop a novel line-crossing inequality for sums of random variables which are only required to have a finite first moment, which we use to prove a composite version of the strong law of large numbers (SLLN). This allows us to show that violation of the SLLN is an event of outer measure zero and that our e-process explodes to infinity on every such violating sequence, while this is provably not achievable with a nonnegative (super)martingale.
△ Less
Submitted 3 May, 2023; v1 submitted 8 March, 2022;
originally announced March 2022.