-
Some Results on Generalized Familywise Error Rate Controlling Procedures under Dependence
Authors:
Monitirtha Dey,
Subir Kumar Bhandari
Abstract:
The topic of multiple hypotheses testing now has a potpourri of novel theories and ubiquitous applications in diverse scientific fields. However, the universal utility of this field often hinders the possibility of having a generalized theory that accommodates every scenario. This tradeoff is better reflected through the lens of dependence, a central piece behind the theoretical and applied develo…
▽ More
The topic of multiple hypotheses testing now has a potpourri of novel theories and ubiquitous applications in diverse scientific fields. However, the universal utility of this field often hinders the possibility of having a generalized theory that accommodates every scenario. This tradeoff is better reflected through the lens of dependence, a central piece behind the theoretical and applied developments of multiple testing. Although omnipresent in many scientific avenues, the nature and extent of dependence vary substantially with the context and complexity of the particular scenario. Positive dependence is the norm in testing many treatments versus a single control or in spatial statistics. On the contrary, negative dependence arises naturally in tests based on split samples and in cyclical, ordered comparisons. In GWAS, the SNP markers are generally considered to be weakly dependent. Generalized familywise error rate (k-FWER) control has been one of the prominent frequentist approaches in simultaneous inference. However, the performances of k-FWER controlling procedures are yet unexplored under different dependencies. This paper revisits the classical testing problem of normal means in different correlated frameworks. We establish upper bounds on the generalized familywise error rates under each dependence, consequently giving rise to improved testing procedures. Towards this, we present improved probability inequalities, which are of independent theoretical interest
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
An Efficient Frequency-Based Approach for Maximal Square Detection in Binary Matrices
Authors:
Swastik Bhandari
Abstract:
This paper presents a novel frequency-based algorithm which solves the maximal square problem with improved practical speed performance while maintaining optimal asymptotic complexity. My approach tracks the columnar continuity of ones through an adaptive frequency vector and dynamic thresholding mechanism that eliminates the need for nested minimum operations commonly found in standard dynamic pr…
▽ More
This paper presents a novel frequency-based algorithm which solves the maximal square problem with improved practical speed performance while maintaining optimal asymptotic complexity. My approach tracks the columnar continuity of ones through an adaptive frequency vector and dynamic thresholding mechanism that eliminates the need for nested minimum operations commonly found in standard dynamic programming solutions. Theoretical analysis confirms a time complexity of O(mn) and a space complexity of O(n).Formal loop-invariant proofs verify correctness, while comprehensive benchmarking demonstrates speed improvements of 1.3-5x over standard methods in various matrix densities and sizes. This method improves algorithm design and simultaneously creates opportunities for faster spatial pattern recognition in fields like urban planning, environmental science, and medical imaging.
△ Less
Submitted 29 March, 2025; v1 submitted 22 March, 2025;
originally announced March 2025.
-
To Study Properties of a Known Procedure in Adaptive Sequential Sampling Design
Authors:
Sampurna Kundu,
Jayant Jha,
Subir Kumar Bhandari
Abstract:
We consider the procedure proposed by Bhandari et al. (2009) in the context of two-treatment clinical trials, with the objective of minimizing the applications of the less effective drug to the least number of patients. Our focus is on an adaptive sequential procedure that is both simple and intuitive. Through a refined theoretical analysis, we establish that the number of applications of the less…
▽ More
We consider the procedure proposed by Bhandari et al. (2009) in the context of two-treatment clinical trials, with the objective of minimizing the applications of the less effective drug to the least number of patients. Our focus is on an adaptive sequential procedure that is both simple and intuitive. Through a refined theoretical analysis, we establish that the number of applications of the less effective drug is a finite random variable whose all moments are also finite. In contrast, Bhandari et al. (2009) observed that this number increases logarithmically with the total sample size. We attribute this discrepancy to differences in their choice of the initial sample size and the method of analysis employed. We further extend the allocation rule to multi-treatment setup and derive analogous finiteness results, reinforcing the generalizability of our findings. Extensive simulation studies and real-data analyses support theoretical developments, showing stabilization in allocation and reduced patient exposure to inferior treatments as the total sample size grows. These results enhance the long-term ethical strength of the proposed adaptive allocation strategy.
△ Less
Submitted 27 June, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Improved Upper Bound for the Size of a Trifferent Code
Authors:
Siddharth Bhandari,
Abhishek Khetan
Abstract:
A subset $\mathcal{C}\subseteq\{0,1,2\}^n$ is said to be a $\textit{trifferent}$ code (of block length $n$) if for every three distinct codewords $x,y, z \in \mathcal{C}$, there is a coordinate $i\in \{1,2,\ldots,n\}$ where they all differ, that is, $\{x(i),y(i),z(i)\}$ is same as $\{0,1,2\}$. Let $T(n)$ denote the size of the largest trifferent code of block length $n$. Understanding the asymptot…
▽ More
A subset $\mathcal{C}\subseteq\{0,1,2\}^n$ is said to be a $\textit{trifferent}$ code (of block length $n$) if for every three distinct codewords $x,y, z \in \mathcal{C}$, there is a coordinate $i\in \{1,2,\ldots,n\}$ where they all differ, that is, $\{x(i),y(i),z(i)\}$ is same as $\{0,1,2\}$. Let $T(n)$ denote the size of the largest trifferent code of block length $n$. Understanding the asymptotic behavior of $T(n)$ is closely related to determining the zero-error capacity of the $(3/2)$-channel defined by Elias'88, and is a long-standing open problem in the area. Elias had shown that $T(n)\leq 2\times (3/2)^n$ and prior to our work the best upper bound was $T(n)\leq 0.6937 \times (3/2)^n$ due to Kurz'23. We improve this bound to $T(n)\leq c \times n^{-2/5}\times (3/2)^n$ where $c$ is an absolute constant.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Asymptotically Optimal Sequential Multiple Testing Procedures for Correlated Normal
Authors:
Monitirtha Dey,
Subir Kumar Bhandari
Abstract:
Simultaneous statistical inference has been a cornerstone in the statistics methodology literature because of its fundamental theory and paramount applications. The mainstream multiple testing literature has traditionally considered two frameworks: the sample size is deterministic, and the test statistics corresponding to different tests are independent. However, in many modern scientific avenues,…
▽ More
Simultaneous statistical inference has been a cornerstone in the statistics methodology literature because of its fundamental theory and paramount applications. The mainstream multiple testing literature has traditionally considered two frameworks: the sample size is deterministic, and the test statistics corresponding to different tests are independent. However, in many modern scientific avenues, these assumptions are often violated. There is little study that explores the multiple testing problem in a sequential framework where the test statistics corresponding to the various streams are dependent. This work fills this gap in a unified way by considering the classical means-testing problem in an equicorrelated Gaussian and sequential framework. We focus on sequential test procedures that control the type I and type II familywise error probabilities at pre-specified levels. We establish that our proposed test procedures achieve the optimal expected sample sizes under every possible signal configuration asymptotically, as the two error probabilities vanish at arbitrary rates. Towards this, we elucidate that the ratio of the expected sample size of our proposed rule and that of the classical SPRT goes to one asymptotically, thus illustrating their connection. Generalizing this, we show that our proposed procedures, with appropriately adjusted critical values, are asymptotically optimal for controlling any multiple testing error metric lying between multiples of FWER in a certain sense. This class of metrics includes FDR/FNR, pFDR/pFNR, the per-comparison and per-family error rates, and the false positive rate.
△ Less
Submitted 20 March, 2025; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Optimal test statistic under normality assumption
Authors:
Nabaneet Das,
Subir K. Bhandari
Abstract:
The idea of an optimal test statistic in the context of simultaneous hypothesis testing was given by Sun and Tony Cai (2009) which is the conditional probability of a hypothesis being null given the data. Since we do not have a simplified expression of the statistic, it is impossible to implement the optimal test in more general dependency setup. This note simplifies the expression of optimal test…
▽ More
The idea of an optimal test statistic in the context of simultaneous hypothesis testing was given by Sun and Tony Cai (2009) which is the conditional probability of a hypothesis being null given the data. Since we do not have a simplified expression of the statistic, it is impossible to implement the optimal test in more general dependency setup. This note simplifies the expression of optimal test statistic of Sun and Tony Cai (2009) under the multivariate normal model. We have considered the model of Xie et. al.(2011), where the test statistics are generated from a multivariate normal distribution conditional to the unobserved states of the hypotheses and the states are i.i.d. Bernoulli random variables. While the equivalence of LFDR and optimal test statistic was established under very stringent conditions of Xie et. al.(2016), the expression obtained in this paper is valid for any covariance matrix and for any fixed 0<p<1. The optimal procedure is implemented with the help of this expression and the performances have been compared with Benjamini Hochberg method and marginal procedure.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
Correction Factor of FWER for Normal Distribution in Nearly Independent Setup
Authors:
Nabaneet Das,
Subir K. Bhandari
Abstract:
In this paper, we have attempted to study the behaviour of the family wise error rate (FWER) for Bonferroni's procedure in a nearly independent setup for normal distribution. In search for a suitable correlation penalty, it has been noted that the root mean square (RMS) of correlations is not appropriate under this setup as opposed to the study of \cite{efron2007correlation}. We have provided a su…
▽ More
In this paper, we have attempted to study the behaviour of the family wise error rate (FWER) for Bonferroni's procedure in a nearly independent setup for normal distribution. In search for a suitable correlation penalty, it has been noted that the root mean square (RMS) of correlations is not appropriate under this setup as opposed to the study of \cite{efron2007correlation}. We have provided a suitable correction factor for deviation from independence and approximated the FWER under this nearly independent setup.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
Large-scale adaptive multiple testing for sequential data controlling false discovery and nondiscovery rates
Authors:
Rahul Roy,
Shyamal K. De,
Subir Kumar Bhandari
Abstract:
In modern scientific experiments, we frequently encounter data that have large dimensions, and in some experiments, such high dimensional data arrive sequentially rather than full data being available all at a time. We develop multiple testing procedures with simultaneous control of false discovery and nondiscovery rates when $m$-variate data vectors $\mathbf{X}_1, \mathbf{X}_2, \dots$ are observe…
▽ More
In modern scientific experiments, we frequently encounter data that have large dimensions, and in some experiments, such high dimensional data arrive sequentially rather than full data being available all at a time. We develop multiple testing procedures with simultaneous control of false discovery and nondiscovery rates when $m$-variate data vectors $\mathbf{X}_1, \mathbf{X}_2, \dots$ are observed sequentially or in groups and each coordinate of these vectors leads to a hypothesis testing. Existing multiple testing methods for sequential data uses fixed stopping boundaries that do not depend on sample size, and hence, are quite conservative when the number of hypotheses $m$ is large. We propose sequential tests based on adaptive stopping boundaries that ensure shrinkage of the continue sampling region as the sample size increases. Under minimal assumptions on the data sequence, we first develop a test based on an oracle test statistic such that both false discovery rate (FDR) and false nondiscovery rate (FNR) are nearly equal to some prefixed levels with strong control. Under a two-group mixture model assumption, we propose a data-driven stopping and decision rule based on local false discovery rate statistic that mimics the oracle rule and guarantees simultaneous control of FDR and FNR asymptotically as $m$ tends to infinity. Both the oracle and the data-driven stopping times are shown to be finite (i.e., proper) with probability 1 for all finite $m$ and converge to a finite constant as $m$ grows to infinity. Further, we compare the data-driven test with the existing gap rule proposed in He and Bartroff (2021) and show that the ratio of the expected sample sizes of our method and the gap rule tends to zero as $m$ goes to infinity. Extensive analysis of simulated datasets as well as some real datasets illustrate the superiority of the proposed tests over some existing methods.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Asymptotic bayes optimality under sparsity for equicorrelated multivariate normal test statistics
Authors:
Rahul Roy,
Subir Kumar Bhandari
Abstract:
Here we address dependence among the test statistics in connection with asymptotically Bayes' optimal tests in presence of sparse alternatives. Extending the setup in Bogdan et.al. (2011) we consider an equicorrelated ( with equal correlation $ρ$ ) multivariate normal assumption on the joint distribution of the test statistics, while conditioned on the mean vector $\boldsymbolμ$. Rest of the set u…
▽ More
Here we address dependence among the test statistics in connection with asymptotically Bayes' optimal tests in presence of sparse alternatives. Extending the setup in Bogdan et.al. (2011) we consider an equicorrelated ( with equal correlation $ρ$ ) multivariate normal assumption on the joint distribution of the test statistics, while conditioned on the mean vector $\boldsymbolμ$. Rest of the set up is identical to Bogdan et.al. (2011) with a slight modification in the asymptotic framework. We exploit an well known result on equicorrelated multivariate normal variables with equal marginal variances to decompose the test statistics into independent random variables. We then identify a set of independent yet unobservable gaussian random variables sufficient for the multiple testing problem and chalk out the necessary and sufficient conditions for single cutoff tests to be ABOS based on those dummy variables following Bogdan et.al. (2011). Further we replaced the dummy variables with deviations of the statistics from their arithmetic means which were easily calculable from the observations due to the decomposition used earlier. Additional assumptions are then derived so that the necessary and sufficient conditions for single cutoff tests to be ABOS using the independent dummy variables plays the same role with the replacement variable as well (with a deviation of order $o(1)$). Next with the same additional assumption, necessary and sufficient conditions for single cutoff tests to control the Bayesian FDRs are derived and as a consequence under various sparsity assumptions we proved that the classical Bonferroni and Benjamini-Hochberg methods of multiple testing are ABOS if the same conditions are satisfied.
△ Less
Submitted 26 August, 2022; v1 submitted 25 August, 2022;
originally announced August 2022.
-
FWER Goes to Zero for Correlated Normal
Authors:
Monitirtha Dey,
Subir Kumar Bhandari
Abstract:
Familywise error rate (FWER) has been a cornerstone in simultaneous inference for decades, and the classical Bonferroni method has been one of the most prominent frequentist approaches for controlling FWER. The present article studies the limiting behavior of Bonferroni FWER in a multiple testing problem as the number of hypotheses grows to infinity. We establish that in the equicorrelated normal…
▽ More
Familywise error rate (FWER) has been a cornerstone in simultaneous inference for decades, and the classical Bonferroni method has been one of the most prominent frequentist approaches for controlling FWER. The present article studies the limiting behavior of Bonferroni FWER in a multiple testing problem as the number of hypotheses grows to infinity. We establish that in the equicorrelated normal setup with positive equicorrelation, Bonferroni FWER tends to zero asymptotically. We extend this result for generalized familywise error rates and to arbitrarily correlated setups.
△ Less
Submitted 6 December, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Fokas diagonalization of piecewise constant coefficient linear differential operators on finite intervals and networks
Authors:
Sultan Aitzhan,
Sambhav Bhandari,
David Andrew Smith
Abstract:
We describe a new form of diagonalization for linear two point constant coefficient differential operators with arbitrary linear boundary conditions. Although the diagonalization is in a weaker sense than that usually employed to solve initial boundary value problems (IBVP), we show that it is sufficient to solve IBVP whose spatial parts are described by such operators. We argue that the method de…
▽ More
We describe a new form of diagonalization for linear two point constant coefficient differential operators with arbitrary linear boundary conditions. Although the diagonalization is in a weaker sense than that usually employed to solve initial boundary value problems (IBVP), we show that it is sufficient to solve IBVP whose spatial parts are described by such operators. We argue that the method described may be viewed as a reimplementation of the Fokas transform method for linear evolution equations on the finite interval. The results are extended to multipoint and interface operators, including operators defined on networks of finite intervals, in which the coefficients of the differential operator may vary between subintervals, and arbitrary interface and boundary conditions may be imposed; differential operators with piecewise constant coefficients are thus included. Both homogeneous and inhomogeneous problems are solved.
△ Less
Submitted 21 October, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Asymptotically optimal test for dependent multiple testing set up
Authors:
Rahul Roy,
Subir Kumar Bhandari
Abstract:
In this paper we explore the behaviour of dependent test statistics for testing of multiple hypothesis . To keep simplicity, we have considered a mixture normal model with equicorrelated correlation set up. With a simple linear transformation,the test statistics were decomposed into independent components, which, when conditioned appropriately generated independent variables. These were used to co…
▽ More
In this paper we explore the behaviour of dependent test statistics for testing of multiple hypothesis . To keep simplicity, we have considered a mixture normal model with equicorrelated correlation set up. With a simple linear transformation,the test statistics were decomposed into independent components, which, when conditioned appropriately generated independent variables. These were used to construct conditional tests , which were shown to be asymptotically optimal with power as large as that obtained using N.P.Lemma. We have pursued extensive simulation to support the claim.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Bound on FWER for correlated normal distribution
Authors:
Nabaneet Das,
Subir K. Bhandari
Abstract:
In this paper,our main focus is to obtain an asymptotic bound on the family wise error rate (FWER) for Bonferroni-type procedure in the simultaneous hypotheses testing problem when the observations corresponding to individual hypothesis are correlated. In particular, we have considered the sequence of null hypotheses H_{0i} : X_i follows N(0,1) , (i=1,2,....,n) and equicorrelated structure of the…
▽ More
In this paper,our main focus is to obtain an asymptotic bound on the family wise error rate (FWER) for Bonferroni-type procedure in the simultaneous hypotheses testing problem when the observations corresponding to individual hypothesis are correlated. In particular, we have considered the sequence of null hypotheses H_{0i} : X_i follows N(0,1) , (i=1,2,....,n) and equicorrelated structure of the sequence (X_1,....,X_n). Distribution free bound on FWER under equicorrelated setup can be found in Tong(2014). But the upper bound provided in Tong(2014) is not a bounded quantity as the no. of hypotheses(n) gets larger and larger and as a result,FWER is highly overestimated for the choice of a particular distribution (e.g.- normal). In the equicorrelated normal setup, we have shown that FWER asymptotically is a convex function (as a function of correlation (rho)) and hence an upper bound on the FWER of Bonferroni-(alpha) procedure is alpha(1-ρ).This implies,Bonferroni's method actually controls the FWER at a much smaller level than the desired level of significance under the positively correlated case and necessitates a correlation correction.
△ Less
Submitted 18 August, 2020; v1 submitted 6 August, 2019;
originally announced August 2019.
-
Characterization of Extreme Copulas
Authors:
Partha Pratim Ghosh,
Subir Kumar Bhandari
Abstract:
In this paper our aim is to characterize the set of extreme points of the set of all n-dimensional copulas (n > 1). We have shown that a copula must induce a singular measure with respect to Lebesgue measure in order to be an extreme point in the set of n-dimensional copulas. We also have discovered some sufficient conditions for a copula to be an extreme copula. We have presented a construction o…
▽ More
In this paper our aim is to characterize the set of extreme points of the set of all n-dimensional copulas (n > 1). We have shown that a copula must induce a singular measure with respect to Lebesgue measure in order to be an extreme point in the set of n-dimensional copulas. We also have discovered some sufficient conditions for a copula to be an extreme copula. We have presented a construction of a small subset of n-dimensional extreme copulas such that any n-dimensional copula is a limit point of that subset with respect to weak convergence. The applications of such a theory are widespread, finding use in many facets of current mathematical research, such as distribution theory, survival analysis, reliability theory and optimization purposes. To illustrate the point further, examples of how such extremal representations can help in optimization have also been included.
△ Less
Submitted 7 September, 2017;
originally announced September 2017.
-
Some Permutationllay Symmetric Multiple Hypotheses Testing Rules Under Dependent Set up
Authors:
Anupam Kundu,
Subir Kumar Bhandari
Abstract:
In this paper, our interest is in the problem of simultaneous hypothesis testing when the test statistics corresponding to the individual hypotheses are possibly correlated. Specifically, we consider the case when the test statistics together have a multivariate normal distribution (with equal correlation between each pair) with an unknown mean vector and our goal is to decide which components of…
▽ More
In this paper, our interest is in the problem of simultaneous hypothesis testing when the test statistics corresponding to the individual hypotheses are possibly correlated. Specifically, we consider the case when the test statistics together have a multivariate normal distribution (with equal correlation between each pair) with an unknown mean vector and our goal is to decide which components of the mean vector are zero and which are non-zero. This problem was taken up earlier in Bogdan et al. (2011) for the case when the test statistics are independent normals. Asymptotic optimality in a Bayesian decision theoretic sense was studied in this context, the optimal precodures were characterized and optimality of some well-known procedures were thereby established. The case under dependence was left as a challenging open problem. We have studied the problem both theoretically and through extensive simulations and have given some permutation invariant rules. Though in Bogdan et al. (2011), the asymptotic derivations were done in the context of sparsity of the non-zero means, our result does not require the assumption of sparsity and holds under a more general setup.
△ Less
Submitted 10 January, 2019; v1 submitted 13 April, 2016;
originally announced April 2016.