-
Learning Beyond Euclid: Curvature-Adaptive Generalization for Neural Networks on Manifolds
Authors:
Krisanu Sarkar
Abstract:
In this work, we develop new generalization bounds for neural networks trained on data supported on Riemannian manifolds. Existing generalization theories often rely on complexity measures derived from Euclidean geometry, which fail to account for the intrinsic structure of non-Euclidean spaces. Our analysis introduces a geometric refinement: we derive covering number bounds that explicitly incorp…
▽ More
In this work, we develop new generalization bounds for neural networks trained on data supported on Riemannian manifolds. Existing generalization theories often rely on complexity measures derived from Euclidean geometry, which fail to account for the intrinsic structure of non-Euclidean spaces. Our analysis introduces a geometric refinement: we derive covering number bounds that explicitly incorporate manifold-specific properties such as sectional curvature, volume growth, and injectivity radius. These geometric corrections lead to sharper Rademacher complexity bounds for classes of Lipschitz neural networks defined on compact manifolds. The resulting generalization guarantees recover standard Euclidean results when curvature is zero but improve substantially in settings where the data lies on curved, low-dimensional manifolds embedded in high-dimensional ambient spaces. We illustrate the tightness of our bounds in negatively curved spaces, where the exponential volume growth leads to provably higher complexity, and in positively curved spaces, where the curvature acts as a regularizing factor. This framework provides a principled understanding of how intrinsic geometry affects learning capacity, offering both theoretical insight and practical implications for deep learning on structured data domains.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Hindsight-Guided Momentum (HGM) Optimizer: An Approach to Adaptive Learning Rate
Authors:
Krisanu Sarkar
Abstract:
We introduce Hindsight-Guided Momentum (HGM), a first-order optimization algorithm that adaptively scales learning rates based on the directional consistency of recent updates. Traditional adaptive methods, such as Adam or RMSprop , adapt learning dynamics using only the magnitude of gradients, often overlooking important geometric cues.Geometric cues refer to directional information, such as the…
▽ More
We introduce Hindsight-Guided Momentum (HGM), a first-order optimization algorithm that adaptively scales learning rates based on the directional consistency of recent updates. Traditional adaptive methods, such as Adam or RMSprop , adapt learning dynamics using only the magnitude of gradients, often overlooking important geometric cues.Geometric cues refer to directional information, such as the alignment between current gradients and past updates, which reflects the local curvature and consistency of the optimization path. HGM addresses this by incorporating a hindsight mechanism that evaluates the cosine similarity between the current gradient and accumulated momentum. This allows it to distinguish between coherent and conflicting gradient directions, increasing the learning rate when updates align and reducing it in regions of oscillation or noise. The result is a more responsive optimizer that accelerates convergence in smooth regions of the loss surface while maintaining stability in sharper or more erratic areas. Despite this added adaptability, the method preserves the computational and memory efficiency of existing optimizers.By more intelligently responding to the structure of the optimization landscape, HGM provides a simple yet effective improvement over existing approaches, particularly in non-convex settings like that of deep neural network training.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
On Controlling the False Discovery Rate in Multiple Testing of the Means of Correlated Normals Against Two-Sided Alternatives
Authors:
Sanat K. Sarkar
Abstract:
This paper revisits the following open question in simultaneous testing of multivariate normal means against two-sided alternatives: Can the method of Benjamini and Hochberg (BH, 1995) control the false discovery rate (FDR) without imposing any dependence structure on the correlations? The answer to this question is generally believed to be yes, and is conjectured so in the literature since result…
▽ More
This paper revisits the following open question in simultaneous testing of multivariate normal means against two-sided alternatives: Can the method of Benjamini and Hochberg (BH, 1995) control the false discovery rate (FDR) without imposing any dependence structure on the correlations? The answer to this question is generally believed to be yes, and is conjectured so in the literature since results of numerical studies investigating the question and reported in numerous papers strongly support it. No theoretical justification of this answer has yet been put forward in the literature, as far as we know. In this paper, we offer a partial proof of this conjecture. More specifically, we consider the following two settings - (i) the covariance matrix is known and (ii) the covariance matrix is an unknown scalar multiple of a known matrix - and prove that in each of these settings a BH-type stepup method based on some weighted versions of the original z- or t-test statistics controls the FDR.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Adapting BH to One- and Two-Way Classified Structures of Hypotheses
Authors:
Shinjini Nandi,
Sanat K. Sarkar
Abstract:
Multiple testing literature contains ample research on controlling false discoveries for hypotheses classified according to one criterion, which we refer to as one-way classified hypotheses. Although simultaneous classification of hypotheses according to two different criteria, resulting in two-way classified hypotheses, do often occur in scientific studies, no such research has taken place yet, a…
▽ More
Multiple testing literature contains ample research on controlling false discoveries for hypotheses classified according to one criterion, which we refer to as one-way classified hypotheses. Although simultaneous classification of hypotheses according to two different criteria, resulting in two-way classified hypotheses, do often occur in scientific studies, no such research has taken place yet, as far as we know, under this structure. This article produces procedures, both in their oracle and data-adaptive forms, for controlling the overall false discovery rate (FDR) across all hypotheses effectively capturing the underlying one- or two-way classification structure. They have been obtained by using results associated with weighted Benjamini-Hochberg (BH) procedure in their more general forms providing guidance on how to adapt the original BH procedure to the underlying one- or two-way classification structure through an appropriate choice of the weights. The FDR is maintained non-asymptotically by our proposed procedures in their oracle forms under positive regression dependence on subset of null $p$-values (PRDS) and in their data-adaptive forms under independence of the $p$-values. Possible control of FDR for our data-adaptive procedures in certain scenarios involving dependent $p$-values have been investigated through simulations. The fact that our suggested procedures can be superior to contemporary practices has been demonstrated through their applications in simulated scenarios and to real-life data sets. While the procedures proposed here for two-way classified hypotheses are new, the data-adaptive procedure obtained for one-way classified hypotheses is alternative to and often more powerful than those proposed in Hu et al. (2010).
△ Less
Submitted 8 March, 2019; v1 submitted 16 December, 2018;
originally announced December 2018.
-
The Control of the False Discovery Rate in Fixed Sequence Multiple Testing
Authors:
Gavin Lynch,
Wenge Guo,
Sanat K. Sarkar,
Helmut Finner
Abstract:
Controlling the false discovery rate (FDR) is a powerful approach to multiple testing. In many applications, the tested hypotheses have an inherent hierarchical structure. In this paper, we focus on the fixed sequence structure where the testing order of the hypotheses has been strictly specified in advance. We are motivated to study such a structure, since it is the most basic of hierarchical str…
▽ More
Controlling the false discovery rate (FDR) is a powerful approach to multiple testing. In many applications, the tested hypotheses have an inherent hierarchical structure. In this paper, we focus on the fixed sequence structure where the testing order of the hypotheses has been strictly specified in advance. We are motivated to study such a structure, since it is the most basic of hierarchical structures, yet it is often seen in real applications such as statistical process control and streaming data analysis. We first consider a conventional fixed sequence method that stops testing once an acceptance occurs, and develop such a method controlling the FDR under both arbitrary and negative dependencies. The method under arbitrary dependency is shown to be unimprovable without losing control of the FDR and unlike existing FDR methods; it cannot be improved even by restricting to the usual positive regression dependence on subset (PRDS) condition. To account for any potential mistakes in the ordering of the tests, we extend the conventional fixed sequence method to one that allows more but a given number of acceptances. Simulation studies show that the proposed procedures can be powerful alternatives to existing FDR controlling procedures. The proposed procedures are illustrated through a real data set from a microarray experiment.
△ Less
Submitted 9 November, 2016;
originally announced November 2016.
-
Two-stage algorithms for covering array construction
Authors:
Kaushik Sarkar,
Charles J. Colbourn
Abstract:
Modern software systems often consist of many different components, each with a number of options. Although unit tests may reveal faulty options for individual components, functionally correct components may interact in unforeseen ways to cause a fault. Covering arrays are used to test for interactions among components systematically. A two-stage framework, providing a number of concrete algorithm…
▽ More
Modern software systems often consist of many different components, each with a number of options. Although unit tests may reveal faulty options for individual components, functionally correct components may interact in unforeseen ways to cause a fault. Covering arrays are used to test for interactions among components systematically. A two-stage framework, providing a number of concrete algorithms, is developed for the efficient construction of covering arrays. %Our framework divides the construction in two stages. In the first stage, a time and memory efficient randomized algorithm covers most of the interactions. In the second stage, a more sophisticated search covers the remainder in relatively few tests. In this way, the storage limitations of the sophisticated search algorithms are avoided; hence the range of the number of components for which the algorithm can be applied is extended, without increasing the number of tests. Many of the framework instantiations can be tuned to optimize a memory-quality trade-off, so that fewer tests can be achieved using more memory. The algorithms developed outperform the currently best known methods when the number of components ranges from 20 to 60, the number of options for each ranges from 3 to 6, and $t$-way interactions are covered for $t\in \{5,6\}$. In some cases a reduction in the number of tests by more than $50\%$ is achieved.
△ Less
Submitted 21 June, 2016;
originally announced June 2016.
-
Partial Covering Arrays: Algorithms and Asymptotics
Authors:
Kaushik Sarkar,
Charles J. Colbourn,
Annalisa De Bonis,
Ugo Vaccaro
Abstract:
A covering array $\mathsf{CA}(N;t,k,v)$ is an $N\times k$ array with entries in $\{1, 2, \ldots , v\}$, for which every $N\times t$ subarray contains each $t$-tuple of $\{1, 2, \ldots , v\}^t$ among its rows. Covering arrays find application in interaction testing, including software and hardware testing, advanced materials development, and biological systems. A central question is to determine or…
▽ More
A covering array $\mathsf{CA}(N;t,k,v)$ is an $N\times k$ array with entries in $\{1, 2, \ldots , v\}$, for which every $N\times t$ subarray contains each $t$-tuple of $\{1, 2, \ldots , v\}^t$ among its rows. Covering arrays find application in interaction testing, including software and hardware testing, advanced materials development, and biological systems. A central question is to determine or bound $\mathsf{CAN}(t,k,v)$, the minimum number $N$ of rows of a $\mathsf{CA}(N;t,k,v)$. The well known bound $\mathsf{CAN}(t,k,v)=O((t-1)v^t\log k)$ is not too far from being asymptotically optimal. Sensible relaxations of the covering requirement arise when (1) the set $\{1, 2, \ldots , v\}^t$ need only be contained among the rows of at least $(1-ε)\binom{k}{t}$ of the $N\times t$ subarrays and (2) the rows of every $N\times t$ subarray need only contain a (large) subset of $\{1, 2, \ldots , v\}^t$. In this paper, using probabilistic methods, significant improvements on the covering array upper bound are established for both relaxations, and for the conjunction of the two. In each case, a randomized algorithm constructs such arrays in expected polynomial time.
△ Less
Submitted 6 May, 2016;
originally announced May 2016.
-
Upper bounds on the size of covering arrays
Authors:
Kaushik Sarkar,
Charles J. Colbourn
Abstract:
Covering arrays find important application in software and hardware interaction testing. For practical applications it is useful to determine or bound the minimum number of rows, CAN$(t,k,v)$, in a covering array for given values of the parameters $t,k$ and $v$. Asymptotic upper bounds for CAN$(t,k,v)$ have earlier been established using the Stein-Lovász-Johnson strategy and the Lovász local lemma…
▽ More
Covering arrays find important application in software and hardware interaction testing. For practical applications it is useful to determine or bound the minimum number of rows, CAN$(t,k,v)$, in a covering array for given values of the parameters $t,k$ and $v$. Asymptotic upper bounds for CAN$(t,k,v)$ have earlier been established using the Stein-Lovász-Johnson strategy and the Lovász local lemma. A series of improvements on these bounds is developed in this paper. First an estimate for the discrete Stein-Lovász-Johnson bound is derived. Then using alteration, the Stein-Lovász-Johnson bound is improved upon, leading to a two-stage construction algorithm. Bounds from the Lovász local lemma are improved upon in a different manner, by examining group actions on the set of symbols. Two asymptotic upper bounds on CAN$(t,k,v)$ are established that are tighter than the known bounds. A two-stage bound is derived that employs the Lovász local lemma and the conditional Lovász local lemma distribution.
△ Less
Submitted 24 March, 2016;
originally announced March 2016.
-
P-positions in Modular Extensions to Nim
Authors:
Tanya Khovanova,
Karan Sarkar
Abstract:
In this paper, we consider a modular extension to the game of Nim, which we call $m$-Modular Nim, and explore its optimal strategy. In $m$-Modular Nim, a player can either make a standard Nim move or remove a multiple of $m$ tokens in total. We develop a winning strategy for all $m$ with $2$ heaps and for odd $m$ with any number of heaps.
In this paper, we consider a modular extension to the game of Nim, which we call $m$-Modular Nim, and explore its optimal strategy. In $m$-Modular Nim, a player can either make a standard Nim move or remove a multiple of $m$ tokens in total. We develop a winning strategy for all $m$ with $2$ heaps and for odd $m$ with any number of heaps.
△ Less
Submitted 27 August, 2015;
originally announced August 2015.
-
Further results on controlling the false discovery proportion
Authors:
Wenge Guo,
Li He,
Sanat K. Sarkar
Abstract:
The probability of false discovery proportion (FDP) exceeding $γ\in[0,1)$, defined as $γ$-FDP, has received much attention as a measure of false discoveries in multiple testing. Although this measure has received acceptance due to its relevance under dependency, not much progress has been made yet advancing its theory under such dependency in a nonasymptotic setting, which motivates our research i…
▽ More
The probability of false discovery proportion (FDP) exceeding $γ\in[0,1)$, defined as $γ$-FDP, has received much attention as a measure of false discoveries in multiple testing. Although this measure has received acceptance due to its relevance under dependency, not much progress has been made yet advancing its theory under such dependency in a nonasymptotic setting, which motivates our research in this article. We provide a larger class of procedures containing the stepup analog of, and hence more powerful than, the stepdown procedure in Lehmann and Romano [Ann. Statist. 33 (2005) 1138-1154] controlling the $γ$-FDP under similar positive dependence condition assumed in that paper. We offer better alternatives of the stepdown and stepup procedures in Romano and Shaikh [IMS Lecture Notes Monogr. Ser. 49 (2006a) 33-50, Ann. Statist. 34 (2006b) 1850-1873] using pairwise joint distributions of the null $p$-values. We generalize the notion of $γ$-FDP making it appropriate in situations where one is willing to tolerate a few false rejections or, due to high dependency, some false rejections are inevitable, and provide methods that control this generalized $γ$-FDP in two different scenarios: (i) only the marginal $p$-values are available and (ii) the marginal $p$-values as well as the common pairwise joint distributions of the null $p$-values are available, and assuming both positive dependence and arbitrary dependence conditions on the $p$-values in each scenario. Our theoretical findings are being supported through numerical studies.
△ Less
Submitted 2 June, 2014;
originally announced June 2014.
-
On a generalized false discovery rate
Authors:
Sanat K. Sarkar,
Wenge Guo
Abstract:
The concept of $k$-FWER has received much attention lately as an appropriate error rate for multiple testing when one seeks to control at least $k$ false rejections, for some fixed $k\ge 1$. A less conservative notion, the $k$-FDR, has been introduced very recently by Sarkar [Ann. Statist. 34 (2006) 394--415], generalizing the false discovery rate of Benjamini and Hochberg [J. Roy. Statist. Soc.…
▽ More
The concept of $k$-FWER has received much attention lately as an appropriate error rate for multiple testing when one seeks to control at least $k$ false rejections, for some fixed $k\ge 1$. A less conservative notion, the $k$-FDR, has been introduced very recently by Sarkar [Ann. Statist. 34 (2006) 394--415], generalizing the false discovery rate of Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289--300]. In this article, we bring newer insight to the $k$-FDR considering a mixture model involving independent $p$-values before motivating the developments of some new procedures that control it. We prove the $k$-FDR control of the proposed methods under a slightly weaker condition than in the mixture model. We provide numerical evidence of the proposed methods' superior power performance over some $k$-FWER and $k$-FDR methods. Finally, we apply our methods to a real data set.
△ Less
Submitted 17 June, 2009;
originally announced June 2009.
-
An adaptive step-down procedure with proven FDR control under independence
Authors:
Yulia Gavrilov,
Yoav Benjamini,
Sanat K. Sarkar
Abstract:
In this work we study an adaptive step-down procedure for testing $m$ hypotheses. It stems from the repeated use of the false discovery rate controlling the linear step-up procedure (sometimes called BH), and makes use of the critical constants $iq/[(m+1-i(1-q)]$, $i=1,...,m$. Motivated by its success as a model selection procedure, as well as by its asymptotic optimality, we are interested in i…
▽ More
In this work we study an adaptive step-down procedure for testing $m$ hypotheses. It stems from the repeated use of the false discovery rate controlling the linear step-up procedure (sometimes called BH), and makes use of the critical constants $iq/[(m+1-i(1-q)]$, $i=1,...,m$. Motivated by its success as a model selection procedure, as well as by its asymptotic optimality, we are interested in its false discovery rate (FDR) controlling properties for a finite number of hypotheses. We prove this step-down procedure controls the FDR at level $q$ for independent test statistics. We then numerically compare it with two other procedures with proven FDR control under independence, both in terms of power under independence and FDR control under positive dependence.
△ Less
Submitted 31 March, 2009;
originally announced March 2009.
-
On the Simes inequality and its generalization
Authors:
Sanat K. Sarkar
Abstract:
The Simes inequality has received considerable attention recently because of its close connection to some important multiple hypothesis testing procedures. We revisit in this article an old result on this inequality to clarify and strengthen it and a recently proposed generalization of it to offer an alternative simpler proof.
The Simes inequality has received considerable attention recently because of its close connection to some important multiple hypothesis testing procedures. We revisit in this article an old result on this inequality to clarify and strengthen it and a recently proposed generalization of it to offer an alternative simpler proof.
△ Less
Submitted 15 May, 2008;
originally announced May 2008.
-
Stepup procedures controlling generalized FWER and generalized FDR
Authors:
Sanat K. Sarkar
Abstract:
In many applications of multiple hypothesis testing where more than one false rejection can be tolerated, procedures controlling error rates measuring at least $k$ false rejections, instead of at least one, for some fixed $k\ge 1$ can potentially increase the ability of a procedure to detect false null hypotheses. The $k$-FWER, a generalized version of the usual familywise error rate (FWER), is…
▽ More
In many applications of multiple hypothesis testing where more than one false rejection can be tolerated, procedures controlling error rates measuring at least $k$ false rejections, instead of at least one, for some fixed $k\ge 1$ can potentially increase the ability of a procedure to detect false null hypotheses. The $k$-FWER, a generalized version of the usual familywise error rate (FWER), is such an error rate that has recently been introduced in the literature and procedures controlling it have been proposed. A further generalization of a result on the $k$-FWER is provided in this article. In addition, an alternative and less conservative notion of error rate, the $k$-FDR, is introduced in the same spirit as the $k$-FWER by generalizing the usual false discovery rate (FDR). A $k$-FWER procedure is constructed given any set of increasing constants by utilizing the $k$th order joint null distributions of the $p$-values without assuming any specific form of dependence among all the $p$-values. Procedures controlling the $k$-FDR are also developed by using the $k$th order joint null distributions of the $p$-values, first assuming that the sets of null and nonnull $p$-values are mutually independent or they are jointly positively dependent in the sense of being multivariate totally positive of order two (MTP$_2$) and then discarding that assumption about the overall dependence among the $p$-values.
△ Less
Submitted 20 March, 2008;
originally announced March 2008.
-
Generalizing Simes' test and Hochberg's stepup procedure
Authors:
Sanat K. Sarkar
Abstract:
In a multiple testing problem where one is willing to tolerate a few false rejections, procedure controlling the familywise error rate (FWER) can potentially be improved in terms of its ability to detect false null hypotheses by generalizing it to control the $k$-FWER, the probability of falsely rejecting at least $k$ null hypotheses, for some fixed $k>1$. Simes' test for testing the intersectio…
▽ More
In a multiple testing problem where one is willing to tolerate a few false rejections, procedure controlling the familywise error rate (FWER) can potentially be improved in terms of its ability to detect false null hypotheses by generalizing it to control the $k$-FWER, the probability of falsely rejecting at least $k$ null hypotheses, for some fixed $k>1$. Simes' test for testing the intersection null hypothesis is generalized to control the $k$-FWER weakly, that is, under the intersection null hypothesis, and Hochberg's stepup procedure for simultaneous testing of the individual null hypotheses is generalized to control the $k$-FWER strongly, that is, under any configuration of the true and false null hypotheses. The proposed generalizations are developed utilizing joint null distributions of the $k$-dimensional subsets of the $p$-values, assumed to be identical. The generalized Simes' test is proved to control the $k$-FWER weakly under the multivariate totally positive of order two (MTP$_2$) condition [J. Multivariate Analysis 10 (1980) 467--498] of the joint null distribution of the $p$-values by generalizing the original Simes' inequality. It is more powerful to detect $k$ or more false null hypotheses than the original Simes' test when the $p$-values are independent. A stepdown procedure strongly controlling the $k$-FWER, a version of generalized Holm's procedure that is different from and more powerful than [Ann. Statist. 33 (2005) 1138--1154] with independent $p$-values, is derived before proposing the generalized Hochberg's procedure. The strong control of the $k$-FWER for the generalized Hochberg's procedure is established in situations where the generalized Simes' test is known to control its $k$-FWER weakly.
△ Less
Submitted 13 March, 2008;
originally announced March 2008.
-
False discovery and false nondiscovery rates in single-step multiple testing procedures
Authors:
Sanat K. Sarkar
Abstract:
Results on the false discovery rate (FDR) and the false nondiscovery rate (FNR) are developed for single-step multiple testing procedures. In addition to verifying desirable properties of FDR and FNR as measures of error rates, these results extend previously known results, providing further insights, particularly under dependence, into the notions of FDR and FNR and related measures. First, con…
▽ More
Results on the false discovery rate (FDR) and the false nondiscovery rate (FNR) are developed for single-step multiple testing procedures. In addition to verifying desirable properties of FDR and FNR as measures of error rates, these results extend previously known results, providing further insights, particularly under dependence, into the notions of FDR and FNR and related measures. First, considering fixed configurations of true and false null hypotheses, inequalities are obtained to explain how an FDR- or FNR-controlling single-step procedure, such as a Bonferroni or uSidák procedure, can potentially be improved. Two families of procedures are then constructed, one that modifies the FDR-controlling and the other that modifies the FNR-controlling uSidák procedure. These are proved to control FDR or FNR under independence less conservatively than the corresponding families that modify the FDR- or FNR-controlling Bonferroni procedure. Results of numerical investigations of the performance of the modified uSidák FDR procedure over its competitors are presented. Second, considering a mixture model where different configurations of true and false null hypotheses are assumed to have certain probabilities, results are also derived that extend some of Storey's work to the dependence case.
△ Less
Submitted 23 May, 2006;
originally announced May 2006.