Search | arXiv e-print repository

Nonparametric methods controlling the median of the false discovery proportion

Abstract: When testing many hypotheses, often we do not have strong expectations about the directions of the effects. In some situations however, the alternative hypotheses are that the parameters lie in a certain direction or interval, and it is in fact expected that most hypotheses are false. This is often the case when researchers perform multiple noninferiority or equivalence tests, e.g. when testing fo… ▽ More When testing many hypotheses, often we do not have strong expectations about the directions of the effects. In some situations however, the alternative hypotheses are that the parameters lie in a certain direction or interval, and it is in fact expected that most hypotheses are false. This is often the case when researchers perform multiple noninferiority or equivalence tests, e.g. when testing food safety with metabolite data. The goal is then to use data to corroborate the expectation that most hypotheses are false. We propose a nonparametric multiple testing approach that is powerful in such situations. If the user's expectations are wrong, our approach will still be valid but have low power. Of course all multiple testing methods become more powerful when appropriate one-sided instead of two-sided tests are used, but our approach has superior power then. The methods in this paper control the median of the false discovery proportion (FDP), which is the fraction of false discoveries among the rejected hypotheses. This approach is comparable to false discovery rate control, where one ensures that the mean rather than the median of the FDP is small. Our procedures make use of a symmetry property of the test statistics, do not require independence and are valid for finite samples. △ Less

Submitted 29 January, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

MSC Class: 62G10

arXiv:2410.02306 [pdf, ps, other]

Choosing alpha post hoc: the danger of multiple standard significance thresholds

Authors: Jesse Hemerik, Nick W Koning

Abstract: A fundamental assumption of classical hypothesis testing is that the significance threshold $α$ is chosen independently from the data. The validity of confidence intervals likewise relies on choosing $α$ beforehand. We point out that the independence of $α$ is guaranteed in practice because, in most fields, there exists one standard $α$ that everyone uses -- so that $α$ is automatically independen… ▽ More A fundamental assumption of classical hypothesis testing is that the significance threshold $α$ is chosen independently from the data. The validity of confidence intervals likewise relies on choosing $α$ beforehand. We point out that the independence of $α$ is guaranteed in practice because, in most fields, there exists one standard $α$ that everyone uses -- so that $α$ is automatically independent of everything. However, there have been recent calls to decrease $α$ from $0.05$ to $0.005$. We note that this may lead to multiple accepted standard thresholds within one scientific field. For example, different journals may require different significance thresholds. As a consequence, some researchers may be tempted to conveniently choose their $α$ based on their p-value. We use examples to illustrate that this severely invalidates hypothesis tests, and mention some potential solutions. △ Less

Submitted 10 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: Accepted for publication in Statistical Science

MSC Class: 62A01

arXiv:2401.17993

Robust Inference for Generalized Linear Mixed Models: An Approach Based on Score Sign Flipping

Authors: Angela Andreella, Jelle Goeman, Jesse Hemerik, Livio Finos

Abstract: Despite the versatility of generalized linear mixed models in handling complex experimental designs, they often suffer from misspecification and convergence problems. This makes inference on the values of coefficients problematic. To address these challenges, we propose a robust extension of the score-based statistical test using sign-flipping transformations. Our approach efficiently handles with… ▽ More Despite the versatility of generalized linear mixed models in handling complex experimental designs, they often suffer from misspecification and convergence problems. This makes inference on the values of coefficients problematic. To address these challenges, we propose a robust extension of the score-based statistical test using sign-flipping transformations. Our approach efficiently handles within-variance structure and heteroscedasticity, ensuring accurate regression coefficient testing. The approach is illustrated by analyzing the reduction of health issues over time for newly adopted children. The model is characterized by a binomial response with unbalanced frequencies and several categorical and continuous predictors. The proposed approach efficiently deals with critical problems related to longitudinal nonlinear models, surpassing common statistical approaches such as generalized estimating equations and generalized linear mixed models. △ Less

Submitted 27 March, 2025; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: The paper contains errors that we are thoroughly analyzing for a revised version, though this process requires time

arXiv:2306.07720 [pdf, ps, other]

doi 10.1080/00031305.2024.2319182

On the term "randomization test"

Authors: Jesse Hemerik

Abstract: There exists no consensus on the meaning of the term "randomization test". Contradicting uses of the term are leading to confusion, misunderstandings and indeed invalid data analyses. As we point out, a main source of the confusion is that the term was not explicitly defined when it was first used in the 1930's. Later authors made clear proposals to reach a consensus regarding the term. This resul… ▽ More There exists no consensus on the meaning of the term "randomization test". Contradicting uses of the term are leading to confusion, misunderstandings and indeed invalid data analyses. As we point out, a main source of the confusion is that the term was not explicitly defined when it was first used in the 1930's. Later authors made clear proposals to reach a consensus regarding the term. This resulted in some level of agreement around the 1970's. However, in the last few decades, the term has often been used in ways that contradict these proposals. This paper provides an overview of the history of the term per se, for the first time tracing it back to 1937. This will hopefully lead to more agreement on terminology and less confusion on the related fundamental concepts. △ Less

Submitted 13 June, 2023; originally announced June 2023.

MSC Class: 62G10

Journal ref: The American Statistician, 2024

arXiv:2209.13918 [pdf, other]

Inference in generalized linear models with robustness to misspecified variances

Authors: Riccardo De Santis, Jelle J. Goeman, Jesse Hemerik, Samuel Davenport, Livio Finos

Abstract: Generalized linear models usually assume a common dispersion parameter, an assumption that is seldom true in practice. Consequently, standard parametric methods may suffer appreciable loss of type I error control. As an alternative, we present a semi-parametric group-invariance method based on sign flipping of score contributions. Our method requires only the correct specification of the mean mode… ▽ More Generalized linear models usually assume a common dispersion parameter, an assumption that is seldom true in practice. Consequently, standard parametric methods may suffer appreciable loss of type I error control. As an alternative, we present a semi-parametric group-invariance method based on sign flipping of score contributions. Our method requires only the correct specification of the mean model, but is robust against any misspecification of the variance. We present tests for single as well as multiple regression coefficients. The test is asymptotically valid but shows excellent performance in small samples. We illustrate the method using RNA sequencing count data, for which it is difficult to model the overdispersion correctly. The method is available in the R library flipscores. △ Less

Submitted 13 September, 2024; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2208.11570 [pdf, other]

Flexible control of the median of the false discovery proportion

Authors: Jesse Hemerik, Aldo Solari, Jelle J Goeman

Abstract: We introduce a multiple testing procedure that controls the median of the proportion of false discoveries (FDP) in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini-Hochberg method, which controls the mean of the FDP. Our method allows freely choosing one or several values of alpha after seeing the data -- unlike Benjamini-Hochberg, which… ▽ More We introduce a multiple testing procedure that controls the median of the proportion of false discoveries (FDP) in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini-Hochberg method, which controls the mean of the FDP. Our method allows freely choosing one or several values of alpha after seeing the data -- unlike Benjamini-Hochberg, which can be very liberal when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the FDP, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values. △ Less

Submitted 13 March, 2024; v1 submitted 24 August, 2022; originally announced August 2022.

MSC Class: 62F03

arXiv:2202.00967 [pdf, other]

More Efficient Exact Group-Invariance Testing: using a Representative Subgroup

Authors: Nick W. Koning, Jesse Hemerik

Abstract: Non-parametric tests based on permutation, rotation or sign-flipping are examples of group-invariance tests. These tests test invariance of the null distribution under a set of transformations that has a group structure, in the algebraic sense. Such groups are often huge, which makes it computationally infeasible to test using the entire group. Hence, it is standard practice to test using a random… ▽ More Non-parametric tests based on permutation, rotation or sign-flipping are examples of group-invariance tests. These tests test invariance of the null distribution under a set of transformations that has a group structure, in the algebraic sense. Such groups are often huge, which makes it computationally infeasible to test using the entire group. Hence, it is standard practice to test using a randomly sampled set of transformations from the group. This random sample still needs to be substantial to obtain good power and replicability. We improve upon this standard practice by using a well-designed subgroup of transformations instead of a random sample. The resulting subgroup-invariance test is still exact, as invariance under a group implies invariance under its subgroups. We illustrate this in a generalized location model and obtain more powerful tests based on the same number of transformations. In particular, we show that a subgroup-invariance test is consistent for lower signal-to-noise ratios than a test based on a random sample. For the special case of a normal location model and a particular design of the subgroup, we show that the power improvement is equivalent to the power difference between a Monte Carlo $Z$-test and a Monte Carlo $t$-test. △ Less

Submitted 22 November, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

MSC Class: 62G10; 62G09

arXiv:2012.00368 [pdf, other]

Permutation-based true discovery proportions for functional Magnetic Resonance Imaging cluster analysis

Authors: Angela Andreella, Jesse Hemerik, Wouter Weeda, Livio Finos, Jelle Goeman

Abstract: We propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of hypotheses. These bounds are simultaneously valid with high confidence. The methodology is particularly useful in functional Magnetic Resonance Imaging cluster analysis, where it provides a confidence statem… ▽ More We propose a permutation-based method for testing a large collection of hypotheses simultaneously. Our method provides lower bounds for the number of true discoveries in any selected subset of hypotheses. These bounds are simultaneously valid with high confidence. The methodology is particularly useful in functional Magnetic Resonance Imaging cluster analysis, where it provides a confidence statement on the percentage of truly activated voxels within clusters of voxels, avoiding the well-known spatial specificity paradox. We offer a user-friendly tool to estimate the percentage of true discoveries for each cluster while controlling the family-wise error rate for multiple testing and taking into account that the cluster was chosen in a data-driven way. The method adapts to the spatial correlation structure that characterizes functional Magnetic Resonance Imaging data, gaining power over parametric approaches. △ Less

Submitted 26 January, 2023; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:2007.02844 [pdf, other]

On optimal two-stage testing of multiple mediators

Authors: Vera Djordjilović, Jesse Hemerik, Magne Thoresen

Abstract: Mediation analysis in high-dimensional settings often involves identifying potential mediators among a large number of measured variables. For this purpose, a two-step familywise error rate procedure called ScreenMin has been recently proposed (Djordjilović et al. 2019). In ScreenMin, variables are first screened and only those that pass the screening are tested. The proposed threshold for selecti… ▽ More Mediation analysis in high-dimensional settings often involves identifying potential mediators among a large number of measured variables. For this purpose, a two-step familywise error rate procedure called ScreenMin has been recently proposed (Djordjilović et al. 2019). In ScreenMin, variables are first screened and only those that pass the screening are tested. The proposed threshold for selection has been shown to guarantee asymptotic familywise error rate. In this work, we investigate the impact of the selection threshold on the finite sample familywise error rate. We derive a power maximizing selection threshold and show that it is well approximated by an adaptive threshold of Wang et al. (2016). We illustrate the investigated procedures on a case-control study examining the effect of fish intake on the risk of colorectal adenoma. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: 20 pages, 5 gifures

arXiv:2001.01466 [pdf, ps, other]

Permutation testing in high-dimensional linear models: an empirical investigation

Authors: Jesse Hemerik, Magne Thoresen, Livio Finos

Abstract: Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates. Permutation-based tests are valuable in particular because they can be highly robust to violations of the standard linear model, such as non-normality and hete… ▽ More Permutation testing in linear models, where the number of nuisance coefficients is smaller than the sample size, is a well-studied topic. The common approach of such tests is to permute residuals after regressing on the nuisance covariates. Permutation-based tests are valuable in particular because they can be highly robust to violations of the standard linear model, such as non-normality and heteroscedasticity. Moreover, in some cases they can be combined with existing, powerful permutation-based multiple testing methods. Here, we propose permutation tests for models where the number of nuisance coefficients exceeds the sample size. The performance of the novel tests is investigated with simulations. In a wide range of simulation scenarios our proposed permutation methods provided appropriate type I error rate control, unlike some competing tests, while having good power. △ Less

Submitted 8 October, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Comments: Accepted for publication in Journal of Statistical Computation and Simulation

MSC Class: 62G09

arXiv:1912.02633 [pdf, ps, other]

doi 10.1111/insr.12431

Another look at the Lady Tasting Tea and differences between permutation tests and randomization tests

Authors: Jesse Hemerik, Jelle J. Goeman

Abstract: The statistical literature is known to be inconsistent in the use of the terms "permutation test" and "randomization test". Several authors succesfully argue that these terms should be used to refer to two distinct classes of tests and that there are major conceptual differences between these classes. The present paper explains an important difference in mathematical reasoning between these classe… ▽ More The statistical literature is known to be inconsistent in the use of the terms "permutation test" and "randomization test". Several authors succesfully argue that these terms should be used to refer to two distinct classes of tests and that there are major conceptual differences between these classes. The present paper explains an important difference in mathematical reasoning between these classes: a permutation test fundamentally requires that the set of permutations has a group structure, in the algebraic sense; the reasoning behind a randomization test is not based on such a group structure and it is possible to use an experimental design that does not correspond to a group. In particular, we can use a randomization scheme where the number of possible treatment patterns is larger than in standard experimental designs. This leads to exact \emph{p}-values of improved resolution, providing increased power for very small significance levels, at the cost of decreased power for larger significance levels. We discuss applications in randomized trials and elsewhere. Further, we explain that Fisher's famous Lady Tasting Tea experiment, which is commonly referred to as the first permutation test, is in fact a randomization test. This distinction is important to avoid confusion and invalid tests. △ Less

Submitted 6 October, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: International Statistical Review. Early view version (2020)

MSC Class: 62G10

arXiv:1911.00862 [pdf, other]

Optimal two-stage testing of multiple mediators

Authors: Vera Djordjilović, Jesse Hemerik, Magne Thoresen

Abstract: Mediation analysis in high-dimensional settings often involves identifying potential mediators among a large number of measured variables. For this purpose, a two step familywise error rate (FWER) procedure called ScreenMin has been recently proposed (Djordjilović et al. 2019). In ScreenMin, variables are first screened and only those that pass the screening are tested. The proposed threshold for… ▽ More Mediation analysis in high-dimensional settings often involves identifying potential mediators among a large number of measured variables. For this purpose, a two step familywise error rate (FWER) procedure called ScreenMin has been recently proposed (Djordjilović et al. 2019). In ScreenMin, variables are first screened and only those that pass the screening are tested. The proposed threshold for selection has been shown to guarantee asymptotic FWER. In this work, we investigate the impact of the selection threshold on the finite sample FWER. We derive power maximizing selection threshold and show that it is well approximated by an adaptive threshold of Wang et al. (2016). We study the performance of the proposed procedures in a simulation study, and apply them to a case-control study examining the effect of fish intake on the risk of colorectal adenoma. △ Less

Submitted 3 November, 2019; originally announced November 2019.

arXiv:1909.03796 [pdf, ps, other]

doi 10.1111/rssb.12369

Robust testing in generalized linear models by sign-flipping score contributions

Authors: Jesse Hemerik, Jelle J Goeman, Livio Finos

Abstract: Generalized linear models are often misspecified due to overdispersion, heteroscedasticity and ignored nuisance variables. Existing quasi-likelihood methods for testing in misspecified models often do not provide satisfactory type-I error rate control. We provide a novel semi-parametric test, based on sign-flipping individual score contributions. The tested parameter is allowed to be multi-dimensi… ▽ More Generalized linear models are often misspecified due to overdispersion, heteroscedasticity and ignored nuisance variables. Existing quasi-likelihood methods for testing in misspecified models often do not provide satisfactory type-I error rate control. We provide a novel semi-parametric test, based on sign-flipping individual score contributions. The tested parameter is allowed to be multi-dimensional and even high-dimensional. Our test is often robust against the mentioned forms of misspecification and provides better type-I error control than its competitors. When nuisance parameters are estimated, our basic test becomes conservative. We show how to take nuisance estimation into account to obtain an asymptotically exact test. Our proposed test is asymptotically equivalent to its parametric counterpart. △ Less

Submitted 8 May, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: To appear in Journal of the Royal Statistical Society: Series B (Methodology). Early view version (2020)

MSC Class: 62G10

arXiv:1901.04885 [pdf, other]

doi 10.1214/20-AOS1999

Only Closed Testing Procedures are Admissible for Controlling False Discovery Proportions

Authors: Jelle Goeman, Jesse Hemerik, Aldo Solari

Abstract: We consider the class of all multiple testing methods controlling tail probabilities of the false discovery proportion, either for one random set or simultaneously for many such sets. This class encompasses methods controlling familywise error rate, generalized familywise error rate, false discovery exceedance, joint error rate, simultaneous control of all false discovery proportions, and others,… ▽ More We consider the class of all multiple testing methods controlling tail probabilities of the false discovery proportion, either for one random set or simultaneously for many such sets. This class encompasses methods controlling familywise error rate, generalized familywise error rate, false discovery exceedance, joint error rate, simultaneous control of all false discovery proportions, and others, as well as seemingly unrelated methods such as gene set testing in genomics and cluster inference methods in neuroimaging. We show that all such methods are either equivalent to a closed testing method, or are uniformly improved by one. Moreover, we show that a closed testing method is admissible as a method controlling tail probabilities of false discovery proportions if and only if all its local tests are admissible. This implies that, when designing such methods, it is sufficient to restrict attention to closed testing methods only. We demonstrate the practical usefulness of this design principle by constructing a uniform improvement of a recently proposed method. △ Less

Submitted 29 April, 2022; v1 submitted 15 January, 2019; originally announced January 2019.

MSC Class: 62F03

arXiv:1808.05528 [pdf, ps, other]

doi 10.1093/biomet/asz021

Permutation-based simultaneous confidence bounds for the false discovery proportion

Authors: Jesse Hemerik, Aldo Solari, Jelle J. Goeman

Abstract: When multiple hypotheses are tested, interest is often in ensuring that the proportion of false discoveries (FDP) is small with high confidence. In this paper, confidence upper bounds for the FDP are constructed, which are simultaneous over all rejection cut-offs. In particular this allows the user to select a set of hypotheses post hoc such that the FDP lies below some constant with high confiden… ▽ More When multiple hypotheses are tested, interest is often in ensuring that the proportion of false discoveries (FDP) is small with high confidence. In this paper, confidence upper bounds for the FDP are constructed, which are simultaneous over all rejection cut-offs. In particular this allows the user to select a set of hypotheses post hoc such that the FDP lies below some constant with high confidence. Our method uses permutations to account for the dependence structure in the data. So far only Meinshausen provided an exact, permutation-based and computationally feasible method for simultaneous FDP bounds. We provide an exact method, which uniformly improves this procedure. Further, we provide a generalization of this method. It lets the user select the shape of the simultaneous confidence bounds. This gives the user more freedom in determining the power properties of the method. Interestingly, several existing permutation methods, such as Significance Analysis of Microarrays (SAM) and Westfall and Young's maxT method, are obtained as special cases. △ Less

Submitted 16 August, 2018; originally announced August 2018.

MSC Class: 62G09; 62H15

Journal ref: Biometrika, 106(3):635-649, 2019

Showing 1–15 of 15 results for author: Hemerik, J