-
A Powerful Bootstrap Test of Independence in High Dimensions
Authors:
Mauricio Olivares,
Tomasz Olma,
Daniel Wilhelm
Abstract:
This paper proposes a nonparametric test of pairwise independence of one random variable from a large pool of other random variables. The test statistic is the maximum of several Chatterjee's rank correlations and critical values are computed via a block multiplier bootstrap. The test is shown to asymptotically control size uniformly over a large class of data-generating processes, even when the n…
▽ More
This paper proposes a nonparametric test of pairwise independence of one random variable from a large pool of other random variables. The test statistic is the maximum of several Chatterjee's rank correlations and critical values are computed via a block multiplier bootstrap. The test is shown to asymptotically control size uniformly over a large class of data-generating processes, even when the number of variables is much larger than sample size. The test is consistent against any fixed alternative. It can be combined with a stepwise procedure for selecting those variables from the pool that violate independence, while controlling the family-wise error rate. All formal results leave the dependence among variables in the pool completely unrestricted. In simulations, we find that our test is very powerful, outperforming existing tests in most scenarios considered, particularly in high dimensions and/or when the variables in the pool are dependent.
△ Less
Submitted 17 April, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
Optimal Data Collection for Randomized Control Trials
Authors:
Pedro Carneiro,
Sokbae Lee,
Daniel Wilhelm
Abstract:
In a randomized control trial, the precision of an average treatment effect estimator can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our p…
▽ More
In a randomized control trial, the precision of an average treatment effect estimator can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our procedure seeks to minimize the resulting average treatment effect estimator's mean squared error, subject to the researcher's budget constraint. We rely on a modification of an orthogonal greedy algorithm that is conceptually simple and easy to implement in the presence of a large number of potential covariates, and does not require any tuning parameters. In two empirical applications, we show that our procedure can lead to substantial gains of up to 58%, measured either in terms of reductions in data collection costs or in terms of improvements in the precision of the treatment effect estimator.
△ Less
Submitted 22 August, 2016; v1 submitted 11 March, 2016;
originally announced March 2016.
-
Nonparametric instrumental variable estimation under monotonicity
Authors:
Denis Chetverikov,
Daniel Wilhelm
Abstract:
The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark co…
▽ More
The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark contrast to the existing literature, the presence of a monotone instrument implies boundedness of our measure of ill-posedness when restricted to the space of monotone functions. Based on this result we derive a novel non-asymptotic error bound for the constrained estimator that imposes monotonicity of the regression function. For a given sample size, the bound is independent of the degree of ill-posedness as long as the regression function is not too steep. As an implication, the bound allows us to show that the constrained estimator converges at a fast, polynomial rate, independently of the degree of ill-posedness, in a large, but slowly shrinking neighborhood of constant functions. Our simulation study demonstrates significant finite-sample performance gains from imposing monotonicity even when the regression function is rather far from being a constant. We apply the constrained estimator to the problem of estimating gasoline demand functions from U.S. data.
△ Less
Submitted 19 July, 2015;
originally announced July 2015.