-
Confidence intervals centred on bootstrap smoothed estimators: an impossibility result
Authors:
Paul Kabaila,
Christeen Wijethunga
Abstract:
Recently, Kabaila and Wijethunga assessed the performance of a confidence interval centred on a bootstrap smoothed estimator, with width proportional to an estimator of Efron's delta method approximation to the standard deviation of this estimator. They used a testbed situation consisting of two nested linear regression models, with error variance assumed known, and model selection using a prelimi…
▽ More
Recently, Kabaila and Wijethunga assessed the performance of a confidence interval centred on a bootstrap smoothed estimator, with width proportional to an estimator of Efron's delta method approximation to the standard deviation of this estimator. They used a testbed situation consisting of two nested linear regression models, with error variance assumed known, and model selection using a preliminary hypothesis test. This assessment was in terms of coverage and scaled expected length, where the scaling is with respect to the expected length of the usual confidence interval with the same minimum coverage probability. They found that this confidence interval has scaled expected length that (a) has a maximum value that may be much greater than 1 and (b) is greater than a number slightly less than 1 when the simpler model is correct. We therefore ask the following question. For a confidence interval, centred on the bootstrap smoothed estimator, does there exist a formula for its data-based width such that, in this testbed situation, it has the desired minimum coverage and scaled expected length that (a) has a maximum value that is not too much larger than 1 and (b) is substantially less than 1 when the simpler model is correct? Using a recent decision-theoretic performance bound due to Kabaila and Kong, it is shown that the answer to this question is `no' for a wide range of scenarios.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Admissibility of the usual confidence set for the mean of a univariate or bivariate normal population: The unknown-variance case
Authors:
Hannes Leeb,
Paul Kabaila
Abstract:
In the Gaussian linear regression model (with unknown mean and variance), we show that the standard confidence set for one or two regression coefficients is admissible in the sense of Joshi (1969). This solves a long-standing open problem in mathematical statistics, and this has important implications on the performance of modern inference procedures post-model-selection or post-shrinkage, particu…
▽ More
In the Gaussian linear regression model (with unknown mean and variance), we show that the standard confidence set for one or two regression coefficients is admissible in the sense of Joshi (1969). This solves a long-standing open problem in mathematical statistics, and this has important implications on the performance of modern inference procedures post-model-selection or post-shrinkage, particularly in situations where the number of parameters is larger than the sample size. As a technical contribution of independent interest, we introduce a new class of conjugate priors for the Gaussian location-scale model.
△ Less
Submitted 21 September, 2018; v1 submitted 20 September, 2018;
originally announced September 2018.
-
Two sources of poor coverage of confidence intervals after model selection
Authors:
Paul Kabaila,
Rheanna Mainzer
Abstract:
We compare the following two sources of poor coverage of post-model-selection confidence intervals: the preliminary data-based model selection sometimes chooses the wrong model and the data used to choose the model is re-used for the construction of the confidence interval.
We compare the following two sources of poor coverage of post-model-selection confidence intervals: the preliminary data-based model selection sometimes chooses the wrong model and the data used to choose the model is re-used for the construction of the confidence interval.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
Conditional assessment of the impact of a Hausman pretest on confidence intervals
Authors:
Paul Kabaila,
Rheanna Mainzer,
Davide Farchione
Abstract:
We assess the impact of a Hausman pretest, applied to panel data, on a confidence interval for the slope, conditional on the observed values of the time-varying covariate. This assessment has the advantages that it (a) relates to the values of this covariate at hand, (b) is valid irrespective of how this covariate is generated, (c) uses finite sample results and (d) results in an assessment that i…
▽ More
We assess the impact of a Hausman pretest, applied to panel data, on a confidence interval for the slope, conditional on the observed values of the time-varying covariate. This assessment has the advantages that it (a) relates to the values of this covariate at hand, (b) is valid irrespective of how this covariate is generated, (c) uses finite sample results and (d) results in an assessment that is determined by the values of this covariate and only 2 unknown parameters. Our conditional analysis shows that the confidence interval constructed after a Hausman pretest should not be used.
△ Less
Submitted 26 November, 2015;
originally announced November 2015.
-
The impact of a Hausman pretest, applied to panel data, on the coverage probability of confidence intervals
Authors:
Paul Kabaila,
Rheanna Mainzer,
Davide Farchione
Abstract:
In the analysis of panel data that includes a time-varying covariate, a Hausman pretest is commonly used to decide whether subsequent inference is made using the random effects model or the fixed effects model. We consider the effect of this pretest on the coverage probability of a confidence interval for the slope parameter. We prove three new finite sample theorems that make it easy to assess, f…
▽ More
In the analysis of panel data that includes a time-varying covariate, a Hausman pretest is commonly used to decide whether subsequent inference is made using the random effects model or the fixed effects model. We consider the effect of this pretest on the coverage probability of a confidence interval for the slope parameter. We prove three new finite sample theorems that make it easy to assess, for a wide variety of circumstances, the effect of the Hausman pretest on the minimum coverage probability of this confidence interval. Our results show that for the small levels of significance of the Hausman pretest commonly used in applications, the minimum coverage probability of the confidence interval for the slope parameter can be far below nominal.
△ Less
Submitted 28 January, 2015; v1 submitted 17 December, 2014;
originally announced December 2014.
-
A comparison of Bayesian and frequentist interval estimators in regression that utilize uncertain prior information
Authors:
Paul Kabaila,
Gayan Dharmarathne
Abstract:
Consider a linear regression model with regression parameter beta and normally distributed errors. Suppose that the parameter of interest is theta = a^T beta where a is a specified vector. Define the parameter tau = c^T beta - t where c and t are specified and a and c are linearly independent. Also suppose that we have uncertain prior information that tau = 0. Kabaila and Giri, 2009, JSPI, describ…
▽ More
Consider a linear regression model with regression parameter beta and normally distributed errors. Suppose that the parameter of interest is theta = a^T beta where a is a specified vector. Define the parameter tau = c^T beta - t where c and t are specified and a and c are linearly independent. Also suppose that we have uncertain prior information that tau = 0. Kabaila and Giri, 2009, JSPI, describe a new frequentist 1-alpha confidence interval for theta that utilizes this uncertain prior information. We compare this confidence interval with Bayesian 1-alpha equi-tailed and shortest credible intervals for theta that result from a prior density for tau that is a mixture of a rectangular "slab" and a Dirac delta function "spike", combined with noninformative prior densities for the other parameters of the model. We show that these frequentist and Bayesian interval estimators depend on the data in very different ways. We also consider some close variants of this prior distribution that lead to Bayesian and frequentist interval estimators with greater similarity. Nonetheless, as we show, substantial differences between these interval estimators remain.
△ Less
Submitted 14 January, 2014;
originally announced January 2014.
-
A new recentered confidence sphere for the multivariate normal mean
Authors:
Waruni Abeysekera,
Paul Kabaila
Abstract:
We describe a new recentered confidence sphere for the mean, theta, of a multivariate normal distribution. This sphere is centred on the positive-part James-Stein estimator, with radius that is a piecewise cubic Hermite interpolating polynomial function of the norm of the data vector. This radius function is determined by numerically minimizing the scaled expected volume, at theta = 0, of this con…
▽ More
We describe a new recentered confidence sphere for the mean, theta, of a multivariate normal distribution. This sphere is centred on the positive-part James-Stein estimator, with radius that is a piecewise cubic Hermite interpolating polynomial function of the norm of the data vector. This radius function is determined by numerically minimizing the scaled expected volume, at theta = 0, of this confidence sphere, subject to the coverage constraint. We use the computationally-convenient formula, derived by Casella and Hwang [3], for the coverage probability of a recentered confidence sphere. Casella and Hwang, op. cit., describe a recentered confidence sphere that is also centred on the positive-part James-Stein estimator, but with radius function determined by empirical Bayes considerations. Our new recentered confidence sphere compares favourably with this confidence sphere, in terms of both the minimum coverage probability and the scaled expected volume at theta = 0.
△ Less
Submitted 9 December, 2014; v1 submitted 11 June, 2013;
originally announced June 2013.
-
On confidence intervals in regression that utilize uncertain prior information about a vector parameter
Authors:
Paul Kabaila,
Dilshani Tissera
Abstract:
Consider a linear regression model with n-dimensional response vector, p-dimensional regression parameter beta and independent normally distributed errors. Suppose that the parameter of interest is theta = a^T beta where a is a specified vector. Define the s-dimensional parameter vector tau=C^T beta-t where C and t are specified. Also suppose that we have uncertain prior information that tau=0. Pa…
▽ More
Consider a linear regression model with n-dimensional response vector, p-dimensional regression parameter beta and independent normally distributed errors. Suppose that the parameter of interest is theta = a^T beta where a is a specified vector. Define the s-dimensional parameter vector tau=C^T beta-t where C and t are specified. Also suppose that we have uncertain prior information that tau=0. Part of our evaluation of a frequentist confidence interval for theta is the ratio (expected length of this confidence interval)/(expected length of standard 1-alpha confidence interval), the scaled expected length of this interval. We say that a 1-alpha confidence interval for theta utilizes this uncertain prior information if (a) the scaled expected length of this interval is significantly less than 1 when tau=0, (b) the maximum value of the scaled expected length is not too large and (c) this confidence interval reverts to the standard 1-alpha confidence interval when the data happen to strongly contradict the prior information. Let hat{Theta}=a^T hat{beta} and hat{tau}=C^T hat{beta}-t, where hat{beta} is the least squares estimator of beta. We consider the particular case that that E((hat{tau}-tau)(hat{Theta}-theta))=0, so that hat{Theta} and hat{tau} are independent. We present a new 1-alpha confidence interval for theta that utilizes the uncertain prior informationthat tau=0. The following problem is used to illustrate the application of this new confidence interval. Consider a 2^3 factorial experiment with 1 replicate. Suppose that the parameter of interest theta is a specified linear combination of the main effects. Assume that the three-factorinteraction is zero. Also suppose that we have uncertain prior information that all of the two-factor interactions are zero. Our aim is to find a frequentist 0.95 confidence interval for theta that utilizes this uncertain prior information.
△ Less
Submitted 24 April, 2013; v1 submitted 27 March, 2013;
originally announced March 2013.
-
On randomized confidence intervals for the binomial probability
Authors:
Paul Kabaila
Abstract:
Suppose that X_1,X_2,...,X_n are independent and identically Bernoulli(theta) distributed. Also suppose that our aim is to find an exact confidence interval for theta that is the intersection of a 1-α/2 upper confidence interval and a 1-α/2 lower confidence interval. The Clopper-Pearson interval is the standard such confidence interval for theta, which is widely used in practice. We consider the r…
▽ More
Suppose that X_1,X_2,...,X_n are independent and identically Bernoulli(theta) distributed. Also suppose that our aim is to find an exact confidence interval for theta that is the intersection of a 1-α/2 upper confidence interval and a 1-α/2 lower confidence interval. The Clopper-Pearson interval is the standard such confidence interval for theta, which is widely used in practice. We consider the randomized confidence interval of Stevens, 1950 and present some extensions, including pseudorandomized confidence intervals. We also consider the "data-randomized" confidence interval of Korn, 1987 and point out some additional attractive features of this interval. We also contribute to the discussion about the practical use of such confidence intervals.
△ Less
Submitted 26 February, 2013;
originally announced February 2013.
-
Letter to the Editor: Some comments on: On construction of the smallest one-sided confidence interval for the difference of two proportions
Authors:
Chris Lloyd,
Paul Kabaila
Abstract:
Letter to the Editor: Some comments on "On construction of the smallest one-sided confidence interval for the difference of two proportions" by Weizhen Wang [arXiv:1002.4945].
Letter to the Editor: Some comments on "On construction of the smallest one-sided confidence interval for the difference of two proportions" by Weizhen Wang [arXiv:1002.4945].
△ Less
Submitted 15 November, 2012;
originally announced November 2012.
-
Note on a paradox in decision-theoretic interval estimation
Authors:
Paul Kabaila
Abstract:
Confidence intervals are assessed according to two criteria, namely expected length and coverage probability. In an attempt to apply the decision-theoretic method to finding a good confidence interval, a loss function that is a linear combination of the interval length and the indicator function that the interval includes the parameter of interest has been proposed. We consider the particular case…
▽ More
Confidence intervals are assessed according to two criteria, namely expected length and coverage probability. In an attempt to apply the decision-theoretic method to finding a good confidence interval, a loss function that is a linear combination of the interval length and the indicator function that the interval includes the parameter of interest has been proposed. We consider the particular case that the parameter of interest is the normal mean, when the variance is unknown. Casella, Hwang and Robert, Statistica Sinica, 1993, have shown that this loss function, combined with the standard noninformative prior, leads to a generalized Bayes rule that is a confidence interval for this parameter which has "paradoxical behaviour". We show that a simple modification of this loss function, combined with the same prior, leads to a generalized Bayes rule that is the usual confidence interval i.e. the "paradoxical behaviour" is removed.
△ Less
Submitted 22 May, 2012;
originally announced May 2012.
-
Confidence intervals in regression centred on the SCAD estimator
Authors:
Davide Farchione,
Paul Kabaila
Abstract:
Consider a linear regression model. Fan and Li (2001) describe the smoothly clipped absolute deviation (SCAD) point estimator of the regression parameter vector. To gain insight into the properties of this estimator, they consider an orthonormal design matrix and focus on the estimation of a specified component of this vector. They show that the SCAD point estimator has three attractive properties…
▽ More
Consider a linear regression model. Fan and Li (2001) describe the smoothly clipped absolute deviation (SCAD) point estimator of the regression parameter vector. To gain insight into the properties of this estimator, they consider an orthonormal design matrix and focus on the estimation of a specified component of this vector. They show that the SCAD point estimator has three attractive properties. We answer the question: To what extent can an interval estimator, centred on the SCAD estimator, have similar attractive properties?
△ Less
Submitted 30 May, 2012; v1 submitted 5 March, 2012;
originally announced March 2012.
-
The performance of a two-stage analysis of ABAB/BABA crossover trials
Authors:
Paul Kabaila,
Matthew Vicendese
Abstract:
Freeman has considered the following two-stage procedure for finding a confidence interval for the treatment difference theta, using data from an AB/BA crossover trial. In the first stage, a preliminary test of the null hypothesis that the differential carryover is zero, is carried out. If this hypothesis is accepted then the confidence interval for theta is constructed assuming that the different…
▽ More
Freeman has considered the following two-stage procedure for finding a confidence interval for the treatment difference theta, using data from an AB/BA crossover trial. In the first stage, a preliminary test of the null hypothesis that the differential carryover is zero, is carried out. If this hypothesis is accepted then the confidence interval for theta is constructed assuming that the differential carryover is zero. If, on the other hand, this hypothesis is rejected then this confidence interval is constructed using only data from the first period. Freeman has shown that this confidence interval has minimum coverage probability far below nominal. He therefore concludes that this confidence interval should not be used. In the present paper, we analyse the performance of a similar two-stage procedure for an ABAB/BABA crossover trial. This trial differs in very significant ways from an AB/BA crossover trial, including the fact that for an ABAB/BABA crossover trial there is an unbiased estimator of the differential carryover that is unaffected by between-subject variation. Despite these great differences, we arrive at the same conclusion as Freeman. Namely, that the confidence interval resulting from the two-stage procedure should not be used.
△ Less
Submitted 26 September, 2011; v1 submitted 28 January, 2011;
originally announced January 2011.
-
The coverage probabililty of confidence intervals in regression after a preliminary F test
Authors:
Paul Kabaila,
Davide Farchione
Abstract:
Consider a linear regression model with regression parameter beta=(beta_1,..., beta_p) and independent normal errors. Suppose the parameter of interest is theta = a^T beta, where a is specified. Define the s-dimensional parameter vector tau = C^T beta - t, where C and t are specified. Suppose that we carry out a preliminary F test of the null hypothesis H_0: tau = 0 against the alternative hypothe…
▽ More
Consider a linear regression model with regression parameter beta=(beta_1,..., beta_p) and independent normal errors. Suppose the parameter of interest is theta = a^T beta, where a is specified. Define the s-dimensional parameter vector tau = C^T beta - t, where C and t are specified. Suppose that we carry out a preliminary F test of the null hypothesis H_0: tau = 0 against the alternative hypothesis H_1: tau not equal to 0. It is common statistical practice to then construct a confidence interval for theta with nominal coverage 1-alpha, using the same data, based on the assumption that the selected model had been given to us a priori(as the true model). We call this the naive 1-alpha confidence interval for theta. This assumption is false and it may lead to this confidence interval having minimum coverage probability far below 1-alpha, making it completely inadequate. Our aim is to compute this minimum coverage probability. It is straightforward to find an expression for the coverage probability of this confidence interval that is a multiple integral of dimension s+1. However, we derive a new elegant and computationally-convenient formula for this coverage probability. For s=2 this formula is a sum of a triple and a double integral and for all s>2 this formula is a sum of a quadruple and a double integral. This makes it easy to compute the minimum coverage probability of the naive confidence interval, irrespective of how large s is. A very important practical application of this formula is to the analysis of covariance. In this context, tau can be defined so that H_0 expresses the hypothesis of "parallelism". Applied statisticians commonly recommend carrying out a preliminary F test of this hypothesis. We illustrate the application of our formula with a real-life analysis of covariance data set and a preliminary F test for "parallelism". We show that the naive 0.95 confidence interval has minimum coverage probability 0.0846, showing that it is completely inadequate.
△ Less
Submitted 11 March, 2010;
originally announced March 2010.
-
Admissibility of the usual confidence interval in linear regression
Authors:
Paul Kabaila,
Khageswor Giri,
Hannes Leeb
Abstract:
Consider a linear regression model with independent and identically normally distributed random errors. Suppose that the parameter of interest is a specified linear combination of the regression parameters. We prove that the usual confidence interval for this parameter is admissible within a broad class of confidence intervals.
Consider a linear regression model with independent and identically normally distributed random errors. Suppose that the parameter of interest is a specified linear combination of the regression parameters. We prove that the usual confidence interval for this parameter is admissible within a broad class of confidence intervals.
△ Less
Submitted 17 January, 2010;
originally announced January 2010.
-
The Asymptotic Efficiency of Improved Prediction Intervals
Authors:
Paul Kabaila,
Khreshna Syuhada
Abstract:
Barndorff-Nielsen and Cox (1994, p.319) modify an estimative prediction limit to obtain an improved prediction limit with better coverage properties. Kabaila and Syuhada (2008) present a simulation-based approximation to this improved prediction limit, which avoids the extensive algebraic manipulations required for this modification. We present a modification of an estimative prediction interval…
▽ More
Barndorff-Nielsen and Cox (1994, p.319) modify an estimative prediction limit to obtain an improved prediction limit with better coverage properties. Kabaila and Syuhada (2008) present a simulation-based approximation to this improved prediction limit, which avoids the extensive algebraic manipulations required for this modification. We present a modification of an estimative prediction interval, analogous to the Barndorff-Nielsen and Cox modification, to obtain an improved prediction interval with better coverage properties. We also present an analogue, for the prediction interval context, of this simulation-based approximation. The parameter estimator on which the estimative and improved prediction limits and intervals are based is assumed to have the same asymptotic distribution as the (conditional) maximum likelihood estimator. The improved prediction limit and interval depend on the asymptotic conditional bias of this estimator. This bias can be very sensitive to very small changes in the estimator. It may require considerable effort to find this bias. We show, however, that the improved prediction limit and interval have asymptotic efficiencies that are functionally independent of this bias. Thus, improved prediction limits and intervals obtained using the Barndorff-Nielsen and Cox type of methodology can conveniently be based on the (conditional) maximum likelihood estimator, whose asymptotic conditional bias is given by the formula of Vidoni (2004, p.144). Also, improved prediction limits and intervals obtained using Kabaila and Syuhada type approximations have asymptotic efficiencies that are independent of the estimator on which these intervals are based.
△ Less
Submitted 13 January, 2009;
originally announced January 2009.
-
Simultaneous confidence intervals for the population cell means, for two-by-two factorial data, that utilize uncertain prior information
Authors:
Paul Kabaila,
Khageswor Giri
Abstract:
Consider a two-by-two factorial experiment with more than 1 replicate. Suppose that we have uncertain prior information that the two-factor interaction is zero. We describe new simultaneous frequentist confidence intervals for the 4 population cell means, with simultaneous confidence coefficient 1-alpha, that utilize this prior information in the following sense. These simultaneous confidence inte…
▽ More
Consider a two-by-two factorial experiment with more than 1 replicate. Suppose that we have uncertain prior information that the two-factor interaction is zero. We describe new simultaneous frequentist confidence intervals for the 4 population cell means, with simultaneous confidence coefficient 1-alpha, that utilize this prior information in the following sense. These simultaneous confidence intervals define a cube with expected volume that (a) is relatively small when the two-factor interaction is zero and (b) has maximum value that is not too large. Also, these intervals coincide with the standard simultaneous confidence intervals obtained by Tukey's method, with simultaneous confidence coefficient 1-alpha, when the data strongly contradict the prior information that the two-factor interaction is zero. We illustrate the application of these new simultaneous confidence intervals to a real data set.
△ Less
Submitted 30 April, 2012; v1 submitted 9 December, 2008;
originally announced December 2008.
-
Large-Sample Confidence Intervals for the Treatment Difference in a Two-Period Crossover Trial, Utilizing Prior Information
Authors:
Paul Kabaila,
Khageswor Giri
Abstract:
Consider a two-treatment, two-period crossover trial, with responses that are continuous random variables. We find a large-sample frequentist 1-alpha confidence interval for the treatment difference that utilizes the uncertain prior information that there is no differential carryover effect.
Consider a two-treatment, two-period crossover trial, with responses that are continuous random variables. We find a large-sample frequentist 1-alpha confidence interval for the treatment difference that utilizes the uncertain prior information that there is no differential carryover effect.
△ Less
Submitted 15 October, 2008; v1 submitted 10 July, 2008;
originally announced July 2008.
-
Upper bounds on the minimum coverage probability of confidence intervals in regression after variable selection
Authors:
Paul Kabaila,
Khageswor Giri
Abstract:
We consider a linear regression model, with the parameter of interest a specified linear combination of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or minimizing AIC) is used to select a model. It is common statistical practice to then construct a confidence interval for the parameter of interest based on t…
▽ More
We consider a linear regression model, with the parameter of interest a specified linear combination of the regression parameter vector. We suppose that, as a first step, a data-based model selection (e.g. by preliminary hypothesis tests or minimizing AIC) is used to select a model. It is common statistical practice to then construct a confidence interval for the parameter of interest based on the assumption that the selected model had been given to us a priori. This assumption is false and it can lead to a confidence interval with poor coverage properties. We provide an easily-computed finite sample upper bound (calculated by repeated numerical evaluation of a double integral) to the minimum coverage probability of this confidence interval. This bound applies for model selection by any of the following methods: minimum AIC, minimum BIC, maximum adjusted R-squared, minimum Mallows' Cp and t-tests. The importance of this upper bound is that it delineates general categories of design matrices and model selection procedures for which this confidence interval has poor coverage properties. This upper bound is shown to be a finite sample analogue of an earlier large sample upper bound due to Kabaila and Leeb.
△ Less
Submitted 6 November, 2007;
originally announced November 2007.
-
Confidence intervals for the normal mean utilizing prior information
Authors:
David Farchione,
Paul Kabaila
Abstract:
Consider X_1,X_2,...,X_n that are independent and identically N(mu,sigma^2) distributed. Suppose that we have uncertain prior information that mu = 0. We answer the question: to what extent can a frequentist 1-alpha confidence interval for mu utilize this prior information?
Consider X_1,X_2,...,X_n that are independent and identically N(mu,sigma^2) distributed. Suppose that we have uncertain prior information that mu = 0. We answer the question: to what extent can a frequentist 1-alpha confidence interval for mu utilize this prior information?
△ Less
Submitted 12 November, 2007; v1 submitted 27 September, 2007;
originally announced September 2007.