-
rquest: An R package for hypothesis tests and confidence intervals for quantiles and summary measures based on quantiles
Authors:
Luke A. Prendergast,
Shenal Dedduwakumara,
Robert G. Staudte
Abstract:
Sample quantiles, such as the median, are often better suited than the sample mean for summarising location characteristics of a data set. Similarly, linear combinations of sample quantiles and ratios of such linear combinations, e.g. the interquartile range and quantile-based skewness measures, are often used to quantify characteristics such as spread and skew. While often reported, it is uncommo…
▽ More
Sample quantiles, such as the median, are often better suited than the sample mean for summarising location characteristics of a data set. Similarly, linear combinations of sample quantiles and ratios of such linear combinations, e.g. the interquartile range and quantile-based skewness measures, are often used to quantify characteristics such as spread and skew. While often reported, it is uncommon to accompany quantile estimates with confidence intervals or standard errors. The rquest package provides a simple way to conduct hypothesis tests and derive confidence intervals for quantiles, linear combinations of quantiles, ratios of dependent linear combinations (e.g., Bowley's measure of skewness) and differences and ratios of all of the above for comparisons between independent samples. Many commonly used measures based on quantiles are included, although it is also very simple for users to define their own. Additionally, quantile-based measures of inequality are also considered. The methods are based on recent research showing that reliable distribution-free confidence intervals can be obtained, even for moderate sample sizes. Several examples are provided herein.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Slice Weighted Average Regression
Authors:
Marina Masioti,
Joshua Davies,
Amanda Shaker,
Luke A. Prendergast
Abstract:
It has previously been shown that ordinary least squares can be used to estimate the coefficients of the single-index model under only mild conditions. However, the estimator is non-robust leading to poor estimates for some models. In this paper we propose a new sliced least-squares estimator that utilizes ideas from Sliced Inverse Regression. Slices with problematic observations that contribute t…
▽ More
It has previously been shown that ordinary least squares can be used to estimate the coefficients of the single-index model under only mild conditions. However, the estimator is non-robust leading to poor estimates for some models. In this paper we propose a new sliced least-squares estimator that utilizes ideas from Sliced Inverse Regression. Slices with problematic observations that contribute to high variability in the estimator can easily be down-weighted to robustify the procedure. The estimator is simple to implement and can result in vast improvements for some models when compared to the usual least-squares approach. While the estimator was initially conceived with the single-index model in mind, we also show that multiple directions can be obtained, therefore providing another notable advantage of using slicing with least squares. Several simulation studies and a real data example are included, as well as some comparisons with some other recent methods.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
A note on switching eigenvalues under small perturbations
Authors:
Marina Masioti,
Connie S-N Li-Wai-Suen,
Luke A. Prendergast,
Amanda Shaker
Abstract:
Sensitivity of eigenvectors and eigenvalues of symmetric matrix estimates to the removal of a single observation have been well documented in the literature. However, a complicating factor can exist in that the rank of the eigenvalues may change due to the removal of an observation, and with that so too does the perceived importance of the corresponding eigenvector. We refer to this problem as "sw…
▽ More
Sensitivity of eigenvectors and eigenvalues of symmetric matrix estimates to the removal of a single observation have been well documented in the literature. However, a complicating factor can exist in that the rank of the eigenvalues may change due to the removal of an observation, and with that so too does the perceived importance of the corresponding eigenvector. We refer to this problem as "switching of eigenvalues". Since there is not enough information in the new eigenvalues post observation removal to indicate that this has happened, how do we know that this switching has occurred? In this paper, we show that approximations to the eigenvalues can be used to help determine when switching may have occurred. We then discuss possible actions researchers can take based on this knowledge, for example making better choices when it comes to deciding how many principal components should be retained and adjustments to approximate influence diagnostics that perform poorly when switching has occurred. Our results are easily applied to any eigenvalue problem involving symmetric matrix estimators. We highlight our approach with application to a real data example.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Extending the coefficient of variation for measuring heterogeneity following a meta-regression
Authors:
Maxwell Cairns,
Luke A. Prendergast
Abstract:
Meta-regression is often used to form hypotheses about what is associated with heterogeneity in a meta-analysis and to estimate the extent to which effects can vary between cohorts and other distinguishing factors. However, study-level variables, called moderators, that are available and used in the meta-regression analysis will rarely explain all of the heterogeneity. Therefore, measuring and try…
▽ More
Meta-regression is often used to form hypotheses about what is associated with heterogeneity in a meta-analysis and to estimate the extent to which effects can vary between cohorts and other distinguishing factors. However, study-level variables, called moderators, that are available and used in the meta-regression analysis will rarely explain all of the heterogeneity. Therefore, measuring and trying to understand residual heterogeneity is still important in a meta-regression, although it is not clear how some heterogeneity measures should be used in the meta-regression context. The coefficient of variation, and its variants, are useful measures of relative heterogeneity. We consider these measures in the context of meta-regression which allows researchers to investigate heterogeneity at different levels of the moderator and also average relative heterogeneity overall. We also provide CIs for the measures and our simulation studies show that these intervals have good coverage properties. We recommend that these measures and corresponding intervals could provide useful insights into moderators that may be contributing to the presence of heterogeneity in a meta-analysis and lead to a better understanding of estimated mean effects.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
On choosing optimal response transformations for dimension reduction
Authors:
Marina Masioti,
Luke A. Prendergast,
Amanda Shaker
Abstract:
It has previously been shown that response transformations can be very effective in improving dimension reduction outcomes for a continuous response. The choice of transformation used can make a big difference in the visualization of the response versus the dimension reduced regressors. In this article, we provide an automated approach for choosing parameters of transformation functions to seek op…
▽ More
It has previously been shown that response transformations can be very effective in improving dimension reduction outcomes for a continuous response. The choice of transformation used can make a big difference in the visualization of the response versus the dimension reduced regressors. In this article, we provide an automated approach for choosing parameters of transformation functions to seek optimal results. A criterion based on an influence measure between dimension reduction spaces is utilized for choosing the optimal parameter value of the transformation. Since influence measures can be time-consuming for large data sets, two efficient criteria are also provided. Given that a different transformation may be suitable for each direction required to form the subspace, we also employ an iterative approach to choosing optimal parameter values. Several simulation studies and a real data example highlight the effectiveness of the proposed methods.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Insights and inference for the proportion below the relative poverty line
Authors:
Dilanka S. Dedduwakumara,
Luke A. Prendergast,
Robert G. Staudte
Abstract:
We examine a commonly used relative poverty measure called the headcount ratio ($H_p$), defined to be the proportion of incomes falling below the relative poverty line, which is defined to be a fraction $p$ of the median income. We do this by considering this concept for theoretical income populations, and its potential for determining actual changes following transfer of incomes from the wealthy…
▽ More
We examine a commonly used relative poverty measure called the headcount ratio ($H_p$), defined to be the proportion of incomes falling below the relative poverty line, which is defined to be a fraction $p$ of the median income. We do this by considering this concept for theoretical income populations, and its potential for determining actual changes following transfer of incomes from the wealthy to those whose incomes fall below the relative poverty line. In the process we derive and evaluate the performance of large sample confidence intervals for $H_p$. Finally, we illustrate the estimators on real income data sets.
△ Less
Submitted 25 February, 2020; v1 submitted 21 August, 2019;
originally announced August 2019.
-
Interval estimators for inequality measures using grouped data
Authors:
Dilanka S. Dedduwakumara,
Luke A. Prendergast
Abstract:
Income inequality measures are often used as an indication of economic health. How to obtain reliable confidence intervals for these measures based on sampled data has been studied extensively in recent years. To preserve confidentiality, income data is often made available in summary form only (i.e. histograms, frequencies between quintiles, etc.). In this paper, we show that good coverage can be…
▽ More
Income inequality measures are often used as an indication of economic health. How to obtain reliable confidence intervals for these measures based on sampled data has been studied extensively in recent years. To preserve confidentiality, income data is often made available in summary form only (i.e. histograms, frequencies between quintiles, etc.). In this paper, we show that good coverage can be achieved for bootstrap and Wald-type intervals for quantile-based measures when only grouped (binned) data are available. These coverages are typically superior to those that we have been able to achieve for intervals for popular measures such as the Gini index in this grouped data setting. To facilitate the bootstrapping, we use the Generalized Lambda Distribution and also a linear interpolation approximation method to approximate the underlying density. The latter is possible when groups means are available. We also apply our methods to real data sets.
△ Less
Submitted 18 July, 2019; v1 submitted 17 July, 2019;
originally announced July 2019.
-
An efficient estimator of the parameters of the Generalized Lambda Distribution
Authors:
Dilanka S. Dedduwakumara,
Luke A. Prendergast,
Robert G. Staudte
Abstract:
Estimation of the four generalized lambda distribution parameters is not straightforward, and available estimators that perform best have large computation times. In this paper, we introduce a simple two-step estimator of the parameters that is comparatively very quick to compute and performs well when compared with other methods. This computational efficiency makes the use of bootstrapping to obt…
▽ More
Estimation of the four generalized lambda distribution parameters is not straightforward, and available estimators that perform best have large computation times. In this paper, we introduce a simple two-step estimator of the parameters that is comparatively very quick to compute and performs well when compared with other methods. This computational efficiency makes the use of bootstrapping to obtain interval estimators for the parameters possible. Simulations are used to assess the performance of the new estimators and applications to several data sets are included.
△ Less
Submitted 25 February, 2020; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Decomposing the Quantile Ratio Index with applications to Australian income and wealth data
Authors:
Luke A. Prendergast,
Robert G. Staudte
Abstract:
The quantile ratio index introduced by Prendergast and Staudte 2017 is a simple and effective measure of relative inequality for income data that is resistant to outliers. It measures the average relative distance of a randomly chosen income from its symmetric quantile. Another useful property of this index is investigated here: given a partition of the income distribution into a union of sets of…
▽ More
The quantile ratio index introduced by Prendergast and Staudte 2017 is a simple and effective measure of relative inequality for income data that is resistant to outliers. It measures the average relative distance of a randomly chosen income from its symmetric quantile. Another useful property of this index is investigated here: given a partition of the income distribution into a union of sets of symmetric quantiles, one can find the conditional inequality for each set as measured by the quantile ratio index and readily combine them in a weighted average to obtain the index for the entire population. When applied to data for various years, one can track how these contributions to inequality vary over time, as illustrated here for Australian Bureau of Statistics income and wealth data.
△ Less
Submitted 29 December, 2017;
originally announced December 2017.
-
Confidence Intervals for Quantiles from Histograms and Other Grouped Data
Authors:
Dilanka S. Dedduwakumara,
Luke A. Prendergast
Abstract:
Interval estimation of quantiles has been treated by many in the literature. However, to the best of our knowledge there has been no consideration for interval estimation when the data are available in grouped format. Motivated by this, we introduce several methods to obtain confidence intervals for quantiles when only grouped data is available. Our preferred method for interval estimation is to a…
▽ More
Interval estimation of quantiles has been treated by many in the literature. However, to the best of our knowledge there has been no consideration for interval estimation when the data are available in grouped format. Motivated by this, we introduce several methods to obtain confidence intervals for quantiles when only grouped data is available. Our preferred method for interval estimation is to approximate the underlying density using the Generalized Lambda Distribution (GLD) to both estimate the quantiles and variance of the quantile estimators. We compare the GLD method with some other methods that we also introduce which are based on a frequency approximation approach and a linear interpolation approximation of the density. Our methods are strongly supported by simulations showing that excellent coverage can be achieved for a wide number of distributions. These distributions include highly-skewed distributions such as the log-normal, Dagum and Singh-Maddala distributions. We also apply our methods to real data and show that inference can be carried out on published outcomes that have been summarized only by a histogram. Our methods are therefore useful for a broad range of applications. We have also created a web application that can be used to conveniently calculate the estimators.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
A Simple and Effective Inequality Measure
Authors:
Luke A. Prendergast,
Robert G. Staudte
Abstract:
Ratios of quantiles are often computed for income distributions as rough measures of inequality, and inference for such ratios have recently become available. The special case when the quantiles are symmetrically chosen; that is, when the p/2 quantile is divided by the (1-p/2), is of special interest because the graph of such ratios, plotted as a function of p over the unit interval, yields an inf…
▽ More
Ratios of quantiles are often computed for income distributions as rough measures of inequality, and inference for such ratios have recently become available. The special case when the quantiles are symmetrically chosen; that is, when the p/2 quantile is divided by the (1-p/2), is of special interest because the graph of such ratios, plotted as a function of p over the unit interval, yields an informative inequality curve. The area above the curve and less than the horizontal line at one is an easily interpretable coefficient of inequality. The advantages of these concepts over the traditional Lorenz curve and Gini coefficient are numerous: they are defined for all positive income distributions, they can be robustly estimated and distribution-free confidence intervals for the inequality coefficient are easily found. Moreover the inequality curves satisfy a median-based transference principle and are convex for many commonly assumed income distributions.
△ Less
Submitted 10 March, 2016;
originally announced March 2016.
-
When large n is not enough---Distribution-free Interval Estimators for Ratios of Quantiles
Authors:
Luke A. Prendergast,
Robert G. Staudte
Abstract:
Ratios of sample percentiles or of quantiles based on a single sample are often published for skewed income data to illustrate aspects of income inequality, but distribution-free confidence intervals for such ratios are to our knowledge not in the literature. Here we derive and compare two large-sample methods for obtaining such intervals. They both require good distribution-free estimates of the…
▽ More
Ratios of sample percentiles or of quantiles based on a single sample are often published for skewed income data to illustrate aspects of income inequality, but distribution-free confidence intervals for such ratios are to our knowledge not in the literature. Here we derive and compare two large-sample methods for obtaining such intervals. They both require good distribution-free estimates of the quantile density at the quantiles of interest, and such estimates have recently become available. Simulation studies for various sample sizes are carried out for Pareto, lognormal and exponential distributions, as well as fitted generalized lambda distributions, to determine the coverage probabilities and widths of the intervals. Robustness of the estimators to contamination or a positive proportion of zero incomes is examined via influence functions. The motivating example is Australian household income data where ratios of quantiles measure inequality, but of course these results apply equally to data from other countries.
△ Less
Submitted 13 September, 2015; v1 submitted 25 August, 2015;
originally announced August 2015.
-
Exploiting the Quantile Optimality Ratio to Obtain Better Confidence Intervals for Quantiles
Authors:
Luke A. Prendergast,
Robert G. Staudte
Abstract:
A standard approach to confidence intervals for quantiles requires good estimates of the quantile density. The optimal bandwidth for kernel estimation of the quantile density depends on an underlying location-scale family only through the quantile optimality ratio (QOR), which is the starting point for our results. While the QOR is not distribution-free, it turns out that what is optimal for one f…
▽ More
A standard approach to confidence intervals for quantiles requires good estimates of the quantile density. The optimal bandwidth for kernel estimation of the quantile density depends on an underlying location-scale family only through the quantile optimality ratio (QOR), which is the starting point for our results. While the QOR is not distribution-free, it turns out that what is optimal for one family often works quite well for families having similar shape. This allows one to rely on a single representative QOR if one has a rough idea of the distributional shape. Another option that we explore assumes the data can be modeled by the highly flexible generalized lambda distribution (GLD), already studied by others, and we show that using the QOR for the estimated GLD can lead to more than competitive intervals. Confidence intervals for the difference between quantiles from independent populations are also considered, with an application to heart rate data.
△ Less
Submitted 2 July, 2015; v1 submitted 15 May, 2015;
originally announced May 2015.
-
Meta-analysis of ratios of sample variances
Authors:
Luke A. Prendergast,
Robert G. Staudte
Abstract:
When conducting a meta-analysis of standardized mean differences (SMDs), it is common to assume equal variances in the two arms of each study. This leads to Cohen's $d$ estimates for which interpretation is simple. However, this simplicity should not be used as a justification for the assumption of equal variances in situations where evidence may suggest that it is incorrect. Until now, researcher…
▽ More
When conducting a meta-analysis of standardized mean differences (SMDs), it is common to assume equal variances in the two arms of each study. This leads to Cohen's $d$ estimates for which interpretation is simple. However, this simplicity should not be used as a justification for the assumption of equal variances in situations where evidence may suggest that it is incorrect. Until now, researchers have either used an $F$-test for each individual study as a justification for the equality of variances or perhaps even conveniently ignored such tools altogether. In this paper we propose using a meta-analysis of F-test statistics to estimate the ratio of variances prior to the combination of SMD's. This procedure allows some studies to be included that might otherwise be omitted by individual fixed level tests for unequal variances, sometimes occur even when the assumption of equal variances holds. The estimated ratio of variances, as well as associated confidence intervals, can be used as guidance as to whether the assumption of equal variances is violated. The estimators considered include variance stabilization transformations (VST) of the $F$-test statistics as well as MLE estimators. The VST approaches enable the use of QQ-plots to visually inspect for violations of equal variances while the MLE estimator easily allows for the introduction of a random effect. When there is evidence of unequal variances, this work provides a means to formally justify the use of less common methods such as log ratio of means when studies are measured on a different scale.
△ Less
Submitted 27 June, 2015; v1 submitted 10 December, 2014;
originally announced December 2014.
-
Simple response and predictor transformations to adjust for symmetric dependency in dimension reduction for visualization
Authors:
Luke A. Prendergast,
Alexandra L. Garnham
Abstract:
In the regression setting, dimension reduction allows for complicated regression structures to be detected via visualization in a low-dimension framework. However, some popular dimension reduction methodologies fail to achieve this aim when faced with a problem often referred to as symmetric dependency. In this paper we show how vastly superior results can be achieved when carrying out response an…
▽ More
In the regression setting, dimension reduction allows for complicated regression structures to be detected via visualization in a low-dimension framework. However, some popular dimension reduction methodologies fail to achieve this aim when faced with a problem often referred to as symmetric dependency. In this paper we show how vastly superior results can be achieved when carrying out response and predictor transformations for methods such as least squares and Sliced Inverse Regression. These transformations are simple to implement and utilize estimates from other dimension reduction methods that are not faced with the symmetric dependency problem. We highlight the effectiveness of our approach via simulation and an example. Furthermore, we show that ordinary least squares can effectively detect multiple dimension reduction directions. Methods robust to extreme response values are also considered.
△ Less
Submitted 24 March, 2014;
originally announced March 2014.
-
Sensitivity of principal Hessian direction analysis
Authors:
Luke A. Prendergast,
Jodie A. Smith
Abstract:
We provide sensitivity comparisons for two competing versions of the dimension reduction method principal Hessian directions (pHd). These comparisons consider the effects of small perturbations on the estimation of the dimension reduction subspace via the influence function. We show that the two versions of pHd can behave completely differently in the presence of certain observational types. Our…
▽ More
We provide sensitivity comparisons for two competing versions of the dimension reduction method principal Hessian directions (pHd). These comparisons consider the effects of small perturbations on the estimation of the dimension reduction subspace via the influence function. We show that the two versions of pHd can behave completely differently in the presence of certain observational types. Our results also provide evidence that outliers in the traditional sense may or may not be highly influential in practice. Since influential observations may lurk within otherwise typical data, we consider the influence function in the empirical setting for the efficient detection of influential observations in practice.
△ Less
Submitted 11 June, 2007;
originally announced June 2007.