-
Almost goodness-of-fit tests
Authors:
Amparo Baíllo,
Javier Cárcamo
Abstract:
We introduce the almost goodness-of-fit test, a procedure to decide if a (parametric) model provides a good representation of the probability distribution generating the observed sample. We consider the approximate model determined by an M-estimator of the parameters as the best representative of the unknown distribution within the parametric class. The objective is the approximate validation of a…
▽ More
We introduce the almost goodness-of-fit test, a procedure to decide if a (parametric) model provides a good representation of the probability distribution generating the observed sample. We consider the approximate model determined by an M-estimator of the parameters as the best representative of the unknown distribution within the parametric class. The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value, the margin of error. The methodology also allows quantifying the percentage improvement of the proposed model compared to a non-informative (constant) one. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and the corresponding one of the estimated (parametric) model. The value of the parameter $p$ allows modulating the impact of the tails of the distribution in the validation of the model. By deriving the asymptotic distribution of the test statistic, as well as proving the consistency of its bootstrap approximation, we present an easy-to-implement and flexible method. The performance of the proposal is illustrated with a simulation study and the analysis of a real dataset.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Tests for almost stochastic dominance
Authors:
Amparo Baíllo,
Javier Cárcamo,
Carlos Mora-Corral
Abstract:
We introduce a 2-dimensional stochastic dominance (2DSD) index to characterize both strict and almost stochastic dominance. Based on this index, we derive an estimator for the minimum violation ratio (MVR), also known as the critical parameter, of the almost stochastic ordering condition between two variables. We determine the asymptotic properties of the empirical 2DSD index and MVR for the most…
▽ More
We introduce a 2-dimensional stochastic dominance (2DSD) index to characterize both strict and almost stochastic dominance. Based on this index, we derive an estimator for the minimum violation ratio (MVR), also known as the critical parameter, of the almost stochastic ordering condition between two variables. We determine the asymptotic properties of the empirical 2DSD index and MVR for the most frequently used stochastic orders. We also provide conditions under which the bootstrap estimators of these quantities are strongly consistent. As an application, we develop consistent bootstrap testing procedures for almost stochastic dominance. The performance of the tests is checked via simulations and the analysis of real data.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Extremal points of Lorenz curves and applications to inequality analysis
Authors:
Amparo Baíllo,
Javier Cárcamo,
Carlos Mora-Corral
Abstract:
We find the set of extremal points of Lorenz curves with fixed Gini index and compute the maximal $L^1$-distance between Lorenz curves with given values of their Gini coefficients. As an application we introduce a bidimensional index that simultaneously measures relative inequality and dissimilarity between two populations. This proposal employs the Gini indices of the variables and an $L^1$-dista…
▽ More
We find the set of extremal points of Lorenz curves with fixed Gini index and compute the maximal $L^1$-distance between Lorenz curves with given values of their Gini coefficients. As an application we introduce a bidimensional index that simultaneously measures relative inequality and dissimilarity between two populations. This proposal employs the Gini indices of the variables and an $L^1$-distance between their Lorenz curves. The index takes values in a right-angled triangle, two of whose sides characterize perfect relative inequality-expressed by the Lorenz ordering between the underlying distributions. Further, the hypotenuse represents maximal distance between the two distributions. As a consequence, we construct a chart to, graphically, either see the evolution of (relative) inequality and distance between two income distributions over time or to compare the distribution of income of a specific population between a fixed time point and a range of years. We prove the mathematical results behind the above claims and provide a full description of the asymptotic properties of the plug-in estimator of this index. Finally, we apply the proposed bidimensional index to several real EU-SILC income datasets to illustrate its performance in practice.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Homogeneity tests for Michaelis-Menten curves with application to fluorescence resonance energy transfer data
Authors:
Amparo Baíllo,
Laura Martínez-Muñoz,
Mario Mellado
Abstract:
Resonance energy transfer methods are in wide use for evaluating protein-protein interactions and protein conformational changes in living cells. Fluorescence resonance energy transfer (FRET) measures energy transfer as a function of the acceptor:donor ratio, generating FRET saturation curves. Modeling these curves by Michaelis-Menten kinetics allows characterization by two parameters, which serve…
▽ More
Resonance energy transfer methods are in wide use for evaluating protein-protein interactions and protein conformational changes in living cells. Fluorescence resonance energy transfer (FRET) measures energy transfer as a function of the acceptor:donor ratio, generating FRET saturation curves. Modeling these curves by Michaelis-Menten kinetics allows characterization by two parameters, which serve to evaluate apparent affinity between two proteins and to compare this affinity in different experimental conditions. To reduce the effect of sampling variability, several statistical samples of the saturation curve are generated in the same biological conditions. Here we study three procedures to determine whether statistical samples in a collection are homogeneous, in the sense that they are extracted from the same regression model. From the hypothesis testing viewpoint, we considered an F test and a procedure based on bootstrap resampling. The third method analyzed the problem from the model selection viewpoint, and used the Akaike information criterion (AIC). Although we only considered the Michaelis-Menten model, all statistical procedures would be applicable to any other nonlinear regression model. We compared the performance of the homogeneity testing methods in a Monte Carlo study and through analysis in living cells of FRET saturation curves for dimeric complexes of CXCR4, a seven-transmembrane receptor of the G protein-coupled receptor family. We show that the F test, the bootstrap procedure and the model selection method lead in general to similar conclusions, although AIC gave the best results when sample sizes were small, whereas the F test and the bootstrap method were more appropriate for large samples. In practice, all three methods are easy to use simultaneously and show consistency, facilitating conclusions on sample homogeneity.
△ Less
Submitted 4 May, 2011;
originally announced May 2011.
-
Supervised classification for a family of Gaussian functional models
Authors:
Amparo Baíllo,
Juan Antonio Cuesta-Albertos,
Antonio Cuevas
Abstract:
In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit knowledge has two practical consequences. First, the consistency of the well-known nearest neighbors classifier (which is not guaranteed in the problems…
▽ More
In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit knowledge has two practical consequences. First, the consistency of the well-known nearest neighbors classifier (which is not guaranteed in the problems with functional data) is established for the indicated class of processes. Second, and more important, parametric and nonparametric plug-in classifiers can be obtained by estimating the unknown elements in the optimal rule. The performance of these new plug-in classifiers is checked, with positive results, through a simulation study and a real data example.
△ Less
Submitted 28 April, 2010;
originally announced April 2010.
-
Tests for zero-inflation and overdispersion
Authors:
A. Baillo,
J. Carcamo,
J. R. Berrendero
Abstract:
We propose a new methodology to detect zero-inflation and overdispersion based on the comparison of the expected sample extremes among convexly ordered distributions. The method is very flexible and includes tests for the proportion of structural zeros in zero-inflated models, tests to distinguish between two ordered parametric families and a new general test to detect overdispersion. The perfor…
▽ More
We propose a new methodology to detect zero-inflation and overdispersion based on the comparison of the expected sample extremes among convexly ordered distributions. The method is very flexible and includes tests for the proportion of structural zeros in zero-inflated models, tests to distinguish between two ordered parametric families and a new general test to detect overdispersion. The performance of the proposed tests is evaluated via some simulation studies. For the well-known fetal lamb data, we conclude that the zero-inflated Poisson model should be rejected against other more disperse models, but we cannot reject the negative binomial model.
△ Less
Submitted 24 September, 2008;
originally announced September 2008.
-
Supervised functional classification: A theoretical remark and some comparisons
Authors:
Amparo Baillo,
Antonio Cuevas
Abstract:
The problem of supervised classification (or discrimination) with functional data is considered, with a special interest on the popular k-nearest neighbors (k-NN) classifier. First, relying on a recent result by Cerou and Guyader (2006), we prove the consistency of the k-NN classifier for functional data whose distribution belongs to a broad family of Gaussian processes with triangular covarianc…
▽ More
The problem of supervised classification (or discrimination) with functional data is considered, with a special interest on the popular k-nearest neighbors (k-NN) classifier. First, relying on a recent result by Cerou and Guyader (2006), we prove the consistency of the k-NN classifier for functional data whose distribution belongs to a broad family of Gaussian processes with triangular covariance functions. Second, on a more practical side, we check the behavior of the k-NN method when compared with a few other functional classifiers. This is carried out through a small simulation study and the analysis of several real functional data sets. While no global "uniform" winner emerges from such comparisons, the overall performance of the k-NN method, together with its sound intuitive motivation and relative simplicity, suggests that it could represent a reasonable benchmark for the classification problem with functional data.
△ Less
Submitted 17 June, 2008;
originally announced June 2008.