-
arXiv:1612.07408 [pdf, ps, other]
Statistical Distances and Their Role in Robustness
Abstract: Statistical distances, divergences, and similar quantities have a large history and play a fundamental role in statistics, machine learning and associated scientific disciplines. However, within the statistical literature, this extensive role has too often been played out behind the scenes, with other aspects of the statistical problems being viewed as more central, more interesting, or more impor… ▽ More
Submitted 21 December, 2016; originally announced December 2016.
Comments: 23 pages
Journal ref: New Advances in Statistics and Data Science 2017, 3-26
-
Composite likelihood inference in a discrete latent variable model for two-way "clustering-by-segmentation" problems
Abstract: We consider a discrete latent variable model for two-way data arrays, which allows one to simultaneously produce clusters along one of the data dimensions (e.g. exchangeable observational units or features) and contiguous groups, or segments, along the other (e.g. consecutively ordered times or locations). The model relies on a hidden Markov structure but, given its complexity, cannot be estimated… ▽ More
Submitted 27 June, 2015; originally announced June 2015.
-
arXiv:0909.0608 [pdf, ps, other]
Building and using semiparametric tolerance regions for parametric multinomial models
Abstract: We introduce a semiparametric ``tubular neighborhood'' of a parametric model in the multinomial setting. It consists of all multinomial distributions lying in a distance-based neighborhood of the parametric model of interest. Fitting such a tubular model allows one to use a parametric model while treating it as an approximation to the true distribution. In this paper, the Kullback--Leibler dista… ▽ More
Submitted 3 September, 2009; originally announced September 2009.
Comments: Published in at http://dx.doi.org/10.1214/08-AOS603 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS603 MSC Class: 62G35 (Primary) 62G15 (Secondary)
Journal ref: Annals of Statistics 2009, Vol. 37, No. 6A, 3644-3659
-
arXiv:0804.0991 [pdf, ps, other]
Quadratic distances on probabilities: A unified foundation
Abstract: This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the kernel that generates the distance. We show how t… ▽ More
Submitted 7 April, 2008; originally announced April 2008.
Comments: Published in at http://dx.doi.org/10.1214/009053607000000956 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS0330 MSC Class: 62A01; 62E20 (Primary) 62H10 (Secondary)
Journal ref: Annals of Statistics 2008, Vol. 36, No. 2, 983-1006
-
arXiv:0708.2153 [pdf, ps, other]
Estimating the number of classes
Abstract: Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be one-sided. Confidence intervals for the number of… ▽ More
Submitted 16 August, 2007; originally announced August 2007.
Comments: Published at http://dx.doi.org/10.1214/009053606000001280 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS0180 MSC Class: 62G15; 62G15 (Primary); 62G05 (Secondary)
Journal ref: Annals of Statistics 2007, Vol. 35, No. 2, 917-930
-
arXiv:math/0611544 [pdf, ps, other]
Model selection in High-Dimensions: A Quadratic-risk based approach
Abstract: In this article we propose a general class of risk measures which can be used for data based evaluation of parametric models. The loss function is defined as generalized quadratic distance between the true density and the proposed model. These distances are characterized by a simple quadratic form structure that is adaptable through the choice of a nonnegative definite kernel and a bandwidth par… ▽ More
Submitted 1 October, 2007; v1 submitted 17 November, 2006; originally announced November 2006.
Comments: Updated with reviewer suggestions
-
arXiv:math/0606077 [pdf, ps, other]
Model Building for Semiparametric Mixtures
Abstract: An important and yet difficult problem in fitting multivariate mixture models is determining the mixture complexity. We develop theory and a unified framework for finding the nonparametric maximum likelihood estimator of a multivariate mixing distribution and consequently estimating the mixture complexity. Multivariate mixtures provide a flexible approach to fitting high-dimensional data while o… ▽ More
Submitted 3 June, 2006; originally announced June 2006.
Comments: 48 pages
MSC Class: 62G05 (Primary); 62G20; 62E20
-
arXiv:math/0602238 [pdf, ps, other]
The topography of multivariate normal mixtures
Abstract: Multivariate normal mixtures provide a flexible method of fitting high-dimensional data. It is shown that their topography, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points, as well as the ridges of the density. A plot of the elevations on the ridgeline shows the key features of the mi… ▽ More
Submitted 11 February, 2006; originally announced February 2006.
Comments: Published at http://dx.doi.org/10.1214/009053605000000417 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS0069 MSC Class: 62E10; 62H05 (Primary) 62H30 (Secondary)
Journal ref: Annals of Statistics 2005, Vol. 33, No. 5, 2042-2065