-
Inference in Perturbation Models, Finite Mixtures and Scan Statistics: The Volume-of-Tube Formula
Authors:
Ramani S. Pilla,
Catherine Loader
Abstract:
This research creates a general class of "perturbation models" which are described by an underlying "null" model that accounts for most of the structure in data and a perturbation that accounts for possible small localized departures. The perturbation models encompass finite mixture models and spatial scan process. In this article, (1) we propose a new test statistic to detect the presence of pe…
▽ More
This research creates a general class of "perturbation models" which are described by an underlying "null" model that accounts for most of the structure in data and a perturbation that accounts for possible small localized departures. The perturbation models encompass finite mixture models and spatial scan process. In this article, (1) we propose a new test statistic to detect the presence of perturbation, including the case where the null model contains a set of nuisance parameters, and show that it is equivalent to the likelihood ratio test; (2) we establish that the asymptotic distribution of the test statistic is equivalent to the supremum of a Gaussian random field over a high-dimensional manifold (e.g., curve, surface etc.) with boundaries and singularities; (3) we derive a technique for approximating the quantiles of the test statistic using the Hotelling-Weyl-Naiman "volume-of-tube formula"; and (4) we solve the long-pending problem of testing for the order of a mixture model; in particular, derive the asymptotic null distribution for a general family of mixture models including the multivariate mixtures. The inferential theory developed in this article is applicable for a class of non-regular statistical problems involving loss of identifiability or when some of the parameters are on the boundary of the parametric space.
△ Less
Submitted 3 June, 2006; v1 submitted 20 November, 2005;
originally announced November 2005.
-
The Volume-of-Tubes Formula: Computational Methods and Statistical Applications
Authors:
Catherine Loader
Abstract:
The volume-of-tube formula was first introduced by Hotelling (1939), to solve significance of terms in nonlinear regression models. Since this pioneering paper, there has been significant work on extending the tube formula to more general settings, including multidimensional problems, and many new applications in statistical inference, including confidence bands in regression and smoothing model…
▽ More
The volume-of-tube formula was first introduced by Hotelling (1939), to solve significance of terms in nonlinear regression models. Since this pioneering paper, there has been significant work on extending the tube formula to more general settings, including multidimensional problems, and many new applications in statistical inference, including confidence bands in regression and smoothing models; applications to functional data analysis; testing in mixture models; and spatial scan analysis.
Implementation of the tube formula requires numerical evaluation of certain problem-specific geometric constants that appear in Hotelling's formula and its extensions. The purpose of this note is to describe a software library, libtube, that performs the calculations. A variety of illustrative examples are given.
Source code for the libtube library andexamples can be downloaded from http://www.herine.net/stat/libtube/.
△ Less
Submitted 20 November, 2005;
originally announced November 2005.
-
A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing Between a New Source and Random Fluctuations
Authors:
Ramani S. Pilla,
Catherine Loader,
Cyrus Taylor
Abstract:
We propose a new test statistic based on a score process for determining the statistical significance of a putative signal that may be a small perturbation to a noisy experimental background. We derive the reference distribution for this score test statistic; it has an elegant geometrical interpretation as well as broad applicability. We illustrate the technique in the context of a model problem…
▽ More
We propose a new test statistic based on a score process for determining the statistical significance of a putative signal that may be a small perturbation to a noisy experimental background. We derive the reference distribution for this score test statistic; it has an elegant geometrical interpretation as well as broad applicability. We illustrate the technique in the context of a model problem from high-energy particle physics. Monte Carlo experimental results confirm that the score test results in a significantly improved rate of signal detection.
△ Less
Submitted 29 May, 2005;
originally announced May 2005.
-
On large-sample estimation and testing via quadratic inference functions for correlated data
Authors:
Ramani S. Pilla,
Catherine Loader
Abstract:
Hansen (1982) proposed a class of "generalized method of moments" (GMMs) for estimating a vector of regression parameters from a set of score functions. Hansen established that, under certain regularity conditions, the estimator based on the GMMs is consistent, asymptotically normal and asymptotically efficient. In the generalized estimating equation framework, extending the principle of the GMM…
▽ More
Hansen (1982) proposed a class of "generalized method of moments" (GMMs) for estimating a vector of regression parameters from a set of score functions. Hansen established that, under certain regularity conditions, the estimator based on the GMMs is consistent, asymptotically normal and asymptotically efficient. In the generalized estimating equation framework, extending the principle of the GMMs to implicitly estimate the underlying correlation structure leads to a "quadratic inference function" (QIF) for the analysis of correlated data. The main objectives of this research are to (1) formulate an appropriate estimated covariance matrix for the set of extended score functions defining the inference functions; (2) develop a unified large-sample theoretical framework for the QIF; (3) derive a generalization of the QIF test statistic for a general linear hypothesis problem involving correlated data while establishing the asymptotic distribution of the test statistic under the null and local alternative hypotheses; (4) propose an iteratively reweighted generalized least squares algorithm for inference in the QIF framework; and (5) investigate the effect of basis matrices, defining the set of extended score functions, on the size and power of the QIF test through Monte Carlo simulated experiments.
△ Less
Submitted 7 January, 2006; v1 submitted 17 May, 2005;
originally announced May 2005.