Search | arXiv e-print repository

Grouping predictors via network-wide metrics

Authors: Brandon Woosuk Park, Anand N. Vidyashankar, Tucker S. McElroy

Abstract: When multitudes of features can plausibly be associated with a response, both privacy considerations and model parsimony suggest grouping them to increase the predictive power of a regression model. Specifically, the identification of groups of predictors significantly associated with the response variable eases further downstream analysis and decision-making. This paper proposes a new data analys… ▽ More When multitudes of features can plausibly be associated with a response, both privacy considerations and model parsimony suggest grouping them to increase the predictive power of a regression model. Specifically, the identification of groups of predictors significantly associated with the response variable eases further downstream analysis and decision-making. This paper proposes a new data analysis methodology that utilizes the high-dimensional predictor space to construct an implicit network with weighted edges %and weights on the edges to identify significant associations between the response and the predictors. Using a population model for groups of predictors defined via network-wide metrics, a new supervised grouping algorithm is proposed to determine the correct group, with probability tending to one as the sample size diverges to infinity. For this reason, we establish several theoretical properties of the estimates of network-wide metrics. A novel model-assisted bootstrap procedure that substantially decreases computational complexity is developed, facilitating the assessment of uncertainty in the estimates of network-wide metrics. The proposed methods account for several challenges that arise in the high-dimensional data setting, including (i) a large number of predictors, (ii) uncertainty regarding the true statistical model, and (iii) model selection variability. The performance of the proposed methods is demonstrated through numerical experiments, data from sports analytics, and breast cancer data. △ Less

Submitted 4 May, 2024; originally announced May 2024.

MSC Class: 62G05 62J05 62L12 60F05

arXiv:2212.13686 [pdf, other]

Statistical inference for high-dimensional spectral density matrix

Authors: Jinyuan Chang, Qing Jiang, Tucker S. McElroy, Xiaofeng Shao

Abstract: The spectral density matrix is a fundamental object of interest in time series analysis, and it encodes both contemporary and dynamic linear relationships between component processes of the multivariate system. In this paper we develop novel inference procedures for the spectral density matrix in the high-dimensional setting. Specifically, we introduce a new global testing procedure to test the nu… ▽ More The spectral density matrix is a fundamental object of interest in time series analysis, and it encodes both contemporary and dynamic linear relationships between component processes of the multivariate system. In this paper we develop novel inference procedures for the spectral density matrix in the high-dimensional setting. Specifically, we introduce a new global testing procedure to test the nullity of the cross-spectral density for a given set of frequencies and across pairs of component indices. For the first time, both Gaussian approximation and parametric bootstrap methodologies are employed to conduct inference for a high-dimensional parameter formulated in the frequency domain, and new technical tools are developed to provide asymptotic guarantees of the size accuracy and power for global testing. We further propose a multiple testing procedure for simultaneously testing the nullity of the cross-spectral density at a given set of frequencies. The method is shown to control the false discovery rate. Both numerical simulations and a real data illustration demonstrate the usefulness of the proposed testing methods. △ Less

Submitted 2 February, 2025; v1 submitted 27 December, 2022; originally announced December 2022.

arXiv:1406.4584 [pdf, other]

Estimation of Causal Invertible VARMA Models

Authors: Anindya Roy, Tucker S. McElroy, Peter Linton

Abstract: We present a re-parameterization of vector autoregressive moving average (VARMA) models that allows estimation of parameters under the constraints of causality and invertibility. The parameter constraints associated with a causal invertible VARMA model are highly complex. Currently there are no procedures that can maintain the constraints in the estimated VARMA process, except in the special case… ▽ More We present a re-parameterization of vector autoregressive moving average (VARMA) models that allows estimation of parameters under the constraints of causality and invertibility. The parameter constraints associated with a causal invertible VARMA model are highly complex. Currently there are no procedures that can maintain the constraints in the estimated VARMA process, except in the special case of a vector autoregression (VAR), where some moment based causal estimators are available. Even in the VAR case, the available likelihood based estimators are not causal. The maximum likelihood estimator based on the full likelihood that does not condition on the initial observations by definition satisfies the causal invertible constraints but optimization of the likelihood under the complex constraints is an intractable problem. The commonly used Bayesian procedure for VAR often has posterior mass outside the causal set because the priors are not constrained to the causal set of parameters. We provide an exact mathematical solution to this problem. An $m$-variate VARMA$(p, q)$ process contains $(p+ q) m^2 + \binom{m+1}{2}$ parameters, which must be constrained to a subset of Euclidean space in order to guarantee causality and invertibility. This space is implicitly described in this paper, through the device of parameterizing the entire space of block Toeplitz matrices in terms of positive definite matrices and orthogonal matrices. The parameterization has connection to Schur- stability of polynomials and the associated Stein transformation that are often used in dynamical systems literature. As an important by-product of our investigation, we generalize a classical result in dynamical systems to provide a characterization of Schur stable matrix polynomials. △ Less

Submitted 17 June, 2014; originally announced June 2014.

MSC Class: 62M10; 62F30; 62H12; 91B84 ACM Class: G.3

arXiv:1112.1977 [pdf, ps, other]

doi 10.1214/13-AOS1180

Asymptotic theory of cepstral random fields

Authors: Tucker S. McElroy, Scott H. Holan

Abstract: Random fields play a central role in the analysis of spatially correlated data and, as a result, have a significant impact on a broad array of scientific applications. This paper studies the cepstral random field model, providing recursive formulas that connect the spatial cepstral coefficients to an equivalent moving-average random field, which facilitates easy computation of the autocovariance m… ▽ More Random fields play a central role in the analysis of spatially correlated data and, as a result, have a significant impact on a broad array of scientific applications. This paper studies the cepstral random field model, providing recursive formulas that connect the spatial cepstral coefficients to an equivalent moving-average random field, which facilitates easy computation of the autocovariance matrix. We also provide a comprehensive treatment of the asymptotic theory for two-dimensional random field models: we establish asymptotic results for Bayesian, maximum likelihood and quasi-maximum likelihood estimation of random field parameters and regression parameters. The theoretical results are presented generally and are of independent interest, pertaining to a wide class of random field models. The results for the cepstral model facilitate model-building: because the cepstral coefficients are unconstrained in practice, numerical optimization is greatly simplified, and we are always guaranteed a positive definite covariance matrix. We show that inference for individual coefficients is possible, and one can refine models in a disciplined manner. Our results are illustrated through simulation and the analysis of straw yield data in an agricultural field experiment. △ Less

Submitted 16 January, 2014; v1 submitted 8 December, 2011; originally announced December 2011.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1180 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1180

Journal ref: Annals of Statistics 2014, Vol. 42, No. 1, 64-86

Showing 1–4 of 4 results for author: McElroy, T S