-
Scaling limits for sample autocovariance operators of Hilbert space-valued linear processes
Authors:
Marie-Christine Düker,
Pavlos Zoubouloglou
Abstract:
This article considers linear processes with values in a separable Hilbert space exhibiting long-range dependence. The scaling limits for the sample autocovariance operators at different time lags are investigated in the topology of their respective Hilbert spaces. Distinguishing two different regimes of long-range dependence, the limiting object is either a Hilbert space-valued Gaussian or a Hilb…
▽ More
This article considers linear processes with values in a separable Hilbert space exhibiting long-range dependence. The scaling limits for the sample autocovariance operators at different time lags are investigated in the topology of their respective Hilbert spaces. Distinguishing two different regimes of long-range dependence, the limiting object is either a Hilbert space-valued Gaussian or a Hilbert space-valued non-Gaussian random variable. The latter can be represented as a unitary transformation of double Wiener-Itô integrals with sample paths in a function space. This work is the first to show weak convergence to such double stochastic integrals with sample paths in infinite dimensions. The result generalizes the well known convergence to a Hermite process in finite dimensions, introducing a new domain of attraction for probability measures in Hilbert spaces. The key technical contributions include the introduction of double Wiener-Itô integrals with values in a function space and with dependent integrators, as well as establishing sufficient conditions for their existence as limits of sample autocovariance operators.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Kernel Estimation for Nonlinear Dynamics
Authors:
Marie-Christine Düker,
Adam Waterbury
Abstract:
Many scientific problems involve data exhibiting both temporal and cross-sectional dependencies. While linear dependencies have been extensively studied, the theoretical analysis of regression estimators under nonlinear dependencies remains scarce. This work studies a kernel-based estimation procedure for nonlinear dynamics within the reproducing kernel Hilbert space framework, focusing on nonline…
▽ More
Many scientific problems involve data exhibiting both temporal and cross-sectional dependencies. While linear dependencies have been extensively studied, the theoretical analysis of regression estimators under nonlinear dependencies remains scarce. This work studies a kernel-based estimation procedure for nonlinear dynamics within the reproducing kernel Hilbert space framework, focusing on nonlinear vector autoregressive models. We derive nonasymptotic probabilistic bounds on the deviation between a regularized kernel estimator and the nonlinear regression function. A key technical contribution is a concentration bound for quadratic forms of stochastic matrices in the presence of dependent data, which is of independent interest. Additionally, we characterize conditions on multivariate kernels that guarantee optimal convergence rates.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Prior distributions for structured semi-orthogonal matrices
Authors:
Michael Jauch,
Marie-Christine Düker,
Peter Hoff
Abstract:
Statistical models for multivariate data often include a semi-orthogonal matrix parameter. In many applications, there is reason to expect that the semi-orthogonal matrix parameter satisfies a structural assumption such as sparsity or smoothness. From a Bayesian perspective, these structural assumptions should be incorporated into an analysis through the prior distribution. In this work, we introd…
▽ More
Statistical models for multivariate data often include a semi-orthogonal matrix parameter. In many applications, there is reason to expect that the semi-orthogonal matrix parameter satisfies a structural assumption such as sparsity or smoothness. From a Bayesian perspective, these structural assumptions should be incorporated into an analysis through the prior distribution. In this work, we introduce a general approach to constructing prior distributions for structured semi-orthogonal matrices that leads to tractable posterior inference via parameter-expanded Markov chain Monte Carlo. We draw upon recent results from random matrix theory to establish a theoretical basis for the proposed approach. We then introduce specific prior distributions for incorporating sparsity or smoothness and illustrate their use through applications to biological and oceanographic data.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Vector AutoRegressive Moving Average Models: A Review
Authors:
Marie-Christine Düker,
David S. Matteson,
Ruey S. Tsay,
Ines Wilms
Abstract:
Vector AutoRegressive Moving Average (VARMA) models form a powerful and general model class for analyzing dynamics among multiple time series. While VARMA models encompass the Vector AutoRegressive (VAR) models, their popularity in empirical applications is dominated by the latter. Can this phenomenon be explained fully by the simplicity of VAR models? Perhaps many users of VAR models have not ful…
▽ More
Vector AutoRegressive Moving Average (VARMA) models form a powerful and general model class for analyzing dynamics among multiple time series. While VARMA models encompass the Vector AutoRegressive (VAR) models, their popularity in empirical applications is dominated by the latter. Can this phenomenon be explained fully by the simplicity of VAR models? Perhaps many users of VAR models have not fully appreciated what VARMA models can provide. The goal of this review is to provide a comprehensive resource for researchers and practitioners seeking insights into the advantages and capabilities of VARMA models. We start by reviewing the identification challenges inherent to VARMA models thereby encompassing classical and modern identification schemes and we continue along the same lines regarding estimation, specification and diagnosis of VARMA models. We then highlight the practical utility of VARMA models in terms of Granger Causality analysis, forecasting and structural analysis as well as recent advances and extensions of VARMA models to further facilitate their adoption in practice. Finally, we discuss some interesting future research directions where VARMA models can fulfill their potentials in applications as compared to their subclass of VAR models.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Breuer-Major Theorems for Hilbert Space-Valued Random Variables
Authors:
Marie-Christine Düker,
Pavlos Zoubouloglou
Abstract:
Let $\{X_k\}_{k \in \mathbb{Z}}$ be a stationary Gaussian process with values in a separable Hilbert space $\mathcal{H}_1$, and let $G:\mathcal{H}_1 \to \mathcal{H}_2$ be an operator acting on $X_k$. Under suitable conditions on the operator $G$ and the temporal and cross-sectional correlations of $\{X_k\}_{k \in \mathbb{Z}}$, we derive a central limit theorem (CLT) for the normalized partial sums…
▽ More
Let $\{X_k\}_{k \in \mathbb{Z}}$ be a stationary Gaussian process with values in a separable Hilbert space $\mathcal{H}_1$, and let $G:\mathcal{H}_1 \to \mathcal{H}_2$ be an operator acting on $X_k$. Under suitable conditions on the operator $G$ and the temporal and cross-sectional correlations of $\{X_k\}_{k \in \mathbb{Z}}$, we derive a central limit theorem (CLT) for the normalized partial sums of $\{G[X_k]\}_{k \in \mathbb{Z}}$. To prove a CLT for the Hilbert space-valued process $\{G[X_k]\}_{k \in \mathbb{Z}}$, we employ techniques from the recently developed infinite dimensional Malliavin-Stein framework. In addition, we provide quantitative and continuous time versions of the derived CLT. In a series of examples, we recover and strengthen limit theorems for a wide array of statistics relevant in functional data analysis, and present a novel limit theorem in the framework of neural operators as an application of our result.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Testing for common structures in high-dimensional factor models
Authors:
Marie-Christine Düker,
Vladas Pipiras
Abstract:
This work proposes a novel procedure to test for common structures across two high-dimensional factor models. The introduced test allows to uncover whether two factor models are driven by the same loading matrix up to some linear transformation. The test can be used to discover inter individual relationships between two data sets. In addition, it can be applied to test for structural changes over…
▽ More
This work proposes a novel procedure to test for common structures across two high-dimensional factor models. The introduced test allows to uncover whether two factor models are driven by the same loading matrix up to some linear transformation. The test can be used to discover inter individual relationships between two data sets. In addition, it can be applied to test for structural changes over time in the loading matrix of an individual factor model. The test aims to reduce the set of possible alternatives in a classical change-point setting. The theoretical results establish the asymptotic behavior of the introduced test statistic. The theory is supported by a simulation study showing promising results in empirical test size and power. A data application investigates changes in the loadings when modeling the celebrated US macroeconomic data set of Stock and Watson.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Latent Gaussian dynamic factor modeling and forecasting for multivariate count time series
Authors:
Younghoon Kim,
Marie-Christine Düker,
Zachary F. Fisher,
Vladas Pipiras
Abstract:
This work considers estimation and forecasting in a multivariate, possibly high-dimensional count time series model constructed from a transformation of a latent Gaussian dynamic factor series. The estimation of the latent model parameters is based on second-order properties of the count and underlying Gaussian time series, yielding estimators of the underlying covariance matrices for which standa…
▽ More
This work considers estimation and forecasting in a multivariate, possibly high-dimensional count time series model constructed from a transformation of a latent Gaussian dynamic factor series. The estimation of the latent model parameters is based on second-order properties of the count and underlying Gaussian time series, yielding estimators of the underlying covariance matrices for which standard principal component analysis applies. Theoretical consistency results are established for the proposed estimation, building on certain concentration results for the models of the type considered. They also involve the memory of the latent Gaussian process, quantified through a spectral gap, shown to be suitably bounded as the model dimension increases, which is of independent interest. In addition, novel cross-validation schemes are suggested for model selection. The forecasting is carried out through a particle-based sequential Monte Carlo, leveraging Kalman filtering techniques. A simulation study and an application are also considered.
△ Less
Submitted 3 April, 2025; v1 submitted 19 July, 2023;
originally announced July 2023.
-
High-dimensional latent Gaussian count time series: Concentration results for autocovariances and applications
Authors:
Marie-Christine Düker,
Robert Lund,
Vladas Pipiras
Abstract:
This work considers stationary vector count time series models defined via deterministic functions of a latent stationary vector Gaussian series. The construction is very general and ensures a pre-specified marginal distribution for the counts in each dimension, depending on unknown parameters that can be marginally estimated. The vector Gaussian series injects flexibility into the model's tempora…
▽ More
This work considers stationary vector count time series models defined via deterministic functions of a latent stationary vector Gaussian series. The construction is very general and ensures a pre-specified marginal distribution for the counts in each dimension, depending on unknown parameters that can be marginally estimated. The vector Gaussian series injects flexibility into the model's temporal and cross-dimensional dependencies, perhaps through a parametric model akin to a vector autoregression. We show that the latent Gaussian model can be estimated by relating the covariances of the counts and the latent Gaussian series. In a possibly high-dimensional setting, concentration bounds are established for the differences between the estimated and true latent Gaussian autocovariances, in terms of those for the observed count series and the estimated marginal parameters. The results are applied to the case where the latent Gaussian series is a vector autoregression, and its parameters are estimated sparsely through a LASSO-type procedure.
△ Less
Submitted 29 October, 2023; v1 submitted 1 January, 2023;
originally announced January 2023.
-
Higher-order approximation for uncertainty quantification in time series analysis
Authors:
Annika Betken,
Marie-Christine Düker
Abstract:
For time series with high temporal correlation, the empirical process converges rather slowly to its limiting distribution. Many statistics in change-point analysis, goodness-of-fit testing and uncertainty quantification admit a representation as functionals of the empirical process and therefore inherit its slow convergence. As a result, inference based on the asymptotic distribution of those qua…
▽ More
For time series with high temporal correlation, the empirical process converges rather slowly to its limiting distribution. Many statistics in change-point analysis, goodness-of-fit testing and uncertainty quantification admit a representation as functionals of the empirical process and therefore inherit its slow convergence. As a result, inference based on the asymptotic distribution of those quantities is significantly affected by relatively small sample sizes. We assess the quality of higher-order approximations of the empirical process by deriving the asymptotic distribution of the corresponding error terms. Based on the limiting distribution of the higher-order terms, we propose a novel approach to calculate confidence intervals for statistical quantities such as the median. In a simulation study, we compare coverage rates and lengths of these confidence intervals with those based on the asymptotic distribution of the empirical process and highlight some benefits of higher-order approximations of the empirical process.
△ Less
Submitted 22 May, 2025; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Local Whittle estimation of high-dimensional long-run variance and precision matrices
Authors:
Changryong Baek,
Marie-Christine Düker,
Vladas Pipiras
Abstract:
This work develops non-asymptotic theory for estimation of the long-run variance matrix and its inverse, the so-called precision matrix, for high-dimensional time series under general assumptions on the dependence structure including long-range dependence. The estimation involves shrinkage techniques which are thresholding and penalizing versions of the classical multivariate local Whittle estimat…
▽ More
This work develops non-asymptotic theory for estimation of the long-run variance matrix and its inverse, the so-called precision matrix, for high-dimensional time series under general assumptions on the dependence structure including long-range dependence. The estimation involves shrinkage techniques which are thresholding and penalizing versions of the classical multivariate local Whittle estimator. The results ensure consistent estimation in a double asymptotic regime where the number of component time series is allowed to grow with the sample size as long as the true model parameters are sparse. The key technical result is a concentration inequality of the local Whittle estimator for the long-run variance matrix around the true model parameters. In particular, it handles simultaneously the estimation of the memory parameters which enter the underlying model. Novel algorithms for the considered procedures are proposed, and a simulation study and a data application are also provided.
△ Less
Submitted 28 December, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Testing Simultaneous Diagonalizability
Authors:
Yuchen Xu,
Marie-Christine Düker,
David S. Matteson
Abstract:
This paper proposes novel methods to test for simultaneous diagonalization of possibly asymmetric matrices. Motivated by various applications, a two-sample test as well as a generalization for multiple matrices are proposed. A partial version of the test is also studied to check whether a partial set of eigenvectors is shared across samples. Additionally, a novel algorithm for the considered testi…
▽ More
This paper proposes novel methods to test for simultaneous diagonalization of possibly asymmetric matrices. Motivated by various applications, a two-sample test as well as a generalization for multiple matrices are proposed. A partial version of the test is also studied to check whether a partial set of eigenvectors is shared across samples. Additionally, a novel algorithm for the considered testing methods is introduced. Simulation studies demonstrate favorable performance for all designs. Finally, the theoretical results are utilized to decouple vector autoregression models into multiple univariate time series, and to test for the same stationary distribution in recurrent Markov chains. These applications are demonstrated using macroeconomic indices of 8 countries and streamflow data, respectively.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Clustering Future Scenarios Based on Predicted Range Maps
Authors:
Matthew Davidow,
Cory Merow,
Judy Che-Castaldo,
Toryn Schafer,
Marie-Christine Duker,
Derek Corcoran,
David Matteson
Abstract:
Predictions of biodiversity trajectories under climate change are crucial in order to act effectively in maintaining the diversity of species. In many ecological applications, future predictions are made under various global warming scenarios as described by a range of different climate models. The outputs of these various predictions call for a reliable interpretation. We propose a interpretable…
▽ More
Predictions of biodiversity trajectories under climate change are crucial in order to act effectively in maintaining the diversity of species. In many ecological applications, future predictions are made under various global warming scenarios as described by a range of different climate models. The outputs of these various predictions call for a reliable interpretation. We propose a interpretable and flexible two step methodology to measure the similarity between predicted species range maps and cluster the future scenario predictions utilizing a spectral clustering technique. We find that clustering based on ecological impact (predicted species range maps) is mainly driven by the amount of warming. We contrast this with clustering based only on predicted climate features, which is driven mainly by climate models. The differences between these clusterings illustrate that it is crucial to incorporate ecological information to understand the relevant differences between climate models. The findings of this work can be used to better synthesize forecasts of biodiversity loss under the wide spectrum of results that emerge when considering potential future biodiversity loss.
△ Less
Submitted 17 July, 2022; v1 submitted 18 January, 2021;
originally announced January 2021.
-
Limit theorems for multivariate long-range dependent processes
Authors:
Marie-Christine Düker
Abstract:
This article considers multivariate linear processes whose components are either short- or long-range dependent. The functional central limit theorems for the sample mean and the sample autocovariances for these processes are investigated, paying special attention to the mixed cases of short- and long-range dependent series. The resulting limit processes can involve multivariate Brownian motion ma…
▽ More
This article considers multivariate linear processes whose components are either short- or long-range dependent. The functional central limit theorems for the sample mean and the sample autocovariances for these processes are investigated, paying special attention to the mixed cases of short- and long-range dependent series. The resulting limit processes can involve multivariate Brownian motion marginals, operator fractional Brownian motions and matrix-valued versions of the so-called Rosenblatt process.
△ Less
Submitted 11 February, 2020; v1 submitted 27 April, 2017;
originally announced April 2017.
-
Limit theorems for Hilbert space-valued linear processes under long range dependence
Authors:
Marie-Christine Düker
Abstract:
Let $(X_{k})_{k \in \mathbb Z }$ be a linear process with values in a separable Hilbert space $\mathbb{H}$ given by $X_{k} =\sum_{j=0}^{\infty} (j+1)^{-N}\varepsilon_{k-j}$ for each $k \in \mathbb Z$, where $N:\mathbb{H} \to \mathbb{H}$ is a bounded, linear normal operator and $(\varepsilon_{k})_{ k \in \mathbb Z }$ is a sequence of independent, identically distributed $\mathbb{H}$-valued random v…
▽ More
Let $(X_{k})_{k \in \mathbb Z }$ be a linear process with values in a separable Hilbert space $\mathbb{H}$ given by $X_{k} =\sum_{j=0}^{\infty} (j+1)^{-N}\varepsilon_{k-j}$ for each $k \in \mathbb Z$, where $N:\mathbb{H} \to \mathbb{H}$ is a bounded, linear normal operator and $(\varepsilon_{k})_{ k \in \mathbb Z }$ is a sequence of independent, identically distributed $\mathbb{H}$-valued random variables with $E\varepsilon_{0}=0$ and $E\| \varepsilon_{0} \|^2<\infty$. We investigate the central and the functional central limit theorem for $(X_{k})_{k \in \mathbb Z }$ when the series of operator norms $\sum_{j=0}^{\infty} \|(j+1)^{-N}\|_{op}$ diverges. Furthermore we show that the limit process in case of the functional central limit theorem generates an operator self-similar process.
△ Less
Submitted 3 January, 2017;
originally announced January 2017.