-
Weight-calibrated estimation for factor models of high-dimensional time series
Authors:
Xinghao Qiao,
Zihan Wang,
Qiwei Yao,
Bo Zhang
Abstract:
The factor modeling for high-dimensional time series is powerful in discovering latent common components for dimension reduction and information extraction. Most available estimation methods can be divided into two categories: the covariance-based under asymptotically-identifiable assumption and the autocovariance-based with white idiosyncratic noise. This paper follows the autocovariance-based fr…
▽ More
The factor modeling for high-dimensional time series is powerful in discovering latent common components for dimension reduction and information extraction. Most available estimation methods can be divided into two categories: the covariance-based under asymptotically-identifiable assumption and the autocovariance-based with white idiosyncratic noise. This paper follows the autocovariance-based framework and develops a novel weight-calibrated method to improve the estimation performance. It adopts a linear projection to tackle high-dimensionality, and employs a reduced-rank autoregression formulation. The asymptotic theory of the proposed method is established, relaxing the assumption on white noise. Additionally, we make the first attempt in the literature by providing a systematic theoretical comparison among the covariance-based, the standard autocovariance-based, and our proposed weight-calibrated autocovariance-based methods in the presence of factors with different strengths. Extensive simulations are conducted to showcase the superior finite-sample performance of our proposed method, as well as to validate the newly established theory. The superiority of our proposal is further illustrated through the analysis of one financial and one macroeconomic data sets.
△ Less
Submitted 4 May, 2025; v1 submitted 2 May, 2025;
originally announced May 2025.
-
On Robust Empirical Likelihood for Nonparametric Regression with Application to Regression Discontinuity Designs
Authors:
Qin Fang,
Shaojun Guo,
Yang Hong,
Xinghao Qiao
Abstract:
Empirical likelihood serves as a powerful tool for constructing confidence intervals in nonparametric regression and regression discontinuity designs (RDD). The original empirical likelihood framework can be naturally extended to these settings using local linear smoothers, with Wilks' theorem holding only when an undersmoothed bandwidth is selected. However, the generalization of bias-corrected v…
▽ More
Empirical likelihood serves as a powerful tool for constructing confidence intervals in nonparametric regression and regression discontinuity designs (RDD). The original empirical likelihood framework can be naturally extended to these settings using local linear smoothers, with Wilks' theorem holding only when an undersmoothed bandwidth is selected. However, the generalization of bias-corrected versions of empirical likelihood under more realistic conditions is non-trivial and has remained an open challenge in the literature. This paper provides a satisfactory solution by proposing a novel approach, referred to as robust empirical likelihood, designed for nonparametric regression and RDD. The core idea is to construct robust weights which simultaneously achieve bias correction and account for the additional variability introduced by the estimated bias, thereby enabling valid confidence interval construction without extra estimation steps involved. We demonstrate that the Wilks' phenomenon still holds under weaker conditions in nonparametric regression, sharp and fuzzy RDD settings. Extensive simulation studies confirm the effectiveness of our proposed approach, showing superior performance over existing methods in terms of coverage probabilities and interval lengths. Moreover, the proposed procedure exhibits robustness to bandwidth selection, making it a flexible and reliable tool for empirical analyses. The practical usefulness is further illustrated through applications to two real datasets.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Backward Stochastic Differential Equations-guided Generative Model for Structural-to-functional Neuroimage Translator
Authors:
Zengjing Chen,
Lu Wang,
Yongkang Lin,
Jie Peng,
Zhiping Liu,
Jie Luo,
Bao Wang,
Yingchao Liu,
Nazim Haouchine,
Xu Qiao
Abstract:
A Method for structural-to-functional neuroimage translator
A Method for structural-to-functional neuroimage translator
△ Less
Submitted 23 February, 2025;
originally announced March 2025.
-
Conformal Prediction Under Generalized Covariate Shift with Posterior Drift
Authors:
Baozhen Wang,
Xingye Qiao
Abstract:
In many real applications of statistical learning, collecting sufficiently many training data is often expensive, time-consuming, or even unrealistic. In this case, a transfer learning approach, which aims to leverage knowledge from a related source domain to improve the learning performance in the target domain, is more beneficial. There have been many transfer learning methods developed under va…
▽ More
In many real applications of statistical learning, collecting sufficiently many training data is often expensive, time-consuming, or even unrealistic. In this case, a transfer learning approach, which aims to leverage knowledge from a related source domain to improve the learning performance in the target domain, is more beneficial. There have been many transfer learning methods developed under various distributional assumptions. In this article, we study a particular type of classification problem, called conformal prediction, under a new distributional assumption for transfer learning. Classifiers under the conformal prediction framework predict a set of plausible labels instead of one single label for each data instance, affording a more cautious and safer decision. We consider a generalization of the \textit{covariate shift with posterior drift} setting for transfer learning. Under this setting, we propose a weighted conformal classifier that leverages both the source and target samples, with a coverage guarantee in the target domain. Theoretical studies demonstrate favorable asymptotic properties. Numerical studies further illustrate the usefulness of the proposed method.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Large-scale Multiple Testing of Cross-covariance Functions with Applications to Functional Network Models
Authors:
Qin Fang,
Qing Jiang,
Xinghao Qiao
Abstract:
The estimation of functional networks through functional covariance and graphical models have recently attracted increasing attention in settings with high dimensional functional data, where the number of functional variables p is comparable to, and maybe larger than, the number of subjects. However, the existing methods all depend on regularization techniques, which make it unclear how the involv…
▽ More
The estimation of functional networks through functional covariance and graphical models have recently attracted increasing attention in settings with high dimensional functional data, where the number of functional variables p is comparable to, and maybe larger than, the number of subjects. However, the existing methods all depend on regularization techniques, which make it unclear how the involved tuning parameters are related to the number of false edges. In this paper, we first reframe the functional covariance model estimation as a tuning-free problem of simultaneously testing p(p-1)/2 hypotheses for cross-covariance functions, and introduce a novel multiple testing procedure. We then explore the multiple testing procedure under a general error-contamination framework and establish that our procedure can control false discoveries asymptotically. Additionally, we demonstrate that our proposed methods for two concrete examples: the functional covariance model for discretely observed functional data and, importantly, the more challenging functional graphical model, can be seamlessly integrated into the general error-contamination framework, and, with verifiable conditions, achieve theoretical guarantees on effective false discovery control. Finally, we showcase the superiority of our proposals through extensive simulations and brain connectivity analysis of two neuroimaging datasets.
△ Less
Submitted 4 September, 2024; v1 submitted 28 July, 2024;
originally announced July 2024.
-
Functional knockoffs selection with applications to functional data analysis in high dimensions
Authors:
Xinghao Qiao,
Mingya Long,
Qizhai Li
Abstract:
The knockoffs is a recently proposed powerful framework that effectively controls the false discovery rate (FDR) for variable selection. However, none of the existing knockoff solutions are directly suited to handle multivariate or high-dimensional functional data, which has become increasingly prevalent in various scientific applications. In this paper, we propose a novel functional model-X knock…
▽ More
The knockoffs is a recently proposed powerful framework that effectively controls the false discovery rate (FDR) for variable selection. However, none of the existing knockoff solutions are directly suited to handle multivariate or high-dimensional functional data, which has become increasingly prevalent in various scientific applications. In this paper, we propose a novel functional model-X knockoffs selection framework tailored to sparse high-dimensional functional models, and show that our proposal can achieve the effective FDR control for any sample size. Furthermore, we illustrate the proposed functional model-X knockoffs selection procedure along with the associated theoretical guarantees for both FDR control and asymptotic power using examples of commonly adopted functional linear additive regression models and the functional graphical model. In the construction of functional knockoffs, we integrate essential components including the correlation operator matrix, the Karhunen-Loève expansion, and semidefinite programming, and develop executable algorithms. We demonstrate the superiority of our proposed methods over the competitors through both extensive simulations and the analysis of two brain imaging datasets.
△ Less
Submitted 27 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
On the modelling and prediction of high-dimensional functional time series
Authors:
Jinyuan Chang,
Qin Fang,
Xinghao Qiao,
Qiwei Yao
Abstract:
We propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series $p$ is large in relation to the length of time series $n$. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed…
▽ More
We propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series $p$ is large in relation to the length of time series $n$. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed curve series can be segmented into several groups such that any two subseries from any two different groups are uncorrelated both contemporaneously and serially. Consequently in our second step those groups are handled separately without the information loss on the overall linear dynamic structure. The second step is devoted to establishing a finite-dimensional dynamical structure for all the transformed functional time series within each group. Furthermore the finite-dimensional structure is represented by that of a vector time series. Modelling and forecasting for the original high-dimensional functional time series are realized via those for the vector time series in all the groups. We investigate the theoretical properties of our proposed methods, and illustrate the finite-sample performance through both extensive simulation and two real datasets.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
From sparse to dense functional data in high dimensions: Revisiting phase transitions from a non-asymptotic perspective
Authors:
Shaojun Guo,
Dong Li,
Xinghao Qiao,
Yizhu Wang
Abstract:
Nonparametric estimation of the mean and covariance functions is ubiquitous in functional data analysis and local linear smoothing techniques are most frequently used. Zhang and Wang (2016) explored different types of asymptotic properties of the estimation, which reveal interesting phase transition phenomena based on the relative order of the average sampling frequency per subject $T$ to the numb…
▽ More
Nonparametric estimation of the mean and covariance functions is ubiquitous in functional data analysis and local linear smoothing techniques are most frequently used. Zhang and Wang (2016) explored different types of asymptotic properties of the estimation, which reveal interesting phase transition phenomena based on the relative order of the average sampling frequency per subject $T$ to the number of subjects $n$, partitioning the data into three categories: "sparse", "semi-dense", and "ultra-dense". In an increasingly available high-dimensional scenario, where the number of functional variables $p$ is large in relation to $n$, we revisit this open problem from a non-asymptotic perspective by deriving comprehensive concentration inequalities for the local linear smoothers. Besides being of interest by themselves, our non-asymptotic results lead to elementwise maximum rates of $L_2$ convergence and uniform convergence serving as a fundamentally important tool for further convergence analysis when $p$ grows exponentially with $n$ and possibly $T$. With the presence of extra $\log p$ terms to account for the high-dimensional effect, we then investigate the scaled phase transitions and the corresponding elementwise maximum rates from sparse to semi-dense to ultra-dense functional data in high dimensions. We also discuss a couple of applications of our theoretical results. Finally, numerical studies are carried out to confirm the established theoretical properties.
△ Less
Submitted 25 January, 2025; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Adaptive Functional Thresholding for Sparse Covariance Function Estimation in High Dimensions
Authors:
Qin Fang,
Shaojun Guo,
Xinghao Qiao
Abstract:
Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this paper, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert--Schmidt norm of functions, we introduce a new class…
▽ More
Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this paper, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert--Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding estimator by incorporating the variance effects of individual entries of the sample covariance function into functional thresholding. To handle the practical scenario where curves are partially observed with errors, we also develop a nonparametric smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its binned implementation to accelerate the computation. We investigate the theoretical properties of our proposals when p grows exponentially with n under both fully and partially observed functional scenarios. Finally, we demonstrate that the proposed adaptive functional thresholding estimators significantly outperform the competitors through extensive simulations and the functional connectivity analysis of two neuroimaging datasets.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
An autocovariance-based learning framework for high-dimensional functional time series
Authors:
Jinyuan Chang,
Cheng Chen,
Xinghao Qiao,
Qiwei Yao
Abstract:
Many scientific and economic applications involve the statistical learning of high-dimensional functional time series, where the number of functional variables is comparable to, or even greater than, the number of serially dependent functional observations. In this paper, we model observed functional time series, which are subject to errors in the sense that each functional datum arises as the sum…
▽ More
Many scientific and economic applications involve the statistical learning of high-dimensional functional time series, where the number of functional variables is comparable to, or even greater than, the number of serially dependent functional observations. In this paper, we model observed functional time series, which are subject to errors in the sense that each functional datum arises as the sum of two uncorrelated components, one dynamic and one white noise. Motivated from the fact that the autocovariance function of observed functional time series automatically filters out the noise term, we propose a three-step procedure by first performing autocovariance-based dimension reduction, then formulating a novel autocovariance-based block regularized minimum distance estimation framework to produce block sparse estimates, and based on which obtaining the final functional sparse estimates. We investigate theoretical properties of the proposed estimators, and illustrate the proposed estimation procedure via three sparse high-dimensional functional time series models. We demonstrate via both simulated and real datasets that our proposed estimators significantly outperform the competitors.
△ Less
Submitted 23 August, 2022; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Finite Sample Theory for High-Dimensional Functional/Scalar Time Series with Applications
Authors:
Qin Fang,
Shaojun Guo,
Xinghao Qiao
Abstract:
Statistical analysis of high-dimensional functional times series arises in various applications. Under this scenario, in addition to the intrinsic infinite-dimensionality of functional data, the number of functional variables can grow with the number of serially dependent observations. In this paper, we focus on the theoretical analysis of relevant estimated cross-(auto)covariance terms between tw…
▽ More
Statistical analysis of high-dimensional functional times series arises in various applications. Under this scenario, in addition to the intrinsic infinite-dimensionality of functional data, the number of functional variables can grow with the number of serially dependent observations. In this paper, we focus on the theoretical analysis of relevant estimated cross-(auto)covariance terms between two multivariate functional time series or a mixture of multivariate functional and scalar time series beyond the Gaussianity assumption. We introduce a new perspective on dependence by proposing functional cross-spectral stability measure to characterize the effect of dependence on these estimated cross terms, which are essential in the estimates for additive functional linear regressions. With the proposed functional cross-spectral stability measure, we develop useful concentration inequalities for estimated cross-(auto)covariance matrix functions to accommodate more general sub-Gaussian functional linear processes and, furthermore, establish finite sample theory for relevant estimated terms under a commonly adopted functional principal component analysis framework. Using our derived non-asymptotic results, we investigate the convergence properties of the regularized estimates for two additive functional linear regression applications under sparsity assumptions including functional linear lagged regression and partially functional linear regression in the context of high-dimensional functional/scalar time series.
△ Less
Submitted 31 July, 2021; v1 submitted 16 April, 2020;
originally announced April 2020.
-
On Consistency and Sparsity for High-Dimensional Functional Time Series with Application to Autoregressions
Authors:
Shaojun Guo,
Xinghao Qiao
Abstract:
Modelling a large collection of functional time series arises in a broad spectral of real applications. Under such a scenario, not only the number of functional variables can be diverging with, or even larger than the number of temporally dependent functional observations, but each function itself is an infinite-dimensional object, posing a challenging task. In this paper, we propose a three-step…
▽ More
Modelling a large collection of functional time series arises in a broad spectral of real applications. Under such a scenario, not only the number of functional variables can be diverging with, or even larger than the number of temporally dependent functional observations, but each function itself is an infinite-dimensional object, posing a challenging task. In this paper, we propose a three-step procedure to estimate high-dimensional functional time series models. To provide theoretical guarantees for the three-step procedure, we focus on multivariate stationary processes and propose a novel functional stability measure based on their spectral properties. Such stability measure facilitates the development of some useful concentration bounds on sample (auto)covariance functions, which serve as a fundamental tool for further convergence analysis in high-dimensional settings. As functional principal component analysis (FPCA) is one of the key dimension reduction techniques in the first step, we also investigate the non-asymptotic properties of the relevant estimated terms under a FPCA framework. To illustrate with an important application, we consider vector functional autoregressive models and develop a regularization approach to estimate autoregressive coefficient functions under the sparsity constraint. Using our derived non-asymptotic results, we investigate convergence properties of the regularized estimate under high-dimensional scaling. Finally, the finite-sample performance of the proposed method is examined through both simulations and a public financial dataset.
△ Less
Submitted 30 August, 2021; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Rates of Convergence for Large-scale Nearest Neighbor Classification
Authors:
Xingye Qiao,
Jiexin Duan,
Guang Cheng
Abstract:
Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a…
▽ More
Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor $k$ should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings.
△ Less
Submitted 30 October, 2019; v1 submitted 3 September, 2019;
originally announced September 2019.
-
A General Theory for Large-Scale Curve Time Series via Functional Stability Measure
Authors:
Shaojun Guo,
Xinghao Qiao
Abstract:
Modelling a large bundle of curves arises in a broad spectrum of real applications. However, existing literature relies primarily on the critical assumption of independent curve observations. In this paper, we provide a general theory for large-scale Gaussian curve time series, where the temporal and cross-sectional dependence across multiple curve observations exist and the number of functional v…
▽ More
Modelling a large bundle of curves arises in a broad spectrum of real applications. However, existing literature relies primarily on the critical assumption of independent curve observations. In this paper, we provide a general theory for large-scale Gaussian curve time series, where the temporal and cross-sectional dependence across multiple curve observations exist and the number of functional variables, $p,$ may be large relative to the number of observations, $n.$ We propose a novel functional stability measure for multivariate stationary processes based on their spectral properties and use it to establish some useful concentration bounds on the sample covariance matrix function. These concentration bounds serve as a fundamental tool for further theoretical analysis, in particular, for deriving nonasymptotic upper bounds on the errors of the regularized estimates in high dimensional settings. As {\it functional principle component analysis} (FPCA) is one of the key techniques to handle functional data, we also investigate the concentration properties of the relevant estimated terms under a FPCA framework. To illustrate with an important application, we consider {\it vector functional autoregressive models} and develop a regularization approach to estimate autoregressive coefficient functions under the sparsity constraint. Using our derived nonasymptotic results, we investigate the theoretical properties of the regularized estimate in a "large $p,$ small $n$" regime. The finite sample performance of the proposed method is examined through simulation studies.
△ Less
Submitted 20 December, 2018; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Distributed Nearest Neighbor Classification
Authors:
Jiexin Duan,
Xingye Qiao,
Guang Cheng
Abstract:
Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overc…
▽ More
Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of both the regret and instability, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our findings are supported by numerical studies using both simulated and real data sets.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
Homogeneity Pursuit in Single Index Models based Panel Data Analysis
Authors:
Heng Lian,
Xinghao Qiao,
Wenyang Zhang
Abstract:
Panel data analysis is an important topic in statistics and econometrics. Traditionally, in panel data analysis, all individuals are assumed to share the same unknown parameters, e.g. the same coefficients of covariates when the linear models are used, and the differences between the individuals are accounted for by cluster effects. This kind of modelling only makes sense if our main interest is o…
▽ More
Panel data analysis is an important topic in statistics and econometrics. Traditionally, in panel data analysis, all individuals are assumed to share the same unknown parameters, e.g. the same coefficients of covariates when the linear models are used, and the differences between the individuals are accounted for by cluster effects. This kind of modelling only makes sense if our main interest is on the global trend, this is because it would not be able to tell us anything about the individual attributes which are sometimes very important. In this paper, we proposed a modelling based on the single index models embedded with homogeneity for panel data analysis, which builds the individual attributes in the model and is parsimonious at the same time. We develop a data driven approach to identify the structure of homogeneity, and estimate the unknown parameters and functions based on the identified structure. Asymptotic properties of the resulting estimators are established. Intensive simulation studies conducted in this paper also show the resulting estimators work very well when sample size is finite. Finally, the proposed modelling is applied to a public financial dataset and a UK climate dataset, the results reveal some interesting findings.
△ Less
Submitted 7 June, 2017; v1 submitted 2 June, 2017;
originally announced June 2017.
-
On Reject and Refine Options in Multicategory Classification
Authors:
Chong Zhang,
Wenbo Wang,
Xingye Qiao
Abstract:
In many real applications of statistical learning, a decision made from misclassification can be too costly to afford; in this case, a reject option, which defers the decision until further investigation is conducted, is often preferred. In recent years, there has been much development for binary classification with a reject option. Yet, little progress has been made for the multicategory case. In…
▽ More
In many real applications of statistical learning, a decision made from misclassification can be too costly to afford; in this case, a reject option, which defers the decision until further investigation is conducted, is often preferred. In recent years, there has been much development for binary classification with a reject option. Yet, little progress has been made for the multicategory case. In this article, we propose margin-based multicategory classification methods with a reject option. In addition, and more importantly, we introduce a new and unique refine option for the multicategory problem, where the class of an observation is predicted to be from a set of class labels, whose cardinality is not necessarily one. The main advantage of both options lies in their capacity of identifying error-prone observations. Moreover, the refine option can provide more constructive information for classification by effectively ruling out implausible classes. Efficient implementations have been developed for the proposed methods. On the theoretical side, we offer a novel statistical learning theory and show a fast convergence rate of the excess $\ell$-risk of our methods with emphasis on diverging dimensionality and number of classes. The results can be further improved under a low noise assumption. A set of comprehensive simulation and real data studies has shown the usefulness of the new learning tools compared to regular multicategory classifiers. Detailed proofs of theorems and extended numerical results are included in the supplemental materials available online.
△ Less
Submitted 9 January, 2017;
originally announced January 2017.