-
On Robust Empirical Likelihood for Nonparametric Regression with Application to Regression Discontinuity Designs
Authors:
Qin Fang,
Shaojun Guo,
Yang Hong,
Xinghao Qiao
Abstract:
Empirical likelihood serves as a powerful tool for constructing confidence intervals in nonparametric regression and regression discontinuity designs (RDD). The original empirical likelihood framework can be naturally extended to these settings using local linear smoothers, with Wilks' theorem holding only when an undersmoothed bandwidth is selected. However, the generalization of bias-corrected v…
▽ More
Empirical likelihood serves as a powerful tool for constructing confidence intervals in nonparametric regression and regression discontinuity designs (RDD). The original empirical likelihood framework can be naturally extended to these settings using local linear smoothers, with Wilks' theorem holding only when an undersmoothed bandwidth is selected. However, the generalization of bias-corrected versions of empirical likelihood under more realistic conditions is non-trivial and has remained an open challenge in the literature. This paper provides a satisfactory solution by proposing a novel approach, referred to as robust empirical likelihood, designed for nonparametric regression and RDD. The core idea is to construct robust weights which simultaneously achieve bias correction and account for the additional variability introduced by the estimated bias, thereby enabling valid confidence interval construction without extra estimation steps involved. We demonstrate that the Wilks' phenomenon still holds under weaker conditions in nonparametric regression, sharp and fuzzy RDD settings. Extensive simulation studies confirm the effectiveness of our proposed approach, showing superior performance over existing methods in terms of coverage probabilities and interval lengths. Moreover, the proposed procedure exhibits robustness to bandwidth selection, making it a flexible and reliable tool for empirical analyses. The practical usefulness is further illustrated through applications to two real datasets.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Two-Dimensional Deep ReLU CNN Approximation for Korobov Functions: A Constructive Approach
Authors:
Qin Fang,
Lei Shi,
Min Xu,
Ding-Xuan Zhou
Abstract:
This paper investigates approximation capabilities of two-dimensional (2D) deep convolutional neural networks (CNNs), with Korobov functions serving as a benchmark. We focus on 2D CNNs, comprising multi-channel convolutional layers with zero-padding and ReLU activations, followed by a fully connected layer. We propose a fully constructive approach for building 2D CNNs to approximate Korobov functi…
▽ More
This paper investigates approximation capabilities of two-dimensional (2D) deep convolutional neural networks (CNNs), with Korobov functions serving as a benchmark. We focus on 2D CNNs, comprising multi-channel convolutional layers with zero-padding and ReLU activations, followed by a fully connected layer. We propose a fully constructive approach for building 2D CNNs to approximate Korobov functions and provide rigorous analysis of the complexity of the constructed networks. Our results demonstrate that 2D CNNs achieve near-optimal approximation rates under the continuous weight selection model, significantly alleviating the curse of dimensionality. This work provides a solid theoretical foundation for 2D CNNs and illustrates their potential for broader applications in function approximation.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Large-scale Multiple Testing of Cross-covariance Functions with Applications to Functional Network Models
Authors:
Qin Fang,
Qing Jiang,
Xinghao Qiao
Abstract:
The estimation of functional networks through functional covariance and graphical models have recently attracted increasing attention in settings with high dimensional functional data, where the number of functional variables p is comparable to, and maybe larger than, the number of subjects. However, the existing methods all depend on regularization techniques, which make it unclear how the involv…
▽ More
The estimation of functional networks through functional covariance and graphical models have recently attracted increasing attention in settings with high dimensional functional data, where the number of functional variables p is comparable to, and maybe larger than, the number of subjects. However, the existing methods all depend on regularization techniques, which make it unclear how the involved tuning parameters are related to the number of false edges. In this paper, we first reframe the functional covariance model estimation as a tuning-free problem of simultaneously testing p(p-1)/2 hypotheses for cross-covariance functions, and introduce a novel multiple testing procedure. We then explore the multiple testing procedure under a general error-contamination framework and establish that our procedure can control false discoveries asymptotically. Additionally, we demonstrate that our proposed methods for two concrete examples: the functional covariance model for discretely observed functional data and, importantly, the more challenging functional graphical model, can be seamlessly integrated into the general error-contamination framework, and, with verifiable conditions, achieve theoretical guarantees on effective false discovery control. Finally, we showcase the superiority of our proposals through extensive simulations and brain connectivity analysis of two neuroimaging datasets.
△ Less
Submitted 4 September, 2024; v1 submitted 28 July, 2024;
originally announced July 2024.
-
On the modelling and prediction of high-dimensional functional time series
Authors:
Jinyuan Chang,
Qin Fang,
Xinghao Qiao,
Qiwei Yao
Abstract:
We propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series $p$ is large in relation to the length of time series $n$. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed…
▽ More
We propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series $p$ is large in relation to the length of time series $n$. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed curve series can be segmented into several groups such that any two subseries from any two different groups are uncorrelated both contemporaneously and serially. Consequently in our second step those groups are handled separately without the information loss on the overall linear dynamic structure. The second step is devoted to establishing a finite-dimensional dynamical structure for all the transformed functional time series within each group. Furthermore the finite-dimensional structure is represented by that of a vector time series. Modelling and forecasting for the original high-dimensional functional time series are realized via those for the vector time series in all the groups. We investigate the theoretical properties of our proposed methods, and illustrate the finite-sample performance through both extensive simulation and two real datasets.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Autoregressive Networks with Dependent Edges
Authors:
Jinyuan Chang,
Qin Fang,
Eric D. Kolaczyk,
Peter W. MacDonald,
Qiwei Yao
Abstract:
We propose an autoregressive framework for modelling dynamic networks with dependent edges. It encompasses the models which accommodate, for example, transitivity, density-dependent and other stylized features often observed in real network data. By assuming the edges of network at each time are independent conditionally on their lagged values, the models, which exhibit a close connection with tem…
▽ More
We propose an autoregressive framework for modelling dynamic networks with dependent edges. It encompasses the models which accommodate, for example, transitivity, density-dependent and other stylized features often observed in real network data. By assuming the edges of network at each time are independent conditionally on their lagged values, the models, which exhibit a close connection with temporal ERGMs, facilitate both simulation and the maximum likelihood estimation in the straightforward manner. Due to the possible large number of parameters in the models, the initial MLEs may suffer from slow convergence rates. An improved estimator for each component parameter is proposed based on an iteration based on the projection which mitigates the impact of the other parameters (Chang et al., 2021, 2023). Based on a martingale difference structure, the asymptotic distribution of the improved estimator is derived without the stationarity assumption. The limiting distribution is not normal in general, and it reduces to normal when the underlying process satisfies some mixing conditions. Illustration with a transitivity model was carried out in both simulation and a real network data set.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Dynamic Reconfiguration of Brain Functional Network in Stroke
Authors:
Kaichao Wu,
Beth Jelfs,
Katrina Neville,
Wenzhen He,
Qiang Fang
Abstract:
The brain continually reorganizes its functional network to adapt to post-stroke functional impairments. Previous studies using static modularity analysis have presented global-level behavior patterns of this network reorganization. However, it is far from understood how the brain reconfigures its functional network dynamically following a stroke. This study collected resting-state functional MRI…
▽ More
The brain continually reorganizes its functional network to adapt to post-stroke functional impairments. Previous studies using static modularity analysis have presented global-level behavior patterns of this network reorganization. However, it is far from understood how the brain reconfigures its functional network dynamically following a stroke. This study collected resting-state functional MRI data from 15 stroke patients, with mild (n = 6) and severe (n = 9) two subgroups based on their clinical symptoms. Additionally, 15 age-matched healthy subjects were considered as controls. By applying a multilayer network method, a dynamic modular structure was recognized based on a time-resolved function network. Then dynamic network measurements (recruitment, integration, and flexibility) were calculated to characterize the dynamic reconfiguration of post-stroke brain functional networks, hence, to reveal the neural functional rebuilding process. It was found from this investigation that severe patients tended to have reduced recruitment and increased between-network integration, while mild patients exhibited low network flexibility and less network integration. It is also noted that this severity-dependent alteration in network interaction was not able to be revealed by previous studies using static methods. Clinically, the obtained knowledge of the diverse patterns of dynamic adjustment in brain functional networks observed from the brain signal could help understand the underlying mechanism of the motor, speech, and cognitive functional impairments caused by stroke attacks. The proposed method not only could be used to evaluate patients' current brain status but also has the potential to provide insights into prognosis analysis and prediction.
△ Less
Submitted 22 March, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Adaptive Functional Thresholding for Sparse Covariance Function Estimation in High Dimensions
Authors:
Qin Fang,
Shaojun Guo,
Xinghao Qiao
Abstract:
Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this paper, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert--Schmidt norm of functions, we introduce a new class…
▽ More
Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this paper, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert--Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding estimator by incorporating the variance effects of individual entries of the sample covariance function into functional thresholding. To handle the practical scenario where curves are partially observed with errors, we also develop a nonparametric smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its binned implementation to accelerate the computation. We investigate the theoretical properties of our proposals when p grows exponentially with n under both fully and partially observed functional scenarios. Finally, we demonstrate that the proposed adaptive functional thresholding estimators significantly outperform the competitors through extensive simulations and the functional connectivity analysis of two neuroimaging datasets.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Stacked Autoencoder Based Multi-Omics Data Integration for Cancer Survival Prediction
Authors:
Xing Wu,
Qiulian Fang
Abstract:
Cancer survival prediction is important for developing personalized treatments and inducing disease-causing mechanisms. Multi-omics data integration is attracting widespread interest in cancer research for providing information for understanding cancer progression at multiple genetic levels. Many works, however, are limited because of the high dimensionality and heterogeneity of multi-omics data.…
▽ More
Cancer survival prediction is important for developing personalized treatments and inducing disease-causing mechanisms. Multi-omics data integration is attracting widespread interest in cancer research for providing information for understanding cancer progression at multiple genetic levels. Many works, however, are limited because of the high dimensionality and heterogeneity of multi-omics data. In this paper, we propose a novel method to integrate multi-omics data for cancer survival prediction, called Stacked AutoEncoder-based Survival Prediction Neural Network (SAEsurv-net). In the cancer survival prediction for TCGA cases, SAEsurv-net addresses the curse of dimensionality with a two-stage dimensionality reduction strategy and handles multi-omics heterogeneity with a stacked autoencoder model. The two-stage dimensionality reduction strategy achieves a balance between computation complexity and information exploiting. The stacked autoencoder model removes most heterogeneities such as data's type and size in the first group of autoencoders, and integrates multiple omics data in the second autoencoder. The experiments show that SAEsurv-net outperforms models based on a single type of data as well as other state-of-the-art methods.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
Authors:
Qixiang Fang,
Dong Nguyen,
Daniel L Oberski
Abstract:
Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social sci…
▽ More
Text embedding models from Natural Language Processing can map text data (e.g. words, sentences, documents) to supposedly meaningful numerical representations (a.k.a. text embeddings). While such models are increasingly applied in social science research, one important issue is often not addressed: the extent to which these embeddings are valid representations of constructs relevant for social science research. We therefore propose the use of the classic construct validity framework to evaluate the validity of text embeddings. We show how this framework can be adapted to the opaque and high-dimensional nature of text embeddings, with application to survey questions. We include several popular text embedding methods (e.g. fastText, GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct validity analyses. We find evidence of convergent and discriminant validity in some cases. We also show that embeddings can be used to predict respondent's answers to completely new survey questions. Furthermore, BERT-based embedding techniques and the Universal Sentence Encoder provide more valid representations of survey questions than do others. Our results thus highlight the necessity to examine the construct validity of text embeddings before deploying them in social science research.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Efficient Graph Deep Learning in TensorFlow with tf_geometric
Authors:
Jun Hu,
Shengsheng Qian,
Quan Fang,
Youze Wang,
Quan Zhao,
Huaiwen Zhang,
Changsheng Xu
Abstract:
We introduce tf_geometric, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1.x and 2.x. tf_geometric provides kernel libraries for building Graph Neural Networks (GNNs) as well as implementations of popular GNNs. The kernel libraries consist of infrastructures for building efficient GNNs, including graph data structures, graph map-reduce framewor…
▽ More
We introduce tf_geometric, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1.x and 2.x. tf_geometric provides kernel libraries for building Graph Neural Networks (GNNs) as well as implementations of popular GNNs. The kernel libraries consist of infrastructures for building efficient GNNs, including graph data structures, graph map-reduce framework, graph mini-batch strategy, etc. These infrastructures enable tf_geometric to support single-graph computation, multi-graph computation, graph mini-batch, distributed training, etc.; therefore, tf_geometric can be used for a variety of graph deep learning tasks, such as transductive node classification, inductive node classification, link prediction, and graph classification. Based on the kernel libraries, tf_geometric implements a variety of popular GNN models for different tasks. To facilitate the implementation of GNNs, tf_geometric also provides some other libraries for dataset management, graph sampling, etc. Different from existing popular GNN libraries, tf_geometric provides not only Object-Oriented Programming (OOP) APIs, but also Functional APIs, which enable tf_geometric to handle advanced graph deep learning tasks such as graph meta-learning. The APIs of tf_geometric are friendly, and they are suitable for both beginners and experts. In this paper, we first present an overview of tf_geometric's framework. Then, we conduct experiments on some benchmark datasets and report the performance of several popular GNN models implemented by tf_geometric.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
The Role of Time, Weather and Google Trends in Understanding and Predicting Web Survey Response
Authors:
Qixiang Fang,
Joep Burger,
Ralph Meijers,
Kees van Berkel
Abstract:
In the literature about web survey methodology, significant efforts have been made to understand the role of time-invariant factors (e.g. gender, education and marital status) in (non-)response mechanisms. Time-invariant factors alone, however, cannot account for most variations in (non-)responses, especially fluctuations of response rates over time. This observation inspires us to investigate the…
▽ More
In the literature about web survey methodology, significant efforts have been made to understand the role of time-invariant factors (e.g. gender, education and marital status) in (non-)response mechanisms. Time-invariant factors alone, however, cannot account for most variations in (non-)responses, especially fluctuations of response rates over time. This observation inspires us to investigate the counterpart of time-invariant factors, namely time-varying factors and the potential role they play in web survey (non-)response. Specifically, we study the effects of time, weather and societal trends (derived from Google Trends data) on the daily (non-)response patterns of the 2016 and 2017 Dutch Health Surveys. Using discrete-time survival analysis, we find, among others, that weekends, holidays, pleasant weather, disease outbreaks and terrorism salience are associated with fewer responses. Furthermore, we show that using these variables alone achieves satisfactory prediction accuracy of both daily and cumulative response rates when the trained model is applied to future unseen data. This approach has the further benefit of requiring only non-personal contextual information and thus involving no privacy issues. We discuss the implications of the study for survey research and data collection.
△ Less
Submitted 21 January, 2022; v1 submitted 30 October, 2020;
originally announced November 2020.
-
Deep Fundamental Matrix Estimation without Correspondences
Authors:
Omid Poursaeed,
Guandao Yang,
Aditya Prakash,
Qiuren Fang,
Hanqing Jiang,
Bharath Hariharan,
Serge Belongie
Abstract:
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estim…
▽ More
Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estimate fundamental matrices in an end-to-end manner without relying on point correspondences. New modules and layers are introduced in order to preserve mathematical properties of the fundamental matrix as a homogeneous rank-2 matrix with seven degrees of freedom. We analyze performance of the proposed models using various metrics on the KITTI dataset, and show that they achieve competitive performance with traditional methods without the need for extracting correspondences.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
Bayesian Benchmark Dose Analysis
Authors:
Qijun Fang,
Walter W. Piegorsch,
Katherine Y. Barnes
Abstract:
An important objective in environmental risk assessment is estimation of minimum exposure levels, called Benchmark Doses (BMDs) that induce a pre-specified Benchmark Response (BMR) in a target population. Established inferential approaches for BMD analysis typically involve one-sided, frequentist confidence limits, leading in practice to what are called Benchmark Dose Lower Limits (BMDLs). Appeal…
▽ More
An important objective in environmental risk assessment is estimation of minimum exposure levels, called Benchmark Doses (BMDs) that induce a pre-specified Benchmark Response (BMR) in a target population. Established inferential approaches for BMD analysis typically involve one-sided, frequentist confidence limits, leading in practice to what are called Benchmark Dose Lower Limits (BMDLs). Appeal to Bayesian modeling and credible limits for building BMDLs is far less developed, however. Indeed, for the few existing forms of Bayesian BMDs, informative prior information is seldom incorporated. We develop reparameterized quantal-response models that explicitly describe the BMD as a target parameter. Our goal is to obtain an improved estimation and calculation archetype for the BMD and for the BMDL, by employing quantifiable prior belief to represent parameter uncertainty in the statistical model. Implementation is facilitated via a Monte Carlo-based adaptive Metropolis (AM) algorithm to approximate the posterior distribution. An example from environmental carcinogenicity testing illustrates the calculations.
△ Less
Submitted 10 June, 2014;
originally announced June 2014.
-
Bayesian Model-Averaged Benchmark Dose Analysis Via Reparameterized Quantal-Response Models
Authors:
Qijun Fang,
Walter W. Piegorsch,
Susan J. Simmons,
Xiaosong Li,
Cuixian Chen,
Yishi Wang
Abstract:
An important objective in biomedical risk assessment is estimation of minimum exposure levels that induce a pre-specified adverse response in a target population. The exposure/dose points in such settings are known as Benchmark Doses (BMDs). Recently, parametric Bayesian estimation for finding BMDs has become popular. A large variety of candidate dose-response models is available for applying thes…
▽ More
An important objective in biomedical risk assessment is estimation of minimum exposure levels that induce a pre-specified adverse response in a target population. The exposure/dose points in such settings are known as Benchmark Doses (BMDs). Recently, parametric Bayesian estimation for finding BMDs has become popular. A large variety of candidate dose-response models is available for applying these methods, however, leading to questions of model adequacy and uncertainty. Here we enhance the Bayesian estimation technique for BMD analysis by applying Bayesian model averaging to produce point estimates and (lower) credible bounds. We include reparameterizations of traditional dose-response models that allow for more-focused use of elicited prior information when building the Bayesian hierarchy. Performance of the method is evaluated via a short simulation study. An example from carcinogenicity testing illustrates the calculations.
△ Less
Submitted 17 February, 2014;
originally announced February 2014.