-
Adaptive Block-Based Change-Point Detection for Sparse Spatially Clustered Data with Applications in Remote Sensing Imaging
Authors:
Alan Moore,
Lynna Chu,
Zhengyuan Zhu
Abstract:
We present a non-parametric change-point detection approach to detect potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a small, unknown subset of dimensions, where these dimensions may be correlated. Our work is motivated by a remote sensing application, where changes occur in small, spatia…
▽ More
We present a non-parametric change-point detection approach to detect potentially sparse changes in a time series of high-dimensional observations or non-Euclidean data objects. We target a change in distribution that occurs in a small, unknown subset of dimensions, where these dimensions may be correlated. Our work is motivated by a remote sensing application, where changes occur in small, spatially clustered regions over time. An adaptive block-based change-point detection framework is proposed that accounts for spatial dependencies across dimensions and leverages these dependencies to boost detection power and improve estimation accuracy. Through simulation studies, we demonstrate that our approach has superior performance in detecting sparse changes in datasets with spatial or local group structures. An application of the proposed method to detect activity, such as new construction, in remote sensing imagery of the Natanz Nuclear facility in Iran is presented to demonstrate the method's efficacy.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Controlling the false discovery rate in high-dimensional linear models using model-X knockoffs and $p$-values
Authors:
Jinyuan Chang,
Chenlong Li,
Cheng Yong Tang,
Zhengtian Zhu
Abstract:
In this paper, we propose novel multiple testing methods for controlling the false discovery rate (FDR) in the context of high-dimensional linear models. Our development innovatively integrates model-X knockoff techniques with debiased penalized regression estimators. The proposed approach addresses two fundamental challenges in high-dimensional statistical inference: (i) constructing valid test s…
▽ More
In this paper, we propose novel multiple testing methods for controlling the false discovery rate (FDR) in the context of high-dimensional linear models. Our development innovatively integrates model-X knockoff techniques with debiased penalized regression estimators. The proposed approach addresses two fundamental challenges in high-dimensional statistical inference: (i) constructing valid test statistics and corresponding $p$-values in solving problems with a diverging number of model parameters, and (ii) ensuring FDR control under complex and unknown dependence structures among test statistics. A central contribution of our methodology lies in the rigorous construction and theoretical analysis of two paired sets of test statistics. Based on these test statistics, our methodology adopts two $p$-value-based multiple testing algorithms. The first applies the conventional Benjamini-Hochberg procedure, justified by the asymptotic mutual independence and normality of one set of the test statistics. The second leverages the paired structure of both sets of test statistics to improve detection power while maintaining rigorous FDR control. We provide comprehensive theoretical analysis, establishing the validity of the debiasing framework and ensuring that the proposed methods achieve proper FDR control. Extensive simulation studies demonstrate that our procedures outperform existing approaches - particularly those relying on empirical evaluations of false discovery proportions - in terms of both power and empirical control of the FDR. Notably, our methodology yields substantial improvements in settings characterized by weaker signals, smaller sample sizes, and lower pre-specified FDR levels.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
The Estimation of Continual Causal Effect for Dataset Shifting Streams
Authors:
Baining Chen,
Yiming Zhang,
Yuqiao Han,
Ruyue Zhang,
Ruihuan Du,
Zhishuo Zhou,
Zhengdan Zhu,
Xun Liu,
Jiecheng Guo
Abstract:
Causal effect estimation has been widely used in marketing optimization. The framework of an uplift model followed by a constrained optimization algorithm is popular in practice. To enhance performance in the online environment, the framework needs to be improved to address the complexities caused by temporal dataset shift. This paper focuses on capturing the dataset shift from user behavior and d…
▽ More
Causal effect estimation has been widely used in marketing optimization. The framework of an uplift model followed by a constrained optimization algorithm is popular in practice. To enhance performance in the online environment, the framework needs to be improved to address the complexities caused by temporal dataset shift. This paper focuses on capturing the dataset shift from user behavior and domain distribution changing over time. We propose an Incremental Causal Effect with Proxy Knowledge Distillation (ICE-PKD) framework to tackle this challenge. The ICE-PKD framework includes two components: (i) a multi-treatment uplift network that eliminates confounding bias using counterfactual regression; (ii) an incremental training strategy that adapts to the temporal dataset shift by updating with the latest data and protects generalization via replay-based knowledge distillation. We also revisit the uplift modeling metrics and introduce a novel metric for more precise online evaluation in multiple treatment scenarios. Extensive experiments on both simulated and online datasets show that the proposed framework achieves better performance. The ICE-PKD framework has been deployed in the marketing system of Huaxiaozhu, a ride-hailing platform in China.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Tensor dynamic conditional correlation model: A new way to pursuit "Holy Grail of investing"
Authors:
Cheng Yu,
Zhoufan Zhu,
Ke Zhu
Abstract:
Style investing creates asset classes (or the so-called "styles") with low correlations, aligning well with the principle of "Holy Grail of investing" in terms of portfolio selection. The returns of styles naturally form a tensor-valued time series, which requires new tools for studying the dynamics of the conditional correlation matrix to facilitate the aforementioned principle. Towards this goal…
▽ More
Style investing creates asset classes (or the so-called "styles") with low correlations, aligning well with the principle of "Holy Grail of investing" in terms of portfolio selection. The returns of styles naturally form a tensor-valued time series, which requires new tools for studying the dynamics of the conditional correlation matrix to facilitate the aforementioned principle. Towards this goal, we introduce a new tensor dynamic conditional correlation (TDCC) model, which is based on two novel treatments: trace-normalization and dimension-normalization. These two normalizations adapt to the tensor nature of the data, and they are necessary except when the tensor data reduce to vector data. Moreover, we provide an easy-to-implement estimation procedure for the TDCC model, and examine its finite sample performance by simulations. Finally, we assess the usefulness of the TDCC model in international portfolio selection across ten global markets and in large portfolio selection for 1800 stocks from the Chinese stock market.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
False Discovery Rate Control via Frequentist-assisted Horseshoe
Authors:
Qiaoyu Liang,
Zihan Zhu,
Ziang Fu,
Michael Evans
Abstract:
The horseshoe prior, a widely used handy alternative to the spike-and-slab prior, has proven to be an exceptional default global-local shrinkage prior in Bayesian inference and machine learning. However, designing tests with frequentist false discovery rate (FDR) control using the horseshoe prior or the general class of global-local shrinkage priors remains an open problem. In this paper, we propo…
▽ More
The horseshoe prior, a widely used handy alternative to the spike-and-slab prior, has proven to be an exceptional default global-local shrinkage prior in Bayesian inference and machine learning. However, designing tests with frequentist false discovery rate (FDR) control using the horseshoe prior or the general class of global-local shrinkage priors remains an open problem. In this paper, we propose a frequentist-assisted horseshoe procedure that not only resolves this long-standing FDR control issue for the high dimensional normal means testing problem but also exhibits satisfactory finite-sample FDR control under any desired nominal level for both large-scale multiple independent and correlated tests. We carry out the frequentist-assisted horseshoe procedure in an easy and intuitive way by using the minimax estimator of the global parameter of the horseshoe prior while maintaining the remaining full Bayes vanilla horseshoe structure. The results of both intensive simulations under different sparsity levels, and real-world data demonstrate that the frequentist-assisted horseshoe procedure consistently achieves robust finite-sample FDR control. Existing frequentist or Bayesian FDR control procedures can lose finite-sample FDR control in a variety of common sparse cases. Based on the intimate relationship between the minimax estimation and the level of FDR control discovered in this work, we point out potential generalizations to achieve FDR control for both more complicated models and the general global-local shrinkage prior family.
△ Less
Submitted 17 February, 2025; v1 submitted 8 February, 2025;
originally announced February 2025.
-
EM-based Fast Uncertainty Quantification for Bayesian Multi-setup Operational Modal Analysis
Authors:
Wei Zhu,
Binbin Li,
Zuo Zhu
Abstract:
The current Bayesian FFT algorithm relies on direct differentiation to obtain the posterior covariance matrix (PCM), which is time-consuming, memory-intensive, and hard to code, especially for the multi-setup operational modal analysis (OMA). Aiming at accelerating the uncertainty quantification in multi-setup OMA, an expectation-maximization (EM)-based algorithm is proposed by reformulating the H…
▽ More
The current Bayesian FFT algorithm relies on direct differentiation to obtain the posterior covariance matrix (PCM), which is time-consuming, memory-intensive, and hard to code, especially for the multi-setup operational modal analysis (OMA). Aiming at accelerating the uncertainty quantification in multi-setup OMA, an expectation-maximization (EM)-based algorithm is proposed by reformulating the Hessian matrix of the negative log-likelihood function (NLLF) as a sum of simplified components corresponding to the complete-data NLLF. Matrix calculus is employed to derive these components in a compact manner, resulting in expressions similar to those in the single-setup case. This similarity allows for the reuse of existing Bayesian single-setup OMA codes, simplifying implementation. The singularity caused by mode shape norm constraints is addressed through null space projection, eliminating potential numerical errors from the conventional pseudoinverse operation. A sparse assembly strategy is further adopted, avoiding unnecessary calculations and storage of predominant zero elements in the Hessian matrix. The proposed method is then validated through a comprehensive parametric study and applied to a multi-setup OMA of a high-rise building. Results demonstrate that the proposed method efficiently calculates the PCM within seconds, even for cases with hundreds of parameters. This represents an efficiency improvement of at least one order of magnitude over the state-of-the-art method. Such performance paves the way for a real-time modal identification of large-scale structures, including those with closely-spaced modes.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Transfer Learning for High-dimensional Quantile Regression with Distribution Shift
Authors:
Ruiqi Bai,
Yijiao Zhang,
Hanbo Yang,
Zhongyi Zhu
Abstract:
Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional regression setting, existing transfer approaches mainly focus on the parameter shift. In this paper, we focus on the high-dimensional quantile regression with kno…
▽ More
Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional regression setting, existing transfer approaches mainly focus on the parameter shift. In this paper, we focus on the high-dimensional quantile regression with knowledge transfer under three types of distribution shift: parameter shift, covariate shift, and residual shift. We propose a novel transferable set and a new transfer framework to address the above three discrepancies. Non-asymptotic estimation error bounds and source detection consistency are established to validate the availability and superiority of our method in the presence of distribution shift. Additionally, an orthogonal debiased approach is proposed for statistical inference with knowledge transfer, leading to sharper asymptotic results. Extensive simulation results as well as real data applications further demonstrate the effectiveness of our proposed procedure.
△ Less
Submitted 29 November, 2024;
originally announced November 2024.
-
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective
Authors:
Jintang Li,
Ruofan Wu,
Yuchang Zhu,
Huizhe Zhang,
Xinzhou Jin,
Guibin Zhang,
Zulun Zhu,
Zibin Zheng,
Liang Chen
Abstract:
Graph autoencoders (GAEs) are self-supervised learning models that can learn meaningful representations of graph-structured data by reconstructing the input graph from a low-dimensional latent space. Over the past few years, GAEs have gained significant attention in academia and industry. In particular, the recent advent of GAEs with masked autoencoding schemes marks a significant advancement in g…
▽ More
Graph autoencoders (GAEs) are self-supervised learning models that can learn meaningful representations of graph-structured data by reconstructing the input graph from a low-dimensional latent space. Over the past few years, GAEs have gained significant attention in academia and industry. In particular, the recent advent of GAEs with masked autoencoding schemes marks a significant advancement in graph self-supervised learning research. While numerous GAEs have been proposed, the underlying mechanisms of GAEs are not well understood, and a comprehensive benchmark for GAEs is still lacking. In this work, we bridge the gap between GAEs and contrastive learning by establishing conceptual and methodological connections. We revisit the GAEs studied in previous works and demonstrate how contrastive learning principles can be applied to GAEs. Motivated by these insights, we introduce lrGAE (left-right GAE), a general and powerful GAE framework that leverages contrastive learning principles to learn meaningful representations. Our proposed lrGAE not only facilitates a deeper understanding of GAEs but also sets a new benchmark for GAEs across diverse graph-based learning tasks. The source code for lrGAE, including the baselines and all the code for reproducing the results, is publicly available at https://github.com/EdisonLeeeee/lrGAE.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Evolving Multi-Scale Normalization for Time Series Forecasting under Distribution Shifts
Authors:
Dalin Qin,
Yehui Li,
Weiqi Chen,
Zhaoyang Zhu,
Qingsong Wen,
Liang Sun,
Pierre Pinson,
Yi Wang
Abstract:
Complex distribution shifts are the main obstacle to achieving accurate long-term time series forecasting. Several efforts have been conducted to capture the distribution characteristics and propose adaptive normalization techniques to alleviate the influence of distribution shifts. However, these methods neglect the intricate distribution dynamics observed from various scales and the evolving fun…
▽ More
Complex distribution shifts are the main obstacle to achieving accurate long-term time series forecasting. Several efforts have been conducted to capture the distribution characteristics and propose adaptive normalization techniques to alleviate the influence of distribution shifts. However, these methods neglect the intricate distribution dynamics observed from various scales and the evolving functions of distribution dynamics and normalized mapping relationships. To this end, we propose a novel model-agnostic Evolving Multi-Scale Normalization (EvoMSN) framework to tackle the distribution shift problem. Flexible normalization and denormalization are proposed based on the multi-scale statistics prediction module and adaptive ensembling. An evolving optimization strategy is designed to update the forecasting model and statistics prediction module collaboratively to track the shifting distributions. We evaluate the effectiveness of EvoMSN in improving the performance of five mainstream forecasting methods on benchmark datasets and also show its superiority compared to existing advanced normalization and online learning approaches. The code is publicly available at https://github.com/qindalin/EvoMSN.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Enhancement of price trend trading strategies via image-induced importance weights
Authors:
Zhoufan Zhu,
Ke Zhu
Abstract:
We open up the "black-box" to identify the predictive general price patterns in price chart images via the deep learning image analysis techniques. Our identified price patterns lead to the construction of image-induced importance (triple-I) weights, which are applied to weighted moving average the existing price trend trading signals according to their level of importance in predicting price move…
▽ More
We open up the "black-box" to identify the predictive general price patterns in price chart images via the deep learning image analysis techniques. Our identified price patterns lead to the construction of image-induced importance (triple-I) weights, which are applied to weighted moving average the existing price trend trading signals according to their level of importance in predicting price movements. From an extensive empirical analysis on the Chinese stock market, we show that the triple-I weighting scheme can significantly enhance the price trend trading signals for proposing portfolios, with a thoughtful robustness study in terms of network specifications, image structures, and stock sizes. Moreover, we demonstrate that the triple-I weighting scheme is able to propose long-term portfolios from a time-scale transfer learning, enhance the news-based trading strategies through a non-technical transfer learning, and increase the overall strength of numerous trading rules for portfolio selection.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Individualized Dynamic Mediation Analysis Using Latent Factor Models
Authors:
Yijiao Zhang,
Yubai Yuan,
Yuexia Zhang,
Zhongyi Zhu,
Annie Qu
Abstract:
Mediation analysis plays a crucial role in causal inference as it can investigate the pathways through which treatment influences outcome. Most existing mediation analysis assumes that mediation effects are static and homogeneous within populations. However, mediation effects usually change over time and exhibit significant heterogeneity in many real-world applications. Additionally, the presence…
▽ More
Mediation analysis plays a crucial role in causal inference as it can investigate the pathways through which treatment influences outcome. Most existing mediation analysis assumes that mediation effects are static and homogeneous within populations. However, mediation effects usually change over time and exhibit significant heterogeneity in many real-world applications. Additionally, the presence of unobserved confounding variables imposes a significant challenge to inferring both causal effect and mediation effect. To address these issues, we propose an individualized dynamic mediation analysis method. Our approach can identify the significant mediators of the population level while capturing the time-varying and heterogeneous mediation effects via latent factor modeling on coefficients of structural equation models. Another advantage of our method is that we can infer individualized mediation effects in the presence of unmeasured time-varying confounders. We provide estimation consistency for our proposed causal estimand and selection consistency for significant mediators. Extensive simulation studies and an application to a DNA methylation study demonstrate the effectiveness and advantages of our method.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
On the Identification of Temporally Causal Representation with Instantaneous Dependence
Authors:
Zijian Li,
Yifan Shen,
Kaitao Zheng,
Ruichu Cai,
Xiangchen Song,
Mingming Gong,
Zhengmao Zhu,
Guangyi Chen,
Kun Zhang
Abstract:
Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observa…
▽ More
Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.
△ Less
Submitted 7 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis
Authors:
Zihan Zhu,
Xin Gai,
Anru R. Zhang
Abstract:
In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post…
▽ More
In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post-clustering data often leads to an inflated type-I error. In this paper, we introduce a new statistical approach that adjusts for this bias when analyzing data collected over time. Our method extends classical selective inference methods for cross-sectional data to longitudinal data. We provide theoretical guarantees for our approach with upper bounds on the selective type-I and type-II errors. We apply the method to simulated data and real-world Acute Kidney Injury (AKI) EHR datasets, thereby illustrating the advantages of our approach.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Distributed Iterative Hard Thresholding for Variable Selection in Tobit Models
Authors:
Changxin Yang,
Zhongyi Zhu,
Heng Lian
Abstract:
While extensive research has been conducted on high-dimensional data and on regression with left-censored responses, simultaneously addressing these complexities remains challenging, with only a few proposed methods available. In this paper, we utilize the Iterative Hard Thresholding (IHT) algorithm on the Tobit model in such a setting. Theoretical analysis demonstrates that our estimator converge…
▽ More
While extensive research has been conducted on high-dimensional data and on regression with left-censored responses, simultaneously addressing these complexities remains challenging, with only a few proposed methods available. In this paper, we utilize the Iterative Hard Thresholding (IHT) algorithm on the Tobit model in such a setting. Theoretical analysis demonstrates that our estimator converges with a near-optimal minimax rate. Additionally, we extend the method to a distributed setting, requiring only a few rounds of communication while retaining the estimation rate of the centralized version. Simulation results show that the IHT algorithm for the Tobit model achieves superior accuracy in predictions and subset selection, with the distributed estimator closely matching that of the centralized estimator. When applied to high-dimensional left-censored HIV viral load data, our method also exhibits similar superiority.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence
Authors:
Yancheng Huang,
Kai Yang,
Zelin Zhu,
Leian Chen
Abstract:
The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the system parameters. Nevertheless, this presumption often proves unattainable in practi…
▽ More
The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the system parameters. Nevertheless, this presumption often proves unattainable in practical scenarios due to factors such as estimation errors, system updates, etc. This paper aims to take the first attempt to develop a triadic-OCD framework with certifiable robustness, provable optimality, and guaranteed convergence. In addition, the proposed triadic-OCD algorithm can be realized in a fully asynchronous distributed manner, easing the necessity of transmitting the data to a single server. This asynchronous mechanism could also mitigate the straggler issue that faced by traditional synchronous algorithm. Moreover, the non-asymptotic convergence property of Triadic-OCD is theoretically analyzed, and its iteration complexity to achieve an $ε$-optimal point is derived. Extensive experiments have been conducted to elucidate the effectiveness of the proposed method.
△ Less
Submitted 4 June, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Spatial Heterogeneous Additive Partial Linear Model: A Joint Approach of Bivariate Spline and Forest Lasso
Authors:
Xin Zhang,
Shan Yu,
Zhengyuan Zhu,
Xin Wang
Abstract:
Identifying spatial heterogeneous patterns has attracted a surge of research interest in recent years, due to its important applications in various scientific and engineering fields. In practice the spatially heterogeneous components are often mixed with components which are spatially smooth, making the task of identifying the heterogeneous regions more challenging. In this paper, we develop an ef…
▽ More
Identifying spatial heterogeneous patterns has attracted a surge of research interest in recent years, due to its important applications in various scientific and engineering fields. In practice the spatially heterogeneous components are often mixed with components which are spatially smooth, making the task of identifying the heterogeneous regions more challenging. In this paper, we develop an efficient clustering approach to identify the model heterogeneity of the spatial additive partial linear model. Specifically, we aim to detect the spatially contiguous clusters based on the regression coefficients while introducing a spatially varying intercept to deal with the smooth spatial effect. On the one hand, to approximate the spatial varying intercept, we use the method of bivariate spline over triangulation, which can effectively handle the data from a complex domain. On the other hand, a novel fusion penalty termed the forest lasso is proposed to reveal the spatial clustering pattern. Our proposed fusion penalty has advantages in both the estimation and computation efficiencies when dealing with large spatial data. Theoretically properties of our estimator are established, and simulation results show that our approach can achieve more accurate estimation with a limited computation cost compared with the existing approaches. To illustrate its practical use, we apply our approach to analyze the spatial pattern of the relationship between land surface temperature measured by satellites and air temperature measured by ground stations in the United States.
△ Less
Submitted 3 May, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
DeepLINK-T: deep learning inference for time series data using knockoffs and LSTM
Authors:
Wenxuan Zuo,
Zifan Zhu,
Yuxuan Du,
Yi-Chun Yeh,
Jed A. Fuhrman,
Jinchi Lv,
Yingying Fan,
Fengzhu Sun
Abstract:
High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains un…
▽ More
High-dimensional longitudinal time series data is prevalent across various real-world applications. Many such applications can be modeled as regression problems with high-dimensional time series covariates. Deep learning has been a popular and powerful tool for fitting these regression models. Yet, the development of interpretable and reproducible deep-learning models is challenging and remains underexplored. This study introduces a novel method, Deep Learning Inference using Knockoffs for Time series data (DeepLINK-T), focusing on the selection of significant time series variables in regression while controlling the false discovery rate (FDR) at a predetermined level. DeepLINK-T combines deep learning with knockoff inference to control FDR in feature selection for time series models, accommodating a wide variety of feature distributions. It addresses dependencies across time and features by leveraging a time-varying latent factor structure in time series covariates. Three key ingredients for DeepLINK-T are 1) a Long Short-Term Memory (LSTM) autoencoder for generating time series knockoff variables, 2) an LSTM prediction network using both original and knockoff variables, and 3) the application of the knockoffs framework for variable selection with FDR control. Extensive simulation studies have been conducted to evaluate DeepLINK-T's performance, showing its capability to control FDR effectively while demonstrating superior feature selection power for high-dimensional longitudinal time series data compared to its non-time series counterpart. DeepLINK-T is further applied to three metagenomic data sets, validating its practical utility and effectiveness, and underscoring its potential in real-world applications.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Covariate-Elaborated Robust Partial Information Transfer with Conditional Spike-and-Slab Prior
Authors:
Ruqian Zhang,
Yijiao Zhang,
Annie Qu,
Zhongyi Zhu,
Juan Shen
Abstract:
The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global similarity measure between the source data and the target data, which may lead to inefficiency when only partial information is shared. In this paper, we propose a novel Bayesian transfer learning method named `…
▽ More
The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global similarity measure between the source data and the target data, which may lead to inefficiency when only partial information is shared. In this paper, we propose a novel Bayesian transfer learning method named ``CONCERT'' to allow robust partial information transfer for high-dimensional data analysis. A conditional spike-and-slab prior is introduced in the joint distribution of target and source parameters for information transfer. By incorporating covariate-specific priors, we can characterize partial similarities and integrate source information collaboratively to improve the performance on the target. In contrast to existing work, the CONCERT is a one-step procedure, which achieves variable selection and information transfer simultaneously. We establish variable selection consistency, as well as estimation and prediction error bounds for CONCERT. Our theory demonstrates the covariate-specific benefit of transfer learning. To ensure that our algorithm is scalable, we adopt the variational Bayes framework to facilitate implementation. Extensive experiments and two real data applications showcase the validity and advantage of CONCERT over existing cutting-edge transfer learning methods.
△ Less
Submitted 21 August, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Seller-Side Experiments under Interference Induced by Feedback Loops in Two-Sided Platforms
Authors:
Zhihua Zhu,
Zheng Cai,
Liang Zheng,
Nian Si
Abstract:
Two-sided platforms are central to modern commerce and content sharing and often utilize A/B testing for developing new features. While user-side experiments are common, seller-side experiments become crucial for specific interventions and metrics. This paper investigates the effects of interference caused by feedback loops on seller-side experiments in two-sided platforms, with a particular focus…
▽ More
Two-sided platforms are central to modern commerce and content sharing and often utilize A/B testing for developing new features. While user-side experiments are common, seller-side experiments become crucial for specific interventions and metrics. This paper investigates the effects of interference caused by feedback loops on seller-side experiments in two-sided platforms, with a particular focus on the counterfactual interleaving design, proposed in \citet{ha2020counterfactual,nandy2021b}. These feedback loops, often generated by pacing algorithms, cause outcomes from earlier sessions to influence subsequent ones. This paper contributes by creating a mathematical framework to analyze this interference, theoretically estimating its impact, and conducting empirical evaluations of the counterfactual interleaving design in real-world scenarios. Our research shows that feedback loops can result in misleading conclusions about the treatment effects.
△ Less
Submitted 9 February, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery
Authors:
Zhen Qin,
Michael B. Wakin,
Zhihui Zhu
Abstract:
In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those fact…
▽ More
In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which enforces orthonormality among most of the factors. To ensure the orthonormal structure, we utilize the Riemannian gradient descent (RGD) for optimizing those factors over the Stiefel manifold. We first delve into the TT factorization problem and establish the local linear convergence of RGD. Notably, the rate of convergence only experiences a linear decline as the tensor order increases. We then study the sensing problem that aims to recover a TT format tensor from linear measurements. Assuming the sensing operator satisfies the restricted isometry property (RIP), we show that with a proper initialization, which could be obtained through spectral initialization, RGD also converges to the ground-truth tensor at a linear rate. Furthermore, we expand our analysis to encompass scenarios involving Gaussian noise in the measurements. We prove that RGD can reliably recover the ground truth at a linear rate, with the recovery error exhibiting only polynomial growth in relation to the tensor order. We conduct various experiments to validate our theoretical findings.
△ Less
Submitted 21 December, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Global Rank Sum Test: An Efficient Rank-Based Nonparametric Test for Large Scale Online Experiment
Authors:
Zheng Cai,
Bo Hu,
Zhihua Zhu
Abstract:
Online experiments are widely used for improving online services. While doing online experiments, The student t-test is the most widely used hypothesis testing technique. In practice, however, the normality assumption on which the t-test depends on may fail, which resulting in untrustworthy results. In this paper, we first discuss the question of when the t-test fails, and thus introduce the rank-…
▽ More
Online experiments are widely used for improving online services. While doing online experiments, The student t-test is the most widely used hypothesis testing technique. In practice, however, the normality assumption on which the t-test depends on may fail, which resulting in untrustworthy results. In this paper, we first discuss the question of when the t-test fails, and thus introduce the rank-sum test. Next, in order to solve the difficulties while implementing rank-sum test in large online experiment platforms, we proposed a global-rank-sum test method as an improvement for the traditional one. Finally, we demonstrate that the global-rank-sum test is not only more accurate and has higher statistical power than the t-test, but also more time efficient than the traditional rank-sum test, which eventually makes it possible for large online experiment platforms to use.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks
Authors:
Jiayuan Ye,
Zhenyu Zhu,
Fanghui Liu,
Reza Shokri,
Volkan Cevher
Abstract:
We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find…
▽ More
We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training. Notably, for the special setting of linearized network, our analysis indicates that the squared gradient norm (and therefore the escalation of privacy loss) is tied directly to the per-layer variance of the initialization distribution. By using this analysis, we demonstrate that privacy bound improves with increasing depth under certain initializations (LeCun and Xavier), while degrades with increasing depth under other initializations (He and NTK). Our work reveals a complex interplay between privacy and depth that depends on the chosen initialization distribution. We further prove excess empirical risk bounds under a fixed KL privacy budget, and show that the interplay between privacy utility trade-off and depth is similarly affected by the initialization.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
Authors:
Zhenyu Zhu,
Francesco Locatello,
Volkan Cevher
Abstract:
This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery me…
▽ More
This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery method of Rolland et al. [2022], assuming a sufficiently good estimation of the score function. Finally, we analyze the upper bound of score-matching estimation within the score-based generative modeling, which has been applied for causal discovery but is also of independent interest within the domain of generative models.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Offline Reinforcement Learning for Optimizing Production Bidding Policies
Authors:
Dmytro Korenkevych,
Frank Cheng,
Artsiom Balakir,
Alex Nikulkov,
Lingnan Gao,
Zhihao Cen,
Zuobing Xu,
Zheqing Zhu
Abstract:
The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the plat…
▽ More
The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the platform but use advertiser funds to operate, there is a strong practical need to balance reliability and explainability of the agent with optimizing power. We propose a generalizable approach to optimizing bidding policies in production environments by learning from real data using offline reinforcement learning. This approach can be used to optimize any differentiable base policy (practically, a heuristic policy based on principles which the advertiser can easily understand), and only requires data generated by the base policy itself. We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training. We demonstrate that such an architecture achieves statistically significant performance gains in both simulated and at-scale production bidding environments. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes parameters of existing production routines without replacing them with black box-style models like neural networks.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Diffusion on the Probability Simplex
Authors:
Griffin Floto,
Thorsteinn Jonsson,
Mihai Nica,
Scott Sanner,
Eric Zhengyu Zhu
Abstract:
Diffusion models learn to reverse the progressive noising of a data distribution to create a generative model. However, the desired continuous nature of the noising process can be at odds with discrete data. To deal with this tension between continuous and discrete objects, we propose a method of performing diffusion on the probability simplex. Using the probability simplex naturally creates an in…
▽ More
Diffusion models learn to reverse the progressive noising of a data distribution to create a generative model. However, the desired continuous nature of the noising process can be at odds with discrete data. To deal with this tension between continuous and discrete objects, we propose a method of performing diffusion on the probability simplex. Using the probability simplex naturally creates an interpretation where points correspond to categorical probability distributions. Our method uses the softmax function applied to an Ornstein-Unlenbeck Process, a well-known stochastic differential equation. We find that our methodology also naturally extends to include diffusion on the unit cube which has applications for bounded image generation.
△ Less
Submitted 11 September, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
A Data Fusion Method for Quantile Treatment Effects
Authors:
Yijiao Zhang,
Zhongyi Zhu
Abstract:
With the increasing availability of datasets, developing data fusion methods to leverage the strengths of different datasets to draw causal effects is of great practical importance to many scientific fields. In this paper, we consider estimating the quantile treatment effects using small validation data with fully-observed confounders and large auxiliary data with unmeasured confounders. We propos…
▽ More
With the increasing availability of datasets, developing data fusion methods to leverage the strengths of different datasets to draw causal effects is of great practical importance to many scientific fields. In this paper, we consider estimating the quantile treatment effects using small validation data with fully-observed confounders and large auxiliary data with unmeasured confounders. We propose a Fused Quantile Treatment effects Estimator (FQTE) by integrating the information from two datasets based on doubly robust estimating functions. We allow for the misspecification of the models on the dataset with unmeasured confounders. Under mild conditions, we show that the proposed FQTE is asymptotically normal and more efficient than the initial QTE estimator using the validation data solely. By establishing the asymptotic linear forms of related estimators, convenient methods for covariance estimation are provided. Simulation studies demonstrate the empirical validity and improved efficiency of our fused estimators. We illustrate the proposed method with an application.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows
Authors:
Mingxuan Yi,
Zhanxing Zhu,
Song Liu
Abstract:
The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsi…
▽ More
The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework - MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., non-saturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.
△ Less
Submitted 8 August, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Big portfolio selection by graph-based conditional moments method
Authors:
Zhoufan Zhu,
Ningning Zhang,
Ke Zhu
Abstract:
How to do big portfolio selection is very important but challenging for both researchers and practitioners. In this paper, we propose a new graph-based conditional moments (GRACE) method to do portfolio selection based on thousands of stocks or more. The GRACE method first learns the conditional quantiles and mean of stock returns via a factor-augmented temporal graph convolutional network, which…
▽ More
How to do big portfolio selection is very important but challenging for both researchers and practitioners. In this paper, we propose a new graph-based conditional moments (GRACE) method to do portfolio selection based on thousands of stocks or more. The GRACE method first learns the conditional quantiles and mean of stock returns via a factor-augmented temporal graph convolutional network, which guides the learning procedure through a factor-hypergraph built by the set of stock-to-stock relations from the domain knowledge as well as the set of factor-to-stock relations from the asset pricing knowledge. Next, the GRACE method learns the conditional variance, skewness, and kurtosis of stock returns from the learned conditional quantiles by using the quantiled conditional moment (QCM) method. The QCM method is a supervised learning procedure to learn these conditional higher-order moments, so it largely overcomes the computational difficulty from the classical high-dimensional GARCH-type methods. Moreover, the QCM method allows the mis-specification in modeling conditional quantiles to some extent, due to its regression-based nature. Finally, the GRACE method uses the learned conditional mean, variance, skewness, and kurtosis to construct several performance measures, which are criteria to sort the stocks to proceed the portfolio selection in the well-known 10-decile framework. An application to NASDAQ and NYSE stock markets shows that the GRACE method performs much better than its competitors, particularly when the performance measures are comprised of conditional variance, skewness, and kurtosis.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Understanding Best Subset Selection: A Tale of Two C(omplex)ities
Authors:
Saptarshi Roy,
Ambuj Tewari,
Ziwei Zhu
Abstract:
We consider the problem of best subset selection (BSS) under high-dimensional sparse linear regression model. Recently, Guo et al. (2020) showed that the model selection performance of BSS depends on a certain identifiability margin, a measure that captures the model discriminative power of BSS under a general correlation structure that is robust to the design dependence, unlike its computational…
▽ More
We consider the problem of best subset selection (BSS) under high-dimensional sparse linear regression model. Recently, Guo et al. (2020) showed that the model selection performance of BSS depends on a certain identifiability margin, a measure that captures the model discriminative power of BSS under a general correlation structure that is robust to the design dependence, unlike its computational surrogates such as LASSO, SCAD, MCP, etc. Expanding on this, we further broaden the theoretical understanding of best subset selection in this paper and show that the complexities of the residualized signals, the portion of the signals orthogonal to the true active features, and spurious projections, describing the projection operators associated with the irrelevant features, also play fundamental roles in characterizing the margin condition for model consistency of BSS. In particular, we establish both necessary and sufficient margin conditions depending only on the identifiability margin and the two complexity measures. We also partially extend our sufficiency result to the case of high-dimensional sparse generalized linear models (GLMs).
△ Less
Submitted 11 April, 2025; v1 submitted 15 January, 2023;
originally announced January 2023.
-
Fitting mixed logit random regret minimization models using maximum simulated likelihood
Authors:
Ziyue Zhu,
Álvaro A. Gutiérrez-Vargas,
Martina Vandebroek
Abstract:
This article describes the mixrandregret command, which extends the randregret command introduced in Gutiérrez-Vargas et al. (2021, The Stata Journal 21: 626-658) incorporating random coefficients for Random Regret Minimization models. The newly developed command mixrandregret allows the inclusion of random coefficients in the regret function of the classical RRM model introduced in Chorus (2010,…
▽ More
This article describes the mixrandregret command, which extends the randregret command introduced in Gutiérrez-Vargas et al. (2021, The Stata Journal 21: 626-658) incorporating random coefficients for Random Regret Minimization models. The newly developed command mixrandregret allows the inclusion of random coefficients in the regret function of the classical RRM model introduced in Chorus (2010, European Journal of Transport and Infrastructure Research 10: 181-196). The command allows the user to specify a combination of fixed and random coefficients. In addition, the user can specify normal and log-normal distributions for the random coefficients using the commands' options. The models are fitted using simulated maximum likelihood using numerical integration to approximate the choice probabilities.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Understanding and Improving Transfer Learning of Deep Models via Neural Collapse
Authors:
Xiao Li,
Sheng Liu,
Jinxin Zhou,
Xinyu Lu,
Carlos Fernandez-Granda,
Zhihui Zhu,
Qing Qu
Abstract:
With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on…
▽ More
With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.
△ Less
Submitted 18 July, 2024; v1 submitted 23 December, 2022;
originally announced December 2022.
-
Transfer Learning for High-dimensional Quantile Regression via Convolution Smoothing
Authors:
Yijiao Zhang,
Zhongyi Zhu
Abstract:
This paper studies the high-dimensional quantile regression problem under the transfer learning framework, where possibly related source datasets are available to make improvements on the estimation or prediction based solely on the target data. In the oracle case with known transferable sources, a smoothed two-step transfer learning algorithm based on convolution smoothing is proposed and the L1/…
▽ More
This paper studies the high-dimensional quantile regression problem under the transfer learning framework, where possibly related source datasets are available to make improvements on the estimation or prediction based solely on the target data. In the oracle case with known transferable sources, a smoothed two-step transfer learning algorithm based on convolution smoothing is proposed and the L1/L2 estimation error bounds of the corresponding estimator are also established. To avoid including non-informative sources, we propose to select the transferable sources adaptively and establish its selection consistency under regular conditions. Monte Carlo simulations as well as an empirical analysis of gene expression data demonstrate the effectiveness of the proposed procedure.
△ Less
Submitted 1 May, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Are All Losses Created Equal: A Neural Collapse Perspective
Authors:
Jinxin Zhou,
Chong You,
Xiao Li,
Kangning Liu,
Sheng Liu,
Qing Qu,
Zhihui Zhu
Abstract:
While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so o…
▽ More
While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on training data. Based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence.
△ Less
Submitted 8 October, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
A Validation Approach to Over-parameterized Matrix and Image Recovery
Authors:
Lijun Ding,
Zhen Qin,
Liwei Jiang,
Jinxin Zhou,
Zhihui Zhu
Abstract:
This paper studies the problem of recovering a low-rank matrix from several noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a priori and use an objective function built from a rank-overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground trut…
▽ More
This paper studies the problem of recovering a low-rank matrix from several noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a priori and use an objective function built from a rank-overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of the ground-truth matrix rather than scaling with the overspecified matrix rank, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when it is stopped appropriately. We then propose an efficient stopping strategy based on the common hold-out method and show that it detects a nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior, which over-parameterizes an image with a deep network.
△ Less
Submitted 4 October, 2024; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold
Authors:
Can Yaras,
Peng Wang,
Zhihui Zhu,
Laura Balzano,
Qing Qu
Abstract:
When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also…
▽ More
When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon for normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddles with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.
△ Less
Submitted 7 March, 2023; v1 submitted 19 September, 2022;
originally announced September 2022.
-
ReBoot: Distributed statistical learning via refitting bootstrap samples
Authors:
Yumeng Wang,
Ziwei Zhu,
Xuming He
Abstract:
In this paper, we propose a one-shot distributed learning algorithm via refitting bootstrap samples, which we refer to as ReBoot. ReBoot refits a new model to mini-batches of bootstrap samples that are continuously drawn from each of the locally fitted models. It requires only one round of communication of model parameters without much memory. Theoretically, we analyze the statistical error rate o…
▽ More
In this paper, we propose a one-shot distributed learning algorithm via refitting bootstrap samples, which we refer to as ReBoot. ReBoot refits a new model to mini-batches of bootstrap samples that are continuously drawn from each of the locally fitted models. It requires only one round of communication of model parameters without much memory. Theoretically, we analyze the statistical error rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems, respectively. In both cases, ReBoot provably achieves the full-sample statistical rate. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples (i.e., the number of sites), is $O(n ^ {-2})$ in GLM, where $n$ is the subsample size (the sample size of each local site). This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Our simulation study demonstrates the statistical advantage of ReBoot over competing methods. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification. FedReBoot exhibits substantial superiority over Federated Averaging (FedAvg) within early rounds of communication.
△ Less
Submitted 7 May, 2024; v1 submitted 19 July, 2022;
originally announced July 2022.
-
A statistical reconstruction algorithm for positronium lifetime imaging using time-of-flight positron emission tomography
Authors:
Hsin-Hsiung Huang,
Zheyuan Zhu,
Slun Booppasiri,
Zhuo Chen,
Shuo Pang,
Chien-Min Kao
Abstract:
Positron emission tomography (PET) is an important modality for diagnosing diseases such as cancer and Alzheimer's disease, capable of revealing the uptake of radiolabeled molecules that target specific pathological markers of the diseases. Recently, positronium lifetime imaging (PLI) that adds to traditional PET the ability to explore properties of the tissue microenvironment beyond tracer uptake…
▽ More
Positron emission tomography (PET) is an important modality for diagnosing diseases such as cancer and Alzheimer's disease, capable of revealing the uptake of radiolabeled molecules that target specific pathological markers of the diseases. Recently, positronium lifetime imaging (PLI) that adds to traditional PET the ability to explore properties of the tissue microenvironment beyond tracer uptake has been demonstrated with time-of-flight (TOF) PET and the use of non-pure positron emitters. However, achieving accurate reconstruction of lifetime images from data acquired by systems having a finite TOF resolution still presents a challenge. This paper focuses on the two-dimensional PLI, introducing a maximum likelihood estimation (MLE) method that employs an exponentially modified Gaussian (EMG) probability distribution that describes the positronium lifetime data produced by TOF PET. We evaluate the performance of our EMG-based MLE method against \st{traditional} approaches using exponential likelihood functions and penalized surrogate methods. Results from computer-simulated data reveal that the proposed EMG-MLE method can yield quantitatively accurate lifetime images. We also demonstrate that the proposed MLE formulation can be extended to handle PLI data containing multiple positron populations.
△ Less
Submitted 15 January, 2025; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Offline Reinforcement Learning with Causal Structured World Models
Authors:
Zheng-Mao Zhu,
Xiong-Hui Chen,
Hong-Long Tian,
Kun Zhang,
Yang Yu
Abstract:
Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying c…
▽ More
Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error bound. We then propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly. Consequently, it performs better than the plain model-based offline RL algorithms and other causal model-based RL algorithms.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Modeling Ride-Sourcing Matching and Pickup Processes based on Additive Gaussian Process Models
Authors:
Zheng Zhu,
Meng Xu,
Yining Di,
Xiqun Chen,
Jingru Yu
Abstract:
Matching and pickup processes are core features of ride-sourcing services. Previous studies have adopted abundant analytical models to depict the two processes and obtain operational insights; while the goodness of fit between models and data was dismissed. To simultaneously consider the fitness between models and data and analytically tractable formations, we propose a data-driven approach based…
▽ More
Matching and pickup processes are core features of ride-sourcing services. Previous studies have adopted abundant analytical models to depict the two processes and obtain operational insights; while the goodness of fit between models and data was dismissed. To simultaneously consider the fitness between models and data and analytically tractable formations, we propose a data-driven approach based on the additive Gaussian Process Model (AGPM) for ride-sourcing market modeling. The framework is tested based on real-world data collected in Hangzhou, China. We fit analytical models, machine learning models, and AGPMs, in which the number of matches or pickups are used as outputs and spatial, temporal, demand, and supply covariates are utilized as inputs. The results demonstrate the advantages of AGPMs in recovering the two processes in terms of estimation accuracy. Furthermore, we illustrate the modeling power of AGPM by utilizing the trained model to design and estimate idle vehicle relocation strategies.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
A-Optimal Split Questionnaire Designs for Multivariate Continuous Variables
Authors:
Dae-Gyu Jang,
Zhengyuan Zhu,
Cindy Yu
Abstract:
A split questionnaire design (SQD), an alternative to full questionnaires, can reduce the response burden and improve survey quality. One can design a split questionnaire to reduce the information loss from missing data induced by the split questionnaire. This study develops a methodology for finding optimal SQD (OSQD) for multivariate continuous variables, applying a probabilistic design and opti…
▽ More
A split questionnaire design (SQD), an alternative to full questionnaires, can reduce the response burden and improve survey quality. One can design a split questionnaire to reduce the information loss from missing data induced by the split questionnaire. This study develops a methodology for finding optimal SQD (OSQD) for multivariate continuous variables, applying a probabilistic design and optimality criterion approach. Our method employs previous survey data to compute the Fisher information matrix and A-optimality criterion to find OSQD for the current survey study. We derive theoretical findings on the relationship between the correlation structure and OSQD and the robustness of local OSQD. We conduct simulation studies to compare local and two global OSQDs; mini-max OSQD and Bayes OSQD) to baselines. We also apply our method to the 2016 Pet Demographic Survey (PDS) data. In both simulation studies and the real data application, local and global OSQDs outperform the baselines.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features
Authors:
Jinxin Zhou,
Xiao Li,
Tianyu Ding,
Chong You,
Qing Qu,
Zhihui Zhu
Abstract:
When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero…
▽ More
When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. This phenomenon is called Neural Collapse (NC), which seems to take place regardless of the choice of loss functions. In this work, we justify NC under the mean squared error (MSE) loss, where recent empirical evidence shows that it performs comparably or even better than the de-facto cross-entropy loss. Under a simplified unconstrained feature model, we provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. Furthermore, we justify the usage of rescaled MSE loss by probing the optimization landscape around the NC solutions, showing that the landscape can be improved by tuning the rescaling hyperparameters. Finally, our theoretical findings are experimentally verified on practical network architectures.
△ Less
Submitted 12 March, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Robust Training under Label Noise by Over-parameterization
Authors:
Sheng Liu,
Zhihui Zhu,
Qing Qu,
Chong You
Abstract:
Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized d…
▽ More
Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.
△ Less
Submitted 2 August, 2022; v1 submitted 28 February, 2022;
originally announced February 2022.
-
Supervised Homogeneity Fusion: a Combinatorial Approach
Authors:
Wen Wang,
Shihao Wu,
Ziwei Zhu,
Ling Zhou,
Peter X. -K. Song
Abstract:
Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called $L_0$-Fusion that is amenable to mixed integer optimization (MIO). On the…
▽ More
Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called $L_0$-Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called grouping sensitivity that underpins the difficulty of recovering the true groups. We show that $L_0$-Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply $L_0$-Fusion coupled with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide a MIO formulation for $L_0$-Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that $L_0$-Fusion exhibits superiority over its competitors in terms of grouping accuracy.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations
Authors:
Jiaheng Wei,
Zhaowei Zhu,
Hao Cheng,
Tongliang Liu,
Gang Niu,
Yang Liu
Abstract:
Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic noise, though has clean structures which greatly enabled statistical analyses, often fails to model real-world noise patterns. The recent literature has observed several efforts to offer real-world noisy datasets, yet the existing efforts suffer from two caveats: (1) The lack of ground-truth verifica…
▽ More
Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic noise, though has clean structures which greatly enabled statistical analyses, often fails to model real-world noise patterns. The recent literature has observed several efforts to offer real-world noisy datasets, yet the existing efforts suffer from two caveats: (1) The lack of ground-truth verification makes it hard to theoretically study the property and treatment of real-world label noise; (2) These efforts are often of large scales, which may result in unfair comparisons of robust methods within reasonable and accessible computation power. To better understand real-world label noise, it is crucial to build controllable and moderate-sized real-world noisy datasets with both ground-truth and noisy labels. This work presents two new benchmark datasets CIFAR-10N, CIFAR-100N, equipping the training datasets of CIFAR-10, CIFAR-100 with human-annotated real-world noisy labels we collected from Amazon Mechanical Turk. We quantitatively and qualitatively show that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise). We then initiate an effort to benchmarking a subset of the existing solutions using CIFAR-10N and CIFAR-100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and class-dependent synthetic noise. We show indeed the real-world noise patterns impose new and outstanding challenges as compared to synthetic label noise. These observations require us to rethink the treatment of noisy labels, and we hope the availability of these two datasets would facilitate the development and evaluation of future learning with noisy label solutions. Datasets and leaderboards are available at http://noisylabels.com.
△ Less
Submitted 27 March, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning
Authors:
Zhaowei Zhu,
Tianyi Luo,
Yang Liu
Abstract:
Semi-supervised learning (SSL) has demonstrated its potential to improve the model accuracy for a variety of learning tasks when the high-quality supervised data is severely limited. Although it is often established that the average accuracy for the entire population of data is improved, it is unclear how SSL fares with different sub-populations. Understanding the above question has substantial fa…
▽ More
Semi-supervised learning (SSL) has demonstrated its potential to improve the model accuracy for a variety of learning tasks when the high-quality supervised data is severely limited. Although it is often established that the average accuracy for the entire population of data is improved, it is unclear how SSL fares with different sub-populations. Understanding the above question has substantial fairness implications when different sub-populations are defined by the demographic groups that we aim to treat fairly. In this paper, we reveal the disparate impacts of deploying SSL: the sub-population who has a higher baseline accuracy without using SSL (the "rich" one) tends to benefit more from SSL; while the sub-population who suffers from a low baseline accuracy (the "poor" one) might even observe a performance drop after adding the SSL module. We theoretically and empirically establish the above observation for a broad family of SSL algorithms, which either explicitly or implicitly use an auxiliary "pseudo-label". Experiments on a set of image and text classification tasks confirm our claims. We introduce a new metric, Benefit Ratio, and promote the evaluation of the fairness of SSL (Equalized Benefit Ratio). We further discuss how the disparate impact can be mitigated. We hope our paper will alarm the potential pitfall of using SSL and encourage a multifaceted evaluation of future SSL algorithms.
△ Less
Submitted 31 August, 2023; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Empirical likelihood inference for longitudinal data with covariate measurement errors: An application to the LEAN study
Authors:
Yuexia Zhang,
Guoyou Qin,
Zhongyi Zhu,
Jiajia Zhang
Abstract:
Measurement errors usually arise during the longitudinal data collection process. Ignoring the effects of measurement errors will lead to invalid estimates. The Lifestyle Education for Activity and Nutrition (LEAN) study was designed to assess the effectiveness of intervention for enhancing weight loss over nine months. The covariates systolic blood pressure (SBP) and diastolic blood pressure (DBP…
▽ More
Measurement errors usually arise during the longitudinal data collection process. Ignoring the effects of measurement errors will lead to invalid estimates. The Lifestyle Education for Activity and Nutrition (LEAN) study was designed to assess the effectiveness of intervention for enhancing weight loss over nine months. The covariates systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured at baseline, month 4, and month 9. At each assessment time, there were two replicate measurements for SBP and DBP. The replicate measurement errors of SBP follow different distributions, as does DBP. To account for the distributional difference of replicate measurement errors, a new method for analyzing longitudinal data with replicate covariate measurement errors is developed based on the empirical likelihood method. The asymptotic properties of the proposed estimator are established under some regularity conditions. The confidence region for the parameters of interest can be constructed based on the chi-squared approximation without estimating the covariance matrix. Additionally, the proposed empirical likelihood estimator is asymptotically more efficient than the estimator of Lin et al. (2018). Extensive simulations demonstrate that the proposed method can eliminate the effects of measurement errors in the covariate and has a high estimation efficiency. The proposed method indicates the significant effect of the intervention on BMI in the LEAN study.
△ Less
Submitted 2 July, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Volatility prediction comparison via robust volatility proxies: An empirical deviation perspective
Authors:
Weichen Wang,
Ran An,
Ziwei Zhu
Abstract:
Volatility forecasting is crucial to risk management and portfolio construction. One particular challenge of assessing volatility forecasts is how to construct a robust proxy for the unknown true volatility. In this work, we show that the empirical loss comparison between two volatility predictors hinges on the deviation of the volatility proxy from the true volatility. We then establish non-asymp…
▽ More
Volatility forecasting is crucial to risk management and portfolio construction. One particular challenge of assessing volatility forecasts is how to construct a robust proxy for the unknown true volatility. In this work, we show that the empirical loss comparison between two volatility predictors hinges on the deviation of the volatility proxy from the true volatility. We then establish non-asymptotic deviation bounds for three robust volatility proxies, two of which are based on clipped data, and the third of which is based on exponentially weighted Huber loss minimization. In particular, in order for the Huber approach to adapt to non-stationary financial returns, we propose to solve a tuning-free weighted Huber loss minimization problem to jointly estimate the volatility and the optimal robustification parameter at each time point. We then inflate this robustification parameter and use it to update the volatility proxy to achieve optimal balance between the bias and variance of the global empirical loss. We also extend this Huber method to construct volatility predictors. Finally, we exploit the proposed robust volatility proxy to compare different volatility predictors on the Bitcoin market data. It turns out that when the sample size is limited, applying the robust volatility proxy gives more consistent and stable evaluation of volatility forecasts.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery
Authors:
Lijun Ding,
Liwei Jiang,
Yudong Chen,
Qing Qu,
Zhihui Zhu
Abstract:
We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust $\ell_1$ loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associat…
▽ More
We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust $\ell_1$ loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associated nonconvex nonsmooth problem using a subgradient method with diminishing stepsizes. We show that under a regularity condition on the sensing matrices and corruption, which we call restricted direction preserving property (RDPP), even with rank overspecified, the subgradient method converges to the exact low-rank solution at a sublinear rate. Moreover, our result is more general in the sense that it automatically speeds up to a linear rate once the factor rank matches the unknown rank. On the other hand, we show that the RDPP condition holds under generic settings, such as Gaussian measurements under independent or adversarial sparse corruptions, where the result could be of independent interest. Both the exact recovery and the convergence rate of the proposed subgradient method are numerically verified in the overspecified regime. Moreover, our experiment further shows that our particular design of diminishing stepsize effectively prevents overfitting for robust recovery under overparameterized models, such as robust matrix sensing and learning robust deep image prior. This regularization effect is worth further investigation.
△ Less
Submitted 26 October, 2021; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Weak signal identification and inference in penalized likelihood models for categorical responses
Authors:
Yuexia Zhang,
Peibei Shi,
Zhongyi Zhu,
Linbo Wang,
Annie Qu
Abstract:
Penalized likelihood models are widely used to simultaneously select variables and estimate model parameters. However, the existence of weak signals can lead to inaccurate variable selection, biased parameter estimation, and invalid inference. Thus, identifying weak signals accurately and making valid inferences are crucial in penalized likelihood models. We develop a unified approach to identify…
▽ More
Penalized likelihood models are widely used to simultaneously select variables and estimate model parameters. However, the existence of weak signals can lead to inaccurate variable selection, biased parameter estimation, and invalid inference. Thus, identifying weak signals accurately and making valid inferences are crucial in penalized likelihood models. We develop a unified approach to identify weak signals and make inferences in penalized likelihood models, including the special case when the responses are categorical. To identify weak signals, we use the estimated selection probability of each covariate as a measure of the signal strength and formulate a signal identification criterion. To construct confidence intervals, we propose a two-step inference procedure. Extensive simulation studies show that the proposed procedure outperforms several existing methods. We illustrate the proposed method by applying it to the Practice Fusion diabetes data set.
△ Less
Submitted 11 December, 2022; v1 submitted 17 August, 2021;
originally announced August 2021.
-
On sure early selection of the best subset
Authors:
Ziwei Zhu,
Shihao Wu
Abstract:
The early solution path, which tracks the first few variables that enter the model of a selection procedure, is of profound importance to scientific discoveries. In practice, it is often statistically hopeless to identify all the important features with no false discovery, let alone the intimidating expense of experiments to test their significance. Such realistic limitation calls for statistical…
▽ More
The early solution path, which tracks the first few variables that enter the model of a selection procedure, is of profound importance to scientific discoveries. In practice, it is often statistically hopeless to identify all the important features with no false discovery, let alone the intimidating expense of experiments to test their significance. Such realistic limitation calls for statistical guarantee for the early discoveries of a model selector. In this paper, we focus on the early solution path of best subset selection (BSS), where the sparsity constraint is set to be lower than {the true sparsity}. Under a sparse high-dimensional linear model, we establish the sufficient and (near) necessary condition for BSS to achieve sure early selection, or equivalently, zero false discovery throughout its early path. Essentially, this condition boils down to a lower bound of the minimum projected signal margin that characterizes the gap of the captured signal strength between sure selection models and those with spurious discoveries. Defined through projection operators, this margin is independent of the restricted eigenvalues of the design, suggesting the robustness of BSS against collinearity. Moreover, our model selection guarantee tolerates reasonable optimization error and thus applies to near best subsets. Finally, to overcome the computational hurdle of BSS under high dimension, we propose the "screen then select" (STS) strategy to reduce dimension for BSS. Our numerical experiments show that the resulting early path exhibits much lower false discovery rate (FDR) than LASSO, MCP and SCAD, especially in the presence of highly correlated design. We also investigate the early paths of the iterative hard thresholding algorithms, which are greedy computational surrogates for BSS, and which yield comparable FDR as our STS procedure.
△ Less
Submitted 17 November, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.