-
Truncated Cauchy Combination Test: a Robust and Powerful P-value Combination Method with Arbitrary Correlations
Authors:
Bo Chen,
Wei Xu,
Xin Gao
Abstract:
Cauchy combination test has been widely used for combining correlated p-values, but it may fail to work under certain scenarios. We propose a truncated Cauchy combination test (TCCT) which focus on combining p-values with arbitrary correlations, and demonstrate that our proposed test solves the limitations of Cauchy combination test and always has higher power. We prove that the tail probability o…
▽ More
Cauchy combination test has been widely used for combining correlated p-values, but it may fail to work under certain scenarios. We propose a truncated Cauchy combination test (TCCT) which focus on combining p-values with arbitrary correlations, and demonstrate that our proposed test solves the limitations of Cauchy combination test and always has higher power. We prove that the tail probability of our test statistic is asymptotically Cauchy distributed, so it is computationally effective to achieve the combined p-value using our proposed TCCT. We show by simulation that our proposed test has accurate type I error rates, and maintain high power when Cauchy combination test fails to work. We finally perform application studies to illustrate the usefulness of our proposed test on GWAS and microbiome sequencing data.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling
Authors:
Yichuan Cao,
Yibo Miao,
Xiao-Shan Gao,
Yinpeng Dong
Abstract:
Text-to-image (T2I) models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models' security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with closed-source models. Moreover, existing black-box methods often assume knowledge about the model's specifi…
▽ More
Text-to-image (T2I) models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models' security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with closed-source models. Moreover, existing black-box methods often assume knowledge about the model's specific defense mechanisms, limiting their utility in real-world commercial API scenarios. A significant challenge is how to evade unknown and diverse defense mechanisms. To overcome this difficulty, we propose a novel Rule-based Preference modeling Guided Red-Teaming (RPG-RT), which iteratively employs LLM to modify prompts to query and leverages feedback from T2I systems for fine-tuning the LLM. RPG-RT treats the feedback from each iteration as a prior, enabling the LLM to dynamically adapt to unknown defense mechanisms. Given that the feedback is often labeled and coarse-grained, making it difficult to utilize directly, we further propose rule-based preference modeling, which employs a set of rules to evaluate desired or undesired feedback, facilitating finer-grained control over the LLM's dynamic adaptation process. Extensive experiments on nineteen T2I systems with varied safety mechanisms, three online commercial API services, and T2V models verify the superiority and practicality of our approach.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Pure Component Property Estimation Framework Using Explainable Machine Learning Methods
Authors:
Jianfeng Jiao,
Xi Gao,
Jie Li
Abstract:
Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modeling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding re…
▽ More
Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modeling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding relationships to automatically generate features. The supervised machine learning model random forest is applied for feature ranking and pooling. The adjusted R2 is introduced to penalize the inclusion of additional features, providing an assessment of the true contribution of features. The prediction results for normal boiling point (Tb), liquid molar volume, critical temperature (Tc) and critical pressure (Pc) obtained using Artificial Neural Network and Gaussian Process Regression models confirm the accuracy of the molecular representation method. Comparison with GC based models shows that the root-mean-square error on the test set can be reduced by up to 83.8%. To enhance the interpretability of the model, a feature analysis method based on Shapley values is employed to determine the contribution of each feature to the property predictions. The results indicate that using the feature pooling method reduces the number of features from 13316 to 100 without compromising model accuracy. The feature analysis results for Tb, Tc, and Pc confirms that different molecular properties are influenced by different structural features, aligning with mechanistic interpretations. In conclusion, the proposed framework is demonstrated to be feasible and provides a solid foundation for mixture component reconstruction and process integration modelling.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
High-Dimensional Covariate-Dependent Gaussian Graphical Models
Authors:
Jiacheng Wang,
Xin Gao
Abstract:
Motivated by dynamic biologic network analysis, we propose a covariate-dependent Gaussian graphical model (cdexGGM) for capturing network structure that varies with covariates through a novel parameterization. Utilizing a likelihood framework, our methodology jointly estimates all dynamic edge and vertex parameters. We further develop statistical inference procedures to test the dynamic nature of…
▽ More
Motivated by dynamic biologic network analysis, we propose a covariate-dependent Gaussian graphical model (cdexGGM) for capturing network structure that varies with covariates through a novel parameterization. Utilizing a likelihood framework, our methodology jointly estimates all dynamic edge and vertex parameters. We further develop statistical inference procedures to test the dynamic nature of the underlying network. Concerning large-scale networks, we perform composite likelihood estimation with an $\ell_1$ penalty to discover sparse dynamic network structures. We establish the estimation error bound in $\ell_2$ norm and validate the sign consistency in the high-dimensional context. We apply our method to an influenza vaccine data set to model the dynamic gene network that evolves with time. We also investigate a Down syndrome data set to model the dynamic protein network which varies under a factorial experimental design. These applications demonstrate the applicability and effectiveness of the proposed model. The supplemental materials for this article are available online.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Enhancing Masked Time-Series Modeling via Dropping Patches
Authors:
Tianyu Qiu,
Yi Xie,
Yun Xiong,
Hao Niu,
Xiaofeng Gao
Abstract:
This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain,…
▽ More
This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start. This paper conducts comprehensive experiments to verify the effectiveness of the method and analyze its internal mechanism. Empirically, DropPatch strengthens the attention mechanism, reduces information redundancy and serves as an efficient means of data augmentation. Theoretically, it is proved that DropPatch slows down the rate at which the Transformer representations collapse into the rank-1 linear subspace by randomly dropping patches, thus optimizing the quality of the learned representations
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Preference Discerning with LLM-Enhanced Generative Retrieval
Authors:
Fabian Paischer,
Liu Yang,
Linfeng Liu,
Shuai Shao,
Kaveh Hassani,
Jiacheng Li,
Ricky Chen,
Zhang Gabriel Li,
Xialo Gao,
Wei Shao,
Xue Feng,
Nima Noorshams,
Sem Park,
Bo Long,
Hamid Eghbalzadeh
Abstract:
Sequential recommendation systems aim to provide personalized recommendations for users based on their interaction history. To achieve this, they often incorporate auxiliary information, such as textual descriptions of items and auxiliary tasks, like predicting user preferences and intent. Despite numerous efforts to enhance these models, they still suffer from limited personalization. To address…
▽ More
Sequential recommendation systems aim to provide personalized recommendations for users based on their interaction history. To achieve this, they often incorporate auxiliary information, such as textual descriptions of items and auxiliary tasks, like predicting user preferences and intent. Despite numerous efforts to enhance these models, they still suffer from limited personalization. To address this issue, we propose a new paradigm, which we term preference discerning. In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context. To this end, we generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data. To evaluate preference discerning capabilities of sequential recommendation systems, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. We assess current state-of-the-art methods using our benchmark and show that they struggle to accurately discern user preferences. Therefore, we propose a new method named Mender ($\textbf{M}$ultimodal Prefer$\textbf{en}$ce $\textbf{d}$iscern$\textbf{er}$), which improves upon existing methods and achieves state-of-the-art performance on our benchmark. Our results show that Mender can be effectively guided by human preferences even though they have not been observed during training, paving the way toward more personalized sequential recommendation systems. We will open-source the code and benchmarks upon publication.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
An Efficient and Generalizable Symbolic Regression Method for Time Series Analysis
Authors:
Yi Xie,
Tianyu Qiu,
Yun Xiong,
Xiuqi Huang,
Xiaofeng Gao,
Chao Chen
Abstract:
Time series analysis and prediction methods currently excel in quantitative analysis, offering accurate future predictions and diverse statistical indicators, but generally falling short in elucidating the underlying evolution patterns of time series. To gain a more comprehensive understanding and provide insightful explanations, we utilize symbolic regression techniques to derive explicit express…
▽ More
Time series analysis and prediction methods currently excel in quantitative analysis, offering accurate future predictions and diverse statistical indicators, but generally falling short in elucidating the underlying evolution patterns of time series. To gain a more comprehensive understanding and provide insightful explanations, we utilize symbolic regression techniques to derive explicit expressions for the non-linear dynamics in the evolution of time series variables. However, these techniques face challenges in computational efficiency and generalizability across diverse real-world time series data. To overcome these challenges, we propose \textbf{N}eural-\textbf{E}nhanced \textbf{Mo}nte-Carlo \textbf{T}ree \textbf{S}earch (NEMoTS) for time series. NEMoTS leverages the exploration-exploitation balance of Monte-Carlo Tree Search (MCTS), significantly reducing the search space in symbolic regression and improving expression quality. Furthermore, by integrating neural networks with MCTS, NEMoTS not only capitalizes on their superior fitting capabilities to concentrate on more pertinent operations post-search space reduction, but also replaces the complex and time-consuming simulation process, thereby substantially improving computational efficiency and generalizability in time series analysis. NEMoTS offers an efficient and comprehensive approach to time series analysis. Experiments with three real-world datasets demonstrate NEMoTS's significant superiority in performance, efficiency, reliability, and interpretability, making it well-suited for large-scale real-world time series data.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Authors:
Shuyu Cheng,
Yibo Miao,
Yinpeng Dong,
Xiao Yang,
Xiao-Shan Gao,
Jun Zhu
Abstract:
This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradie…
▽ More
This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model's loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously
Authors:
Yihan Wang,
Yifan Zhu,
Xiao-Shan Gao
Abstract:
Availability attacks can prevent the unauthorized use of private data and commercial datasets by generating imperceptible noise and making unlearnable examples before release. Ideally, the obtained unlearnability prevents algorithms from training usable models. When supervised learning (SL) algorithms have failed, a malicious data collector possibly resorts to contrastive learning (CL) algorithms…
▽ More
Availability attacks can prevent the unauthorized use of private data and commercial datasets by generating imperceptible noise and making unlearnable examples before release. Ideally, the obtained unlearnability prevents algorithms from training usable models. When supervised learning (SL) algorithms have failed, a malicious data collector possibly resorts to contrastive learning (CL) algorithms to bypass the protection. Through evaluation, we have found that most of the existing methods are unable to achieve both supervised and contrastive unlearnability, which poses risks to data protection. Different from recent methods based on contrastive error minimization, we employ contrastive-like data augmentations in supervised error minimization or maximization frameworks to obtain attacks effective for both SL and CL. Our proposed AUE and AAP attacks achieve state-of-the-art worst-case unlearnability across SL and CL algorithms with less computation consumption, showcasing prospects in real-world applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Convergence Analysis for General Probability Flow ODEs of Diffusion Models in Wasserstein Distances
Authors:
Xuefeng Gao,
Lingjiong Zhu
Abstract:
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the f…
▽ More
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distributions. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers. Our proof technique relies on spelling out explicitly the contraction rate for the continuous-time ODE and analyzing the discretization and score-matching errors using synchronous coupling; the challenge in our analysis mainly arises from the inherent non-autonomy of the probability flow ODE and the specific exponential integrator that we study.
△ Less
Submitted 15 February, 2025; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Game-Theoretic Unlearnable Example Generator
Authors:
Shuang Liu,
Yihan Wang,
Xiao-Shan Gao
Abstract:
Unlearnable example attacks are data poisoning attacks aiming to degrade the clean test accuracy of deep learning by adding imperceptible perturbations to the training samples, which can be formulated as a bi-level optimization problem. However, directly solving this optimization problem is intractable for deep neural networks. In this paper, we investigate unlearnable example attacks from a game-…
▽ More
Unlearnable example attacks are data poisoning attacks aiming to degrade the clean test accuracy of deep learning by adding imperceptible perturbations to the training samples, which can be formulated as a bi-level optimization problem. However, directly solving this optimization problem is intractable for deep neural networks. In this paper, we investigate unlearnable example attacks from a game-theoretic perspective, by formulating the attack as a nonzero sum Stackelberg game. First, the existence of game equilibria is proved under the normal setting and the adversarial training setting. It is shown that the game equilibrium gives the most powerful poison attack in that the victim has the lowest test accuracy among all networks within the same hypothesis space, when certain loss functions are used. Second, we propose a novel attack method, called the Game Unlearnable Example (GUE), which has three main gradients. (1) The poisons are obtained by directly solving the equilibrium of the Stackelberg game with a first-order algorithm. (2) We employ an autoencoder-like generative network model as the poison attacker. (3) A novel payoff function is introduced to evaluate the performance of the poison. Comprehensive experiments demonstrate that GUE can effectively poison the model in various scenarios. Furthermore, the GUE still works by using a relatively small percentage of the training data to train the generator, and the poison generator can generalize to unseen data well. Our implementation code can be found at https://github.com/hong-xian/gue.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models
Authors:
Xuefeng Gao,
Hoang M. Nguyen,
Lingjiong Zhu
Abstract:
Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forwar…
▽ More
Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forward processes modelled by stochastic differential equations, and obtain an upper bound on the iteration complexity for each model, which demonstrates the impacts of different choices of the forward processes. We also provide a lower bound when the data distribution is Gaussian. Numerically, we experiment SGMs with different forward processes, some of which are newly proposed in this paper, for unconditional image generation on CIFAR-10. We find that the experimental results are in good agreement with our theoretical predictions on the iteration complexity, and the models with our newly proposed forward processes can outperform existing models.
△ Less
Submitted 15 February, 2025; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Structured Neural Networks for Density Estimation and Causal Inference
Authors:
Asic Q. Chen,
Ruian Shi,
Xiang Gao,
Ricardo Baptista,
Rahul G. Krishnan
Abstract:
Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structur…
▽ More
Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
A Note on Location Parameter Estimation using the Weighted Hodges-Lehmann Estimator
Authors:
Xuehong Gao,
Zhijin Chen,
Bosung Kim,
Chanseok Park
Abstract:
Robust design is one of the main tools employed by engineers for the facilitation of the design of high-quality processes. However, most real-world processes invariably contend with external uncontrollable factors, often denoted as outliers or contaminated data, which exert a substantial distorting effect upon the computed sample mean. In pursuit of mitigating the inherent bias entailed by outlier…
▽ More
Robust design is one of the main tools employed by engineers for the facilitation of the design of high-quality processes. However, most real-world processes invariably contend with external uncontrollable factors, often denoted as outliers or contaminated data, which exert a substantial distorting effect upon the computed sample mean. In pursuit of mitigating the inherent bias entailed by outliers within the dataset, the concept of weight adjustment emerges as a prudent recourse, to make the sample more representative of the statistical population. In this sense, the intricate challenge lies in the judicious application of these diverse weights toward the estimation of an alternative to the robust location estimator. Different from the previous studies, this study proposes two categories of new weighted Hodges-Lehmann (WHL) estimators that incorporate weight factors in the location parameter estimation. To evaluate their robust performances in estimating the location parameter, this study constructs a set of comprehensive simulations to compare various location estimators including mean, weighted mean, weighted median, Hodges-Lehmann estimator, and the proposed WHL estimators. The findings unequivocally manifest that the proposed WHL estimators clearly outperform the traditional methods in terms of their breakdown points, biases, and relative efficiencies.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference
Authors:
Haixing Dai,
Mengxuan Hu,
Qing Li,
Lu Zhang,
Lin Zhao,
Dajiang Zhu,
Ibai Diez,
Jorge Sepulcre,
Fan Zhang,
Xingyu Gao,
Manhua Liu,
Quanzheng Li,
Sheng Li,
Tianming Liu,
Xiang Li
Abstract:
Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet…
▽ More
Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-beta accumulation and AD pathophysiology remains unclear, and causal inference approaches are needed to uncover how amyloid-beta levels can impact AD development. In this paper, we propose a graph varying coefficient neural network (GVCNet) for estimating the individual treatment effect with continuous treatment levels using a graph convolutional neural network. We highlight the potential of causal inference approaches, including GVCNet, for measuring the regional causal connections between amyloid-beta accumulation and AD pathophysiology, which may serve as a robust tool for early diagnosis and tailored care.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Uncertainty Quantification via Spatial-Temporal Tweedie Model for Zero-inflated and Long-tail Travel Demand Prediction
Authors:
Xinke Jiang,
Dingyi Zhuang,
Xianghui Zhang,
Hao Chen,
Jiayuan Luo,
Xiaowei Gao
Abstract:
Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which c…
▽ More
Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
△ Less
Submitted 30 January, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Functional Causal Inference with Time-to-Event Data
Authors:
Xiyuan Gao,
Jiayi Wang,
Guanyu Hu,
Jianguo Sun
Abstract:
Functional data is a powerful tool for capturing and analyzing complex patterns and relationships in a variety of fields, allowing for more precise modeling, visualization, and decision-making. For example, in healthcare, functional data such as medical images can help doctors make more accurate diagnoses and develop more effective treatment plans. However, understanding the causal relationships b…
▽ More
Functional data is a powerful tool for capturing and analyzing complex patterns and relationships in a variety of fields, allowing for more precise modeling, visualization, and decision-making. For example, in healthcare, functional data such as medical images can help doctors make more accurate diagnoses and develop more effective treatment plans. However, understanding the causal relationships between functional predictors and time-to-event outcomes remains a challenge. To address this, we propose a functional causal framework including a functional accelerated failure time (FAFT) model and three causal approaches. The regression adjustment approach is based on conditional FAFT with subsequent confounding marginalization, while the functional-inverse-probability-weighting approach is based on marginal FAFT with well-defined functional propensity scores. The double robust approach combines the strengths of both methods and achieves a balance condition through the weighted residuals between imputed observations and regression adjustment outcomes. Our approach can accurately estimate causality, predict outcomes, and is robust to different censoring rates. We demonstrate the power of our framework with simulations and real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. Our findings provide more precise subregions of the hippocampus that align with medical research, highlighting the power of this work for improving healthcare outcomes.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Isometric 3D Adversarial Examples in the Physical World
Authors:
Yibo Miao,
Yinpeng Dong,
Jun Zhu,
Xiao-Shan Gao
Abstract:
3D deep learning models are shown to be as vulnerable to adversarial examples as 2D models. However, existing attack methods are still far from stealthy and suffer from severe performance degradation in the physical world. Although 3D data is highly structured, it is difficult to bound the perturbations with simple metrics in the Euclidean space. In this paper, we propose a novel $ε$-isometric (…
▽ More
3D deep learning models are shown to be as vulnerable to adversarial examples as 2D models. However, existing attack methods are still far from stealthy and suffer from severe performance degradation in the physical world. Although 3D data is highly structured, it is difficult to bound the perturbations with simple metrics in the Euclidean space. In this paper, we propose a novel $ε$-isometric ($ε$-ISO) attack to generate natural and robust 3D adversarial examples in the physical world by considering the geometric properties of 3D objects and the invariance to physical transformations. For naturalness, we constrain the adversarial example to be $ε$-isometric to the original one by adopting the Gaussian curvature as a surrogate metric guaranteed by a theoretical analysis. For invariance to physical transformations, we propose a maxima over transformation (MaxOT) method that actively searches for the most harmful transformations rather than random ones to make the generated adversarial example more robust in the physical world. Experiments on typical point cloud recognition models validate that our approach can significantly improve the attack success rate and naturalness of the generated 3D adversarial examples than the state-of-the-art attack methods.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Many-body localized hidden generative models
Authors:
Weishun Zhong,
Xun Gao,
Susanne F. Yelin,
Khadijeh Najafi
Abstract:
Born machines are quantum-inspired generative models that leverage the probabilistic nature of quantum states. Here, we present a new architecture called many-body localized (MBL) hidden Born machine that utilizes both MBL dynamics and hidden units as learning resources. We show that the hidden units act as an effective thermal bath that enhances the trainability of the system, while the MBL dynam…
▽ More
Born machines are quantum-inspired generative models that leverage the probabilistic nature of quantum states. Here, we present a new architecture called many-body localized (MBL) hidden Born machine that utilizes both MBL dynamics and hidden units as learning resources. We show that the hidden units act as an effective thermal bath that enhances the trainability of the system, while the MBL dynamics stabilize the training trajectories. We numerically demonstrate that the MBL hidden Born machine is capable of learning a variety of tasks, including a toy version of MNIST handwritten digits, quantum data obtained from quantum many-body states, and non-local parity data. Our architecture and algorithm provide novel strategies of utilizing quantum many-body systems as learning resources, and reveal a powerful connection between disorder, interaction, and learning in quantum many-body systems.
△ Less
Submitted 28 December, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Authors:
Xuefeng Gao,
Xun Yu Zhou
Abstract:
We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret low…
▽ More
We consider reinforcement learning for continuous-time Markov decision processes (MDPs) in the infinite-horizon, average-reward setting. In contrast to discrete-time MDPs, a continuous-time process moves to a state and stays there for a random holding time after an action is taken. With unknown transition probabilities and rates of exponential holding times, we derive instance-dependent regret lower bounds that are logarithmic in the time horizon. Moreover, we design a learning algorithm and establish a finite-time regret bound that achieves the logarithmic growth rate. Our analysis builds upon upper confidence reinforcement learning, a delicate estimation of the mean holding times, and stochastic comparison of point processes.
△ Less
Submitted 2 July, 2024; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Robust and Information-theoretically Safe Bias Classifier against Adversarial Attacks
Authors:
Lijia Yu,
Xiao-Shan Gao
Abstract:
In this paper, the bias classifier is introduced, that is, the bias part of a DNN with Relu as the activation function is used as a classifier. The work is motivated by the fact that the bias part is a piecewise constant function with zero gradient and hence cannot be directly attacked by gradient-based methods to generate adversaries, such as FGSM. The existence of the bias classifier is proved a…
▽ More
In this paper, the bias classifier is introduced, that is, the bias part of a DNN with Relu as the activation function is used as a classifier. The work is motivated by the fact that the bias part is a piecewise constant function with zero gradient and hence cannot be directly attacked by gradient-based methods to generate adversaries, such as FGSM. The existence of the bias classifier is proved and an effective training method for the bias classifier is given. It is proved that by adding a proper random first-degree part to the bias classifier, an information-theoretically safe classifier against the original-model gradient attack is obtained in the sense that the attack will generate a totally random attacking direction. This seems to be the first time that the concept of information-theoretically safe classifier is proposed. Several attack methods for the bias classifier are proposed and numerical experiments are used to show that the bias classifier is more robust than DNNs with similar size against these attacks in most cases.
△ Less
Submitted 14 February, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
A Robust Classification-autoencoder to Defend Outliers and Adversaries
Authors:
Lijia Yu,
Xiao-Shan Gao
Abstract:
In this paper, a robust classification-autoencoder (CAE) is proposed, which has strong ability to recognize outliers and defend adversaries. The main idea is to change the autoencoder from an unsupervised learning model into a classifier, where the encoder is used to compress samples with different labels into disjoint compression spaces and the decoder is used to recover samples from their compre…
▽ More
In this paper, a robust classification-autoencoder (CAE) is proposed, which has strong ability to recognize outliers and defend adversaries. The main idea is to change the autoencoder from an unsupervised learning model into a classifier, where the encoder is used to compress samples with different labels into disjoint compression spaces and the decoder is used to recover samples from their compression spaces. The encoder is used both as a compressed feature learner and as a classifier, and the decoder is used to decide whether the classification given by the encoder is correct by comparing the input sample with the output. Since adversary samples are seemingly inevitable for the current DNN framework, the list classifier to defend adversaries is introduced based on CAE, which outputs several labels and the corresponding samples recovered by the CAE. Extensive experimental results are used to show that the CAE achieves state of the art to recognize outliers by finding almost all outliers; the list classifier gives near lossless classification in the sense that the output list contains the correct label for almost all adversaries and the size of the output list is reasonably small.
△ Less
Submitted 7 June, 2022; v1 submitted 30 June, 2021;
originally announced June 2021.
-
Message Passing in Graph Convolution Networks via Adaptive Filter Banks
Authors:
Xing Gao,
Wenrui Dai,
Chenglin Li,
Junni Zou,
Hongkai Xiong,
Pascal Frossard
Abstract:
Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolut…
▽ More
Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolution operator, termed BankGCN, which keeps benefits of message passing models, but extends their capabilities beyond `low-pass' features. It decomposes multi-channel signals on graphs into subspaces and handles particular information in each subspace with an adapted filter. The filters of all subspaces have different frequency responses and together form a filter bank. Furthermore, each filter in the spectral domain corresponds to a message passing scheme, and diverse schemes are implemented via the filter bank. Importantly, the filter bank and the signal decomposition are jointly learned to adapt to the spectral characteristics of data and to target applications. Furthermore, this is implemented almost without extra parameters in comparison with most existing MPGCNs. Experimental results show that the proposed convolution operator permits to achieve excellent performance in graph classification on a collection of benchmark graph datasets.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Estimation and Selection Properties of the LAD Fused Lasso Signal Approximator
Authors:
Xiaoli Gao
Abstract:
The fused lasso is an important method for signal processing when the hidden signals are sparse and blocky. It is often used in combination with the squared loss function. However, the squared loss is not suitable for heavy tail error distributions nor is robust against outliers which arise often in practice. The least absolute deviations (LAD) loss provides a robust alternative to the squared los…
▽ More
The fused lasso is an important method for signal processing when the hidden signals are sparse and blocky. It is often used in combination with the squared loss function. However, the squared loss is not suitable for heavy tail error distributions nor is robust against outliers which arise often in practice. The least absolute deviations (LAD) loss provides a robust alternative to the squared loss. In this paper, we study the asymptotic properties of the fused lasso estimator with the LAD loss for signal approximation. We refer to this estimator as the LAD fused lasso signal approximator, or LAD-FLSA. We investigate the estimation consistency properties of the LAD-FLSA and provide sufficient conditions under which the LAD-FLSA is sign consistent. We also construct an unbiased estimator for the degrees of freedom of the LAD-FLSA for any given tuning parameters. Both simulation studies and real data analysis are conducted to illustrate the performance of the LAD-FLSA and the effect of the unbiased estimator of the degrees of freedom.
△ Less
Submitted 30 April, 2021;
originally announced May 2021.
-
Joint Linear Trend Recovery Using L1 Regularization
Authors:
Xiaoli Gao,
Ejaz Ahmed
Abstract:
This paper studies the recovery of a joint piece-wise linear trend from a time series using L1 regularization approach, called L1 trend filtering (Kim, Koh and Boyd, 2009). We provide some sufficient conditions under which a L1 trend filter can be well-behaved in terms of mean estimation and change point detection. The result is two-fold: for the mean estimation, an almost optimal consistent rate…
▽ More
This paper studies the recovery of a joint piece-wise linear trend from a time series using L1 regularization approach, called L1 trend filtering (Kim, Koh and Boyd, 2009). We provide some sufficient conditions under which a L1 trend filter can be well-behaved in terms of mean estimation and change point detection. The result is two-fold: for the mean estimation, an almost optimal consistent rate is obtained; for the change point detection, the slope change in direction can be recovered in a high probability. In addition, we show that the weak irrepresentable condition, a necessary condition for LASSO model to be sign consistent (Zhao and Yu, 2006), is not necessary for the consistent change point detection. The performance of the L1 trend filter is evaluated by some finite sample simulations studies.
△ Less
Submitted 30 April, 2021;
originally announced April 2021.
-
Enhancing Generative Models via Quantum Correlations
Authors:
Xun Gao,
Eric R. Anschuetz,
Sheng-Tao Wang,
J. Ignacio Cirac,
Mikhail D. Lukin
Abstract:
Generative modeling using samples drawn from the probability distribution constitutes a powerful approach for unsupervised machine learning. Quantum mechanical systems can produce probability distributions that exhibit quantum correlations which are difficult to capture using classical models. We show theoretically that such quantum correlations provide a powerful resource for generative modeling.…
▽ More
Generative modeling using samples drawn from the probability distribution constitutes a powerful approach for unsupervised machine learning. Quantum mechanical systems can produce probability distributions that exhibit quantum correlations which are difficult to capture using classical models. We show theoretically that such quantum correlations provide a powerful resource for generative modeling. In particular, we provide an unconditional proof of separation in expressive power between a class of widely-used generative models, known as Bayesian networks, and its minimal quantum extension. We show that this expressivity advantage is associated with quantum nonlocality and quantum contextuality. Furthermore, we numerically test this separation on standard machine learning data sets and show that it holds for practical problems. The possibility of quantum advantage demonstrated in this work not only sheds light on the design of useful quantum machine learning protocols but also provides inspiration to draw on ideas from quantum foundations to improve purely classical algorithms.
△ Less
Submitted 20 January, 2021;
originally announced January 2021.
-
Addressing patient heterogeneity in disease predictive model development
Authors:
Xu Gao,
Weining Shen,
Jing Ning,
Ziding Feng,
Jianhua Hu
Abstract:
This paper addresses patient heterogeneity associated with prediction problems in biomedical applications. We propose a systematic hypothesis testing approach to determine the existence of patient subgroup structure and the number of subgroups in patient population if subgroups exist. A mixture of generalized linear models is considered to model the relationship between the disease outcome and pat…
▽ More
This paper addresses patient heterogeneity associated with prediction problems in biomedical applications. We propose a systematic hypothesis testing approach to determine the existence of patient subgroup structure and the number of subgroups in patient population if subgroups exist. A mixture of generalized linear models is considered to model the relationship between the disease outcome and patient characteristics and clinical factors, including targeted biomarker profiles. We construct a test statistic based on expectation maximization (EM) algorithm and derive its asymptotic distribution under the null hypothesis. An important computational advantage of the test is that the involved parameter estimates under the complex alternative hypothesis can be obtained through a small number of EM iterations, rather than optimizing the objective function. We demonstrate the finite sample performance of the proposed test in terms of type-I error rate and power, using extensive simulation studies. The applicability of the proposed method is illustrated through an application to a multi-center prostate cancer study.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Improve the Robustness and Accuracy of Deep Neural Network with $L_{2,\infty}$ Normalization
Authors:
Lijia Yu,
Xiao-Shan Gao
Abstract:
In this paper, the robustness and accuracy of the deep neural network (DNN) was enhanced by introducing the $L_{2,\infty}$ normalization of the weight matrices of the DNN with Relu as the activation function. It is proved that the $L_{2,\infty}$ normalization leads to large dihedral angles between two adjacent faces of the polyhedron graph of the DNN function and hence smoother DNN functions, whic…
▽ More
In this paper, the robustness and accuracy of the deep neural network (DNN) was enhanced by introducing the $L_{2,\infty}$ normalization of the weight matrices of the DNN with Relu as the activation function. It is proved that the $L_{2,\infty}$ normalization leads to large dihedral angles between two adjacent faces of the polyhedron graph of the DNN function and hence smoother DNN functions, which reduces over-fitting. A measure is proposed for the robustness of a classification DNN, which is the average radius of the maximal robust spheres with the sample data as centers. A lower bound for the robustness measure is given in terms of the $L_{2,\infty}$ norm. Finally, an upper bound for the Rademacher complexity of DNN with $L_{2,\infty}$ normalization is given. An algorithm is given to train a DNN with the $L_{2,\infty}$ normalization and experimental results are used to show that the $L_{2,\infty}$ normalization is effective to improve the robustness and accuracy.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling
Authors:
Wanzheng Zhu,
Chao Zhang,
Shuochao Yao,
Xiaobin Gao,
Jiawei Han
Abstract:
We study the problem of modeling human mobility from semantic trace data, wherein each GPS record in a trace is associated with a text message that describes the user's activity. Existing methods fall short in unveiling human movement regularities, because they either do not model the text data at all or suffer from text sparsity severely. We propose SHMM, a multi-modal spherical hidden Markov mod…
▽ More
We study the problem of modeling human mobility from semantic trace data, wherein each GPS record in a trace is associated with a text message that describes the user's activity. Existing methods fall short in unveiling human movement regularities, because they either do not model the text data at all or suffer from text sparsity severely. We propose SHMM, a multi-modal spherical hidden Markov model for semantics-rich human mobility modeling. Under the hidden Markov assumption, SHMM models the generation process of a given trace by jointly considering the observed location, time, and text at each step of the trace. The distinguishing characteristic of SHMM is the text modeling part. We use fixed-size vector representations to encode the semantics of the text messages, and model the generation of the l2-normalized text embeddings on a unit sphere with the von Mises-Fisher (vMF) distribution. Compared with other alternatives like multi-variate Gaussian, our choice of the vMF distribution not only incurs much fewer parameters, but also better leverages the discriminative power of text embeddings in a directional metric space. The parameter inference for the vMF distribution is non-trivial since it involves functional inversion of ratios of Bessel functions. We theoretically prove that: 1) the classical Expectation-Maximization algorithm can work with vMF distributions; and 2) while closed-form solutions are hard to be obtained for the M-step, Newton's method is guaranteed to converge to the optimal solution with quadratic convergence rate. We have performed extensive experiments on both synthetic and real-life data. The results on synthetic data verify our theoretical analysis; while the results on real-life data demonstrate that SHMM learns meaningful semantics-rich mobility models, outperforms state-of-the-art mobility models for next location prediction, and incurs lower training cost.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Decomposition of the Total Effect for Two Mediators: A Natural Counterfactual Interaction Effect Framework
Authors:
Xin Gao,
Li Li,
Li Luo
Abstract:
Mediation analysis has been used in many disciplines to explain the mechanism or process that underlies an observed relationship between an exposure variable and an outcome variable via the inclusion of mediators. Decompositions of the total causal effect of an exposure variable into effects characterizing mediation pathways and interactions have gained an increasing amount of interest in the last…
▽ More
Mediation analysis has been used in many disciplines to explain the mechanism or process that underlies an observed relationship between an exposure variable and an outcome variable via the inclusion of mediators. Decompositions of the total causal effect of an exposure variable into effects characterizing mediation pathways and interactions have gained an increasing amount of interest in the last decade. In this work, we develop decompositions for scenarios where the two mediators are causally sequential or non-sequential. Current developments in this area have primarily focused on either decompositions without interaction components or with interactions but assuming no causally sequential order between the mediators. We propose a new concept called natural counterfactual interaction effect that captures the two-way and three-way interactions for both scenarios that extend the two-way mediated interactions in literature. We develop a unified approach for decomposing the total effect into the effects that are due to mediation only, interaction only, both mediation and interaction, neither mediation nor interaction within the counterfactual framework. Finally, we illustrate the proposed decomposition method using a real data analysis where the two mediators are causally sequential.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo
Authors:
Mert Gürbüzbalaban,
Xuefeng Gao,
Yuanhan Hu,
Lingjiong Zhu
Abstract:
Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these a…
▽ More
Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of the parameters of a statistical model given the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of a statistical model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting for large datasets. We show that when the posterior distribution is strongly log-concave and smooth, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein distance if their parameters are selected appropriately. We illustrate the efficiency of our algorithms on decentralized Bayesian linear regression and Bayesian logistic regression problems.
△ Less
Submitted 26 August, 2021; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Graph Pooling with Node Proximity for Hierarchical Representation Learning
Authors:
Xing Gao,
Wenrui Dai,
Chenglin Li,
Hongkai Xiong,
Pascal Frossard
Abstract:
Graph neural networks have attracted wide attentions to enable representation learning of graph data in recent works. In complement to graph convolution operators, graph pooling is crucial for extracting hierarchical representation of graph data. However, most recent graph pooling methods still fail to efficiently exploit the geometry of graph data. In this paper, we propose a novel graph pooling…
▽ More
Graph neural networks have attracted wide attentions to enable representation learning of graph data in recent works. In complement to graph convolution operators, graph pooling is crucial for extracting hierarchical representation of graph data. However, most recent graph pooling methods still fail to efficiently exploit the geometry of graph data. In this paper, we propose a novel graph pooling strategy that leverages node proximity to improve the hierarchical representation learning of graph data with their multi-hop topology. Node proximity is obtained by harmonizing the kernel representation of topology information and node features. Implicit structure-aware kernel representation of topology information allows efficient graph pooling without explicit eigendecomposition of the graph Laplacian. Similarities of node signals are adaptively evaluated with the combination of the affine transformation and kernel trick using the Gaussian RBF function. Experimental results demonstrate that the proposed graph pooling strategy is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
OpEvo: An Evolutionary Method for Tensor Operator Optimization
Authors:
Xiaotian Gao,
Cui Wei,
Lintao Zhang,
Mao Yang
Abstract:
Training and inference efficiency of deep neural networks highly rely on the performance of tensor operators on hardware platforms. Manually optimizing tensor operators has limitations in terms of supporting new operators or hardware platforms. Therefore, automatically optimizing device code configurations of tensor operators is getting increasingly attractive. However, current methods for tensor…
▽ More
Training and inference efficiency of deep neural networks highly rely on the performance of tensor operators on hardware platforms. Manually optimizing tensor operators has limitations in terms of supporting new operators or hardware platforms. Therefore, automatically optimizing device code configurations of tensor operators is getting increasingly attractive. However, current methods for tensor operator optimization usually suffer from poor sample-efficiency due to the combinatorial search space. In this work, we propose a novel evolutionary method, OpEvo, which efficiently explores the search spaces of tensor operators by introducing a topology-aware mutation operation based on q-random walk to leverage the topological structures over the search spaces. Our comprehensive experiment results show that compared with state-of-the-art (SOTA) methods OpEvo can find the best configuration with the lowest variance and least efforts in the number of trials and wall-clock time. All code of this work is available online.
△ Less
Submitted 21 December, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Learning to Stop While Learning to Predict
Authors:
Xinshi Chen,
Hanjun Dai,
Yu Li,
Xin Gao,
Le Song
Abstract:
There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping criteria for outputting results at different iterations, many algorithm-inspired deep models are restricted to a ``fixed-depth'' for all inputs. Similar…
▽ More
There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping criteria for outputting results at different iterations, many algorithm-inspired deep models are restricted to a ``fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances, either to avoid ``over-thinking'', or because we want to compute less for operations converged already. In this paper, we tackle this varying depth problem using a steerable architecture, where a feed-forward deep model and a variational stopping policy are learned together to sequentially determine the optimal number of layers for each input instance. Training such architecture is very challenging. We provide a variational Bayes perspective and design a novel and effective training procedure which decomposes the task into an oracle model learning stage and an imitation stage. Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
A Framework for Behavioral Biometric Authentication using Deep Metric Learning on Mobile Devices
Authors:
Cong Wang,
Yanru Xiao,
Xing Gao,
Li Li,
Jun Wang
Abstract:
Mobile authentication using behavioral biometrics has been an active area of research. Existing research relies on building machine learning classifiers to recognize an individual's unique patterns. However, these classifiers are not powerful enough to learn the discriminative features. When implemented on the mobile devices, they face new challenges from the behavioral dynamics, data privacy and…
▽ More
Mobile authentication using behavioral biometrics has been an active area of research. Existing research relies on building machine learning classifiers to recognize an individual's unique patterns. However, these classifiers are not powerful enough to learn the discriminative features. When implemented on the mobile devices, they face new challenges from the behavioral dynamics, data privacy and side-channel leaks. To address these challenges, we present a new framework to incorporate training on battery-powered mobile devices, so private data never leaves the device and training can be flexibly scheduled to adapt the behavioral patterns at runtime. We re-formulate the classification problem into deep metric learning to improve the discriminative power and design an effective countermeasure to thwart side-channel leaks by embedding a noise signature in the sensing signals without sacrificing too much usability. The experiments demonstrate authentication accuracy over 95% on three public datasets, a sheer 15% gain from multi-class classification with less data and robustness against brute-force and side-channel attacks with 99% and 90% success, respectively. We show the feasibility of training with mobile CPUs, where training 100 epochs takes less than 10 mins and can be boosted 3-5 times with feature transfer. Finally, we profile memory, energy and computational overhead. Our results indicate that training consumes lower energy than watching videos and slightly higher energy than playing games.
△ Less
Submitted 17 August, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Decomposition of Total Effect with the Notion of Natural Counterfactual Interaction Effect
Authors:
Xin Gao,
Li Li,
Li Luo
Abstract:
Mediation analysis serves as a crucial tool to obtain causal inference based on directed acyclic graphs, which has been widely employed in the areas of biomedical science, social science, epidemiology and psychology. Decomposition of total effect provides a deep insight to fully understand the casual contribution from each path and interaction term. Since the four-way decomposition method was prop…
▽ More
Mediation analysis serves as a crucial tool to obtain causal inference based on directed acyclic graphs, which has been widely employed in the areas of biomedical science, social science, epidemiology and psychology. Decomposition of total effect provides a deep insight to fully understand the casual contribution from each path and interaction term. Since the four-way decomposition method was proposed to identify the mediated interaction effect in counterfactual framework, the idea had been extended to a more sophisticated scenario with non-sequential multiple mediators. However, the method exhibits limitations as the causal structure contains direct causal edges between mediators, such as inappropriate modeling of dependence and non-identifiability. We develop the notion of natural counterfactual interaction effect and find that the decomposition of total effect can be consistently realized with our proposed notion. Furthermore, natural counterfactual interaction effect overcomes the drawbacks and possesses a clear and significant interpretation, which may largely improve the capacity of researchers to analyze highly complex causal structures.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Authors:
Chunyuan Li,
Xiang Gao,
Yuan Li,
Baolin Peng,
Xiujun Li,
Yizhe Zhang,
Jianfeng Gao
Abstract:
When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus. A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and un…
▽ More
When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus. A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks. We hope that our first pre-trained big VAE language model itself and results can help the NLP community renew the interests of deep generative models in the era of large-scale pre-training, and make these principled methods more practical.
△ Less
Submitted 11 October, 2020; v1 submitted 5 April, 2020;
originally announced April 2020.
-
Non-Convex Optimization via Non-Reversible Stochastic Gradient Langevin Dynamics
Authors:
Yuanhan Hu,
Xiaoyu Wang,
Xuefeng Gao,
Mert Gurbuzbalaban,
Lingjiong Zhu
Abstract:
Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped La…
▽ More
Stochastic Gradient Langevin Dynamics (SGLD) is a powerful algorithm for optimizing a non-convex objective, where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates towards a global minimum. SGLD is based on the overdamped Langevin diffusion which is reversible in time. By adding an anti-symmetric matrix to the drift term of the overdamped Langevin diffusion, one gets a non-reversible diffusion that converges to the same stationary distribution with a faster convergence rate. In this paper, we study the non reversible Stochastic Gradient Langevin Dynamics (NSGLD) which is based on discretization of the non-reversible Langevin diffusion. We provide finite-time performance bounds for the global convergence of NSGLD for solving stochastic non-convex optimization problems. Our results lead to non-asymptotic guarantees for both population and empirical risk minimization problems. Numerical experiments for Bayesian independent component analysis and neural network models show that NSGLD can outperform SGLD with proper choices of the anti-symmetric matrix.
△ Less
Submitted 2 June, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Bayesian model selection approach for colored graphical Gaussian models
Authors:
Qiong Li,
Xin Gao,
Helene Massam
Abstract:
We consider a class of colored graphical Gaussian models obtained by placing symmetry constraints on the precision matrix in a Bayesian framework. The prior distribution on the precision matrix is the colored $G$-Wishart prior which is the Diaconis-Ylvisaker conjugate prior. In this paper, we develop a computationally efficient model search algorithm which combines linear regression with a double…
▽ More
We consider a class of colored graphical Gaussian models obtained by placing symmetry constraints on the precision matrix in a Bayesian framework. The prior distribution on the precision matrix is the colored $G$-Wishart prior which is the Diaconis-Ylvisaker conjugate prior. In this paper, we develop a computationally efficient model search algorithm which combines linear regression with a double reversible jump Markov chain Monte Carlo (MCMC) method. The latter is to estimate the Bayes factors expressed as the ratio of posterior probabilities of two competing models. We also establish the asymptotic consistency property of the model selection procedure based on the Bayes factors. Our procedure avoids an exhaustive search which is computationally impossible. Our method is illustrated with simulations and a real-world application with a protein signalling data set.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Probabilistic Dual Network Architecture Search on Graphs
Authors:
Yiren Zhao,
Duo Wang,
Xitong Gao,
Robert Mullins,
Pietro Lio,
Mateja Jamnik
Abstract:
We present the first differentiable Network Architecture Search (NAS) for Graph Neural Networks (GNNs). GNNs show promising performance on a wide range of tasks, but require a large amount of architecture engineering. First, graphs are inherently a non-Euclidean and sophisticated data structure, leading to poor adaptivity of GNN architectures across different datasets. Second, a typical graph bloc…
▽ More
We present the first differentiable Network Architecture Search (NAS) for Graph Neural Networks (GNNs). GNNs show promising performance on a wide range of tasks, but require a large amount of architecture engineering. First, graphs are inherently a non-Euclidean and sophisticated data structure, leading to poor adaptivity of GNN architectures across different datasets. Second, a typical graph block contains numerous different components, such as aggregation and attention, generating a large combinatorial search space. To counter these problems, we propose a Probabilistic Dual Network Architecture Search (PDNAS) framework for GNNs. PDNAS not only optimises the operations within a single graph block (micro-architecture), but also considers how these blocks should be connected to each other (macro-architecture). The dual architecture (micro- and marco-architectures) optimisation allows PDNAS to find deeper GNNs on diverse datasets with better performance compared to other graph NAS methods. Moreover, we use a fully gradient-based search approach to update architectural parameters, making it the first differentiable graph NAS method. PDNAS outperforms existing hand-designed GNNs and NAS results, for example, on the PPI dataset, PDNAS beats its best competitors by 1.67 and 0.17 in F1 scores.
△ Less
Submitted 21 March, 2020;
originally announced March 2020.
-
RNA Secondary Structure Prediction By Learning Unrolled Algorithms
Authors:
Xinshi Chen,
Yu Li,
Ramzan Umarov,
Xin Gao,
Le Song
Abstract:
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction which can effectively take into account the inherent constraints in the problem. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With c…
▽ More
In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction which can effectively take into account the inherent constraints in the problem. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled algorithm for constrained programming as the template for deep architectures to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold: it predicts significantly better structures compared to previous SOTA (especially for pseudoknotted structures), while being as efficient as the fastest algorithms in terms of inference time.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Regime Switching Bandits
Authors:
Xiang Zhou,
Yi Xiong,
Ningyuan Chen,
Xuefeng Gao
Abstract:
We study a multi-armed bandit problem where the rewards exhibit regime switching. Specifically, the distributions of the random rewards generated from all arms are modulated by a common underlying state modeled as a finite-state Markov chain. The agent does not observe the underlying state and has to learn the transition matrix and the reward distributions. We propose a learning algorithm for this…
▽ More
We study a multi-armed bandit problem where the rewards exhibit regime switching. Specifically, the distributions of the random rewards generated from all arms are modulated by a common underlying state modeled as a finite-state Markov chain. The agent does not observe the underlying state and has to learn the transition matrix and the reward distributions. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, belief error control in partially observable Markov decision processes and upper-confidence-bound methods for online learning. We also establish an upper bound $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. Finally, we conduct proof-of-concept experiments to illustrate the performance of the learning algorithm.
△ Less
Submitted 1 February, 2021; v1 submitted 25 January, 2020;
originally announced January 2020.
-
Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning
Authors:
Haozhao Wang,
Zhihao Qu,
Song Guo,
Xin Gao,
Ruixuan Li,
Baoliu Ye
Abstract:
Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of F…
▽ More
Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of Federated Learning, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed. Specifically, each training node intermittently pulls the global model from the server in SGD iterations, resulting in that it is sometimes unsynchronized with the server. In such a case, it will use its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis of PRLC achieves two important findings. First, we prove that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Second, we show that PRLC admits lower pulling frequency than the existing pulling reduction method without local compensation. We also conduct extensive experiments on various machine learning models to validate our theoretical results. Experimental results show that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., PRLC requiring only half of the pulling operations of LAG.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Learning and Optimization with Bayesian Hybrid Models
Authors:
Elvis A. Eugene,
Xian Gao,
Alexander W. Dowling
Abstract:
Bayesian hybrid models fuse physics-based insights with machine learning constructs to correct for systematic bias. In this paper, we compare Bayesian hybrid models against physics-based glass-box and Gaussian process black-box surrogate models. We consider ballistic firing as an illustrative case study for a Bayesian decision-making workflow. First, Bayesian calibration is performed to estimate m…
▽ More
Bayesian hybrid models fuse physics-based insights with machine learning constructs to correct for systematic bias. In this paper, we compare Bayesian hybrid models against physics-based glass-box and Gaussian process black-box surrogate models. We consider ballistic firing as an illustrative case study for a Bayesian decision-making workflow. First, Bayesian calibration is performed to estimate model parameters. We then use the posterior distribution from Bayesian analysis to compute optimal firing conditions to hit a target via a single-stage stochastic program. The case study demonstrates the ability of Bayesian hybrid models to overcome systematic bias from missing physics with less data than the pure machine learning approach. Ultimately, we argue Bayesian hybrid models are an emerging paradigm for data-informed decision-making under parametric and epistemic uncertainty.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context
Authors:
Liantao Ma,
Chaohe Zhang,
Yasha Wang,
Wenjie Ruan,
Jiantao Wang,
Wen Tang,
Xinyu Ma,
Xin Gao,
Junyi Gao
Abstract:
Predicting the patient's clinical outcome from the historical electronic medical records (EMR) is a fundamental research problem in medical informatics. Most deep learning-based solutions for EMR analysis concentrate on learning the clinical visit embedding and exploring the relations between visits. Although those works have shown superior performances in healthcare prediction, they fail to explo…
▽ More
Predicting the patient's clinical outcome from the historical electronic medical records (EMR) is a fundamental research problem in medical informatics. Most deep learning-based solutions for EMR analysis concentrate on learning the clinical visit embedding and exploring the relations between visits. Although those works have shown superior performances in healthcare prediction, they fail to explore the personal characteristics during the clinical visits thoroughly. Moreover, existing works usually assume that the more recent record weights more in the prediction, but this assumption is not suitable for all conditions. In this paper, we propose ConCare to handle the irregular EMR data and extract feature interrelationship to perform individualized healthcare prediction. Our solution can embed the feature sequences separately by modeling the time-aware distribution. ConCare further improves the multi-head self-attention via the cross-head decorrelation, so that the inter-dependencies among dynamic features and static baseline information can be effectively captured to form the personal health context. Experimental results on two real-world EMR datasets demonstrate the effectiveness of ConCare. The medical findings extracted by ConCare are also empirically confirmed by human experts and medical literature.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration
Authors:
Liantao Ma,
Junyi Gao,
Yasha Wang,
Chaohe Zhang,
Jiangtao Wang,
Wenjie Ruan,
Wen Tang,
Xin Gao,
Xinyu Ma
Abstract:
Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health s…
▽ More
Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
A High-dimensional M-estimator Framework for Bi-level Variable Selection
Authors:
Bin Luo,
Xiaoli Gao
Abstract:
In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outlier…
▽ More
In high-dimensional data analysis, bi-level sparsity is often assumed when covariates function group-wisely and sparsity can appear either at the group level or within certain groups. In such cases, an ideal model should be able to encourage the bi-level variable selection consistently. Bi-level variable selection has become even more challenging when data have heavy-tailed distribution or outliers exist in random errors and covariates. In this paper, we study a framework of high-dimensional M-estimation for bi-level variable selection. This framework encourages bi-level sparsity through a computationally efficient two-stage procedure. In theory, we provide sufficient conditions under which our two-stage penalized M-estimator possesses simultaneous local estimation consistency and the bi-level variable selection consistency if certain nonconvex penalty functions are used at the group level. Both our simulation studies and real data analysis demonstrate satisfactory finite sample performance of the proposed estimators under different irregular settings.
△ Less
Submitted 10 September, 2021; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling
Authors:
Qitian Wu,
Zixuan Zhang,
Xiaofeng Gao,
Junchi Yan,
Guihai Chen
Abstract:
We target modeling latent dynamics in high-dimension marked event sequences without any prior knowledge about marker relations. Such problem has been rarely studied by previous works which would have fundamental difficulty to handle the arisen challenges: 1) the high-dimensional markers and unknown relation network among them pose intractable obstacles for modeling the latent dynamic process; 2) o…
▽ More
We target modeling latent dynamics in high-dimension marked event sequences without any prior knowledge about marker relations. Such problem has been rarely studied by previous works which would have fundamental difficulty to handle the arisen challenges: 1) the high-dimensional markers and unknown relation network among them pose intractable obstacles for modeling the latent dynamic process; 2) one observed event sequence may concurrently contain several different chains of interdependent events; 3) it is hard to well define the distance between two high-dimension event sequences. To these ends, in this paper, we propose a seminal adversarial imitation learning framework for high-dimension event sequence generation which could be decomposed into: 1) a latent structural intensity model that estimates the adjacent nodes without explicit networks and learns to capture the temporal dynamics in the latent space of markers over observed sequence; 2) an efficient random walk based generation model that aims at imitating the generation process of high-dimension event sequences from a bottom-up view; 3) a discriminator specified as a seq2seq network optimizing the rewards to help the generator output event sequences as real as possible. Experimental results on both synthetic and real-world datasets demonstrate that the proposed method could effectively detect the hidden network among markers and make decent prediction for future marked events, even when the number of markers scales to million level.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Online Bagging for Anytime Transfer Learning
Authors:
Guokun Chi,
Min Jiang,
Xing Gao,
Weizhen Hu,
Shihui Guo,
Kay Chen Tan
Abstract:
Transfer learning techniques have been widely used in the reality that it is difficult to obtain sufficient labeled data in the target domain, but a large amount of auxiliary data can be obtained in the relevant source domain. But most of the existing methods are based on offline data. In practical applications, it is often necessary to face online learning problems in which the data samples are a…
▽ More
Transfer learning techniques have been widely used in the reality that it is difficult to obtain sufficient labeled data in the target domain, but a large amount of auxiliary data can be obtained in the relevant source domain. But most of the existing methods are based on offline data. In practical applications, it is often necessary to face online learning problems in which the data samples are achieved sequentially. In this paper, We are committed to applying the ensemble approach to solving the problem of online transfer learning so that it can be used in anytime setting. More specifically, we propose a novel online transfer learning framework, which applies the idea of online bagging methods to anytime transfer learning problems, and constructs strong classifiers through online iterations of the usefulness of multiple weak classifiers. Further, our algorithm also provides two extension schemes to reduce the impact of negative transfer. Experiments on three real data sets show that the effectiveness of our proposed algorithms.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
Differentiable Combinatorial Losses through Generalized Gradients of Linear Programs
Authors:
Xi Gao,
Han Zhang,
Aliakbar Panahi,
Tom Arodz
Abstract:
When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. The natural training-time loss would involve a combinatorial problem -- dynamic pro…
▽ More
When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. The natural training-time loss would involve a combinatorial problem -- dynamic programming-based global sequence alignment -- but solutions to combinatorial problems are not differentiable with respect to their input parameters, so surrogate, differentiable losses are used instead. Here, we show how to perform gradient descent over combinatorial optimization algorithms that involve continuous parameters, for example edge weights, and can be efficiently expressed as linear programs. We demonstrate usefulness of gradient descent over combinatorial optimization in sequence-to-sequence modeling using differentiable encoder-decoder architecture with softmax or Gumbel-softmax, and in image classification in a weakly supervised setting where instead of the correct class for each photo, only groups of photos labeled with correct but unordered set of classes are available during training.
△ Less
Submitted 2 October, 2020; v1 submitted 17 October, 2019;
originally announced October 2019.