-
Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities
Authors:
Soojin Park,
Su Yeon Kim,
Xinyao Zheng,
Chioun Lee
Abstract:
Educational disparities are rooted in and perpetuate social inequalities across multiple dimensions such as race, socioeconomic status, and geography. To reduce disparities, most intervention strategies focus on a single domain and frequently evaluate their effectiveness by using causal decomposition analysis. However, a growing body of research suggests that single-domain interventions may be ins…
▽ More
Educational disparities are rooted in and perpetuate social inequalities across multiple dimensions such as race, socioeconomic status, and geography. To reduce disparities, most intervention strategies focus on a single domain and frequently evaluate their effectiveness by using causal decomposition analysis. However, a growing body of research suggests that single-domain interventions may be insufficient for individuals marginalized on multiple fronts. While interventions across multiple domains are increasingly proposed, there is limited guidance on appropriate methods for evaluating their effectiveness. To address this gap, we develop an extended causal decomposition analysis that simultaneously targets multiple causally ordered intervening factors, allowing for the assessment of their synergistic effects. These scenarios often involve challenges related to model misspecification due to complex interactions among group categories, intervening factors, and their confounders with the outcome. To mitigate these challenges, we introduce a triply robust estimator that leverages machine learning techniques to address potential model misspecification. We apply our method to a cohort of students from the High School Longitudinal Study, focusing on math achievement disparities between Black, Hispanic, and White high schoolers. Specifically, we examine how two sequential interventions - equalizing the proportion of students who attend high-performing schools and equalizing enrollment in Algebra I by 9th grade across racial groups - may reduce these disparities.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity
Authors:
Xuejun Sun,
Yiran Song,
Xiaochen Zhou,
Ruilie Cai,
Yu Zhang,
Xinyi Li,
Rui Peng,
Jialiu Xie,
Yuanyuan Yan,
Muyao Tang,
Prem Lakshmanane,
Baiming Zou,
James S. Hagood,
Raymond J. Pickles,
Didong Li,
Fei Zou,
Xiaojing Zheng
Abstract:
Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms,…
▽ More
Respiratory viral infections pose a global health burden, yet the cellular immune responses driving protection or pathology remain unclear. Natural infection cohorts often lack pre-exposure baseline data and structured temporal sampling. In contrast, inoculation and vaccination trials generate insightful longitudinal transcriptomic data. However, the scattering of these datasets across platforms, along with inconsistent metadata and preprocessing procedure, hinders AI-driven discovery. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready, rigorously curated dataset that integrates 14,136 RNA-seq profiles from 3,178 subjects across 66 studies encompassing over 2.56 million cells. Spanning vaccination, inoculation, and mixed exposures, the dataset includes microarray, bulk RNA-seq, and single-cell RNA-seq from whole blood, PBMCs, and nasal swabs, sourced from GEO, ImmPort, and ArrayExpress. We harmonized subject-level metadata, standardized outcome measures, applied unified preprocessing pipelines with rigorous quality control, and aligned all data to official gene symbols. To demonstrate the utility of HR-VILAGE-3K3M, we performed predictive modeling of vaccine responders and evaluated batch-effect correction methods. Beyond these initial demonstrations, it supports diverse systems immunology applications and benchmarking of feature selection and transfer learning algorithms. Its scale and heterogeneity also make it ideal for pretraining foundation models of the human immune response and for advancing multimodal learning frameworks. As the largest longitudinal transcriptomic resource for human respiratory viral immunization, it provides an accessible platform for reproducible AI-driven research, accelerating systems immunology and vaccine development against emerging viral threats.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
BLAST: Bayesian online change-point detection with structured image data
Authors:
Xiaojun Zheng,
Simon Mak
Abstract:
The prompt online detection of abrupt changes in image data is essential for timely decision-making in broad applications, from video surveillance to manufacturing quality control. Existing methods, however, face three key challenges. First, the high-dimensional nature of image data introduces computational bottlenecks for efficient real-time monitoring. Second, changes often involve structural im…
▽ More
The prompt online detection of abrupt changes in image data is essential for timely decision-making in broad applications, from video surveillance to manufacturing quality control. Existing methods, however, face three key challenges. First, the high-dimensional nature of image data introduces computational bottlenecks for efficient real-time monitoring. Second, changes often involve structural image features, e.g., edges, blurs and/or shapes, and ignoring such structure can lead to delayed change detection. Third, existing methods are largely non-Bayesian and thus do not provide a quantification of monitoring uncertainty for confident detection. We address this via a novel Bayesian onLine Structure-Aware change deTection (BLAST) method. BLAST first leverages a deep Gaussian Markov random field prior to elicit desirable image structure from offline reference data. With this prior elicited, BLAST employs a new Bayesian online change-point procedure for image monitoring via its so-called posterior run length distribution. This posterior run length distribution can be computed in an online fashion using $\mathcal{O}(p^2)$ work at each time-step, where $p$ is the number of image pixels; this facilitates scalable Bayesian online monitoring of large images. We demonstrate the effectiveness of BLAST over existing methods in a suite of numerical experiments and in two applications, the first on street scene monitoring and the second on real-time process monitoring for metal additive manufacturing.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
PEAKS: Selecting Key Training Examples Incrementally via Prediction Error Anchored by Kernel Similarity
Authors:
Mustafa Burak Gurbuz,
Xingyu Zheng,
Constantine Dovrolis
Abstract:
As deep learning continues to be driven by ever-larger datasets, understanding which examples are most important for generalization has become a critical question. While progress in data selection continues, emerging applications require studying this problem in dynamic contexts. To bridge this gap, we pose the Incremental Data Selection (IDS) problem, where examples arrive as a continuous stream,…
▽ More
As deep learning continues to be driven by ever-larger datasets, understanding which examples are most important for generalization has become a critical question. While progress in data selection continues, emerging applications require studying this problem in dynamic contexts. To bridge this gap, we pose the Incremental Data Selection (IDS) problem, where examples arrive as a continuous stream, and need to be selected without access to the full data source. In this setting, the learner must incrementally build a training dataset of predefined size while simultaneously learning the underlying task. We find that in IDS, the impact of a new sample on the model state depends fundamentally on both its geometric relationship in the feature space and its prediction error. Leveraging this insight, we propose PEAKS (Prediction Error Anchored by Kernel Similarity), an efficient data selection method tailored for IDS. Our comprehensive evaluations demonstrate that PEAKS consistently outperforms existing selection strategies. Furthermore, PEAKS yields increasingly better performance returns than random selection as training data size grows on real-world datasets. The code is available at https://github.com/BurakGurbuz97/PEAKS.
△ Less
Submitted 30 June, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
Causal and Local Correlations Based Network for Multivariate Time Series Classification
Authors:
Mingsen Du,
Yanxuan Wei,
Xiangwei Zheng,
Cun Ji
Abstract:
Recently, time series classification has attracted the attention of a large number of researchers, and hundreds of methods have been proposed. However, these methods often ignore the spatial correlations among dimensions and the local correlations among features. To address this issue, the causal and local correlations based network (CaLoNet) is proposed in this study for multivariate time series…
▽ More
Recently, time series classification has attracted the attention of a large number of researchers, and hundreds of methods have been proposed. However, these methods often ignore the spatial correlations among dimensions and the local correlations among features. To address this issue, the causal and local correlations based network (CaLoNet) is proposed in this study for multivariate time series classification. First, pairwise spatial correlations between dimensions are modeled using causality modeling to obtain the graph structure. Then, a relationship extraction network is used to fuse local correlations to obtain long-term dependency features. Finally, the graph structure and long-term dependency features are integrated into the graph neural network. Experiments on the UEA datasets show that CaLoNet can obtain competitive performance compared with state-of-the-art methods.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
WassFFed: Wasserstein Fair Federated Learning
Authors:
Zhongxuan Han,
Li Zhang,
Chaochao Chen,
Xiaolin Zheng,
Fei Zheng,
Yuyuan Li,
Jianwei Yin
Abstract:
Federated Learning (FL) employs a training approach to address scenarios where users' data cannot be shared across clients. Achieving fairness in FL is imperative since training data in FL is inherently geographically distributed among diverse user groups. Existing research on fairness predominantly assumes access to the entire training data, making direct transfer to FL challenging. However, the…
▽ More
Federated Learning (FL) employs a training approach to address scenarios where users' data cannot be shared across clients. Achieving fairness in FL is imperative since training data in FL is inherently geographically distributed among diverse user groups. Existing research on fairness predominantly assumes access to the entire training data, making direct transfer to FL challenging. However, the limited existing research on fairness in FL does not effectively address two key challenges, i.e., (CH1) Current methods fail to deal with the inconsistency between fair optimization results obtained with surrogate functions and fair classification results. (CH2) Directly aggregating local fair models does not always yield a globally fair model due to non Identical and Independent data Distributions (non-IID) among clients. To address these challenges, we propose a Wasserstein Fair Federated Learning framework, namely WassFFed. To tackle CH1, we ensure that the outputs of local models, rather than the loss calculated with surrogate functions or classification results with a threshold, remain independent of various user groups. To resolve CH2, we employ a Wasserstein barycenter calculation of all local models' outputs for each user group, bringing local model outputs closer to the global output distribution to ensure consistency between the global model and local models. We conduct extensive experiments on three real-world datasets, demonstrating that WassFFed outperforms existing approaches in striking a balance between accuracy and fairness.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Denoising VAE as an Explainable Feature Reduction and Diagnostic Pipeline for Autism Based on Resting state fMRI
Authors:
Xinyuan Zheng,
Orren Ravid,
Robert A. J. Barry,
Yoojean Kim,
Qian Wang,
Young-geun Kim,
Xi Zhu,
Xiaofu He
Abstract:
Autism spectrum disorders (ASDs) are developmental conditions characterized by restricted interests and difficulties in communication. The complexity of ASD has resulted in a deficiency of objective diagnostic biomarkers. Deep learning methods have gained recognition for addressing these challenges in neuroimaging analysis, but finding and interpreting such diagnostic biomarkers are still challeng…
▽ More
Autism spectrum disorders (ASDs) are developmental conditions characterized by restricted interests and difficulties in communication. The complexity of ASD has resulted in a deficiency of objective diagnostic biomarkers. Deep learning methods have gained recognition for addressing these challenges in neuroimaging analysis, but finding and interpreting such diagnostic biomarkers are still challenging computationally. Here, we propose a feature reduction pipeline using resting-state fMRI data. We used Craddock atlas and Power atlas to extract functional connectivity data from rs-fMRI, resulting in over 30 thousand features. By using a denoising variational autoencoder, our proposed pipeline further compresses the connectivity features into 5 latent Gaussian distributions, providing is a low-dimensional representation of the data to promote computational efficiency and interpretability. To test the method, we employed the extracted latent representations to classify ASD using traditional classifiers such as SVM on a large multi-site dataset. The 95% confidence interval for the prediction accuracy of SVM is [0.63, 0.76] after site harmonization using the extracted latent distributions. Without using DVAE for dimensionality reduction, the prediction accuracy is 0.70, which falls within the interval. The DVAE successfully encoded the diagnostic information from rs-fMRI data without sacrificing prediction performance. The runtime for training the DVAE and obtaining classification results from its extracted latent features was 7 times shorter compared to training classifiers directly on the raw data. Our findings suggest that the Power atlas provides more effective brain connectivity insights for diagnosing ASD than Craddock atlas. Additionally, we visualized the latent representations to gain insights into the brain networks contributing to the differences between ASD and neurotypical brains.
△ Less
Submitted 27 March, 2025; v1 submitted 30 September, 2024;
originally announced October 2024.
-
Predicting path-dependent processes by deep learning
Authors:
Xudong Zheng,
Yuecai Han
Abstract:
In this paper, we investigate a deep learning method for predicting path-dependent processes based on discretely observed historical information. This method is implemented by considering the prediction as a nonparametric regression and obtaining the regression function through simulated samples and deep neural networks. When applying this method to fractional Brownian motion and the solutions of…
▽ More
In this paper, we investigate a deep learning method for predicting path-dependent processes based on discretely observed historical information. This method is implemented by considering the prediction as a nonparametric regression and obtaining the regression function through simulated samples and deep neural networks. When applying this method to fractional Brownian motion and the solutions of some stochastic differential equations driven by it, we theoretically proved that the $L_2$ errors converge to 0, and we further discussed the scope of the method. With the frequency of discrete observations tending to infinity, the predictions based on discrete observations converge to the predictions based on continuous observations, which implies that we can make approximations by the method. We apply the method to the fractional Brownian motion and the fractional Ornstein-Uhlenbeck process as examples. Comparing the results with the theoretical optimal predictions and taking the mean square error as a measure, the numerical simulations demonstrate that the method can generate accurate results. We also analyze the impact of factors such as prediction period, Hurst index, etc. on the accuracy.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Mixture Modeling for Temporal Point Processes with Memory
Authors:
Xiaotian Zheng,
Athanasios Kottas,
Bruno Sansó
Abstract:
We propose a constructive approach to building temporal point processes that incorporate dependence on their history. The dependence is modeled through the conditional density of the duration, i.e., the interval between successive event times, using a mixture of first-order conditional densities for each one of a specific number of lagged durations. Such a formulation for the conditional duration…
▽ More
We propose a constructive approach to building temporal point processes that incorporate dependence on their history. The dependence is modeled through the conditional density of the duration, i.e., the interval between successive event times, using a mixture of first-order conditional densities for each one of a specific number of lagged durations. Such a formulation for the conditional duration density accommodates high-order dynamics, and it thus enables flexible modeling for point processes with memory. The implied conditional intensity function admits a representation as a local mixture of first-order hazard functions. By specifying appropriate families of distributions for the first-order conditional densities, with different shapes for the associated hazard functions, we can obtain either self-exciting or self-regulating point processes. From the perspective of duration processes, we develop a method to specify a stationary marginal density. The resulting model, interpreted as a dependent renewal process, introduces high-order Markov dependence among identically distributed durations. Furthermore, we provide extensions to cluster point processes. These can describe duration clustering behaviors attributed to different factors, thus expanding the scope of the modeling framework to a wider range of applications. Regarding implementation, we develop a Bayesian approach to inference, model checking, and prediction. We investigate point process model properties analytically, and illustrate the methodology with both synthetic and real data examples.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems
Authors:
Xin Ma,
Puchen Zhu,
Xiao Li,
Xiaoyin Zheng,
Jianshu Zhou,
Xuchen Wang,
Kwok Wai Samuel Au
Abstract:
Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an…
▽ More
Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial and decentering distortions of the lens to improve the accuracy of stereo vision systems and simplify their calibration process. In addition, we present an easy and flexible calibration method for the MDM of stereo vision systems with a commonly used planar pattern, which requires cameras to observe the planar pattern in different orientations. The proposed technique is easy to use and flexible compared with classical calibration techniques for depth-dependent distortion models in which the lens must be perpendicular to the planar pattern. The experimental validation of the MDM and its calibration method showed that the MDM improved the calibration accuracy by 56.55% and 74.15% compared with the Li's distortion model and traditional Brown's distortion model. Besides, an iteration-based reconstruction method is proposed to iteratively estimate the depth information in the MDM during three-dimensional reconstruction. The results showed that the accuracy of the iteration-based reconstruction method was improved by 9.08% compared with that of the non-iteration reconstruction method.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data
Authors:
Omar Faruque,
Sahara Ali,
Xue Zheng,
Jianwu Wang
Abstract:
The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods…
▽ More
The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods assume stationarity and linear relations in data, making them infeasible for the task. Further, the recent deep learning-based methods rely on the traditional causal structure learning approaches making them computationally expensive. In this paper, we propose a Time-Series Causal Neural Network (TS-CausalNN) - a deep learning technique to discover contemporaneous and lagged causal relations simultaneously. Our proposed architecture comprises (i) convolutional blocks comprising parallel custom causal layers, (ii) acyclicity constraint, and (iii) optimization techniques using the augmented Lagrangian approach. In addition to the simple parallel design, an advantage of the proposed model is that it naturally handles the non-stationarity and non-linearity of the data. Through experiments on multiple synthetic and real world datasets, we demonstrate the empirical proficiency of our proposed approach as compared to several state-of-the-art methods. The inferred graphs for the real world dataset are in good agreement with the domain understanding.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
$e^{\text{RPCA}}$: Robust Principal Component Analysis for Exponential Family Distributions
Authors:
Xiaojun Zheng,
Simon Mak,
Liyan Xie,
Yao Xie
Abstract:
Robust Principal Component Analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low-rank background is critical for process monitoring and diagnosis. However, exis…
▽ More
Robust Principal Component Analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low-rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non-Gaussian. We thus propose a new method called Robust Principal Component Analysis for Exponential Family distributions ($e^{\text{RPCA}}$), which can perform the desired decomposition into low-rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient $e^{\text{RPCA}}$ decomposition. The effectiveness of $e^{\text{RPCA}}$ is then demonstrated in two applications: the first for steel sheet defect detection, and the second for crime activity monitoring in the Atlanta metropolitan area.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Kernel Cox partially linear regression: building predictive models for cancer patients' survival
Authors:
Yaohua Rong,
Sihai Dave Zhao,
Xia Zheng,
Yi Li
Abstract:
Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling…
▽ More
Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling and irrelevant predictors removing simultaneously. In this paper, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and non-parametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
A New Causal Decomposition Paradigm towards Health Equity
Authors:
Xinwei Sun,
Xiangyu Zheng,
Jim Weinstein
Abstract:
Causal decomposition has provided a powerful tool to analyze health disparity problems, by assessing the proportion of disparity caused by each mediator. However, most of these methods lack \emph{policy implications}, as they fail to account for all sources of disparities caused by the mediator. Besides, their estimations \emph{pre-specified} some covariates set (\emph{a.k.a}, admissible set) for…
▽ More
Causal decomposition has provided a powerful tool to analyze health disparity problems, by assessing the proportion of disparity caused by each mediator. However, most of these methods lack \emph{policy implications}, as they fail to account for all sources of disparities caused by the mediator. Besides, their estimations \emph{pre-specified} some covariates set (\emph{a.k.a}, admissible set) for the strong ignorability condition to hold, which can be problematic as some variables in this set may induce new spurious features. To resolve these issues, under the framework of the structural causal model, we propose to decompose the total effect into adjusted and unadjusted effects, with the former being able to include all types of disparity by adjusting each mediator's distribution from the disadvantaged group to the advantaged ones. Besides, equipped with maximal ancestral graph and context variables, we can automatically identify the admissible set, followed by an efficient algorithm for estimation. Theoretical correctness and the efficacy of our method are demonstrated on a synthetic dataset and a spine disease dataset.
△ Less
Submitted 20 February, 2023; v1 submitted 24 July, 2022;
originally announced July 2022.
-
PERCEPT: a new online change-point detection method using topological data analysis
Authors:
Xiaojun Zheng,
Simon Mak,
Liyan Xie,
Yao Xie
Abstract:
Topological data analysis (TDA) provides a set of data analysis tools for extracting embedded topological structures from complex high-dimensional datasets. In recent years, TDA has been a rapidly growing field which has found success in a wide range of applications, including signal processing, neuroscience and network analysis. In these applications, the online detection of changes is of crucial…
▽ More
Topological data analysis (TDA) provides a set of data analysis tools for extracting embedded topological structures from complex high-dimensional datasets. In recent years, TDA has been a rapidly growing field which has found success in a wide range of applications, including signal processing, neuroscience and network analysis. In these applications, the online detection of changes is of crucial importance, but this can be highly challenging since such changes often occur in a low-dimensional embedding within high-dimensional data streams. We thus propose a new method, called PERsistence diagram-based ChangE-PoinT detection (PERCEPT), which leverages the learned topological structure from TDA to sequentially detect changes. PERCEPT follows two key steps: it first learns the embedded topology as a point cloud via persistence diagrams, then applies a non-parametric monitoring approach for detecting changes in the resulting point cloud distributions. This yields a non-parametric, topology-aware framework which can efficiently detect online changes from high-dimensional data streams. We investigate the effectiveness of PERCEPT over existing methods in a suite of numerical experiments where the data streams have an embedded topological structure. We then demonstrate the usefulness of PERCEPT in two applications in solar flare monitoring and human gesture detection.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Bayesian Geostatistical Modeling for Discrete-Valued Processes
Authors:
Xiaotian Zheng,
Athanasios Kottas,
Bruno Sansó
Abstract:
We introduce a flexible and scalable class of Bayesian geostatistical models for discrete data, based on the class of nearest neighbor mixture transition distribution processes (NNMP), referred to as discrete NNMP. The proposed class characterizes spatial variability by a weighted combination of first-order conditional probability mass functions (pmfs) for each one of a given number of neighbors.…
▽ More
We introduce a flexible and scalable class of Bayesian geostatistical models for discrete data, based on the class of nearest neighbor mixture transition distribution processes (NNMP), referred to as discrete NNMP. The proposed class characterizes spatial variability by a weighted combination of first-order conditional probability mass functions (pmfs) for each one of a given number of neighbors. The approach supports flexible modeling for multivariate dependence through specification of general bivariate discrete distributions that define the conditional pmfs. Moreover, the discrete NNMP allows for construction of models given a pre-specified family of marginal distributions that can vary in space, facilitating covariate inclusion. In particular, we develop a modeling and inferential framework for copula-based NNMPs that can attain flexible dependence structures, motivating the use of bivariate copula families for spatial processes. Compared to the traditional class of spatial generalized linear mixed models, where spatial dependence is introduced through a transformation of response means, our process-based modeling approach provides both computational and inferential advantages. We illustrate the benefits with synthetic data examples and an analysis of North American Breeding Bird Survey data.
△ Less
Submitted 2 March, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Nearest-Neighbor Mixture Models for Non-Gaussian Spatial Processes
Authors:
Xiaotian Zheng,
Athanasios Kottas,
Bruno Sansó
Abstract:
We develop a class of nearest-neighbor mixture models that provide direct, computationally efficient, probabilistic modeling for non-Gaussian geospatial data. The class is defined over a directed acyclic graph, which implies conditional independence in representing a multivariate distribution through factorization into a product of univariate conditionals, and is extended to a full spatial process…
▽ More
We develop a class of nearest-neighbor mixture models that provide direct, computationally efficient, probabilistic modeling for non-Gaussian geospatial data. The class is defined over a directed acyclic graph, which implies conditional independence in representing a multivariate distribution through factorization into a product of univariate conditionals, and is extended to a full spatial process. We model each conditional as a mixture of spatially varying transition kernels, with locally adaptive weights, for each one of a given number of nearest neighbors. The modeling framework emphasizes the description of non-Gaussian dependence at the data level, in contrast with approaches that introduce a spatial process for transformed data, or for functionals of the data probability distribution. Thus, it facilitates efficient, full simulation-based inference. We study model construction and properties analytically through specification of bivariate distributions that define the local transition kernels, providing a general strategy for modeling general types of non-Gaussian data. Regarding computation, the framework lays out a new approach to handling spatial data sets, leveraging a mixture model structure to avoid computational issues that arise from large matrix operations. We illustrate the methodology using synthetic data examples and an analysis of Mediterranean Sea surface temperature observations.
△ Less
Submitted 27 June, 2022; v1 submitted 16 July, 2021;
originally announced July 2021.
-
Which Invariance Should We Transfer? A Causal Minimax Learning Approach
Authors:
Mingzhou Liu,
Xiangyu Zheng,
Xinwei Sun,
Fang Fang,
Yizhou Wang
Abstract:
A major barrier to deploying current machine learning models lies in their non-reliability to dataset shifts. To resolve this problem, most existing studies attempted to transfer stable information to unseen environments. Particularly, independent causal mechanisms-based methods proposed to remove mutable causal mechanisms via the do-operator. Compared to previous methods, the obtained stable pred…
▽ More
A major barrier to deploying current machine learning models lies in their non-reliability to dataset shifts. To resolve this problem, most existing studies attempted to transfer stable information to unseen environments. Particularly, independent causal mechanisms-based methods proposed to remove mutable causal mechanisms via the do-operator. Compared to previous methods, the obtained stable predictors are more effective in identifying stable information. However, a key question remains: which subset of this whole stable information should the model transfer, in order to achieve optimal generalization ability? To answer this question, we present a comprehensive minimax analysis from a causal perspective. Specifically, we first provide a graphical condition for the whole stable set to be optimal. When this condition fails, we surprisingly find with an example that this whole stable set, although can fully exploit stable information, is not the optimal one to transfer. To identify the optimal subset under this case, we propose to estimate the worst-case risk with a novel optimization scheme over the intervention functions on mutable causal mechanisms. We then propose an efficient algorithm to search for the subset with minimal worst-case risk, based on a newly defined equivalence relation between stable subsets. Compared to the exponential cost of exhaustively searching over all subsets, our searching strategy enjoys a polynomial complexity. The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
△ Less
Submitted 30 May, 2023; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Online High-Dimensional Change-Point Detection using Topological Data Analysis
Authors:
Xiaojun Zheng,
Simon Mak,
Yao Xie
Abstract:
Topological Data Analysis (TDA) is a rapidly growing field, which studies methods for learning underlying topological structures present in complex data representations. TDA methods have found recent success in extracting useful geometric structures for a wide range of applications, including protein classification, neuroscience, and time-series analysis. However, in many such applications, one is…
▽ More
Topological Data Analysis (TDA) is a rapidly growing field, which studies methods for learning underlying topological structures present in complex data representations. TDA methods have found recent success in extracting useful geometric structures for a wide range of applications, including protein classification, neuroscience, and time-series analysis. However, in many such applications, one is also interested in sequentially detecting changes in this topological structure. We propose a new method called Persistence Diagram based Change-Point (PD-CP), which tackles this problem by integrating the widely-used persistence diagrams in TDA with recent developments in nonparametric change-point detection. The key novelty in PD-CP is that it leverages the distribution of points on persistence diagrams for online detection of topological changes. We demonstrate the effectiveness of PD-CP in an application to solar flare monitoring.
△ Less
Submitted 7 March, 2021; v1 submitted 26 February, 2021;
originally announced March 2021.
-
Latent Causal Invariant Model
Authors:
Xinwei Sun,
Botong Wu,
Xiangyu Zheng,
Chang Liu,
Wei Chen,
Tao Qin,
Tie-yan Liu
Abstract:
Current supervised learning can learn spurious correlation during the data-fitting process, imposing issues regarding interpretability, out-of-distribution (OOD) generalization, and robustness. To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction. Specifically, we introduce latent variables that are separated into (a) output-causative f…
▽ More
Current supervised learning can learn spurious correlation during the data-fitting process, imposing issues regarding interpretability, out-of-distribution (OOD) generalization, and robustness. To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction. Specifically, we introduce latent variables that are separated into (a) output-causative factors and (b) others that are spuriously correlated to the output via confounders, to model the underlying causal factors. We further assume the generating mechanisms from latent space to observed data to be causally invariant. We give the identifiable claim of such invariance, particularly the disentanglement of output-causative factors from others, as a theoretical guarantee for precise inference and avoiding spurious correlation. We propose a Variational-Bayesian-based method for estimation and to optimize over the latent space for prediction. The utility of our approach is verified by improved interpretability, prediction power on various OOD scenarios (including healthcare) and robustness on security.
△ Less
Submitted 27 April, 2021; v1 submitted 4 November, 2020;
originally announced November 2020.
-
On Construction and Estimation of Stationary Mixture Transition Distribution Models
Authors:
Xiaotian Zheng,
Athanasios Kottas,
Bruno Sansó
Abstract:
Mixture transition distribution time series models build high-order dependence through a weighted combination of first-order transition densities for each one of a specified number of lags. We present a framework to construct stationary transition mixture distribution models that extend beyond linear, Gaussian dynamics. We study conditions for first-order strict stationarity which allow for differ…
▽ More
Mixture transition distribution time series models build high-order dependence through a weighted combination of first-order transition densities for each one of a specified number of lags. We present a framework to construct stationary transition mixture distribution models that extend beyond linear, Gaussian dynamics. We study conditions for first-order strict stationarity which allow for different constructions with either continuous or discrete families for the first-order transition densities given a pre-specified family for the marginal density, and with general forms for the resulting conditional expectations. Inference and prediction are developed under the Bayesian framework with particular emphasis on flexible, structured priors for the mixture weights. Model properties are investigated both analytically and through synthetic data examples. Finally, Poisson and Lomax examples are illustrated through real data applications.
△ Less
Submitted 16 June, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Disentangled Neural Architecture Search
Authors:
Xinyue Zheng,
Peng Wang,
Qigang Wang,
Zhongchao Shi
Abstract:
Neural architecture search has shown its great potential in various areas recently. However, existing methods rely heavily on a black-box controller to search architectures, which suffers from the serious problem of lacking interpretability. In this paper, we propose disentangled neural architecture search (DNAS) which disentangles the hidden representation of the controller into semantically mean…
▽ More
Neural architecture search has shown its great potential in various areas recently. However, existing methods rely heavily on a black-box controller to search architectures, which suffers from the serious problem of lacking interpretability. In this paper, we propose disentangled neural architecture search (DNAS) which disentangles the hidden representation of the controller into semantically meaningful concepts, making the neural architecture search process interpretable. Based on systematical study, we discover the correlation between network architecture and its performance, and propose a dense-sampling strategy to conduct a targeted search in promising regions that may generate well-performing architectures. We show that: 1) DNAS successfully disentangles the architecture representations, including operation selection, skip connections, and number of layers. 2) Benefiting from interpretability, DNAS can find excellent architectures under different FLOPS restrictions flexibly. 3) Dense-sampling leads to neural architecture search with higher efficiency and better performance. On the NASBench-101 dataset, DNAS achieves state-of-the-art performance of 94.21% using less than 1/13 computational cost of baseline methods. On ImageNet dataset, DNAS discovers the competitive architectures that achieves 22.7% test error. our method provides a new perspective of understanding neural architecture search.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Tree Inference: Response Time in a Binary Multinomial Processing Tree, Representation and Uniqueness of Parameters
Authors:
Richard Schweickert,
Xiaofang Zheng
Abstract:
A Multinomial Processing Tree (MPT) is a directed tree with a probability associated with each arc. Here we consider an additional parameter associated with each arc, a measure such as the time required to select the arc. MPTs are often used as models of tasks. Each vertex represents a process and an arc descending from a vertex represents selection of an outcome of the process. A source vertex re…
▽ More
A Multinomial Processing Tree (MPT) is a directed tree with a probability associated with each arc. Here we consider an additional parameter associated with each arc, a measure such as the time required to select the arc. MPTs are often used as models of tasks. Each vertex represents a process and an arc descending from a vertex represents selection of an outcome of the process. A source vertex represents processing that begins when a stimulus is presented and a terminal vertex represents making a response. Responses are partitioned into classes. An experimental factor selectively influences a vertex if changing the level of the factor changes parameter values on arcs descending from that vertex and on no others. Earlier work shows that if each of two experimental factors selectively influences a different vertex in an arbitrary MPT it is equivalent for the factors to one of two relatively simple MPTs. Which of the two applies depends on whether the two selectively influenced vertices are ordered by the factors or not. A special case, the Standard Binary Tree for Ordered Processes, arises if the vertices are so ordered and the factor selectively influencing the first vertex changes parameter values on only two arcs descending from that vertex. Here we derive necessary and sufficient conditions for the probability and measure associated with a particular response class to be accounted for by this special case. Parameter values are not unique and we give admissible transformations for transforming one set of parameter values to another. When an experiment with two factors is conducted, the number of observations and parameters to be estimated depend on the number of levels of each factor; we provide degrees of freedom.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
MathNet: Haar-Like Wavelet Multiresolution-Analysis for Graph Representation and Learning
Authors:
Xuebin Zheng,
Bingxin Zhou,
Ming Li,
Yu Guang Wang,
Junbin Gao
Abstract:
Graph Neural Networks (GNNs) have recently caught great attention and achieved significant progress in graph-level applications. In this paper, we propose a framework for graph neural networks with multiresolution Haar-like wavelets, or MathNet, with interrelated convolution and pooling strategies. The underlying method takes graphs in different structures as input and assembles consistent graph r…
▽ More
Graph Neural Networks (GNNs) have recently caught great attention and achieved significant progress in graph-level applications. In this paper, we propose a framework for graph neural networks with multiresolution Haar-like wavelets, or MathNet, with interrelated convolution and pooling strategies. The underlying method takes graphs in different structures as input and assembles consistent graph representations for readout layers, which then accomplishes label prediction. To achieve this, the multiresolution graph representations are first constructed and fed into graph convolutional layers for processing. The hierarchical graph pooling layers are then involved to downsample graph resolution while simultaneously remove redundancy within graph signals. The whole workflow could be formed with a multi-level graph analysis, which not only helps embed the intrinsic topological information of each graph into the GNN, but also supports fast computation of forward and adjoint graph transforms. We show by extensive experiments that the proposed framework obtains notable accuracy gains on graph classification and regression tasks with performance stability. The proposed MathNet outperforms various existing GNN models, especially on big data sets.
△ Less
Submitted 24 January, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification
Authors:
Chaochao Chen,
Jun Zhou,
Longfei Zheng,
Huiwen Wu,
Lingjuan Lyu,
Jia Wu,
Bingzhe Wu,
Ziqi Liu,
Li Wang,
Xiaolin Zheng
Abstract:
Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which i…
▽ More
Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.
△ Less
Submitted 24 April, 2022; v1 submitted 24 May, 2020;
originally announced May 2020.
-
Learned Multi-layer Residual Sparsifying Transform Model for Low-dose CT Reconstruction
Authors:
Xikai Yang,
Xuehang Zheng,
Yong Long,
Saiprasad Ravishankar
Abstract:
Signal models based on sparse representation have received considerable attention in recent years. Compared to synthesis dictionary learning, sparsifying transform learning involves highly efficient sparse coding and operator update steps. In this work, we propose a Multi-layer Residual Sparsifying Transform (MRST) learning model wherein the transform domain residuals are jointly sparsified over l…
▽ More
Signal models based on sparse representation have received considerable attention in recent years. Compared to synthesis dictionary learning, sparsifying transform learning involves highly efficient sparse coding and operator update steps. In this work, we propose a Multi-layer Residual Sparsifying Transform (MRST) learning model wherein the transform domain residuals are jointly sparsified over layers. In particular, the transforms for the deeper layers exploit the more intricate properties of the residual maps. We investigate the application of the learned MRST model for low-dose CT reconstruction using Penalized Weighted Least Squares (PWLS) optimization. Experimental results on Mayo Clinic data show that the MRST model outperforms conventional methods such as FBP and PWLS methods based on edge-preserving (EP) regularizer and single-layer transform (ST) model, especially for maintaining some subtle details.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Practical Privacy Preserving POI Recommendation
Authors:
Chaochao Chen,
Jun Zhou,
Bingzhe Wu,
Wenjin Fang,
Li Wang,
Yuan Qi,
Xiaolin Zheng
Abstract:
Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users' data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this paper, we propose a novel Privacy preserving POI Recommendation (PriRec) framewor…
▽ More
Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users' data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this paper, we propose a novel Privacy preserving POI Recommendation (PriRec) framework. First, to protect data privacy, users' private data (features and actions) are kept on their own side, e.g., Cellphone or Pad. Meanwhile, the public data need to be accessed by all the users are kept by the recommender to reduce the storage costs of users' devices. Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data depend on user-POI actions such as visited counts. The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to public with privacy guarantees. Second, PriRec follows the representations of Factorization Machine (FM) that consists of linear model and the feature interaction model. To protect the model privacy, the linear models are saved on users' side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively. The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt secure aggregation strategy in federated learning paradigm to learn it. To this end, PriRec keeps users' private raw data and models in users' own hands, and protects user privacy to a large extent. We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy.
△ Less
Submitted 27 April, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
Semi-supervised Learning Meets Factorization: Learning to Recommend with Chain Graph Model
Authors:
Chaochao Chen,
Kevin C. Chang,
Qibing Li,
Xiaolin Zheng
Abstract:
Recently latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. However, existing LFMs predict missing values in a user-item rating matrix only based on the known ones, and thus the sparsity of the rating matrix always limits their performance. Meanwhile, semi-supervised learning (SSL) provides an effective way to alleviate the…
▽ More
Recently latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. However, existing LFMs predict missing values in a user-item rating matrix only based on the known ones, and thus the sparsity of the rating matrix always limits their performance. Meanwhile, semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem by performing label propagation, which is mainly based on the smoothness insight on affinity graphs. However, graph-based SSL suffers serious scalability and graph unreliable problems when directly being applied to do recommendation. In this paper, we propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM. The proposed CGM is a combination of Bayesian network and Markov random field. The Bayesian network is used to model the rating generation and regression procedures, and the Markov random field is used to model the confidence-aware smoothness constraint between the generated ratings. Experimental results show that our proposed CGM significantly outperforms the state-of-the-art approaches in terms of four evaluation metrics, and with a larger performance margin when data sparsity increases.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods
Authors:
Bingxin Zhou,
Xuebin Zheng,
Junbin Gao
Abstract:
Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing due to the capability on large-scale sparse datasets with high computational efficiency. In this paper, we present a new framework for Adam-type methods with the trend information when updati…
▽ More
Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing due to the capability on large-scale sparse datasets with high computational efficiency. In this paper, we present a new framework for Adam-type methods with the trend information when updating the parameters with the adaptive step size and gradients. The additional terms in the algorithm promise an efficient movement on the complex cost surface, and thus the loss would converge more rapidly. We show empirically the importance of adding the trend component, where our framework outperforms the conventional Adam and AMSGrad methods constantly on the classical models with several real-world datasets.
△ Less
Submitted 15 December, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling
Authors:
Mengying Zhu,
Xiaolin Zheng,
Yan Wang,
Yuyuan Li,
Qianqiao Liang
Abstract:
As the cornerstone of modern portfolio theory, Markowitz's mean-variance optimization is considered a major model adopted in portfolio management. However, due to the difficulty of estimating its parameters, it cannot be applied to all periods. In some cases, naive strategies such as Equally-weighted and Value-weighted portfolios can even get better performance. Under these circumstances, we can u…
▽ More
As the cornerstone of modern portfolio theory, Markowitz's mean-variance optimization is considered a major model adopted in portfolio management. However, due to the difficulty of estimating its parameters, it cannot be applied to all periods. In some cases, naive strategies such as Equally-weighted and Value-weighted portfolios can even get better performance. Under these circumstances, we can use multiple classic strategies as multiple strategic arms in multi-armed bandit to naturally establish a connection with the portfolio selection problem. This can also help to maximize the rewards in the bandit algorithm by the trade-off between exploration and exploitation. In this paper, we present a portfolio bandit strategy through Thompson sampling which aims to make online portfolio choices by effectively exploiting the performances among multiple arms. Also, by constructing multiple strategic arms, we can obtain the optimal investment portfolio to adapt different investment periods. Moreover, we devise a novel reward function based on users' different investment risk preferences, which can be adaptive to various investment styles. Our experimental results demonstrate that our proposed portfolio strategy has marked superiority across representative real-world market datasets in terms of extensive evaluation criteria.
△ Less
Submitted 14 November, 2019; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Learning Sparse Nonparametric DAGs
Authors:
Xun Zheng,
Chen Dan,
Bryon Aragam,
Pradeep Ravikumar,
Eric P. Xing
Abstract:
We develop a framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data. Our approach is based on a recent algebraic characterization of DAGs that led to a fully continuous program for score-based learning of DAG models parametrized by a linear structural equation model (SEM). We extend this algebraic characterization to nonparametric SEM by leveraging nonparametric spars…
▽ More
We develop a framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data. Our approach is based on a recent algebraic characterization of DAGs that led to a fully continuous program for score-based learning of DAG models parametrized by a linear structural equation model (SEM). We extend this algebraic characterization to nonparametric SEM by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. Unlike existing approaches that require specific modeling choices, loss functions, or algorithms, we present a completely general framework that can be applied to general nonlinear models (e.g. without additive noise), general differentiable loss functions, and generic black-box optimization routines. The code is available at https://github.com/xunzheng/notears.
△ Less
Submitted 23 March, 2020; v1 submitted 28 September, 2019;
originally announced September 2019.
-
BCD-Net for Low-dose CT Reconstruction: Acceleration, Convergence, and Generalization
Authors:
Il Yong Chun,
Xuehang Zheng,
Yong Long,
Jeffrey A. Fessler
Abstract:
Obtaining accurate and reliable images from low-dose computed tomography (CT) is challenging. Regression convolutional neural network (CNN) models that are learned from training data are increasingly gaining attention in low-dose CT reconstruction. This paper modifies the architecture of an iterative regression CNN, BCD-Net, for fast, stable, and accurate low-dose CT reconstruction, and presents t…
▽ More
Obtaining accurate and reliable images from low-dose computed tomography (CT) is challenging. Regression convolutional neural network (CNN) models that are learned from training data are increasingly gaining attention in low-dose CT reconstruction. This paper modifies the architecture of an iterative regression CNN, BCD-Net, for fast, stable, and accurate low-dose CT reconstruction, and presents the convergence property of the modified BCD-Net. Numerical results with phantom data show that applying faster numerical solvers to model-based image reconstruction (MBIR) modules of BCD-Net leads to faster and more accurate BCD-Net; BCD-Net significantly improves the reconstruction accuracy, compared to the state-of-the-art MBIR method using learned transforms; BCD-Net achieves better image quality, compared to a state-of-the-art iterative NN architecture, ADMM-Net. Numerical results with clinical data show that BCD-Net generalizes significantly better than a state-of-the-art deep (non-iterative) regression NN, FBPConvNet, that lacks MBIR modules.
△ Less
Submitted 4 August, 2019;
originally announced August 2019.
-
Two-layer Residual Sparsifying Transform Learning for Image Reconstruction
Authors:
Xuehang Zheng,
Saiprasad Ravishankar,
Yong Long,
Marc Louis Klasky,
Brendt Wohlberg
Abstract:
Signal models based on sparsity, low-rank and other properties have been exploited for image reconstruction from limited and corrupted data in medical imaging and other computational imaging applications. In particular, sparsifying transform models have shown promise in various applications, and offer numerous advantages such as efficiencies in sparse coding and learning. This work investigates pr…
▽ More
Signal models based on sparsity, low-rank and other properties have been exploited for image reconstruction from limited and corrupted data in medical imaging and other computational imaging applications. In particular, sparsifying transform models have shown promise in various applications, and offer numerous advantages such as efficiencies in sparse coding and learning. This work investigates pre-learning a two-layer extension of the transform model for image reconstruction, wherein the transform domain or filtering residuals of the image are further sparsified in the second layer. The proposed block coordinate descent optimization algorithms involve highly efficient updates. Preliminary numerical experiments demonstrate the usefulness of a two-layer model over the previous related schemes for CT image reconstruction from low-dose measurements.
△ Less
Submitted 7 January, 2020; v1 submitted 1 June, 2019;
originally announced June 2019.
-
swTVM: Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core Processor
Authors:
Mingzhen Li,
Changxi Liu,
Jianjin Liao,
Xuegui Zheng,
Hailong Yang,
Rujun Sun,
Jun Xu,
Lin Gan,
Guangwen Yang,
Zhongzhi Luan,
Depei Qian
Abstract:
The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the existing deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core p…
▽ More
The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the existing deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific computing and deep learning workloads. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient codes for deep learning workloads on Sunway. The experiment results show that the codes generated by swTVM achieves 1.79x on average compared to the state-of-the-art deep learning framework on Sunway, across six representative benchmarks. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and Sunway processor particularly with productivity and efficiency in mind. We believe this work will encourage more people to embrace the power of deep learning and Sunway many-core processor.
△ Less
Submitted 11 July, 2022; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Gradient Regularized Budgeted Boosting
Authors:
Zhixiang Eddie Xu,
Matt J. Kusner,
Kilian Q. Weinberger,
Alice X. Zheng
Abstract:
As machine learning transitions increasingly towards real world applications controlling the test-time cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate test-time budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to th…
▽ More
As machine learning transitions increasingly towards real world applications controlling the test-time cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate test-time budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to the supervised learning scenario where sufficient amounts of labeled data are available. In this paper we investigate the common scenario where labeled data is scarce but unlabeled data is available in abundance. We propose an algorithm that leverages the unlabeled data (through Laplace smoothing) and learns classifiers with budget constraints. Our model, based on gradient boosted regression trees (GBRT), is, to our knowledge, the first algorithm for semi-supervised budgeted learning.
△ Less
Submitted 26 January, 2019; v1 submitted 13 January, 2019;
originally announced January 2019.
-
Gradient Boosted Feature Selection
Authors:
Zhixiang Eddie Xu,
Gao Huang,
Kilian Q. Weinberger,
Alice X. Zheng
Abstract:
A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four…
▽ More
A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straight-forward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or out-performs other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domain-specific side information.
△ Less
Submitted 13 January, 2019;
originally announced January 2019.
-
Quantile Treatment Effects and Bootstrap Inference under Covariate-Adaptive Randomization
Authors:
Yichong Zhang,
Xin Zheng
Abstract:
In this paper, we study the estimation and inference of the quantile treatment effect under covariate-adaptive randomization. We propose two estimation methods: (1) the simple quantile regression and (2) the inverse propensity score weighted quantile regression. For the two estimators, we derive their asymptotic distributions uniformly over a compact set of quantile indexes, and show that, when th…
▽ More
In this paper, we study the estimation and inference of the quantile treatment effect under covariate-adaptive randomization. We propose two estimation methods: (1) the simple quantile regression and (2) the inverse propensity score weighted quantile regression. For the two estimators, we derive their asymptotic distributions uniformly over a compact set of quantile indexes, and show that, when the treatment assignment rule does not achieve strong balance, the inverse propensity score weighted estimator has a smaller asymptotic variance than the simple quantile regression estimator. For the inference of method (1), we show that the Wald test using a weighted bootstrap standard error under-rejects. But for method (2), its asymptotic size equals the nominal level. We also show that, for both methods, the asymptotic size of the Wald test using a covariate-adaptive bootstrap standard error equals the nominal level. We illustrate the finite sample performance of the new estimation and inference methods using both simulated and real datasets.
△ Less
Submitted 24 February, 2020; v1 submitted 27 December, 2018;
originally announced December 2018.
-
Our Practice Of Using Machine Learning To Recognize Species By Voice
Authors:
Siddhardha Balemarthy,
Atul Sajjanhar,
James Xi Zheng
Abstract:
As the technology is advancing, audio recognition in machine learning is improved as well. Research in audio recognition has traditionally focused on speech. Living creatures (especially the small ones) are part of the whole ecosystem, monitoring as well as maintaining them are important tasks. Species such as animals and birds are tending to change their activities as well as their habitats due t…
▽ More
As the technology is advancing, audio recognition in machine learning is improved as well. Research in audio recognition has traditionally focused on speech. Living creatures (especially the small ones) are part of the whole ecosystem, monitoring as well as maintaining them are important tasks. Species such as animals and birds are tending to change their activities as well as their habitats due to the adverse effects on the environment or due to other natural or man-made calamities. For those in far deserted areas, we will not have any idea about their existence until we can continuously monitor them. Continuous monitoring will take a lot of hard work and labor. If there is no continuous monitoring, then there might be instances where endangered species may encounter dangerous situations. The best way to monitor those species are through audio recognition. Classifying sound can be a difficult task even for humans. Powerful audio signals and their processing techniques make it possible to detect audio of various species. There might be many ways wherein audio recognition can be done. We can train machines either by pre-recorded audio files or by recording them live and detecting them. The audio of species can be detected by removing all the background noise and echoes. Smallest sound is considered as a syllable. Extracting various syllables is the process we are focusing on which is known as audio recognition in terms of Machine Learning (ML).
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Efficient Tensor Decomposition with Boolean Factors
Authors:
Sung-En Chang,
Xun Zheng,
Ian E. H. Yen,
Pradeep Ravikumar,
Rose Yu
Abstract:
Tensor decomposition has been extensively used as a tool for exploratory analysis. Motivated by neuroscience applications, we study tensor decomposition with Boolean factors. The resulting optimization problem is challenging due to the non-convex objective and the combinatorial constraints. We propose Binary Matching Pursuit (BMP), a novel generalization of the matching pursuit strategy to decompo…
▽ More
Tensor decomposition has been extensively used as a tool for exploratory analysis. Motivated by neuroscience applications, we study tensor decomposition with Boolean factors. The resulting optimization problem is challenging due to the non-convex objective and the combinatorial constraints. We propose Binary Matching Pursuit (BMP), a novel generalization of the matching pursuit strategy to decompose the tensor efficiently. BMP iteratively searches for atoms in a greedy fashion. The greedy atom search step is solved efficiently via a MAXCUT-like boolean quadratic program. We prove that BMP is guaranteed to converge sublinearly to the optimal solution and recover the factors under mild identifiability conditions. Experiments demonstrate the superior performance of our method over baselines on synthetic and real datasets. We also showcase the application of BMP in quantifying neural interactions underlying high-resolution spatiotemporal ECoG recordings.
△ Less
Submitted 11 November, 2020; v1 submitted 10 October, 2018;
originally announced October 2018.
-
Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning
Authors:
Baoxiang Wang,
Tongfang Sun,
Xianjun Sam Zheng
Abstract:
In recent years, reinforcement learning (RL) methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go, and Poker. However, those studies mostly focus on winning the game and have largely ignored the rich and complex human motivations, which are essential for understanding different players' diverse behaviors. In th…
▽ More
In recent years, reinforcement learning (RL) methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go, and Poker. However, those studies mostly focus on winning the game and have largely ignored the rich and complex human motivations, which are essential for understanding different players' diverse behaviors. In this paper, we present a novel method called Multi-Motivation Behavior Modeling (MMBM) that takes the multifaceted human motivations into consideration and models the underlying value structure of the players using inverse RL. Our approach does not require the access to the dynamic of the system, making it feasible to model complex interactive environments such as massively multiplayer online games. MMBM is tested on the World of Warcraft Avatar History dataset, which recorded over 70,000 users' gameplay spanning three years period. Our model reveals the significant difference of value structures among different player groups. Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players' behaviors.
△ Less
Submitted 5 July, 2018; v1 submitted 1 July, 2018;
originally announced July 2018.
-
DAGs with NO TEARS: Continuous Optimization for Structure Learning
Authors:
Xun Zheng,
Bryon Aragam,
Pradeep Ravikumar,
Eric P. Xing
Abstract:
Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the stru…
▽ More
Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely \emph{continuous} optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, which also makes implementation effortless. The proposed method outperforms existing ones, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree. Code implementing the proposed algorithm is open-source and publicly available at https://github.com/xunzheng/notears.
△ Less
Submitted 2 November, 2018; v1 submitted 4 March, 2018;
originally announced March 2018.
-
Beyond Keywords and Relevance: A Personalized Ad Retrieval Framework in E-Commerce Sponsored Search
Authors:
Su Yan,
Wei Lin,
Tianshu Wu,
Daorui Xiao,
Xu Zheng,
Bo Wu,
Kaipeng Liu
Abstract:
On most sponsored search platforms, advertisers bid on some keywords for their advertisements (ads). Given a search request, ad retrieval module rewrites the query into bidding keywords, and uses these keywords as keys to select Top N ads through inverted indexes. In this way, an ad will not be retrieved even if queries are related when the advertiser does not bid on corresponding keywords. Moreov…
▽ More
On most sponsored search platforms, advertisers bid on some keywords for their advertisements (ads). Given a search request, ad retrieval module rewrites the query into bidding keywords, and uses these keywords as keys to select Top N ads through inverted indexes. In this way, an ad will not be retrieved even if queries are related when the advertiser does not bid on corresponding keywords. Moreover, most ad retrieval approaches regard rewriting and ad-selecting as two separated tasks, and focus on boosting relevance between search queries and ads. Recently, in e-commerce sponsored search more and more personalized information has been introduced, such as user profiles, long-time and real-time clicks. Personalized information makes ad retrieval able to employ more elements (e.g. real-time clicks) as search signals and retrieval keys, however it makes ad retrieval more difficult to measure ads retrieved through different signals. To address these problems, we propose a novel ad retrieval framework beyond keywords and relevance in e-commerce sponsored search. Firstly, we employ historical ad click data to initialize a hierarchical network representing signals, keys and ads, in which personalized information is introduced. Then we train a model on top of the hierarchical network by learning the weights of edges. Finally we select the best edges according to the model, boosting RPM/CTR. Experimental results on our e-commerce platform demonstrate that our ad retrieval framework achieves good performance.
△ Less
Submitted 23 April, 2018; v1 submitted 28 December, 2017;
originally announced December 2017.
-
Neural Collaborative Autoencoder
Authors:
Qibing Li,
Xiaolin Zheng,
Xinyue Wu
Abstract:
In recent years, deep neural networks have yielded state-of-the-art performance on several tasks. Although some recent works have focused on combining deep learning with recommendation, we highlight three issues of existing models. First, these models cannot work on both explicit and implicit feedback, since the network structures are specially designed for one particular case. Second, due to the…
▽ More
In recent years, deep neural networks have yielded state-of-the-art performance on several tasks. Although some recent works have focused on combining deep learning with recommendation, we highlight three issues of existing models. First, these models cannot work on both explicit and implicit feedback, since the network structures are specially designed for one particular case. Second, due to the difficulty on training deep neural networks, existing explicit models do not fully exploit the expressive potential of deep learning. Third, neural network models are easier to overfit on the implicit setting than shallow models. To tackle these issues, we present a generic recommender framework called Neural Collaborative Autoencoder (NCAE) to perform collaborative filtering, which works well for both explicit feedback and implicit feedback. NCAE can effectively capture the subtle hidden relationships between interactions via a non-linear matrix factorization process. To optimize the deep architecture of NCAE, we develop a three-stage pre-training mechanism that combines supervised and unsupervised feature learning. Moreover, to prevent overfitting on the implicit setting, we propose an error reweighting module and a sparsity-aware data-augmentation strategy. Extensive experiments on three real-world datasets demonstrate that NCAE can significantly advance the state-of-the-art.
△ Less
Submitted 19 December, 2018; v1 submitted 25 December, 2017;
originally announced December 2017.
-
State Space LSTM Models with Particle MCMC Inference
Authors:
Xun Zheng,
Manzil Zaheer,
Amr Ahmed,
Yuan Wang,
Eric P Xing,
Alexander J Smola
Abstract:
Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike…
▽ More
Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike \cite{zaheer2017latent}, we do not make any factorization assumptions in our inference algorithm. We present an efficient sampler based on sequential Monte Carlo (SMC) method that draws from the joint posterior directly. Experimental results confirms the superiority and stability of this SMC inference algorithm on a variety of domains.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Sparse-View X-Ray CT Reconstruction Using $\ell_1$ Prior with Learned Transform
Authors:
Xuehang Zheng,
Il Yong Chun,
Zhipeng Li,
Yong Long,
Jeffrey A. Fessler
Abstract:
A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sp…
▽ More
A major challenge in X-ray computed tomography (CT) is reducing radiation dose while maintaining high quality of reconstructed images. To reduce the radiation dose, one can reduce the number of projection views (sparse-view CT); however, it becomes difficult to achieve high-quality image reconstruction as the number of projection views decreases. Researchers have applied the concept of learning sparse representations from (high-quality) CT image dataset to the sparse-view CT reconstruction. We propose a new statistical CT reconstruction model that combines penalized weighted-least squares (PWLS) and $\ell_1$ prior with learned sparsifying transform (PWLS-ST-$\ell_1$), and a corresponding efficient algorithm based on Alternating Direction Method of Multipliers (ADMM). To moderate the difficulty of tuning ADMM parameters, we propose a new ADMM parameter selection scheme based on approximated condition numbers. We interpret the proposed model by analyzing the minimum mean square error of its ($\ell_2$-norm relaxed) image update estimator. Our results with the extended cardiac-torso (XCAT) phantom data and clinical chest data show that, for sparse-view 2D fan-beam CT and 3D axial cone-beam CT, PWLS-ST-$\ell_1$ improves the quality of reconstructed images compared to the CT reconstruction methods using edge-preserving regularizer and $\ell_2$ prior with learned ST. These results also show that, for sparse-view 2D fan-beam CT, PWLS-ST-$\ell_1$ achieves comparable or better image quality and requires much shorter runtime than PWLS-DL using a learned overcomplete dictionary. Our results with clinical chest data show that, methods using the unsupervised learned prior generalize better than a state-of-the-art deep "denoising" neural network that does not use a physical imaging model.
△ Less
Submitted 15 September, 2019; v1 submitted 2 November, 2017;
originally announced November 2017.
-
Low Dose CT Image Reconstruction With Learned Sparsifying Transform
Authors:
Xuehang Zheng,
Zening Lu,
Saiprasad Ravishankar,
Yong Long,
Jeffrey A. Fessler
Abstract:
A major challenge in computed tomography (CT) is to reduce X-ray dose to a low or even ultra-low level while maintaining the high quality of reconstructed images. We propose a new method for CT reconstruction that combines penalized weighted-least squares reconstruction (PWLS) with regularization based on a sparsifying transform (PWLS-ST) learned from a dataset of numerous CT images. We adopt an a…
▽ More
A major challenge in computed tomography (CT) is to reduce X-ray dose to a low or even ultra-low level while maintaining the high quality of reconstructed images. We propose a new method for CT reconstruction that combines penalized weighted-least squares reconstruction (PWLS) with regularization based on a sparsifying transform (PWLS-ST) learned from a dataset of numerous CT images. We adopt an alternating algorithm to optimize the PWLS-ST cost function that alternates between a CT image update step and a sparse coding step. We adopt a relaxed linearized augmented Lagrangian method with ordered-subsets (relaxed OS-LALM) to accelerate the CT image update step by reducing the number of forward and backward projections. Numerical experiments on the XCAT phantom show that for low dose levels, the proposed PWLS-ST method dramatically improves the quality of reconstructed images compared to PWLS reconstruction with a nonadaptive edge-preserving regularizer (PWLS-EP).
△ Less
Submitted 10 July, 2017;
originally announced July 2017.
-
PWLS-ULTRA: An Efficient Clustering and Learning-Based Approach for Low-Dose 3D CT Image Reconstruction
Authors:
Xuehang Zheng,
Saiprasad Ravishankar,
Yong Long,
Jeffrey A. Fessler
Abstract:
The development of computed tomography (CT) image reconstruction methods that significantly reduce patient radiation exposure while maintaining high image quality is an important area of research in low-dose CT (LDCT) imaging. We propose a new penalized weighted least squares (PWLS) reconstruction method that exploits regularization based on an efficient Union of Learned TRAnsforms (PWLS-ULTRA). T…
▽ More
The development of computed tomography (CT) image reconstruction methods that significantly reduce patient radiation exposure while maintaining high image quality is an important area of research in low-dose CT (LDCT) imaging. We propose a new penalized weighted least squares (PWLS) reconstruction method that exploits regularization based on an efficient Union of Learned TRAnsforms (PWLS-ULTRA). The union of square transforms is pre-learned from numerous image patches extracted from a dataset of CT images or volumes. The proposed PWLS-based cost function is optimized by alternating between a CT image reconstruction step, and a sparse coding and clustering step. The CT image reconstruction step is accelerated by a relaxed linearized augmented Lagrangian method with ordered-subsets that reduces the number of forward and back projections. Simulations with 2-D and 3-D axial CT scans of the extended cardiac-torso phantom and 3D helical chest and abdomen scans show that for both normal-dose and low-dose levels, the proposed method significantly improves the quality of reconstructed images compared to PWLS reconstruction with a nonadaptive edge-preserving regularizer (PWLS-EP). PWLS with regularization based on a union of learned transforms leads to better image reconstructions than using a single learned square transform. We also incorporate patch-based weights in PWLS-ULTRA that enhance image quality and help improve image resolution uniformity. The proposed approach achieves comparable or better image quality compared to learned overcomplete synthesis dictionaries, but importantly, is much faster (computationally more efficient).
△ Less
Submitted 1 June, 2018; v1 submitted 27 March, 2017;
originally announced March 2017.
-
LightLDA: Big Topic Models on Modest Compute Clusters
Authors:
Jinhui Yuan,
Fei Gao,
Qirong Ho,
Wei Dai,
Jinliang Wei,
Xun Zheng,
Eric P. Xing,
Tie-Yan Liu,
Wei-Ying Ma
Abstract:
When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers. We consider this challenge in the context of topic modeling on web-scale corpora, and show that with a modest cluste…
▽ More
When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers. We consider this challenge in the context of topic modeling on web-scale corpora, and show that with a modest cluster of as few as 8 machines, we can train a topic model with 1 million topics and a 1-million-word vocabulary (for a total of 1 trillion parameters), on a document collection with 200 billion tokens -- a scale not yet reported even with thousands of machines. Our major contributions include: 1) a new, highly efficient O(1) Metropolis-Hastings sampling algorithm, whose running cost is (surprisingly) agnostic of model size, and empirically converges nearly an order of magnitude faster than current state-of-the-art Gibbs samplers; 2) a structure-aware model-parallel scheme, which leverages dependencies within the topic model, yielding a sampling strategy that is frugal on machine memory and network communication; 3) a differential data-structure for model storage, which uses separate data structures for high- and low-frequency words to allow extremely large models to fit in memory, while maintaining high inference speed; and 4) a bounded asynchronous data-parallel scheme, which allows efficient distributed processing of massive data via a parameter server. Our distribution strategy is an instance of the model-and-data-parallel programming model underlying the Petuum framework for general distributed ML, and was implemented on top of the Petuum open-source system. We provide experimental evidence showing how this development puts massive models within reach on a small cluster while still enjoying proportional time cost reductions with increasing cluster size, in comparison with alternative options.
△ Less
Submitted 4 December, 2014;
originally announced December 2014.
-
Model-Parallel Inference for Big Topic Models
Authors:
Xun Zheng,
Jin Kyu Kim,
Qirong Ho,
Eric P. Xing
Abstract:
In real world industrial applications of topic modeling, the ability to capture gigantic conceptual space by learning an ultra-high dimensional topical representation, i.e., the so-called "big model", is becoming the next desideratum after enthusiasms on "big data", especially for fine-grained downstream tasks such as online advertising, where good performances are usually achieved by regression-b…
▽ More
In real world industrial applications of topic modeling, the ability to capture gigantic conceptual space by learning an ultra-high dimensional topical representation, i.e., the so-called "big model", is becoming the next desideratum after enthusiasms on "big data", especially for fine-grained downstream tasks such as online advertising, where good performances are usually achieved by regression-based predictors built on millions if not billions of input features. The conventional data-parallel approach for training gigantic topic models turns out to be rather inefficient in utilizing the power of parallelism, due to the heavy dependency on a centralized image of "model". Big model size also poses another challenge on the storage, where available model size is bounded by the smallest RAM of nodes. To address these issues, we explore another type of parallelism, namely model-parallelism, which enables training of disjoint blocks of a big topic model in parallel. By integrating data-parallelism with model-parallelism, we show that dependencies between distributed elements can be handled seamlessly, achieving not only faster convergence but also an ability to tackle significantly bigger model size. We describe an architecture for model-parallel inference of LDA, and present a variant of collapsed Gibbs sampling algorithm tailored for it. Experimental results demonstrate the ability of this system to handle topic modeling with unprecedented amount of 200 billion model variables only on a low-end cluster with very limited computational resources and bandwidth.
△ Less
Submitted 9 November, 2014;
originally announced November 2014.
-
Primitives for Dynamic Big Model Parallelism
Authors:
Seunghak Lee,
Jin Kyu Kim,
Xun Zheng,
Qirong Ho,
Garth A. Gibson,
Eric P. Xing
Abstract:
When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A natural recourse is to turn to distributed cluster computing, in order to harness additional memory and processors. However, naive, unstructured parallelization of M…
▽ More
When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A natural recourse is to turn to distributed cluster computing, in order to harness additional memory and processors. However, naive, unstructured parallelization of ML algorithms can make inefficient use of distributed memory, while failing to obtain proportional convergence speedups - or can even result in divergence. We develop a framework of primitives for dynamic model-parallelism, STRADS, in order to explore partitioning and update scheduling of model variables in distributed ML algorithms - thus improving their memory efficiency while presenting new opportunities to speed up convergence without compromising inference correctness. We demonstrate the efficacy of model-parallel algorithms implemented in STRADS versus popular implementations for Topic Modeling, Matrix Factorization and Lasso.
△ Less
Submitted 17 June, 2014;
originally announced June 2014.