-
On the Practice of Deep Hierarchical Ensemble Network for Ad Conversion Rate Prediction
Authors:
Jinfeng Zhuang,
Yinrui Li,
Runze Su,
Ke Xu,
Zhixuan Shao,
Kungang Li,
Ling Leng,
Han Sun,
Meng Qi,
Yixiong Meng,
Yang Tang,
Zhifang Liu,
Qifei Shen,
Aayush Mudgal,
Caleb Lu,
Jie Liu,
Hongda Shen
Abstract:
The predictions of click through rate (CTR) and conversion rate (CVR) play a crucial role in the success of ad-recommendation systems. A Deep Hierarchical Ensemble Network (DHEN) has been proposed to integrate multiple feature crossing modules and has achieved great success in CTR prediction. However, its performance for CVR prediction is unclear in the conversion ads setting, where an ad bids for…
▽ More
The predictions of click through rate (CTR) and conversion rate (CVR) play a crucial role in the success of ad-recommendation systems. A Deep Hierarchical Ensemble Network (DHEN) has been proposed to integrate multiple feature crossing modules and has achieved great success in CTR prediction. However, its performance for CVR prediction is unclear in the conversion ads setting, where an ad bids for the probability of a user's off-site actions on a third party website or app, including purchase, add to cart, sign up, etc. A few challenges in DHEN: 1) What feature-crossing modules (MLP, DCN, Transformer, to name a few) should be included in DHEN? 2) How deep and wide should DHEN be to achieve the best trade-off between efficiency and efficacy? 3) What hyper-parameters to choose in each feature-crossing module? Orthogonal to the model architecture, the input personalization features also significantly impact model performance with a high degree of freedom. In this paper, we attack this problem and present our contributions biased to the applied data science side, including:
First, we propose a multitask learning framework with DHEN as the single backbone model architecture to predict all CVR tasks, with a detailed study on how to make DHEN work effectively in practice; Second, we build both on-site real-time user behavior sequences and off-site conversion event sequences for CVR prediction purposes, and conduct ablation study on its importance; Last but not least, we propose a self-supervised auxiliary loss to predict future actions in the input sequence, to help resolve the label sparseness issue in CVR prediction.
Our method achieves state-of-the-art performance compared to previous single feature crossing modules with pre-trained user personalization features.
△ Less
Submitted 23 April, 2025; v1 submitted 10 April, 2025;
originally announced April 2025.
-
See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers
Authors:
Jiaxin Zhuang,
Leon Yan,
Zhenwei Zhang,
Ruiqi Wang,
Jiawei Zhang,
Yuantao Gu
Abstract:
Time series anomaly detection (TSAD) is becoming increasingly vital due to the rapid growth of time series data across various sectors. Anomalies in web service data, for example, can signal critical incidents such as system failures or server malfunctions, necessitating timely detection and response. However, most existing TSAD methodologies rely heavily on manual feature engineering or require e…
▽ More
Time series anomaly detection (TSAD) is becoming increasingly vital due to the rapid growth of time series data across various sectors. Anomalies in web service data, for example, can signal critical incidents such as system failures or server malfunctions, necessitating timely detection and response. However, most existing TSAD methodologies rely heavily on manual feature engineering or require extensive labeled training data, while also offering limited interpretability. To address these challenges, we introduce a pioneering framework called the Time Series Anomaly Multimodal Analyzer (TAMA), which leverages the power of Large Multimodal Models (LMMs) to enhance both the detection and interpretation of anomalies in time series data. By converting time series into visual formats that LMMs can efficiently process, TAMA leverages few-shot in-context learning capabilities to reduce dependence on extensive labeled datasets. Our methodology is validated through rigorous experimentation on multiple real-world datasets, where TAMA consistently outperforms state-of-the-art methods in TSAD tasks. Additionally, TAMA provides rich, natural language-based semantic analysis, offering deeper insights into the nature of detected anomalies. Furthermore, we contribute one of the first open-source datasets that includes anomaly detection labels, anomaly type labels, and contextual description, facilitating broader exploration and advancement within this critical field. Ultimately, TAMA not only excels in anomaly detection but also provides a comprehensive approach for understanding the underlying causes of anomalies, pushing TSAD forward through innovative methodologies and insights.
△ Less
Submitted 4 November, 2024;
originally announced November 2024.
-
Debiasing Machine Unlearning with Counterfactual Examples
Authors:
Ziheng Chen,
Jia Wang,
Jun Zhuang,
Abbavaram Gowtham Reddy,
Fabrizio Silvestri,
Jin Huang,
Kaushiki Nag,
Kun Kuang,
Xin Ning,
Gabriele Tolomei
Abstract:
The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1…
▽ More
The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Exact simulation of extrinsic stress-release processes
Authors:
Young Lee,
Patrick J. Laub,
Thomas Taimre,
Hongbiao Zhao,
Jiancang Zhuang
Abstract:
We present a new and straightforward algorithm that simulates exact sample paths for a generalized stress-release process. The computation of the exact law of the joint interarrival times is detailed and used to derive this algorithm. Furthermore, the martingale generator of the process is derived and induces theoretical moments which generalize some results of Borovkov & Vere-Jones (2000) and are…
▽ More
We present a new and straightforward algorithm that simulates exact sample paths for a generalized stress-release process. The computation of the exact law of the joint interarrival times is detailed and used to derive this algorithm. Furthermore, the martingale generator of the process is derived and induces theoretical moments which generalize some results of Borovkov & Vere-Jones (2000) and are used to demonstrate the validity of our simulation algorithm.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Demographic-Guided Attention in Recurrent Neural Networks for Modeling Neuropathophysiological Heterogeneity
Authors:
Nicha C. Dvornek,
Xiaoxiao Li,
Juntang Zhuang,
Pamela Ventola,
James S. Duncan
Abstract:
Heterogeneous presentation of a neurological disorder suggests potential differences in the underlying pathophysiological changes that occur in the brain. We propose to model heterogeneous patterns of functional network differences using a demographic-guided attention (DGA) mechanism for recurrent neural network models for prediction from functional magnetic resonance imaging (fMRI) time-series da…
▽ More
Heterogeneous presentation of a neurological disorder suggests potential differences in the underlying pathophysiological changes that occur in the brain. We propose to model heterogeneous patterns of functional network differences using a demographic-guided attention (DGA) mechanism for recurrent neural network models for prediction from functional magnetic resonance imaging (fMRI) time-series data. The context computed from the DGA head is used to help focus on the appropriate functional networks based on individual demographic information. We demonstrate improved classification on 3 subsets of the ABIDE I dataset used in published studies that have previously produced state-of-the-art results, evaluating performance under a leave-one-site-out cross-validation framework for better generalizeability to new data. Finally, we provide examples of interpreting functional network differences based on individual demographic variables.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Authors:
Juntang Zhuang,
Tommy Tang,
Yifan Ding,
Sekhar Tatikonda,
Nicha Dvornek,
Xenophon Papademetris,
James S. Duncan
Abstract:
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptiv…
▽ More
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability.We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the stepsize according to the "belief" in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer
△ Less
Submitted 20 December, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Pooling Regularized Graph Neural Network for fMRI Biomarker Analysis
Authors:
Xiaoxiao Li,
Yuan Zhou,
Nicha C. Dvornek,
Muhan Zhang,
Juntang Zhuang,
Pamela Ventola,
James S Duncan
Abstract:
Understanding how certain brain regions relate to a specific neurological disorder has been an important area of neuroimaging research. A promising approach to identify the salient regions is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, e.g. brain networks constructed by functional magnetic resonance imaging (fMRI). We propose an interpretable GNN framewo…
▽ More
Understanding how certain brain regions relate to a specific neurological disorder has been an important area of neuroimaging research. A promising approach to identify the salient regions is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, e.g. brain networks constructed by functional magnetic resonance imaging (fMRI). We propose an interpretable GNN framework with a novel salient region selection mechanism to determine neurological brain biomarkers associated with disorders. Specifically, we design novel regularized pooling layers that highlight salient regions of interests (ROIs) so that we can infer which ROIs are important to identify a certain disease based on the node pooling scores calculated by the pooling layers. Our proposed framework, Pooling Regularized-GNN (PR-GNN), encourages reasonable ROI-selection and provides flexibility to preserve either individual- or group-level patterns. We apply the PR-GNN framework on a Biopoint Autism Spectral Disorder (ASD) fMRI dataset. We investigate different choices of the hyperparameters and show that PR-GNN outperforms baseline methods in terms of classification accuracy. The salient ROI detection results show high correspondence with the previous neuroimaging-derived biomarkers for ASD.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE
Authors:
Juntang Zhuang,
Nicha Dvornek,
Xiaoxiao Li,
Sekhar Tatikonda,
Xenophon Papademetris,
James Duncan
Abstract:
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-m…
▽ More
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method directly back-propagates through ODE solvers, but suffers from a redundantly deep computation graph when searching for the optimal stepsize. We propose the Adaptive Checkpoint Adjoint (ACA) method: in automatic differentiation, ACA applies a trajectory checkpoint strategy which records the forward-mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive solvers. On image classification tasks, compared with the adjoint and naive method, ACA achieves half the error rate in half the training time; NODE trained with ACA outperforms ResNet in both accuracy and test-retest reliability. On time-series modeling, ACA outperforms competing methods. Finally, in an example of the three-body problem, we show NODE with ACA can incorporate physical knowledge to achieve better accuracy. We provide the PyTorch implementation of ACA: \url{https://github.com/juntang-zhuang/torch-ACA}.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Statistical Simulator for the Engine Knock
Authors:
Xun Shen,
Tinghui Ouyang,
Jiancang Zhuang,
Chanyut Khajorntraidet
Abstract:
This paper proposes a statistical simulator for the engine knock based on the Mixture Density Network (MDN) and the accept-reject method. The proposed simulator can generate the random knock intensity signal corresponding to the input signal. The generated knock intensity has a consistent probability distribution with the real engine. Firstly, the statistical analysis is conducted with the experim…
▽ More
This paper proposes a statistical simulator for the engine knock based on the Mixture Density Network (MDN) and the accept-reject method. The proposed simulator can generate the random knock intensity signal corresponding to the input signal. The generated knock intensity has a consistent probability distribution with the real engine. Firstly, the statistical analysis is conducted with the experimental data. From the analysis results, some important assumptions on the statistical properties of the knock intensity are made. Regarding the knock intensity as a random variable on the discrete-time index, it is independent and identically distributed if the input of the engine is identical. The probability distribution of the knock intensity under identical input can be approximated by the Gaussian Mixture Model(GMM). The parameter of the GMM is a function of the input. Based on these assumptions, two sub-problems for establishing the statistical simulator are formulated: One is to approximate the function from input to the parameters of the knock intensity distribution with an absolutely continuous function; The other one is to design a random number generator that outputs the random data consistent with the given distribution. The MDN is applied to approximate the probability density of the knock intensity and the accept-reject algorithm is used for the random number generator design. The proposed method is evaluated in experimental data-based validation.
△ Less
Submitted 15 February, 2020;
originally announced February 2020.
-
Approximate Uncertain Program
Authors:
Xun Shen,
Jiancang Zhuang,
Xingguo Zhang
Abstract:
Chance constrained program where one seeks to minimize an objective over decisions which satisfy randomly disturbed constraints with a given probability is computationally intractable. This paper proposes an approximate approach to address chance constrained program. Firstly, a single layer neural-network is used to approximate the function from decision domain to violation probability domain. The…
▽ More
Chance constrained program where one seeks to minimize an objective over decisions which satisfy randomly disturbed constraints with a given probability is computationally intractable. This paper proposes an approximate approach to address chance constrained program. Firstly, a single layer neural-network is used to approximate the function from decision domain to violation probability domain. The algorithm for updating parameters in single layer neural-network adopts sequential extreme learning machine. Based on the neural violation probability approximate model, a randomized algorithm is then proposed to approach the optimizer in the probabilistic feasible domain of decision. In the randomized algorithm, samples are extracted from decision domain uniformly at first. Then, violation probabilities of all samples are calculated according to neural violation probability approximate model. The ones with violation probability higher than the required level are discarded. The minimizer in the remained feasible decision samples is used to update sampling policy. The policy converges to the optimal feasible decision. Numerical simulations are implemented to validate the proposed method for non-convex problems comparing with scenario approach and parallel randomized algorithm. The result shows that proposed method have improved performance.
△ Less
Submitted 11 November, 2019; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Parallel Randomized Algorithm for Chance Constrained Program
Authors:
Xun Shen,
Jiancang Zhuang,
Xingguo Zhang
Abstract:
Chance constrained program is computationally intractable due to the existence of chance constraints, which are randomly disturbed and should be satisfied with a probability. This paper proposes a two-layer randomized algorithm to address chance constrained program. Randomized optimization is applied to search the optimizer which satisfies chance constraints in a framework of parallel algorithm. F…
▽ More
Chance constrained program is computationally intractable due to the existence of chance constraints, which are randomly disturbed and should be satisfied with a probability. This paper proposes a two-layer randomized algorithm to address chance constrained program. Randomized optimization is applied to search the optimizer which satisfies chance constraints in a framework of parallel algorithm. Firstly, multiple decision samples are extracted uniformly in the decision domain without considering the chance constraints. Then, in the second sampling layer, violation probabilities of all the extracted decision samples are checked by extracting the disturbance samples and calculating the corresponding violation probabilities. The decision samples with violation probabilities higher than the required level are discarded. The minimizer of the cost function among the remained feasible decision samples are used to update optimizer iteratively. Numerical simulations are implemented to validate the proposed method for non-convex problems comparing with scenario approach. The proposed method exhibits better robustness in finding probabilistic feasible optimizer.
△ Less
Submitted 6 November, 2019; v1 submitted 31 October, 2019;
originally announced November 2019.
-
Jointly Discriminative and Generative Recurrent Neural Networks for Learning from fMRI
Authors:
Nicha C. Dvornek,
Xiaoxiao Li,
Juntang Zhuang,
James S. Duncan
Abstract:
Recurrent neural networks (RNNs) were designed for dealing with time-series data and have recently been used for creating predictive models from functional magnetic resonance imaging (fMRI) data. However, gathering large fMRI datasets for learning is a difficult task. Furthermore, network interpretability is unclear. To address these issues, we utilize multitask learning and design a novel RNN-bas…
▽ More
Recurrent neural networks (RNNs) were designed for dealing with time-series data and have recently been used for creating predictive models from functional magnetic resonance imaging (fMRI) data. However, gathering large fMRI datasets for learning is a difficult task. Furthermore, network interpretability is unclear. To address these issues, we utilize multitask learning and design a novel RNN-based model that learns to discriminate between classes while simultaneously learning to generate the fMRI time-series data. Employing the long short-term memory (LSTM) structure, we develop a discriminative model based on the hidden state and a generative model based on the cell state. The addition of the generative model constrains the network to learn functional communities represented by the LSTM nodes that are both consistent with the data generation as well as useful for the classification task. We apply our approach to the classification of subjects with autism vs. healthy controls using several datasets from the Autism Brain Imaging Data Exchange. Experiments show that our jointly discriminative and generative model improves classification learning while also producing robust and meaningful functional communities for better model understanding.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Decision Explanation and Feature Importance for Invertible Networks
Authors:
Juntang Zhuang,
Nicha C. Dvornek,
Xiaoxiao Li,
Junlin Yang,
James S. Duncan
Abstract:
Deep neural networks are vulnerable to adversarial attacks and hard to interpret because of their black-box nature. The recently proposed invertible network is able to accurately reconstruct the inputs to a layer from its outputs, thus has the potential to unravel the black-box model. An invertible network classifier can be viewed as a two-stage model: (1) invertible transformation from input spac…
▽ More
Deep neural networks are vulnerable to adversarial attacks and hard to interpret because of their black-box nature. The recently proposed invertible network is able to accurately reconstruct the inputs to a layer from its outputs, thus has the potential to unravel the black-box model. An invertible network classifier can be viewed as a two-stage model: (1) invertible transformation from input space to the feature space; (2) a linear classifier in the feature space. We can determine the decision boundary of a linear classifier in the feature space; since the transform is invertible, we can invert the decision boundary from the feature space to the input space. Furthermore, we propose to determine the projection of a data point onto the decision boundary, and define explanation as the difference between data and its projection. Finally, we propose to locally approximate a neural network with its first-order Taylor expansion, and define feature importance using a local linear model. We provide the implementation of our method: \url{https://github.com/juntang-zhuang/explain_invertible}.
△ Less
Submitted 14 October, 2019; v1 submitted 29 September, 2019;
originally announced October 2019.
-
Graph Embedding Using Infomax for ASD Classification and Brain Functional Difference Detection
Authors:
Xiaoxiao Li,
Nicha C. Dvornek,
Juntang Zhuang,
Pamela Ventola,
James Duncan
Abstract:
Significant progress has been made using fMRI to characterize the brain changes that occur in ASD, a complex neuro-developmental disorder. However, due to the high dimensionality and low signal-to-noise ratio of fMRI, embedding informative and robust brain regional fMRI representations for both graph-level classification and region-level functional difference detection tasks between ASD and health…
▽ More
Significant progress has been made using fMRI to characterize the brain changes that occur in ASD, a complex neuro-developmental disorder. However, due to the high dimensionality and low signal-to-noise ratio of fMRI, embedding informative and robust brain regional fMRI representations for both graph-level classification and region-level functional difference detection tasks between ASD and healthy control (HC) groups is difficult. Here, we model the whole brain fMRI as a graph, which preserves geometrical and temporal information and use a Graph Neural Network (GNN) to learn from the graph-structured fMRI data. We investigate the potential of including mutual information (MI) loss (Infomax), which is an unsupervised term encouraging large MI of each nodal representation and its corresponding graph-level summarized representation to learn a better graph embedding. Specifically, this work developed a pipeline including a GNN encoder, a classifier and a discriminator, which forces the encoded nodal representations to both benefit classification and reveal the common nodal patterns in a graph. We simultaneously optimize graph-level classification loss and Infomax. We demonstrated that Infomax graph embedding improves classification performance as a regularization term. Furthermore, we found separable nodal representations of ASD and HC groups in prefrontal cortex, cingulate cortex, visual regions, and other social, emotional and execution related brain regions. In contrast with GNN with classification loss only, the proposed pipeline can facilitate training more robust ASD classification models. Moreover, the separable nodal representations can detect the functional differences between the two groups and contribute to revealing new ASD biomarkers.
△ Less
Submitted 13 August, 2019; v1 submitted 9 August, 2019;
originally announced August 2019.
-
Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks
Authors:
Ziliang Chen,
Jingyu Zhuang,
Xiaodan Liang,
Liang Lin
Abstract:
(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training. Learning domain-invariant features helps to achieve this goal, whereas it underpins unlabeled samples drawn from a single or multiple explicit target domains (Multi-target DA). In this paper, we consider a more realistic transfer scenario…
▽ More
(Unsupervised) Domain Adaptation (DA) seeks for classifying target instances when solely provided with source labeled and target unlabeled examples for training. Learning domain-invariant features helps to achieve this goal, whereas it underpins unlabeled samples drawn from a single or multiple explicit target domains (Multi-target DA). In this paper, we consider a more realistic transfer scenario: our target domain is comprised of multiple sub-targets implicitly blended with each other, so that learners could not identify which sub-target each unlabeled sample belongs to. This Blending-target Domain Adaptation (BTDA) scenario commonly appears in practice and threatens the validities of most existing DA algorithms, due to the presence of domain gaps and categorical misalignments among these hidden sub-targets.
To reap the transfer performance gains in this new scenario, we propose Adversarial Meta-Adaptation Network (AMEAN). AMEAN entails two adversarial transfer learning processes. The first is a conventional adversarial transfer to bridge our source and mixed target domains. To circumvent the intra-target category misalignment, the second process presents as ``learning to adapt'': It deploys an unsupervised meta-learner receiving target data and their ongoing feature-learning feedbacks, to discover target clusters as our ``meta-sub-target'' domains. These meta-sub-targets auto-design our meta-sub-target DA loss, which empirically eliminates the implicit category mismatching in our mixed target. We evaluate AMEAN and a variety of DA algorithms in three benchmarks under the BTDA setup. Empirical results show that BTDA is a quite challenging transfer setup for most existing DA algorithms, yet AMEAN significantly outperforms these state-of-the-art baselines and effectively restrains the negative transfer effects in BTDA.
△ Less
Submitted 7 July, 2019;
originally announced July 2019.
-
Graph Neural Network for Interpreting Task-fMRI Biomarkers
Authors:
Xiaoxiao Li,
Nicha C. Dvornek,
Yuan Zhou,
Juntang Zhuang,
Pamela Ventola,
James S. Duncan
Abstract:
Finding the biomarkers associated with ASD is helpful for understanding the underlying roots of the disorder and can lead to earlier diagnosis and more targeted treatment. A promising approach to identify biomarkers is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, i.e. brain networks constructed by fMRI. One way to interpret important features is through l…
▽ More
Finding the biomarkers associated with ASD is helpful for understanding the underlying roots of the disorder and can lead to earlier diagnosis and more targeted treatment. A promising approach to identify biomarkers is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, i.e. brain networks constructed by fMRI. One way to interpret important features is through looking at how the classification probability changes if the features are occluded or replaced. The major limitation of this approach is that replacing values may change the distribution of the data and lead to serious errors. Therefore, we develop a 2-stage pipeline to eliminate the need to replace features for reliable biomarker interpretation. Specifically, we propose an inductive GNN to embed the graphs containing different properties of task-fMRI for identifying ASD and then discover the brain regions/sub-graphs used as evidence for the GNN classifier. We first show GNN can achieve high accuracy in identifying ASD. Next, we calculate the feature importance scores using GNN and compare the interpretation ability with Random Forest. Finally, we run with different atlases and parameters, proving the robustness of the proposed method. The detected biomarkers reveal their association with social behaviors. We also show the potential of discovering new informative biomarkers. Our pipeline can be generalized to other graph feature importance interpretation problems.
△ Less
Submitted 11 July, 2019; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Machine Vision Guided 3D Medical Image Compression for Efficient Transmission and Accurate Segmentation in the Clouds
Authors:
Zihao Liu,
Xiaowei Xu,
Tao Liu,
Qi Liu,
Yanzhi Wang,
Yiyu Shi,
Wujie Wen,
Meiping Huang,
Haiyun Yuan,
Jian Zhuang
Abstract:
Cloud based medical image analysis has become popular recently due to the high computation complexities of various deep neural network (DNN) based frameworks and the increasingly large volume of medical images that need to be processed. It has been demonstrated that for medical images the transmission from local to clouds is much more expensive than the computation in the clouds itself. Towards th…
▽ More
Cloud based medical image analysis has become popular recently due to the high computation complexities of various deep neural network (DNN) based frameworks and the increasingly large volume of medical images that need to be processed. It has been demonstrated that for medical images the transmission from local to clouds is much more expensive than the computation in the clouds itself. Towards this, 3D image compression techniques have been widely applied to reduce the data traffic. However, most of the existing image compression techniques are developed around human vision, i.e., they are designed to minimize distortions that can be perceived by human eyes. In this paper we will use deep learning based medical image segmentation as a vehicle and demonstrate that interestingly, machine and human view the compression quality differently. Medical images compressed with good quality w.r.t. human vision may result in inferior segmentation accuracy. We then design a machine vision oriented 3D image compression framework tailored for segmentation using DNNs. Our method automatically extracts and retains image features that are most important to the segmentation. Comprehensive experiments on widely adopted segmentation frameworks with HVSMR 2016 challenge dataset show that our method can achieve significantly higher segmentation accuracy at the same compression rate, or much better compression rate under the same segmentation accuracy, when compared with the existing JPEG 2000 method. To the best of the authors' knowledge, this is the first machine vision guided medical image compression framework for segmentation in the clouds.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery
Authors:
Xiaoxiao Li,
Nicha C. Dvornek,
Yuan Zhou,
Juntang Zhuang,
Pamela Ventola,
James S. Duncan
Abstract:
Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understan…
▽ More
Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understanding just what image features the network is using, which can in turn be used to define the biomarkers. Current methods extract biomarkers, i.e., important features, by looking at how the prediction changes if "ignoring" one feature at a time. In this work, we go beyond looking at only individual features by using Shapley value explanation (SVE) from cooperative game theory. Cooperative game theory is advantageous here because it directly considers the interaction between features and can be applied to any machine learning method, making it a novel, more accurate way of determining instance-wise biomarker importance from deep learning models. A barrier to using SVE is its computational complexity: $2^N$ given $N$ features. We explicitly reduce the complexity of SVE computation by two approaches based on the underlying graph structure of the input data: 1) only consider the centralized coalition of each feature; 2) a hierarchical pipeline which first clusters features into small communities, then applies SVE in each community. Monte Carlo approximation can be used for large permutation sets. We first validate our methods on the MNIST dataset and compare to human perception. Next, to insure plausibility of our biomarker results, we train a Random Forest (RF) to classify ASD/control subjects from fMRI and compare SVE results to standard RF-based feature importance. Finally, we show initial results on ranked fMRI biomarkers using SVE on a deep learning classifier for the ASD/control dataset.
△ Less
Submitted 13 March, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Prediction of severity and treatment outcome for ASD from fMRI
Authors:
Juntang Zhuang,
Nicha C. Dvornek,
Xiaoxiao Li,
Pamela Ventola,
James S. Duncan
Abstract:
Autism spectrum disorder (ASD) is a complex neurodevelopmental syndrome. Early diagnosis and precise treatment are essential for ASD patients. Although researchers have built many analytical models, there has been limited progress in accurate predictive models for early diagnosis. In this project, we aim to build an accurate model to predict treatment outcome and ASD severity from early stage func…
▽ More
Autism spectrum disorder (ASD) is a complex neurodevelopmental syndrome. Early diagnosis and precise treatment are essential for ASD patients. Although researchers have built many analytical models, there has been limited progress in accurate predictive models for early diagnosis. In this project, we aim to build an accurate model to predict treatment outcome and ASD severity from early stage functional magnetic resonance imaging (fMRI) scans. The difficulty in building large databases of patients who have received specific treatments and the high dimensionality of medical image analysis problems are challenges in this work. We propose a generic and accurate two-level approach for high-dimensional regression problems in medical image analysis. First, we perform region-level feature selection using a predefined brain parcellation. Based on the assumption that voxels within one region in the brain have similar values, for each region we use the bootstrapped mean of voxels within it as a feature. In this way, the dimension of data is reduced from number of voxels to number of regions. Then we detect predictive regions by various feature selection methods. Second, we extract voxels within selected regions, and perform voxel-level feature selection. To use this model in both linear and non-linear cases with limited training examples, we apply two-level elastic net regression and random forest (RF) models respectively. To validate accuracy and robustness of this approach, we perform experiments on both task-fMRI and resting state fMRI datasets. Furthermore, we visualize the influence of each region, and show that the results match well with other findings.
△ Less
Submitted 28 October, 2018;
originally announced October 2018.
-
Prediction of treatment outcome for autism from structure of the brain based on sure independence screening
Authors:
Juntang Zhuang,
Nicha C. Dvornek,
Qingyu Zhao,
Xiaoxiao Li,
Pamela Ventola,
James S. Duncan
Abstract:
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder, and behavioral treatment interventions have shown promise for young children with ASD. However, there is limited progress in understanding the effect of each type of treatment. In this project, we aim to detect structural changes in the brain after treatment and select structural features associated with treatment outcomes. T…
▽ More
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder, and behavioral treatment interventions have shown promise for young children with ASD. However, there is limited progress in understanding the effect of each type of treatment. In this project, we aim to detect structural changes in the brain after treatment and select structural features associated with treatment outcomes. The difficulty in building large databases of patients who have received specific treatments and the high dimensionality of medical image analysis problems are the challenges in this work. To select predictive features and build accurate models, we use the sure independence screening (SIS) method. SIS is a theoretically and empirically validated method for ultra-high dimensional general linear models, and it achieves both predictive accuracy and correct feature selection by iterative feature selection. Compared with step-wise feature selection methods, SIS removes multiple features in each iteration and is computationally efficient. Compared with other linear models such as elastic-net regression, support vector regression (SVR) and partial least squares regression (PSLR), SIS achieves higher accuracy. We validated the superior performance of SIS in various experiments: First, we extract brain structural features from FreeSurfer, including cortical thickness, surface area, mean curvature and cortical volume. Next, we predict different measures of treatment outcomes based on structural features. We show that SIS achieves the highest correlation between prediction and measurements in all tasks. Furthermore, we report regions selected by SIS as biomarkers for ASD.
△ Less
Submitted 25 February, 2019; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Active Learning with Expert Advice
Authors:
Peilin Zhao,
Steven Hoi,
Jinfeng Zhuang
Abstract:
Conventional learning with expert advice methods assumes a learner is always receiving the outcome (e.g., class labels) of every incoming training instance at the end of each trial. In real applications, acquiring the outcome from oracle can be costly or time consuming. In this paper, we address a new problem of active learning with expert advice, where the outcome of an instance is disclosed only…
▽ More
Conventional learning with expert advice methods assumes a learner is always receiving the outcome (e.g., class labels) of every incoming training instance at the end of each trial. In real applications, acquiring the outcome from oracle can be costly or time consuming. In this paper, we address a new problem of active learning with expert advice, where the outcome of an instance is disclosed only when it is requested by the online learner. Our goal is to learn an accurate prediction model by asking the oracle the number of questions as small as possible. To address this challenge, we propose a framework of active forecasters for online active learning with expert advice, which attempts to extend two regular forecasters, i.e., Exponentially Weighted Average Forecaster and Greedy Forecaster, to tackle the task of active learning with expert advice. We prove that the proposed algorithms satisfy the Hannan consistency under some proper assumptions, and validate the efficacy of our technique by an extensive set of experiments.
△ Less
Submitted 26 September, 2013;
originally announced September 2013.