-
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
Authors:
Masatoshi Uehara,
Xingyu Su,
Yulai Zhao,
Xiner Li,
Aviv Regev,
Shuiwang Ji,
Sergey Levine,
Tommaso Biancalani
Abstract:
To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for…
▽ More
To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms. Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising. This sequential refinement allows for the gradual correction of errors introduced during reward optimization. Besides, we provide a theoretical guarantee for our framework. Finally, we demonstrate its superior empirical performance in protein and cell-type-specific regulatory DNA design. The code is available at \href{https://github.com/masa-ue/ProDifEvo-Refinement}{https://github.com/masa-ue/ProDifEvo-Refinement}.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding
Authors:
Xiner Li,
Yulai Zhao,
Chenyu Wang,
Gabriele Scalia,
Gokcen Eraslan,
Surag Nair,
Tommaso Biancalani,
Shuiwang Ji,
Aviv Regev,
Sergey Levine,
Masatoshi Uehara
Abstract:
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, class…
▽ More
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (\textit{e.g.}, classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation. The code is available at \href{https://github.com/masa-ue/SVDD}{https://github.com/masa-ue/SVDD}.
△ Less
Submitted 24 October, 2024; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Graph-Enabled Fast MCMC Sampling with an Unknown High-Dimensional Prior Distribution
Authors:
Chenyang Zhong,
Shouxuan Ji,
Tian Zheng
Abstract:
Posterior sampling is a task of central importance in Bayesian inference. For many applications in Bayesian meta-analysis and Bayesian transfer learning, the prior distribution is unknown and needs to be estimated from samples. In practice, the prior distribution can be high-dimensional, adding to the difficulty of efficient posterior inference. In this paper, we propose a novel Markov chain Monte…
▽ More
Posterior sampling is a task of central importance in Bayesian inference. For many applications in Bayesian meta-analysis and Bayesian transfer learning, the prior distribution is unknown and needs to be estimated from samples. In practice, the prior distribution can be high-dimensional, adding to the difficulty of efficient posterior inference. In this paper, we propose a novel Markov chain Monte Carlo algorithm, which we term graph-enabled MCMC, for posterior sampling with unknown and potentially high-dimensional prior distributions. The algorithm is based on constructing a geometric graph from prior samples and subsequently uses the graph structure to guide the transition of the Markov chain. Through extensive theoretical and numerical studies, we demonstrate that our graph-enabled MCMC algorithm provides reliable approximation to the posterior distribution and is highly computationally efficient.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Accounting for Temporal Variability in Functional Magnetic Resonance Imaging Improves Prediction of Intelligence
Authors:
Yang Li,
Xin Ma,
Raj Sunderraman,
Shihao Ji,
Suprateek Kundu
Abstract:
Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dy…
▽ More
Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dynamic FC or region level functional magnetic resonance imaging (fMRI) times series that encode temporal variability. To account for the temporal dynamics in fMRI data, we propose a deep neural network involving bi-directional long short-term memory (bi-LSTM) approach that also incorporates feature selection mechanism. The proposed pipeline is implemented via an efficient GPU computation framework and applied to predict intelligence scores based on region level fMRI time series as well as dynamic FC. We compare the prediction performance for different intelligence measures based on static FC, dynamic FC, and region level time series acquired from the Adolescent Brain Cognitive Development (ABCD) study involving close to 7000 individuals. Our detailed analysis illustrates that static FC consistently has inferior prediction performance compared to region level time series or dynamic FC for unimodal rest and task fMRI experiments, and in almost all cases using a combination of task and rest features. In addition, the proposed bi-LSTM pipeline based on region level time series identifies several shared and differential important brain regions across task and rest fMRI experiments that drive intelligence prediction. A test-retest analysis of the selected features shows strong reliability across cross-validation folds. Given the large sample size from ABCD study, our results provide strong evidence that superior prediction of intelligence can be achieved by accounting for temporal variations in fMRI.
△ Less
Submitted 14 December, 2022; v1 submitted 11 November, 2022;
originally announced November 2022.
-
A Flexible Quasi-Copula Distribution for Statistical Modeling
Authors:
Sarah S. Ji,
Benjamin B. Chu,
Hua Zhou,
Kenneth Lange
Abstract:
Copulas, generalized estimating equations, and generalized linear mixed models promote the analysis of grouped data where non-normal responses are correlated. Unfortunately, parameter estimation remains challenging in these three frameworks. Based on prior work of Tonda, we derive a new class of probability density functions that allow explicit calculation of moments, marginal and conditional dist…
▽ More
Copulas, generalized estimating equations, and generalized linear mixed models promote the analysis of grouped data where non-normal responses are correlated. Unfortunately, parameter estimation remains challenging in these three frameworks. Based on prior work of Tonda, we derive a new class of probability density functions that allow explicit calculation of moments, marginal and conditional distributions, and the score and observed information needed in maximum likelihood estimation. We also illustrate how the new distribution flexibly models longitudinal data following a non-Gaussian distribution. Finally, we conduct a tri-variate genome-wide association analysis on dichotomized systolic and diastolic blood pressure and body mass index data from the UK-Biobank, showcasing the modeling prowess and computational scalability of the new distribution.
△ Less
Submitted 14 October, 2024; v1 submitted 6 May, 2022;
originally announced May 2022.
-
A Unified Plug-and-Play Framework for Effective Data Denoising and Robust Abstention
Authors:
Krishanu Sarker,
Xiulong Yang,
Yang Li,
Saeid Belkasim,
Shihao Ji
Abstract:
The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty makes high performing DNNs risky for real-world deployment. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that can effectively denoise training data as well as avoid predicting uncertain test data points. Our…
▽ More
The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty makes high performing DNNs risky for real-world deployment. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that can effectively denoise training data as well as avoid predicting uncertain test data points. Our proposed framework leverages underlying data distribution to differentiate between noise and clean data samples without requiring any modification to existing DNN architectures or loss functions. Extensive experiments on multiple image classification datasets and multiple CNN architectures demonstrate that our simple yet effective framework can outperform the state-of-the-art techniques in denoising training data and abstaining uncertain test data.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Adversarial Privacy Preserving Graph Embedding against Inference Attack
Authors:
Kaiyang Li,
Guangchun Luo,
Yang Ye,
Wei Li,
Shihao Ji,
Zhipeng Cai
Abstract:
Recently, the surge in popularity of Internet of Things (IoT), mobile devices, social media, etc. has opened up a large source for graph data. Graph embedding has been proved extremely useful to learn low-dimensional feature representations from graph structured data. These feature representations can be used for a variety of prediction tasks from node classification to link prediction. However, e…
▽ More
Recently, the surge in popularity of Internet of Things (IoT), mobile devices, social media, etc. has opened up a large source for graph data. Graph embedding has been proved extremely useful to learn low-dimensional feature representations from graph structured data. These feature representations can be used for a variety of prediction tasks from node classification to link prediction. However, existing graph embedding methods do not consider users' privacy to prevent inference attacks. That is, adversaries can infer users' sensitive information by analyzing node representations learned from graph embedding algorithms. In this paper, we propose Adversarial Privacy Graph Embedding (APGE), a graph adversarial training framework that integrates the disentangling and purging mechanisms to remove users' private information from learned node representations. The proposed method preserves the structural information and utility attributes of a graph while concealing users' private attributes from inference attacks. Extensive experiments on real-world graph datasets demonstrate the superior performance of APGE compared to the state-of-the-arts. Our source code can be found at https://github.com/uJ62JHD/Privacy-Preserving-Social-Network-Embedding.
△ Less
Submitted 29 August, 2020;
originally announced August 2020.
-
Deep Learning of High-Order Interactions for Protein Interface Prediction
Authors:
Yi Liu,
Hao Yuan,
Lei Cai,
Shuiwang Ji
Abstract:
Protein interactions are important in a broad range of biological processes. Traditionally, computational methods have been developed to automatically predict protein interface from hand-crafted features. Recent approaches employ deep neural networks and predict the interaction of each amino acid pair independently. However, these methods do not incorporate the important sequential information fro…
▽ More
Protein interactions are important in a broad range of biological processes. Traditionally, computational methods have been developed to automatically predict protein interface from hand-crafted features. Recent approaches employ deep neural networks and predict the interaction of each amino acid pair independently. However, these methods do not incorporate the important sequential information from amino acid chains and the high-order pairwise interactions. Intuitively, the prediction of an amino acid pair should depend on both their features and the information of other amino acid pairs. In this work, we propose to formulate the protein interface prediction as a 2D dense prediction problem. In addition, we propose a novel deep model to incorporate the sequential information and high-order pairwise interactions to perform interface predictions. We represent proteins as graphs and employ graph neural networks to learn node features. Then we propose the sequential modeling method to incorporate the sequential information and reorder the feature matrix. Next, we incorporate high-order pairwise interactions to generate a 3D tensor containing different pairwise interactions. Finally, we employ convolutional neural networks to perform 2D dense predictions. Experimental results on multiple benchmarks demonstrate that our proposed method can consistently improve the protein interface prediction performance.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
Towards Deeper Graph Neural Networks
Authors:
Meng Liu,
Hongyang Gao,
Shuiwang Ji
Abstract:
Graph neural networks have shown significant success in the field of graph representation learning. Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations. Nevertheless, one layer of these neighborhood aggregation methods only consider immediate neighbors, and the performance decreases when going deeper to enable larger receptive fields. Severa…
▽ More
Graph neural networks have shown significant success in the field of graph representation learning. Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations. Nevertheless, one layer of these neighborhood aggregation methods only consider immediate neighbors, and the performance decreases when going deeper to enable larger receptive fields. Several recent studies attribute this performance deterioration to the over-smoothing issue, which states that repeated propagation makes node representations of different classes indistinguishable. In this work, we study this observation systematically and develop new insights towards deeper graph neural networks. First, we provide a systematical analysis on this issue and argue that the key factor compromising the performance significantly is the entanglement of representation transformation and propagation in current graph convolution operations. After decoupling these two operations, deeper graph neural networks can be used to learn graph node representations from larger receptive fields. We further provide a theoretical analysis of the above observation when building very deep models, which can serve as a rigorous and gentle description of the over-smoothing issue. Based on our theoretical and empirical analysis, we propose Deep Adaptive Graph Neural Network (DAGNN) to adaptively incorporate information from large receptive fields. A set of experiments on citation, co-authorship, and co-purchase datasets have confirmed our analysis and insights and demonstrated the superiority of our proposed methods.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Multilevel Graph Matching Networks for Deep Graph Similarity Learning
Authors:
Xiang Ling,
Lingfei Wu,
Saizhuo Wang,
Tengfei Ma,
Fangli Xu,
Alex X. Liu,
Chunming Wu,
Shouling Ji
Abstract:
While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., be…
▽ More
While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., between each node of one graph and the other whole graph). In this paper, we propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects in an end-to-end fashion. In particular, the proposed MGMN consists of a node-graph matching network for effectively learning cross-level interactions between each node of one graph and the other whole graph, and a siamese graph neural network to learn global-level interactions between two input graphs. Furthermore, to compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks with different sizes in order to evaluate the effectiveness and robustness of our models. Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks. Compared with previous work, MGMN also exhibits stronger robustness as the sizes of the two input graphs increase.
△ Less
Submitted 7 August, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Graph Backdoor
Authors:
Zhaohan Xi,
Ren Pang,
Shouling Ji,
Ting Wang
Abstract:
One intriguing property of deep neural networks (DNNs) is their inherent vulnerability to backdoor attacks -- a trojan model responds to trigger-embedded inputs in a highly predictable manner while functioning normally otherwise. Despite the plethora of prior work on DNNs for continuous data (e.g., images), the vulnerability of graph neural networks (GNNs) for discrete-structured data (e.g., graph…
▽ More
One intriguing property of deep neural networks (DNNs) is their inherent vulnerability to backdoor attacks -- a trojan model responds to trigger-embedded inputs in a highly predictable manner while functioning normally otherwise. Despite the plethora of prior work on DNNs for continuous data (e.g., images), the vulnerability of graph neural networks (GNNs) for discrete-structured data (e.g., graphs) is largely unexplored, which is highly concerning given their increasing use in security-sensitive domains. To bridge this gap, we present GTA, the first backdoor attack on GNNs. Compared with prior work, GTA departs in significant ways: graph-oriented -- it defines triggers as specific subgraphs, including both topological structures and descriptive features, entailing a large design spectrum for the adversary; input-tailored -- it dynamically adapts triggers to individual graphs, thereby optimizing both attack effectiveness and evasiveness; downstream model-agnostic -- it can be readily launched without knowledge regarding downstream models or fine-tuning strategies; and attack-extensible -- it can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks, constituting severe threats for a range of security-critical applications. Through extensive evaluation using benchmark datasets and state-of-the-art models, we demonstrate the effectiveness of GTA. We further provide analytical justification for its effectiveness and discuss potential countermeasures, pointing to several promising research directions.
△ Less
Submitted 9 August, 2021; v1 submitted 21 June, 2020;
originally announced June 2020.
-
AdvMind: Inferring Adversary Intent of Black-Box Attacks
Authors:
Ren Pang,
Xinyang Zhang,
Shouling Ji,
Xiapu Luo,
Ting Wang
Abstract:
Deep neural networks (DNNs) are inherently susceptible to adversarial attacks even under black-box settings, in which the adversary only has query access to the target models. In practice, while it may be possible to effectively detect such attacks (e.g., observing massive similar but non-identical queries), it is often challenging to exactly infer the adversary intent (e.g., the target class of t…
▽ More
Deep neural networks (DNNs) are inherently susceptible to adversarial attacks even under black-box settings, in which the adversary only has query access to the target models. In practice, while it may be possible to effectively detect such attacks (e.g., observing massive similar but non-identical queries), it is often challenging to exactly infer the adversary intent (e.g., the target class of the adversarial example the adversary attempts to craft) especially during early stages of the attacks, which is crucial for performing effective deterrence and remediation of the threats in many scenarios.
In this paper, we present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust and prompt manner. Specifically, to achieve robust detection, AdvMind accounts for the adversary adaptiveness such that her attempt to conceal the target will significantly increase the attack cost (e.g., in terms of the number of queries); to achieve prompt detection, AdvMind proactively synthesizes plausible query results to solicit subsequent queries from the adversary that maximally expose her intent. Through extensive empirical evaluation on benchmark datasets and state-of-the-art black-box attacks, we demonstrate that on average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches and meanwhile increases the cost of adaptive attacks by over 60%. We further discuss the possible synergy between AdvMind and other defense methods against black-box adversarial attacks, pointing to several promising research directions.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
XGNN: Towards Model-Level Explanations of Graph Neural Networks
Authors:
Hao Yuan,
Jiliang Tang,
Xia Hu,
Shuiwang Ji
Abstract:
Graphs neural networks (GNNs) learn node features by aggregating and combining neighbor information, which have achieved promising performance on many graph tasks. However, GNNs are mostly treated as black-boxes and lack human intelligible explanations. Thus, they cannot be fully trusted and used in certain application domains if GNN models cannot be explained. In this work, we propose a novel app…
▽ More
Graphs neural networks (GNNs) learn node features by aggregating and combining neighbor information, which have achieved promising performance on many graph tasks. However, GNNs are mostly treated as black-boxes and lack human intelligible explanations. Thus, they cannot be fully trusted and used in certain application domains if GNN models cannot be explained. In this work, we propose a novel approach, known as XGNN, to interpret GNNs at the model-level. Our approach can provide high-level insights and generic understanding of how GNNs work. In particular, we propose to explain GNNs by training a graph generator so that the generated graph patterns maximize a certain prediction of the model.We formulate the graph generation as a reinforcement learning task, where for each step, the graph generator predicts how to add an edge into the current graph. The graph generator is trained via a policy gradient method based on information from the trained GNNs. In addition, we incorporate several graph rules to encourage the generated graphs to be valid. Experimental results on both synthetic and real-world datasets show that our proposed methods help understand and verify the trained GNNs. Furthermore, our experimental results indicate that the generated graphs can provide guidance on how to improve the trained GNNs.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Non-Local Graph Neural Networks
Authors:
Meng Liu,
Zhengyang Wang,
Shuiwang Ji
Abstract:
Modern graph neural networks (GNNs) learn node embeddings through multilayer local aggregation and achieve great success in applications on assortative graphs. However, tasks on disassortative graphs usually require non-local aggregation. In addition, we find that local aggregation is even harmful for some disassortative graphs. In this work, we propose a simple yet effective non-local aggregation…
▽ More
Modern graph neural networks (GNNs) learn node embeddings through multilayer local aggregation and achieve great success in applications on assortative graphs. However, tasks on disassortative graphs usually require non-local aggregation. In addition, we find that local aggregation is even harmful for some disassortative graphs. In this work, we propose a simple yet effective non-local aggregation framework with an efficient attention-guided sorting for GNNs. Based on it, we develop various non-local GNNs. We perform thorough experiments to analyze disassortative graph datasets and evaluate our non-local GNNs. Experimental results demonstrate that our non-local GNNs significantly outperform previous state-of-the-art methods on seven benchmark datasets of disassortative graphs, in terms of both model performance and efficiency.
△ Less
Submitted 10 December, 2021; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical Studies
Authors:
Wei Jin,
Yaxin Li,
Han Xu,
Yiqi Wang,
Shuiwang Ji,
Charu Aggarwal,
Jiliang Tang
Abstract:
Deep neural networks (DNNs) have achieved significant performance in various tasks. However, recent studies have shown that DNNs can be easily fooled by small perturbation on the input, called adversarial attacks. As the extensions of DNNs to graphs, Graph Neural Networks (GNNs) have been demonstrated to inherit this vulnerability. Adversary can mislead GNNs to give wrong predictions by modifying…
▽ More
Deep neural networks (DNNs) have achieved significant performance in various tasks. However, recent studies have shown that DNNs can be easily fooled by small perturbation on the input, called adversarial attacks. As the extensions of DNNs to graphs, Graph Neural Networks (GNNs) have been demonstrated to inherit this vulnerability. Adversary can mislead GNNs to give wrong predictions by modifying the graph structure such as manipulating a few edges. This vulnerability has arisen tremendous concerns for adapting GNNs in safety-critical applications and has attracted increasing research attention in recent years. Thus, it is necessary and timely to provide a comprehensive overview of existing graph adversarial attacks and the countermeasures. In this survey, we categorize existing attacks and defenses, and review the corresponding state-of-the-art methods. Furthermore, we have developed a repository with representative algorithms (https://github.com/DSE-MSU/DeepRobust/tree/master/deeprobust/graph). The repository enables us to conduct empirical studies to deepen our understandings on attacks and defenses on graphs.
△ Less
Submitted 12 December, 2020; v1 submitted 1 March, 2020;
originally announced March 2020.
-
Deep Reinforcement Learning Designed Shinnar-Le Roux RF Pulse using Root-Flipping: DeepRF_SLR
Authors:
Dongmyung Shin,
Sooyeon Ji,
Doohee Lee,
Jieun Lee,
Se-Hong Oh,
Jongho Lee
Abstract:
A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as DeepRF_SLR, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape,…
▽ More
A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as DeepRF_SLR, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape, is optimized by iterative applications of deep reinforcement learning and greedy tree search. When tested for the designs of the multiband factors of three and seven RFs, DeepRF_SLR demonstrated improved performance compared to conventional methods, generating shorter duration RF pulses in shorter computational time. In the experiments, the RF pulse from DeepRF_SLR produced a slice profile similar to the minimum-phase SLR RF pulse and the profiles matched to that of the computer simulation. Our approach suggests a new way of designing an RF by applying a machine learning algorithm, demonstrating a machine-designed MRI sequence.
△ Less
Submitted 1 September, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.
-
Learning with Multiplicative Perturbations
Authors:
Xiulong Yang,
Shihao Ji
Abstract:
Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms, that generate \textbf{multiplicative} perturbations to input examples for robust trai…
▽ More
Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms, that generate \textbf{multiplicative} perturbations to input examples for robust training of DNNs. Such perturbations are much more perceptible and interpretable than their \textbf{additive} counterparts exploited by AT and VAT. Furthermore, the multiplicative perturbations can be generated transductively or inductively while the standard AT and VAT only support a transductive implementation. We conduct a series of experiments that analyze the behavior of the multiplicative perturbations and demonstrate that xAT and xVAT match or outperform state-of-the-art classification accuracies across multiple established benchmarks while being about 30\% faster than their additive counterparts. Furthermore, the resulting DNNs also demonstrate distinct weight distributions.
△ Less
Submitted 22 June, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Sparse Graph Attention Networks
Authors:
Yang Ye,
Shihao Ji
Abstract:
Graph Neural Networks (GNNs) have proved to be an effective representation learning framework for graph-structured data, and have achieved state-of-the-art performance on many practical predictive tasks, such as node classification, link prediction and graph classification. Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors o…
▽ More
Graph Neural Networks (GNNs) have proved to be an effective representation learning framework for graph-structured data, and have achieved state-of-the-art performance on many practical predictive tasks, such as node classification, link prediction and graph classification. Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors of a node for feature aggregation, and improve the performance of many graph learning tasks. However, real-world graphs are often very large and noisy, and GATs are prone to overfitting if not regularized properly. Even worse, the local aggregation mechanism of GATs may fail on disassortative graphs, where nodes within local neighborhood provide more noise than useful information for feature aggregation. In this paper, we propose Sparse Graph Attention Networks (SGATs) that learn sparse attention coefficients under an $L_0$-norm regularization, and the learned sparse attentions are then used for all GNN layers, resulting in an edge-sparsified graph. By doing so, we can identify noisy/task-irrelevant edges, and thus perform feature aggregation on most informative neighbors. Extensive experiments on synthetic and real-world graph learning benchmarks demonstrate the superior performance of SGATs. In particular, SGATs can remove about 50\%-80\% edges from large assortative graphs, while retaining similar classification accuracies. On disassortative graphs, SGATs prune majority of noisy edges and outperform GATs in classification accuracies by significant margins. Furthermore, the removed edges can be interpreted intuitively and quantitatively. To the best of our knowledge, this is the first graph learning algorithm that shows significant redundancies in graphs and edge-sparsified graphs can achieve similar or sometimes higher predictive performances than original graphs.
△ Less
Submitted 10 April, 2021; v1 submitted 1 December, 2019;
originally announced December 2019.
-
Efficient Global String Kernel with Random Features: Beyond Counting Substructures
Authors:
Lingfei Wu,
Ian En-Hsu Yen,
Siyu Huo,
Liang Zhao,
Kun Xu,
Liang Ma,
Shouling Ji,
Charu Aggarwal
Abstract:
Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the string, which hardly capture long discriminative patterns, (ii) sum over too many substructures, such as all possible subsequences, which leads to diagonal dominance…
▽ More
Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the string, which hardly capture long discriminative patterns, (ii) sum over too many substructures, such as all possible subsequences, which leads to diagonal dominance of the kernel matrix, or (iii) rely on non-positive-definite similarity measures derived from the edit distance. Furthermore, while there have been works addressing the computational challenge with respect to the length of string, most of them still experience quadratic complexity in terms of the number of training samples when used in a kernel-based classifier. In this paper, we present a new class of global string kernels that aims to (i) discover global properties hidden in the strings through global alignments, (ii) maintain positive-definiteness of the kernel, without introducing a diagonal dominant kernel matrix, and (iii) have a training cost linear with respect to not only the length of the string but also the number of training string samples. To this end, the proposed kernels are explicitly defined through a series of different random feature maps, each corresponding to a distribution of random strings. We show that kernels defined this way are always positive-definite, and exhibit computational benefits as they always produce \emph{Random String Embeddings (RSE)} that can be directly used in any linear classification models. Our extensive experiments on nine benchmark datasets corroborate that RSE achieves better or comparable accuracy in comparison to state-of-the-art baselines, especially with the strings of longer lengths. In addition, we empirically show that RSE scales linearly with the increase of the number and the length of string.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models
Authors:
Ren Pang,
Hua Shen,
Xinyang Zhang,
Shouling Ji,
Yevgeniy Vorobeychik,
Xiapu Luo,
Alex Liu,
Ting Wang
Abstract:
Despite their tremendous success in a range of domains, deep learning systems are inherently susceptible to two types of manipulations: adversarial inputs -- maliciously crafted samples that deceive target deep neural network (DNN) models, and poisoned models -- adversely forged DNNs that misbehave on pre-defined inputs. While prior work has intensively studied the two attack vectors in parallel,…
▽ More
Despite their tremendous success in a range of domains, deep learning systems are inherently susceptible to two types of manipulations: adversarial inputs -- maliciously crafted samples that deceive target deep neural network (DNN) models, and poisoned models -- adversely forged DNNs that misbehave on pre-defined inputs. While prior work has intensively studied the two attack vectors in parallel, there is still a lack of understanding about their fundamental connections: what are the dynamic interactions between the two attack vectors? what are the implications of such interactions for optimizing existing attacks? what are the potential countermeasures against the enhanced attacks? Answering these key questions is crucial for assessing and mitigating the holistic vulnerabilities of DNNs deployed in realistic settings.
Here we take a solid step towards this goal by conducting the first systematic study of the two attack vectors within a unified framework. Specifically, (i) we develop a new attack model that jointly optimizes adversarial inputs and poisoned models; (ii) with both analytical and empirical evidence, we reveal that there exist intriguing "mutual reinforcement" effects between the two attack vectors -- leveraging one vector significantly amplifies the effectiveness of the other; (iii) we demonstrate that such effects enable a large design spectrum for the adversary to enhance the existing attacks that exploit both vectors (e.g., backdoor attacks), such as maximizing the attack evasiveness with respect to various detection methods; (iv) finally, we discuss potential countermeasures against such optimized attacks and their technical challenges, pointing to several promising research directions.
△ Less
Submitted 20 November, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Neural Image Compression and Explanation
Authors:
Xiang Li,
Shihao Ji
Abstract:
Explaining the prediction of deep neural networks (DNNs) and semantic image compression are two active research areas of deep learning with a numerous of applications in decision-critical systems, such as surveillance cameras, drones and self-driving cars, where interpretable decision is critical and storage/network bandwidth is limited. In this paper, we propose a novel end-to-end Neural Image Co…
▽ More
Explaining the prediction of deep neural networks (DNNs) and semantic image compression are two active research areas of deep learning with a numerous of applications in decision-critical systems, such as surveillance cameras, drones and self-driving cars, where interpretable decision is critical and storage/network bandwidth is limited. In this paper, we propose a novel end-to-end Neural Image Compression and Explanation (NICE) framework that learns to (1) explain the predictions of convolutional neural networks (CNNs), and (2) subsequently compress the input images for efficient storage or transmission. Specifically, NICE generates a sparse mask over an input image by attaching a stochastic binary gate to each pixel of the image, whose parameters are learned through the interaction with the CNN classifier to be explained. The generated mask is able to capture the saliency of each pixel measured by its influence to the final prediction of CNN; it can also be used to produce a mixed-resolution image, where important pixels maintain their original high resolution and insignificant background pixels are subsampled to a low resolution. The produced images achieve a high compression rate (e.g., about 0.6x of original image file size), while retaining a similar classification accuracy. Extensive experiments across multiple image classification benchmarks demonstrate the superior performance of NICE compared to the state-of-the-art methods in terms of explanation quality and semantic image compression rate. Our code is available at: https://github.com/lxuniverse/NICE.
△ Less
Submitted 7 December, 2020; v1 submitted 9 August, 2019;
originally announced August 2019.
-
Neural Plasticity Networks
Authors:
Yang Li,
Shihao Ji
Abstract:
Neural plasticity is an important functionality of human brain, in which number of neurons and synapses can shrink or expand in response to stimuli throughout the span of life. We model this dynamic learning process as an $L_0$-norm regularized binary optimization problem, in which each unit of a neural network (e.g., weight, neuron or channel, etc.) is attached with a stochastic binary gate, whos…
▽ More
Neural plasticity is an important functionality of human brain, in which number of neurons and synapses can shrink or expand in response to stimuli throughout the span of life. We model this dynamic learning process as an $L_0$-norm regularized binary optimization problem, in which each unit of a neural network (e.g., weight, neuron or channel, etc.) is attached with a stochastic binary gate, whose parameters determine the level of activity of a unit in the network. At the beginning, only a small portion of binary gates (therefore the corresponding neurons) are activated, while the remaining neurons are in a hibernation mode. As the learning proceeds, some neurons might be activated or deactivated if doing so can be justified by the cost-benefit tradeoff measured by the $L_0$-norm regularized objective. As the training gets mature, the probability of transition between activation and deactivation will diminish until a final hardening stage. We demonstrate that all of these learning dynamics can be modulated by a single parameter $k$ seamlessly. Our neural plasticity network (NPN) can prune or expand a network depending on the initial capacity of network provided by the user; it also unifies dropout (when $k=0$), traditional training of DNNs (when $k=\infty$) and interpolates between these two. To the best of our knowledge, this is the first learning framework that unifies network sparsification and network expansion in an end-to-end training pipeline. Extensive experiments on synthetic dataset and multiple image classification benchmarks demonstrate the superior performance of NPN. We show that both network sparsification and network expansion can yield compact models of similar architectures, while retaining competitive accuracies of the original networks.
△ Less
Submitted 1 May, 2021; v1 submitted 13 August, 2019;
originally announced August 2019.
-
Graph Representation Learning via Hard and Channel-Wise Attention Networks
Authors:
Hongyang Gao,
Shuiwang Ji
Abstract:
Attention operators have been widely applied in various fields, including computer vision, natural language processing, and network embedding learning. Attention operators on graph data enables learnable weights when aggregating information from neighboring nodes. However, graph attention operators (GAOs) consume excessive computational resources, preventing their applications on large graphs. In…
▽ More
Attention operators have been widely applied in various fields, including computer vision, natural language processing, and network embedding learning. Attention operators on graph data enables learnable weights when aggregating information from neighboring nodes. However, graph attention operators (GAOs) consume excessive computational resources, preventing their applications on large graphs. In addition, GAOs belong to the family of soft attention, instead of hard attention, which has been shown to yield better performance. In this work, we propose novel hard graph attention operator (hGAO) and channel-wise graph attention operator (cGAO). hGAO uses the hard attention mechanism by attending to only important nodes. Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes. To further reduce the requirements on computational resources, we propose the cGAO that performs attention operations along channels. cGAO avoids the dependency on the adjacency matrix, leading to dramatic reductions in computational resource requirements. Experimental results demonstrate that our proposed deep models with the new operators achieve consistently better performance. Comparison results also indicates that hGAO achieves significantly better performance than GAO on both node and graph embedding tasks. Efficiency comparison shows that our cGAO leads to dramatic savings in computational resources, making them applicable to large graphs.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Global Pixel Transformers for Virtual Staining of Microscopy Images
Authors:
Yi Liu,
Hao Yuan,
Zhengyang Wang,
Shuiwang Ji
Abstract:
Visualizing the details of different cellular structures is of great importance to elucidate cellular functions. However, it is challenging to obtain high quality images of different structures directly due to complex cellular environments. Fluorescence staining is a popular technique to label different structures but has several drawbacks. In particular, label staining is time consuming and may a…
▽ More
Visualizing the details of different cellular structures is of great importance to elucidate cellular functions. However, it is challenging to obtain high quality images of different structures directly due to complex cellular environments. Fluorescence staining is a popular technique to label different structures but has several drawbacks. In particular, label staining is time consuming and may affect cell morphology, and simultaneous labels are inherently limited. This raises the need of building computational models to learn relationships between unlabeled microscopy images and labeled fluorescence images, and to infer fluorescence labels of other microscopy images excluding the physical staining process. We propose to develop a novel deep model for virtual staining of unlabeled microscopy images. We first propose a novel network layer, known as the global pixel transformer layer, that fuses global information from inputs effectively. The proposed global pixel transformer layer can generate outputs with arbitrary dimensions, and can be employed for all the regular, down-sampling, and up-sampling operators. We then incorporate our proposed global pixel transformer layers and dense blocks to build an U-Net like network. We believe such a design can promote feature reusing between layers. In addition, we propose a multi-scale input strategy to encourage networks to capture features at different scales. We conduct evaluations across various fluorescence image prediction tasks to demonstrate the effectiveness of our approach. Both quantitative and qualitative results show that our method outperforms the state-of-the-art approach significantly. It is also shown that our proposed global pixel transformer layer is useful to improve the fluorescence image prediction results.
△ Less
Submitted 30 September, 2019; v1 submitted 1 July, 2019;
originally announced July 2019.
-
Graph U-Nets
Authors:
Hongyang Gao,
Shuiwang Ji
Abstract:
We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder arc…
▽ More
We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder architectures like U-Nets have been successfully applied on many image pixel-wise prediction tasks, similar methods are lacking for graph data. This is due to the fact that pooling and up-sampling operations are not natural on graph data. To address these challenges, we propose novel graph pooling (gPool) and unpooling (gUnpool) operations in this work. The gPool layer adaptively selects some nodes to form a smaller graph based on their scalar projection values on a trainable projection vector. We further propose the gUnpool layer as the inverse operation of the gPool layer. The gUnpool layer restores the graph into its original structure using the position information of nodes selected in the corresponding gPool layer. Based on our proposed gPool and gUnpool layers, we develop an encoder-decoder model on graph, known as the graph U-Nets. Our experimental results on node classification and graph classification tasks demonstrate that our methods achieve consistently better performance than previous models.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
$L_0$-ARM: Network Sparsification via Stochastic Binary Optimization
Authors:
Yang Li,
Shihao Ji
Abstract:
We consider network sparsification as an $L_0$-norm regularized binary optimization problem, where each unit of a neural network (e.g., weight, neuron, or channel, etc.) is attached with a stochastic binary gate, whose parameters are jointly optimized with original network parameters. The Augment-Reinforce-Merge (ARM), a recently proposed unbiased gradient estimator, is investigated for this binar…
▽ More
We consider network sparsification as an $L_0$-norm regularized binary optimization problem, where each unit of a neural network (e.g., weight, neuron, or channel, etc.) is attached with a stochastic binary gate, whose parameters are jointly optimized with original network parameters. The Augment-Reinforce-Merge (ARM), a recently proposed unbiased gradient estimator, is investigated for this binary optimization problem. Compared to the hard concrete gradient estimator from Louizos et al., ARM demonstrates superior performance of pruning network architectures while retaining almost the same accuracies of baseline methods. Similar to the hard concrete estimator, ARM also enables conditional computation during model training but with improved effectiveness due to the exact binary stochasticity. Thanks to the flexibility of ARM, many smooth or non-smooth parametric functions, such as scaled sigmoid or hard sigmoid, can be used to parameterize this binary optimization problem and the unbiasness of the ARM estimator is retained, while the hard concrete estimator has to rely on the hard sigmoid function to achieve conditional computation and thus accelerated training. Extensive experiments on multiple public datasets demonstrate state-of-the-art pruning rates with almost the same accuracies of baseline methods. The resulting algorithm $L_0$-ARM sparsifies the Wide-ResNet models on CIFAR-10 and CIFAR-100 while the hard concrete estimator cannot. The code is public available at https://github.com/leo-yangli/l0-arm.
△ Less
Submitted 11 September, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
OPENMENDEL: A Cooperative Programming Project for Statistical Genetics
Authors:
Hua Zhou,
Janet S. Sinsheimer,
Christopher A. German,
Sarah S. Ji,
Douglas M. Bates,
Benjamin B. Chu,
Kevin L. Keys,
Juhyun Kim,
Seyoon Ko,
Gordon D. Mosher,
Jeanette C. Papp,
Eric M. Sobel,
Jing Zhai,
Jin J. Zhou,
Kenneth Lange
Abstract:
Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet…
▽ More
Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Non-local U-Net for Biomedical Image Segmentation
Authors:
Zhengyang Wang,
Na Zou,
Dinggang Shen,
Shuiwang Ji
Abstract:
Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equippe…
▽ More
Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equipped with flexible global aggregation blocks, for biomedical image segmentation. These blocks can be inserted into U-Net as size-preserving processes, as well as down-sampling and up-sampling layers. We perform thorough experiments on the 3D multimodality isointense infant brain MR image segmentation task to evaluate the non-local U-Nets. Results show that our proposed models achieve top performances with fewer parameters and faster computation.
△ Less
Submitted 18 February, 2020; v1 submitted 10 December, 2018;
originally announced December 2018.
-
A Stable Minutia Descriptor based on Gabor Wavelet and Linear Discriminant Analysis
Authors:
Gwang-Il Ri,
Mun-Chol Kim,
Su-Rim Ji
Abstract:
The minutia descriptor which describes characteristics of minutia, plays a major role in fingerprint recognition. Typically, fingerprint recognition systems employ minutia descriptors to find potential correspondence between minutiae, and they use similarity between two minutia descriptors to calculate overall similarity between two fingerprint images. A good minutia descriptor can improve recogni…
▽ More
The minutia descriptor which describes characteristics of minutia, plays a major role in fingerprint recognition. Typically, fingerprint recognition systems employ minutia descriptors to find potential correspondence between minutiae, and they use similarity between two minutia descriptors to calculate overall similarity between two fingerprint images. A good minutia descriptor can improve recognition accuracy of fingerprint recognition system and largely reduce comparing time. A good minutia descriptor should have high ability to distinguish between different minutiae and at the same time should be robust in difficult conditions including poor quality image and small size image. It also should be effective in computational cost of similarity among descriptors. In this paper, a robust minutia descriptor is constructed using Gabor wavelet and linear discriminant analysis. This minutia descriptor has high distinguishing ability, stability and simple comparing method. Experimental results on FVC2004 and FVC2006 databases show that the proposed minutia descriptor is very effective in fingerprint recognition.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.
-
Large-Scale Learnable Graph Convolutional Networks
Authors:
Hongyang Gao,
Zhengyang Wang,
Shuiwang Ji
Abstract:
Convolutional neural networks (CNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In CNNs, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighbor…
▽ More
Convolutional neural networks (CNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In CNNs, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighboring units is neither fixed nor are they ordered in generic graphs, thereby hindering the applications of convolutional operations. Here, we address these challenges by proposing the learnable graph convolutional layer (LGCL). LGCL automatically selects a fixed number of neighboring nodes for each feature based on value ranking in order to transform graph data into grid-like structures in 1-D format, thereby enabling the use of regular convolutional operations on generic graphs. To enable model training on large-scale graphs, we propose a sub-graph training method to reduce the excessive memory and computational resource requirements suffered by prior methods on graph convolutions. Our experimental results on node classification tasks in both transductive and inductive learning settings demonstrate that our methods can achieve consistently better performance on the Cora, Citeseer, Pubmed citation network, and protein-protein interaction network datasets. Our results also indicate that the proposed methods using sub-graph training strategy are more efficient as compared to prior approaches.
△ Less
Submitted 12 August, 2018;
originally announced August 2018.
-
Dense Transformer Networks
Authors:
Jun Li,
Yongjun Chen,
Lei Cai,
Ian Davidson,
Shuiwang Ji
Abstract:
The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data…
▽ More
The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on natural and biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.
△ Less
Submitted 7 June, 2017; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Learning Convolutional Text Representations for Visual Question Answering
Authors:
Zhengyang Wang,
Shuiwang Ji
Abstract:
Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object re…
▽ More
Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our "CNN Inception + Gate" model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.
△ Less
Submitted 18 April, 2018; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions
Authors:
Zhengyang Wang,
Hao Yuan,
Shuiwang Ji
Abstract:
The key idea of variational auto-encoders (VAEs) resembles that of traditional auto-encoder models in which spatial information is supposed to be explicitly encoded in the latent space. However, the latent variables in VAEs are vectors, which can be interpreted as multiple feature maps of size 1x1. Such representations can only convey spatial information implicitly when coupled with powerful decod…
▽ More
The key idea of variational auto-encoders (VAEs) resembles that of traditional auto-encoder models in which spatial information is supposed to be explicitly encoded in the latent space. However, the latent variables in VAEs are vectors, which can be interpreted as multiple feature maps of size 1x1. Such representations can only convey spatial information implicitly when coupled with powerful decoders. In this work, we propose spatial VAEs that use feature maps of larger size as latent variables to explicitly capture spatial information. This is achieved by allowing the latent variables to be sampled from matrix-variate normal (MVN) distributions whose parameters are computed from the encoder network. To increase dependencies among locations on latent feature maps and reduce the number of parameters, we further propose spatial VAEs via low-rank MVN distributions. Experimental results show that the proposed spatial VAEs outperform original VAEs in capturing rich structural and spatial information.
△ Less
Submitted 22 January, 2019; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Pixel Deconvolutional Networks
Authors:
Hongyang Gao,
Hao Yuan,
Zhengyang Wang,
Shuiwang Ji
Abstract:
Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pi…
▽ More
Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel deconvolutional layer (PixelDCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular deconvolution operation. The resulting PixelDCL can be used to replace any deconvolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelDCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelDCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than deconvolutional layers. When used in image generation tasks, our PixelDCL can largely overcome the checkerboard problem suffered by regular deconvolution operations.
△ Less
Submitted 26 November, 2017; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Parallelizing Word2Vec in Multi-Core and Many-Core Architectures
Authors:
Shihao Ji,
Nadathur Satish,
Sheng Li,
Pradeep Dubey
Abstract:
Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with "Hogwild" updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose "H…
▽ More
Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with "Hogwild" updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose "HogBatch" by improving reuse of various data structures in the algorithm through the use of minibatching and negative sample sharing, hence allowing us to express the problem using matrix multiply operations. We also explore different techniques to distribute word2vec computation across nodes in a compute cluster, and demonstrate good strong scalability up to 32 nodes. The new algorithm is particularly suitable for modern multi-core/many-core architectures, especially Intel's latest Knights Landing processors, and allows us to scale up the computation near linearly across cores and nodes, and process hundreds of millions of words per second, which is the fastest word2vec implementation to the best of our knowledge.
△ Less
Submitted 23 December, 2016; v1 submitted 18 November, 2016;
originally announced November 2016.
-
Extreme Stochastic Variational Inference: Distributed and Asynchronous
Authors:
Jiong Zhang,
Parameswaran Raman,
Shihao Ji,
Hsiang-Fu Yu,
S. V. N. Vishwanathan,
Inderjit S. Dhillon
Abstract:
Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free al…
▽ More
Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. ESVI overcomes the limitations of SVI by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. We demonstrate the effectiveness of ESVI by running Latent Dirichlet Allocation (LDA) on UMBC-3B, a dataset that has a vocabulary of 3 million and a token size of 3 billion. In our experiments, we found that ESVI not only outperforms VI and SVI in wallclock-time, but also achieves a better quality solution. In addition, we propose a strategy to speed up computation and save memory when fitting large number of topics.
△ Less
Submitted 3 August, 2018; v1 submitted 31 May, 2016;
originally announced May 2016.
-
Parallelizing Word2Vec in Shared and Distributed Memory
Authors:
Shihao Ji,
Nadathur Satish,
Sheng Li,
Pradeep Dubey
Abstract:
Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its exceptional performance in many NLP applications such as named entity recognition, sentiment analysis, machine translation and question answering. State-of-the-art algor…
▽ More
Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its exceptional performance in many NLP applications such as named entity recognition, sentiment analysis, machine translation and question answering. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures but are based on vector-vector operations that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we improve reuse of various data structures in the algorithm through the use of minibatching, hence allowing us to express the problem using matrix multiply operations. We also explore different techniques to distribute word2vec computation across nodes in a compute cluster, and demonstrate good strong scalability up to 32 nodes. In combination, these techniques allow us to scale up the computation near linearly across cores and nodes, and process hundreds of millions of words per second, which is the fastest word2vec implementation to the best of our knowledge.
△ Less
Submitted 8 August, 2016; v1 submitted 15 April, 2016;
originally announced April 2016.
-
BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies
Authors:
Shihao Ji,
S. V. N. Vishwanathan,
Nadathur Satish,
Michael J. Anderson,
Pradeep Dubey
Abstract:
We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a new sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to…
▽ More
We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a new sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to view it as an extension of the DropOut strategy to the output layer, wherein we use a discriminative training loss and a weighted sampling scheme. We also establish close connections between BlackOut, importance sampling, and noise contrastive estimation (NCE). Our experiments, on the recently released one billion word language modeling benchmark, demonstrate scalability and accuracy of BlackOut; we outperform the state-of-the art, and achieve the lowest perplexity scores on this dataset. Moreover, unlike other established methods which typically require GPUs or CPU clusters, we show that a carefully implemented version of BlackOut requires only 1-10 days on a single machine to train a RNNLM with a million word vocabulary and billions of parameters on one billion words. Although we describe BlackOut in the context of RNNLM training, it can be used to any networks with large softmax output layers.
△ Less
Submitted 31 March, 2016; v1 submitted 21 November, 2015;
originally announced November 2015.
-
WordRank: Learning Word Embeddings via Robust Ranking
Authors:
Shihao Ji,
Hyokun Yun,
Pinar Yanardag,
Shin Matsushima,
S. V. N. Vishwanathan
Abstract:
Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on…
▽ More
Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on this insight, we propose a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses. The performance of WordRank is measured in word similarity and word analogy benchmarks, and the results are compared to the state-of-the-art word embedding techniques. Our algorithm is very competitive to the state-of-the- arts on large corpora, while outperforms them by a significant margin when the training set is limited (i.e., sparse and noisy). With 17 million tokens, WordRank performs almost as well as existing methods using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node distributed implementation of WordRank is publicly available for general usage.
△ Less
Submitted 27 September, 2016; v1 submitted 8 June, 2015;
originally announced June 2015.
-
Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization
Authors:
Jun Liu,
Shuiwang Ji,
Jieping Ye
Abstract:
The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l…
▽ More
The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l2,1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the l2,1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method-an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.