Skip to main content

Showing 1–40 of 40 results for author: Ji, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.14944  [pdf, other

    cs.LG cs.AI cs.NE q-bio.QM stat.ML

    Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

    Authors: Masatoshi Uehara, Xingyu Su, Yulai Zhao, Xiner Li, Aviv Regev, Shuiwang Ji, Sergey Levine, Tommaso Biancalani

    Abstract: To fully leverage the capabilities of diffusion models, we are often interested in optimizing downstream reward functions during inference. While numerous algorithms for reward-guided generation have been recently proposed due to their significance, current approaches predominantly focus on single-shot generation, transitioning from fully noised to denoised states. We propose a novel framework for… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: Under review. If you have any suggestions/missing references, please let us know

  2. arXiv:2408.08252  [pdf, other

    cs.LG cs.AI q-bio.GN stat.ML

    Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

    Authors: Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Shuiwang Ji, Aviv Regev, Sergey Levine, Masatoshi Uehara

    Abstract: Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, class… ▽ More

    Submitted 24 October, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: The code is available at https://github.com/masa-ue/SVDD

  3. arXiv:2408.02122  [pdf, other

    stat.CO stat.AP stat.ME

    Graph-Enabled Fast MCMC Sampling with an Unknown High-Dimensional Prior Distribution

    Authors: Chenyang Zhong, Shouxuan Ji, Tian Zheng

    Abstract: Posterior sampling is a task of central importance in Bayesian inference. For many applications in Bayesian meta-analysis and Bayesian transfer learning, the prior distribution is unknown and needs to be estimated from samples. In practice, the prior distribution can be high-dimensional, adding to the difficulty of efficient posterior inference. In this paper, we propose a novel Markov chain Monte… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 45 pages, 11 figures

  4. arXiv:2211.07429  [pdf, other

    q-bio.NC cs.LG eess.IV stat.CO stat.ME

    Accounting for Temporal Variability in Functional Magnetic Resonance Imaging Improves Prediction of Intelligence

    Authors: Yang Li, Xin Ma, Raj Sunderraman, Shihao Ji, Suprateek Kundu

    Abstract: Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dy… ▽ More

    Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

  5. arXiv:2205.03505  [pdf, other

    stat.ME

    A Flexible Quasi-Copula Distribution for Statistical Modeling

    Authors: Sarah S. Ji, Benjamin B. Chu, Hua Zhou, Kenneth Lange

    Abstract: Copulas, generalized estimating equations, and generalized linear mixed models promote the analysis of grouped data where non-normal responses are correlated. Unfortunately, parameter estimation remains challenging in these three frameworks. Based on prior work of Tonda, we derive a new class of probability density functions that allow explicit calculation of moments, marginal and conditional dist… ▽ More

    Submitted 14 October, 2024; v1 submitted 6 May, 2022; originally announced May 2022.

  6. arXiv:2009.12027  [pdf, other

    cs.LG cs.CV stat.ML

    A Unified Plug-and-Play Framework for Effective Data Denoising and Robust Abstention

    Authors: Krishanu Sarker, Xiulong Yang, Yang Li, Saeid Belkasim, Shihao Ji

    Abstract: The success of Deep Neural Networks (DNNs) highly depends on data quality. Moreover, predictive uncertainty makes high performing DNNs risky for real-world deployment. In this paper, we aim to address these two issues by proposing a unified filtering framework leveraging underlying data density, that can effectively denoise training data as well as avoid predicting uncertain test data points. Our… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    Comments: Under review

  7. arXiv:2008.13072  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Privacy Preserving Graph Embedding against Inference Attack

    Authors: Kaiyang Li, Guangchun Luo, Yang Ye, Wei Li, Shihao Ji, Zhipeng Cai

    Abstract: Recently, the surge in popularity of Internet of Things (IoT), mobile devices, social media, etc. has opened up a large source for graph data. Graph embedding has been proved extremely useful to learn low-dimensional feature representations from graph structured data. These feature representations can be used for a variety of prediction tasks from node classification to link prediction. However, e… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

  8. arXiv:2007.09334  [pdf, other

    cs.LG q-bio.MN stat.ML

    Deep Learning of High-Order Interactions for Protein Interface Prediction

    Authors: Yi Liu, Hao Yuan, Lei Cai, Shuiwang Ji

    Abstract: Protein interactions are important in a broad range of biological processes. Traditionally, computational methods have been developed to automatically predict protein interface from hand-crafted features. Recent approaches employ deep neural networks and predict the interaction of each amino acid pair independently. However, these methods do not incorporate the important sequential information fro… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: 10 pages, 3 figures, 4 tables. KDD2020

  9. Towards Deeper Graph Neural Networks

    Authors: Meng Liu, Hongyang Gao, Shuiwang Ji

    Abstract: Graph neural networks have shown significant success in the field of graph representation learning. Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations. Nevertheless, one layer of these neighborhood aggregation methods only consider immediate neighbors, and the performance decreases when going deeper to enable larger receptive fields. Severa… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: 11 pages, KDD2020

  10. arXiv:2007.04395  [pdf, other

    cs.LG cs.AI stat.ML

    Multilevel Graph Matching Networks for Deep Graph Similarity Learning

    Authors: Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X. Liu, Chunming Wu, Shouling Ji

    Abstract: While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., be… ▽ More

    Submitted 7 August, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS)

  11. arXiv:2006.11890  [pdf, other

    cs.LG cs.CR stat.ML

    Graph Backdoor

    Authors: Zhaohan Xi, Ren Pang, Shouling Ji, Ting Wang

    Abstract: One intriguing property of deep neural networks (DNNs) is their inherent vulnerability to backdoor attacks -- a trojan model responds to trigger-embedded inputs in a highly predictable manner while functioning normally otherwise. Despite the plethora of prior work on DNNs for continuous data (e.g., images), the vulnerability of graph neural networks (GNNs) for discrete-structured data (e.g., graph… ▽ More

    Submitted 9 August, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: USENIX Security Symposium 2021, implementation: https://github.com/HarrialX/GraphBackdoor

  12. arXiv:2006.09539  [pdf, other

    cs.LG cs.CR stat.ML

    AdvMind: Inferring Adversary Intent of Black-Box Attacks

    Authors: Ren Pang, Xinyang Zhang, Shouling Ji, Xiapu Luo, Ting Wang

    Abstract: Deep neural networks (DNNs) are inherently susceptible to adversarial attacks even under black-box settings, in which the adversary only has query access to the target models. In practice, while it may be possible to effectively detect such attacks (e.g., observing massive similar but non-identical queries), it is often challenging to exactly infer the adversary intent (e.g., the target class of t… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted as a full paper at KDD 2020

  13. XGNN: Towards Model-Level Explanations of Graph Neural Networks

    Authors: Hao Yuan, Jiliang Tang, Xia Hu, Shuiwang Ji

    Abstract: Graphs neural networks (GNNs) learn node features by aggregating and combining neighbor information, which have achieved promising performance on many graph tasks. However, GNNs are mostly treated as black-boxes and lack human intelligible explanations. Thus, they cannot be fully trusted and used in certain application domains if GNN models cannot be explained. In this work, we propose a novel app… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

  14. Non-Local Graph Neural Networks

    Authors: Meng Liu, Zhengyang Wang, Shuiwang Ji

    Abstract: Modern graph neural networks (GNNs) learn node embeddings through multilayer local aggregation and achieve great success in applications on assortative graphs. However, tasks on disassortative graphs usually require non-local aggregation. In addition, we find that local aggregation is even harmful for some disassortative graphs. In this work, we propose a simple yet effective non-local aggregation… ▽ More

    Submitted 10 December, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: 8 pages, 2 figures, accepted by TPAMI

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

  15. arXiv:2003.00653  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Attacks and Defenses on Graphs: A Review, A Tool and Empirical Studies

    Authors: Wei Jin, Yaxin Li, Han Xu, Yiqi Wang, Shuiwang Ji, Charu Aggarwal, Jiliang Tang

    Abstract: Deep neural networks (DNNs) have achieved significant performance in various tasks. However, recent studies have shown that DNNs can be easily fooled by small perturbation on the input, called adversarial attacks. As the extensions of DNNs to graphs, Graph Neural Networks (GNNs) have been demonstrated to inherit this vulnerability. Adversary can mislead GNNs to give wrong predictions by modifying… ▽ More

    Submitted 12 December, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

    Comments: Accepted by SIGKDD Explorations

  16. arXiv:1912.09015  [pdf

    cs.LG cs.AI eess.IV eess.SP stat.ML

    Deep Reinforcement Learning Designed Shinnar-Le Roux RF Pulse using Root-Flipping: DeepRF_SLR

    Authors: Dongmyung Shin, Sooyeon Ji, Doohee Lee, Jieun Lee, Se-Hong Oh, Jongho Lee

    Abstract: A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as DeepRF_SLR, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape,… ▽ More

    Submitted 1 September, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted at IEEE transactions on Medical Imaging (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9174664)

  17. arXiv:1912.01810  [pdf, other

    cs.LG stat.ML

    Learning with Multiplicative Perturbations

    Authors: Xiulong Yang, Shihao Ji

    Abstract: Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms, that generate \textbf{multiplicative} perturbations to input examples for robust trai… ▽ More

    Submitted 22 June, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted as a conference paper at ICPR 2020

  18. arXiv:1912.00552  [pdf, other

    cs.LG stat.ML

    Sparse Graph Attention Networks

    Authors: Yang Ye, Shihao Ji

    Abstract: Graph Neural Networks (GNNs) have proved to be an effective representation learning framework for graph-structured data, and have achieved state-of-the-art performance on many practical predictive tasks, such as node classification, link prediction and graph classification. Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors o… ▽ More

    Submitted 10 April, 2021; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: Published as a journal paper at IEEE TKDE 2021

  19. arXiv:1911.11121  [pdf, other

    cs.LG stat.ML

    Efficient Global String Kernel with Random Features: Beyond Counting Substructures

    Authors: Lingfei Wu, Ian En-Hsu Yen, Siyu Huo, Liang Zhao, Kun Xu, Liang Ma, Shouling Ji, Charu Aggarwal

    Abstract: Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the string, which hardly capture long discriminative patterns, (ii) sum over too many substructures, such as all possible subsequences, which leads to diagonal dominance… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

    Comments: KDD'19 Oral Paper, Data and Code link available in the paper

  20. arXiv:1911.01559  [pdf, other

    cs.LG cs.CR stat.ML

    A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models

    Authors: Ren Pang, Hua Shen, Xinyang Zhang, Shouling Ji, Yevgeniy Vorobeychik, Xiapu Luo, Alex Liu, Ting Wang

    Abstract: Despite their tremendous success in a range of domains, deep learning systems are inherently susceptible to two types of manipulations: adversarial inputs -- maliciously crafted samples that deceive target deep neural network (DNN) models, and poisoned models -- adversely forged DNNs that misbehave on pre-defined inputs. While prior work has intensively studied the two attack vectors in parallel,… ▽ More

    Submitted 20 November, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: Accepted as a full paper at ACM CCS 2020

  21. arXiv:1908.08988  [pdf, other

    cs.CV cs.LG stat.ML

    Neural Image Compression and Explanation

    Authors: Xiang Li, Shihao Ji

    Abstract: Explaining the prediction of deep neural networks (DNNs) and semantic image compression are two active research areas of deep learning with a numerous of applications in decision-critical systems, such as surveillance cameras, drones and self-driving cars, where interpretable decision is critical and storage/network bandwidth is limited. In this paper, we propose a novel end-to-end Neural Image Co… ▽ More

    Submitted 7 December, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

    Comments: Published as a journal paper at IEEE Access 2020

  22. arXiv:1908.08118  [pdf, other

    cs.NE cs.LG stat.ML

    Neural Plasticity Networks

    Authors: Yang Li, Shihao Ji

    Abstract: Neural plasticity is an important functionality of human brain, in which number of neurons and synapses can shrink or expand in response to stimuli throughout the span of life. We model this dynamic learning process as an $L_0$-norm regularized binary optimization problem, in which each unit of a neural network (e.g., weight, neuron or channel, etc.) is attached with a stochastic binary gate, whos… ▽ More

    Submitted 1 May, 2021; v1 submitted 13 August, 2019; originally announced August 2019.

    Comments: Published as a conference paper at IJCNN 2021

  23. arXiv:1907.04652  [pdf, other

    cs.LG stat.ML

    Graph Representation Learning via Hard and Channel-Wise Attention Networks

    Authors: Hongyang Gao, Shuiwang Ji

    Abstract: Attention operators have been widely applied in various fields, including computer vision, natural language processing, and network embedding learning. Attention operators on graph data enables learnable weights when aggregating information from neighboring nodes. However, graph attention operators (GAOs) consume excessive computational resources, preventing their applications on large graphs. In… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

    Comments: 9 pages, KDD19

  24. arXiv:1907.00941  [pdf, other

    eess.IV cs.LG stat.ML

    Global Pixel Transformers for Virtual Staining of Microscopy Images

    Authors: Yi Liu, Hao Yuan, Zhengyang Wang, Shuiwang Ji

    Abstract: Visualizing the details of different cellular structures is of great importance to elucidate cellular functions. However, it is challenging to obtain high quality images of different structures directly due to complex cellular environments. Fluorescence staining is a popular technique to label different structures but has several drawbacks. In particular, label staining is time consuming and may a… ▽ More

    Submitted 30 September, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: 10 pages, 6 figures, 5 tables

  25. arXiv:1905.05178  [pdf, other

    cs.LG stat.ML

    Graph U-Nets

    Authors: Hongyang Gao, Shuiwang Ji

    Abstract: We consider the problem of representation learning for graph data. Convolutional neural networks can naturally operate on images, but have significant challenges in dealing with graph data. Given images are special cases of graphs with nodes lie on 2D lattices, graph embedding tasks have a natural correspondence with image pixel-wise prediction tasks such as segmentation. While encoder-decoder arc… ▽ More

    Submitted 11 May, 2019; originally announced May 2019.

    Comments: 10 pages, ICML19

  26. $L_0$-ARM: Network Sparsification via Stochastic Binary Optimization

    Authors: Yang Li, Shihao Ji

    Abstract: We consider network sparsification as an $L_0$-norm regularized binary optimization problem, where each unit of a neural network (e.g., weight, neuron, or channel, etc.) is attached with a stochastic binary gate, whose parameters are jointly optimized with original network parameters. The Augment-Reinforce-Merge (ARM), a recently proposed unbiased gradient estimator, is investigated for this binar… ▽ More

    Submitted 11 September, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: Published as a conference paper at ECML 2019

  27. OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

    Authors: Hua Zhou, Janet S. Sinsheimer, Christopher A. German, Sarah S. Ji, Douglas M. Bates, Benjamin B. Chu, Kevin L. Keys, Juhyun Kim, Seyoon Ko, Gordon D. Mosher, Jeanette C. Papp, Eric M. Sobel, Jing Zhai, Jin J. Zhou, Kenneth Lange

    Abstract: Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

    Comments: 16 pages, 2 figures, 2 tables

    Journal ref: Human Genetics, pp 1-11, 2019 Mar 26

  28. arXiv:1812.04103  [pdf, other

    cs.CV cs.LG stat.AP stat.ML

    Non-local U-Net for Biomedical Image Segmentation

    Authors: Zhengyang Wang, Na Zou, Dinggang Shen, Shuiwang Ji

    Abstract: Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equippe… ▽ More

    Submitted 18 February, 2020; v1 submitted 10 December, 2018; originally announced December 2018.

    Comments: In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2019

  29. arXiv:1809.03326  [pdf

    cs.CV stat.AP

    A Stable Minutia Descriptor based on Gabor Wavelet and Linear Discriminant Analysis

    Authors: Gwang-Il Ri, Mun-Chol Kim, Su-Rim Ji

    Abstract: The minutia descriptor which describes characteristics of minutia, plays a major role in fingerprint recognition. Typically, fingerprint recognition systems employ minutia descriptors to find potential correspondence between minutiae, and they use similarity between two minutia descriptors to calculate overall similarity between two fingerprint images. A good minutia descriptor can improve recogni… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

  30. Large-Scale Learnable Graph Convolutional Networks

    Authors: Hongyang Gao, Zhengyang Wang, Shuiwang Ji

    Abstract: Convolutional neural networks (CNNs) have achieved great success on grid-like data such as images, but face tremendous challenges in learning from more generic data such as graphs. In CNNs, the trainable local filters enable the automatic extraction of high-level features. The computation with filters requires a fixed number of ordered units in the receptive fields. However, the number of neighbor… ▽ More

    Submitted 12 August, 2018; originally announced August 2018.

    Journal ref: In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1416-1424). ACM (2018)

  31. arXiv:1705.08881  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    Dense Transformer Networks

    Authors: Jun Li, Yongjun Chen, Lei Cai, Ian Davidson, Shuiwang Ji

    Abstract: The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data… ▽ More

    Submitted 7 June, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

  32. arXiv:1705.06824  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Learning Convolutional Text Representations for Visual Question Answering

    Authors: Zhengyang Wang, Shuiwang Ji

    Abstract: Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object re… ▽ More

    Submitted 18 April, 2018; v1 submitted 18 May, 2017; originally announced May 2017.

    Comments: Conference paper at SDM 2018. https://github.com/divelab/svae

    Journal ref: In proceedings of the 2018 SIAM International Conference on Data Mining (pp. 594-602). 2018

  33. arXiv:1705.06821  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions

    Authors: Zhengyang Wang, Hao Yuan, Shuiwang Ji

    Abstract: The key idea of variational auto-encoders (VAEs) resembles that of traditional auto-encoder models in which spatial information is supposed to be explicitly encoded in the latent space. However, the latent variables in VAEs are vectors, which can be interpreted as multiple feature maps of size 1x1. Such representations can only convey spatial information implicitly when coupled with powerful decod… ▽ More

    Submitted 22 January, 2019; v1 submitted 18 May, 2017; originally announced May 2017.

    Comments: Accepted by SDM2019. Code is publicly available at https://github.com/divelab/svae

  34. arXiv:1705.06820  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Pixel Deconvolutional Networks

    Authors: Hongyang Gao, Hao Yuan, Zhengyang Wang, Shuiwang Ji

    Abstract: Deconvolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of deconvolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pi… ▽ More

    Submitted 26 November, 2017; v1 submitted 18 May, 2017; originally announced May 2017.

    Comments: 11 pages

  35. arXiv:1611.06172  [pdf, other

    cs.DC stat.ML

    Parallelizing Word2Vec in Multi-Core and Many-Core Architectures

    Authors: Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey

    Abstract: Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with "Hogwild" updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose "H… ▽ More

    Submitted 23 December, 2016; v1 submitted 18 November, 2016; originally announced November 2016.

    Comments: NIPS Workshop on Efficient Methods for Deep Neural Networks (2016)

  36. arXiv:1605.09499  [pdf, other

    stat.ML

    Extreme Stochastic Variational Inference: Distributed and Asynchronous

    Authors: Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S. V. N. Vishwanathan, Inderjit S. Dhillon

    Abstract: Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free al… ▽ More

    Submitted 3 August, 2018; v1 submitted 31 May, 2016; originally announced May 2016.

  37. arXiv:1604.04661  [pdf, other

    cs.DC cs.CL stat.ML

    Parallelizing Word2Vec in Shared and Distributed Memory

    Authors: Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey

    Abstract: Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its exceptional performance in many NLP applications such as named entity recognition, sentiment analysis, machine translation and question answering. State-of-the-art algor… ▽ More

    Submitted 8 August, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

    Comments: Added more results

  38. arXiv:1511.06909  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

    Authors: Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey

    Abstract: We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a new sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to… ▽ More

    Submitted 31 March, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016

  39. arXiv:1506.02761  [pdf, other

    cs.CL cs.LG stat.ML

    WordRank: Learning Word Embeddings via Robust Ranking

    Authors: Shihao Ji, Hyokun Yun, Pinar Yanardag, Shin Matsushima, S. V. N. Vishwanathan

    Abstract: Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on… ▽ More

    Submitted 27 September, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), November 1-5, 2016, Austin, Texas, USA

  40. arXiv:1205.2631  [pdf

    cs.LG cs.CV stat.ML

    Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization

    Authors: Jun Liu, Shuiwang Ji, Jieping Ye

    Abstract: The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

    Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

    Report number: UAI-P-2009-PG-339-348