-
Stream-level flow matching with Gaussian processes
Authors:
Ganchao Wei,
Li Ma
Abstract:
Flow matching (FM) is a family of training algorithms for fitting continuous normalizing flows (CNFs). Conditional flow matching (CFM) exploits the fact that the marginal vector field of a CNF can be learned by fitting least-squares regression to the conditional vector field specified given one or both ends of the flow path. In this paper, we extend the CFM algorithm by defining conditional probab…
▽ More
Flow matching (FM) is a family of training algorithms for fitting continuous normalizing flows (CNFs). Conditional flow matching (CFM) exploits the fact that the marginal vector field of a CNF can be learned by fitting least-squares regression to the conditional vector field specified given one or both ends of the flow path. In this paper, we extend the CFM algorithm by defining conditional probability paths along ``streams'', instances of latent stochastic paths that connect data pairs of source and target, which are modeled with Gaussian process (GP) distributions. The unique distributional properties of GPs help preserve the ``simulation-free" nature of CFM training. We show that this generalization of the CFM can effectively reduce the variance in the estimated marginal vector field at a moderate computational cost, thereby improving the quality of the generated samples under common metrics. Additionally, adopting the GP on the streams allows for flexibly linking multiple correlated training data points (e.g., time series). We empirically validate our claim through both simulations and applications to image and neural time series data.
△ Less
Submitted 3 February, 2025; v1 submitted 30 September, 2024;
originally announced September 2024.
-
Covariance Regression for High Dimensional Neural Data via Graph
Authors:
Ganchao Wei
Abstract:
Modern recording techniques enable neuroscientists to simultaneously study neural activity across large populations of neurons, with capturing predictor-dependent correlations being a fundamental challenge in neuroscience. Moreover, the fact that input covariates often lie in restricted subdomains, according to experimental settings, makes inference even more challenging. To address these challeng…
▽ More
Modern recording techniques enable neuroscientists to simultaneously study neural activity across large populations of neurons, with capturing predictor-dependent correlations being a fundamental challenge in neuroscience. Moreover, the fact that input covariates often lie in restricted subdomains, according to experimental settings, makes inference even more challenging. To address these challenges, we propose a set of nonparametric mean-covariance regression models for high-dimensional neural activity with restricted inputs. These models reduce the dimensionality of neural responses by employing a lower-dimensional latent factor model, where both factor loadings and latent factors are predictor-dependent, to jointly model mean and covariance across covariates. The smoothness of neural activity across experimental conditions is modeled nonparametrically using two Gaussian processes (GPs), applied to both loading basis and latent factors. Additionally, to account for the covariates lying in restricted subspace, we incorporate graph information into the covariance structure. To flexibly infer the model, we use an MCMC algorithm to sample from posterior distributions. After validating and studying the properties of proposed methods by simulations, we apply them to two neural datasets (local field potential and neural spiking data) to demonstrate the usage of models for continuous and counting observations. Overall, the proposed methods provide a framework to jointly model covariate-dependent mean and covariance in high dimensional neural data, especially when the covariates lie in restricted domains. The framework is general and can be easily adapted to various applications beyond neuroscience.
△ Less
Submitted 3 February, 2025; v1 submitted 29 September, 2024;
originally announced September 2024.
-
Position: Topological Deep Learning is the New Frontier for Relational Learning
Authors:
Theodore Papamarkou,
Tolga Birdal,
Michael Bronstein,
Gunnar Carlsson,
Justin Curry,
Yue Gao,
Mustafa Hajij,
Roland Kwitt,
Pietro Liò,
Paolo Di Lorenzo,
Vasileios Maroulas,
Nina Miolane,
Farzana Nasrin,
Karthikeyan Natesan Ramamurthy,
Bastian Rieck,
Simone Scardapane,
Michael T. Schaub,
Petar Veličković,
Bei Wang,
Yusu Wang,
Guo-Wei Wei,
Ghada Zamzmi
Abstract:
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning setting…
▽ More
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.
△ Less
Submitted 6 August, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization
Authors:
Yuta Hozumi,
Guo-Wei Wei
Abstract:
Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. Howe…
▽ More
Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Bayesian Bi-clustering of Neural Spiking Activity with Latent Structures
Authors:
Ganchao Wei
Abstract:
Modern neural recording techniques allow neuroscientists to obtain spiking activity of multiple neurons from different brain regions over long time periods, which requires new statistical methods to be developed for understanding structure of the large-scale data. In this paper, we develop a bi-clustering method to cluster the neural spiking activity spatially and temporally, according to their lo…
▽ More
Modern neural recording techniques allow neuroscientists to obtain spiking activity of multiple neurons from different brain regions over long time periods, which requires new statistical methods to be developed for understanding structure of the large-scale data. In this paper, we develop a bi-clustering method to cluster the neural spiking activity spatially and temporally, according to their low-dimensional latent structures. The spatial (neuron) clusters are defined by the latent trajectories within each neural population, while the temporal (state) clusters are defined by (populationally) synchronous local linear dynamics shared with different periods. To flexibly extract the bi-clustering structure, we build the model non-parametrically, and develop an efficient Markov chain Monte Carlo (MCMC) algorithm to sample the posterior distributions of model parameters. Validating our proposed MCMC algorithm through simulations, we find the method can recover unknown parameters and true bi-clustering structures successfully. We then apply the proposed bi-clustering method to multi-regional neural recordings under different experiment settings, where we find that simultaneously considering latent trajectories and spatial-temporal clustering structures can provide us with a more accurate and interpretable result. Overall, the proposed method provides scientific insights for large-scale (counting) time series with elongated recording periods, and it can potentially have application beyond neuroscience.
△ Less
Submitted 26 December, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Convolutional Non-homogeneous Poisson Process with Application to Wildfire Risk Quantification for Power Delivery Networks
Authors:
Guanzhou Wei,
Feng Qiu,
Xiao Liu
Abstract:
The current projection shows that much of the continental U.S. will have significantly hotter and drier days in the following decades, leading to more wildfire hazards that threaten the safety of power grid. Unfortunately, the U.S. power industry is not well prepared and still predominantly relies on empirical fire indices which do not consider the full spectrum of dynamic environmental factors. T…
▽ More
The current projection shows that much of the continental U.S. will have significantly hotter and drier days in the following decades, leading to more wildfire hazards that threaten the safety of power grid. Unfortunately, the U.S. power industry is not well prepared and still predominantly relies on empirical fire indices which do not consider the full spectrum of dynamic environmental factors. This paper proposes a new spatio-temporal point process model, Convolutional Non-homogeneous Poisson Process (cNHPP), to quantify wildfire risks for power delivery networks. The proposed model captures both the current short-term and cumulative long-term effects of covariates on wildfire risks, and the spatio-temporal dependency among different segments of the power delivery network. The computation and interpretation of the intensity function are thoroughly investigated, and the connection between cNHPP and Recurrent Neural Network is also discussed. We apply the proposed approach to estimate wildfire risks on major transmission lines in California, utilizing historical fire data, meteorological and vegetation data obtained from the National Oceanic and Atmospheric Administration and National Aeronautics and Space Administration. Comparison studies are performed to show the applicability and predictive capability of the proposed approach. Useful insights are obtained that potentially enhance power grid resilience against wildfires.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
Gibbs Phenomenon Suppression in PDE-Based Statistical Spatio-Temporal Models
Authors:
Guanzhou Wei,
Xiao Liu,
Russell Barton
Abstract:
A class of physics-informed spatio-temporal models has recently been proposed for modeling spatio-temporal processes governed by advection-diffusion equations. The central idea is to approximate the process by a truncated Fourier series and let the governing physics determine the dynamics of the spectral coefficients. However, because many spatio-temporal processes in real applications are non-per…
▽ More
A class of physics-informed spatio-temporal models has recently been proposed for modeling spatio-temporal processes governed by advection-diffusion equations. The central idea is to approximate the process by a truncated Fourier series and let the governing physics determine the dynamics of the spectral coefficients. However, because many spatio-temporal processes in real applications are non-periodic with boundary discontinuities, the well-known Gibbs phenomenon and ripple artifact almost always exist in the outputs generated by such models due to truncation of the Fourier series. Hence, the key contribution of this paper is to propose a physics-informed spatio-temporal modeling approach that significantly suppresses the Gibbs phenomenon when modeling spatio-temporal advection-diffusion processes. The proposed approach starts with a data flipping procedure for the process respectively along the horizontal and vertical directions (as if we were unfolding a piece of paper that has been folded twice along the two directions). Because the flipped process becomes spatially periodic and has a complete waveform without any boundary discontinuities, the Gibbs phenomenon disappears even if the Fourier series is truncated. Then, for the flipped process and given the Partial Differential Equation (PDE) that governs the process, this paper extends an existing PDE-based spatio-temporal model by obtaining the new temporal dynamics of the spectral coefficients, while maintaining the physical interpretation of the flipped process. Numerical investigations based on a real dataset have been performed to demonstrate the advantages of the proposed approach. It is found that the proposed approach effectively suppresses the Gibbs Phenomenon and significantly reduces the ripple artifact in modeling spatio-temporal advection-diffusion processes. Computer code is available on GitHub.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
An integer grid bridge sampler for the Bayesian inference of incomplete birth-death records
Authors:
Lin Sun,
Gang Wei
Abstract:
A one-to-one correspondence is established between the bridge path space of birth-death processes and the exclusive union of the product spaces of simplexes and integer grids. Formulae are derived for the exact counting of the integer grid bridges with fixed number of upward jumps. Then a uniform sampler over such restricted bridge path space is constructed. This leads to a Monte Carlo scheme, the…
▽ More
A one-to-one correspondence is established between the bridge path space of birth-death processes and the exclusive union of the product spaces of simplexes and integer grids. Formulae are derived for the exact counting of the integer grid bridges with fixed number of upward jumps. Then a uniform sampler over such restricted bridge path space is constructed. This leads to a Monte Carlo scheme, the integer grid bridge sampler, IGBS, to evaluate the transition probabilities of birth-death processes. Even the near zero probability of rare event could now be evaluated with controlled relative error. The IGBS based Bayesian inference for the incomplete birth-death observations is readily performed in demonstrating examples and in the analysis of a severely incomplete data set recording a real epidemic event. Comparison is performed with the basic bootstrap filter, an elementary sequential importance resampling algorithm. The haunting filtering failure has found no position in the new scheme.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Physics-Informed Statistical Modeling for Wildfire Aerosols Process Using Multi-Source Geostationary Satellite Remote-Sensing Data Streams
Authors:
Guanzhou Wei,
Venkat Krishnan,
Yu Xie,
Manajit Sengupta,
Yingchen Zhang,
Haitao Liao,
Xiao Liu
Abstract:
Increasingly frequent wildfires significantly affect solar energy production as the atmospheric aerosols generated by wildfires diminish the incoming solar radiation to the earth. Atmospheric aerosols are measured by Aerosol Optical Depth (AOD), and AOD data streams can be retrieved and monitored by geostationary satellites. However, multi-source remote-sensing data streams often present heterogen…
▽ More
Increasingly frequent wildfires significantly affect solar energy production as the atmospheric aerosols generated by wildfires diminish the incoming solar radiation to the earth. Atmospheric aerosols are measured by Aerosol Optical Depth (AOD), and AOD data streams can be retrieved and monitored by geostationary satellites. However, multi-source remote-sensing data streams often present heterogeneous characteristics, including different data missing rates, measurement errors, systematic biases, and so on. To accurately estimate and predict the underlying AOD propagation process, there exist practical needs and theoretical interests to propose a physics-informed statistical approach for modeling wildfire AOD propagation by simultaneously utilizing, or fusing, multi-source heterogeneous satellite remote-sensing data streams. Leveraging a spectral approach, the proposed approach integrates multi-source satellite data streams with a fundamental advection-diffusion equation that governs the AOD propagation process. A bias correction process is included in the statistical model to account for the bias of the physics model and the truncation error of the Fourier series. The proposed approach is applied to California wildfires AOD data streams obtained from the National Oceanic and Atmospheric Administration. Comprehensive numerical examples are provided to demonstrate the predictive capabilities and model interpretability of the proposed approach. Computer code has been made available on GitHub.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
CCP: Correlated Clustering and Projection for Dimensionality Reduction
Authors:
Yuta Hozumi,
Rui Wang,
Guo-Wei Wei
Abstract:
Most dimensionality reduction methods employ frequency domain representations obtained from matrix diagonalization and may not be efficient for large datasets with relatively high intrinsic dimensions. To address this challenge, Correlated Clustering and Projection (CCP) offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correl…
▽ More
Most dimensionality reduction methods employ frequency domain representations obtained from matrix diagonalization and may not be efficient for large datasets with relatively high intrinsic dimensions. To address this challenge, Correlated Clustering and Projection (CCP) offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correlated clusters and then projects correlated features in each cluster into a one-dimensional representation based on sample correlations. Residue-Similarity (R-S) scores and indexes, the shape of data in Riemannian manifolds, and algebraic topology-based persistent Laplacian are introduced for visualization and analysis. Proposed methods are validated with benchmark datasets associated with various machine learning algorithms.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
A Flexible Bayesian Clustering of Dynamic Subpopulations in Neural Spiking Activity
Authors:
Ganchao Wei,
Ian H. Stevenson,
Xiaojing Wang
Abstract:
With advances in neural recording techniques, neuroscientists are now able to record the spiking activity of many hundreds of neurons simultaneously, and new statistical methods are needed to understand the structure of this large-scale neural population activity. Although previous work has tried to summarize neural activity within and between known populations by extracting low-dimensional latent…
▽ More
With advances in neural recording techniques, neuroscientists are now able to record the spiking activity of many hundreds of neurons simultaneously, and new statistical methods are needed to understand the structure of this large-scale neural population activity. Although previous work has tried to summarize neural activity within and between known populations by extracting low-dimensional latent factors, in many cases what determines a unique population may be unclear. Neurons differ in their anatomical location, but also, in their cell types and response properties. To identify populations directly related to neural activity, we develop a clustering method based on a mixture of dynamic Poisson factor analyzers (mixDPFA) model, with the number of clusters and dimension of latent factors for each cluster treated as unknown parameters. To analyze the proposed mixDPFA model, we propose a Markov chain Monte Carlo (MCMC) algorithm to efficiently sample its posterior distribution. Validating our proposed MCMC algorithm through simulations, we find that it can accurately recover the unknown parameters and the true clustering in the model, and is insensitive to the initial cluster assignments. We then apply the proposed mixDPFA model to multi-region experimental recordings, where we find that the proposed method can identify novel, reliable clusters of neurons based on their activity, and may, thus, be a useful tool for neural data analysis.
△ Less
Submitted 2 March, 2023; v1 submitted 21 May, 2022;
originally announced May 2022.
-
Dynamic modeling of spike count data with Conway-Maxwell Poisson variability
Authors:
Ganchao Wei,
Ian H. Stevenson
Abstract:
In many areas of the brain, neural spiking activity covaries with features of the external world, such as sensory stimuli or an animal's movement. Experimental findings suggest that the variability of neural activity changes over time and may provide information about the external world beyond the information provided by the average neural activity. To flexibly track time-varying neural response p…
▽ More
In many areas of the brain, neural spiking activity covaries with features of the external world, such as sensory stimuli or an animal's movement. Experimental findings suggest that the variability of neural activity changes over time and may provide information about the external world beyond the information provided by the average neural activity. To flexibly track time-varying neural response properties, here we developed a dynamic model with Conway-Maxwell Poisson (CMP) observations. The CMP distribution can flexibly describe firing patterns that are both under- and over-dispersed relative to the Poisson distribution. Here we track parameters of the CMP distribution as they vary over time. Using simulations, we show that a normal approximation can accurately track dynamics in state vectors for both the centering and shape parameters ($λ$ and $ν$). We then fit our model to neural data from neurons in primary visual cortex and "place cells" in the hippocampus. We find that this method out-performs previous dynamic models based on the Poisson distribution. The dynamic CMP model provides a flexible framework for tracking time-varying non-Poisson count data and may also have applications beyond neuroscience.
△ Less
Submitted 8 October, 2022; v1 submitted 1 May, 2022;
originally announced May 2022.
-
Tracking fast and slow changes in synaptic weights from simultaneously observed pre- and postsynaptic spiking
Authors:
Ganchao Wei,
Ian H. Stevenson
Abstract:
Synapses change on multiple timescales, ranging from milliseconds to minutes, due to a combination of both short- and long-term plasticity. Here we develop an extension of the common Generalized Linear Model to infer both short- and long-term changes in the coupling between a pre- and post-synaptic neuron based on observed spiking activity. We model short-term synaptic plasticity using additive ef…
▽ More
Synapses change on multiple timescales, ranging from milliseconds to minutes, due to a combination of both short- and long-term plasticity. Here we develop an extension of the common Generalized Linear Model to infer both short- and long-term changes in the coupling between a pre- and post-synaptic neuron based on observed spiking activity. We model short-term synaptic plasticity using additive effects that depend on the presynaptic spike timing, and we model long-term changes in both synaptic weight and baseline firing rate using point process adaptive smoothing. Using simulations, we first show that this model can accurately recover time-varying synaptic weights 1) for both depressing and facilitating synapses, 2) with a variety of long-term changes (including realistic changes, such as due to STDP), 3) with a range of pre- and post-synaptic firing rates, and 4) for both excitatory and inhibitory synapses. We then apply our model to two experimentally recorded putative synaptic connections. We find that simultaneously tracking fast changes in synaptic weights, slow changes in synaptic weights, and unexplained variations in baseline firing is essential. Omitting any one of these factors can lead to spurious inferences for the others. Altogether, this model provides a flexible framework for tracking short- and long-term variation in spike transmission.
△ Less
Submitted 8 April, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
A binary-activation, multi-level weight RNN and training algorithm for ADC-/DAC-free and noise-resilient processing-in-memory inference with eNVM
Authors:
Siming Ma,
David Brooks,
Gu-Yeon Wei
Abstract:
We propose a new algorithm for training neural networks with binary activations and multi-level weights, which enables efficient processing-in-memory circuits with embedded nonvolatile memories (eNVM). Binary activations obviate costly DACs and ADCs. Multi-level weights leverage multi-level eNVM cells. Compared to existing algorithms, our method not only works for feed-forward networks (e.g., full…
▽ More
We propose a new algorithm for training neural networks with binary activations and multi-level weights, which enables efficient processing-in-memory circuits with embedded nonvolatile memories (eNVM). Binary activations obviate costly DACs and ADCs. Multi-level weights leverage multi-level eNVM cells. Compared to existing algorithms, our method not only works for feed-forward networks (e.g., fully-connected and convolutional), but also achieves higher accuracy and noise resilience for recurrent networks. In particular, we present an RNN-based trigger-word detection PIM accelerator, with detailed hardware noise models and circuit co-design techniques, and validate our algorithm's high inference accuracy and robustness against a variety of real hardware non-idealities.
△ Less
Submitted 12 October, 2020; v1 submitted 29 November, 2019;
originally announced December 2019.
-
MLPerf Training Benchmark
Authors:
Peter Mattson,
Christine Cheng,
Cody Coleman,
Greg Diamos,
Paulius Micikevicius,
David Patterson,
Hanlin Tang,
Gu-Yeon Wei,
Peter Bailis,
Victor Bittorf,
David Brooks,
Dehao Chen,
Debojyoti Dutta,
Udit Gupta,
Kim Hazelwood,
Andrew Hock,
Xinyuan Huang,
Atsushi Ike,
Bill Jia,
Daniel Kang,
David Kanter,
Naveen Kumar,
Jeffery Liao,
Guokai Ma,
Deepak Narayanan
, et al. (12 additional authors not shown)
Abstract:
Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits h…
▽ More
Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits high variance, and software and hardware systems are so diverse that fair benchmarking with the same binary, code, and even hyperparameters is difficult. We therefore present MLPerf, an ML benchmark that overcomes these challenges. Our analysis quantitatively evaluates MLPerf's efficacy at driving performance and scalability improvements across two rounds of results from multiple vendors.
△ Less
Submitted 2 March, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.
-
AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference
Authors:
Thierry Tambe,
En-Yu Yang,
Zishen Wan,
Yuntian Deng,
Vijay Janapa Reddi,
Alexander Rush,
David Brooks,
Gu-Yeon Wei
Abstract:
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optim…
▽ More
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.
△ Less
Submitted 11 February, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Authors:
Yu Emma Wang,
Gu-Yeon Wei,
David Brooks
Abstract:
Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six…
▽ More
Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for deep learning that generates end-to-end models for fully connected (FC), convolutional (CNN), and recurrent (RNN) neural networks. Along with six real-world models, we benchmark Google's Cloud TPU v2/v3, NVIDIA's V100 GPU, and an Intel Skylake CPU platform. We take a deep dive into TPU architecture, reveal its bottlenecks, and highlight valuable lessons learned for future specialized system design. We also provide a thorough comparison of the platforms and find that each has unique strengths for some types of models. Finally, we quantify the rapid performance improvements that specialized software stacks provide for the TPU and GPU platforms.
△ Less
Submitted 22 October, 2019; v1 submitted 24 July, 2019;
originally announced July 2019.
-
Seq-SetNet: Exploring Sequence Sets for Inferring Structures
Authors:
Fusong Ju,
Jianwei Zhu,
Guozheng Wei,
Qi Zhang,
Shiwei Sun,
Dongbo Bu
Abstract:
Sequence set is a widely-used type of data source in a large variety of fields. A typical example is protein structure prediction, which takes an multiple sequence alignment (MSA) as input and aims to infer structural information from it. Almost all of the existing approaches exploit MSAs in an indirect fashion, i.e., they transform MSAs into position-specific scoring matrices (PSSM) that represen…
▽ More
Sequence set is a widely-used type of data source in a large variety of fields. A typical example is protein structure prediction, which takes an multiple sequence alignment (MSA) as input and aims to infer structural information from it. Almost all of the existing approaches exploit MSAs in an indirect fashion, i.e., they transform MSAs into position-specific scoring matrices (PSSM) that represent the distribution of amino acid types at each column. PSSM could capture column-wise characteristics of MSA, however, the column-wise characteristics embedded in each individual component sequence were nearly totally neglected.
The drawback of PSSM is rooted in the fact that an MSA is essentially an unordered sequence set rather than a matrix. Specifically, the interchange of any two sequences will not affect the whole MSA. In contrast, the pixels in an image essentially form a matrix since any two rows of pixels cannot be interchanged. Therefore, the traditional deep neural networks designed for image processing cannot be directly applied on sequence sets. Here, we proposed a novel deep neural network framework (called Seq-SetNet) for sequence set processing. By employing a {\it symmetric function} module to integrate features calculated from preceding layers, Seq-SetNet are immune to the order of sequences in the input MSA. This advantage enables us to directly and fully exploit MSAs by considering each component protein individually. We evaluated Seq-SetNet by using it to extract structural information from MSA for protein secondary structure prediction. Experimental results on popular benchmark sets suggests that Seq-SetNet outperforms the state-of-the-art approaches by 3.6% in precision. These results clearly suggest the advantages of Seq-SetNet in sequence set processing and it can be readily used in a wide range of fields, say natural language processing.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
Learning Low-Rank Approximation for CNNs
Authors:
Dongsoo Lee,
Se Jung Kwon,
Byeongwook Kim,
Gu-Yeon Wei
Abstract:
Low-rank approximation is an effective model compression technique to not only reduce parameter storage requirements, but to also reduce computations. For convolutional neural networks (CNNs), however, well-known low-rank approximation methods, such as Tucker or CP decomposition, result in degraded model accuracy because decomposed layers hinder training convergence. In this paper, we propose a ne…
▽ More
Low-rank approximation is an effective model compression technique to not only reduce parameter storage requirements, but to also reduce computations. For convolutional neural networks (CNNs), however, well-known low-rank approximation methods, such as Tucker or CP decomposition, result in degraded model accuracy because decomposed layers hinder training convergence. In this paper, we propose a new training technique that finds a flat minimum in the view of low-rank approximation without a decomposed structure during training. By preserving the original model structure, 2-dimensional low-rank approximation demanding lowering (such as im2col) is available in our proposed scheme. We show that CNN models can be compressed by low-rank approximation with much higher compression ratio than conventional training methods while maintaining or even enhancing model accuracy. We also discuss various 2-dimensional low-rank approximation techniques for CNNs.
△ Less
Submitted 24 May, 2019;
originally announced May 2019.
-
Structured Compression by Weight Encryption for Unstructured Pruning and Quantization
Authors:
Se Jung Kwon,
Dongsoo Lee,
Byeongwook Kim,
Parichay Kapoor,
Baeseong Park,
Gu-Yeon Wei
Abstract:
Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations. Despite model size reduction, achieving performance enhancement on devices is, however, still challenging mainly due to the irregular representations of sparse matrix formats. This paper proposes a new weight representation scheme for S…
▽ More
Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations. Despite model size reduction, achieving performance enhancement on devices is, however, still challenging mainly due to the irregular representations of sparse matrix formats. This paper proposes a new weight representation scheme for Sparse Quantized Neural Networks, specifically achieved by fine-grained and unstructured pruning method. The representation is encrypted in a structured regular format, which can be efficiently decoded through XOR-gate network during inference in a parallel manner. We demonstrate various deep learning models that can be compressed and represented by our proposed format with fixed and high compression ratio. For example, for fully-connected layers of AlexNet on ImageNet dataset, we can represent the sparse weights by only 0.28 bits/weight for 1-bit quantization and 91% pruning rate with a fixed decoding rate and full memory bandwidth usage. Decoding through XOR-gate network can be performed without any model accuracy degradation with additional patch data associated with small overhead.
△ Less
Submitted 5 March, 2020; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Network Pruning for Low-Rank Binary Indexing
Authors:
Dongsoo Lee,
Se Jung Kwon,
Byeongwook Kim,
Parichay Kapoor,
Gu-Yeon Wei
Abstract:
Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs). Computations using sparse matrices obtained by pruning parameters, however, exhibit vastly different parallelism depending on the index representation scheme. As a result, fine-grained pruning has not gained much attention due to its irregular index form leading to large mem…
▽ More
Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs). Computations using sparse matrices obtained by pruning parameters, however, exhibit vastly different parallelism depending on the index representation scheme. As a result, fine-grained pruning has not gained much attention due to its irregular index form leading to large memory footprint and low parallelism for convolutions and matrix multiplications. In this paper, we propose a new network pruning technique that generates a low-rank binary index matrix to compress index data while decompressing index data is performed by simple binary matrix multiplication. This proposed compression method finds a particular fine-grained pruning mask that can be decomposed into two binary matrices. We also propose a tile-based factorization technique that not only lowers memory requirements but also enhances compression ratio. Various DNN models can be pruned with much fewer indexes compared to previous sparse matrix formats while maintaining the same pruning rate.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
Weightless: Lossy Weight Encoding For Deep Neural Network Compression
Authors:
Brandon Reagen,
Udit Gupta,
Robert Adolf,
Michael M. Mitzenmacher,
Alexander M. Rush,
Gu-Yeon Wei,
David Brooks
Abstract:
The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding which complements conventional compression techniques. The…
▽ More
The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding which complements conventional compression techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, Weightless, can compress DNN weights by up to 496x with the same model accuracy. This results in up to a 1.51x improvement over the state-of-the-art.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction
Authors:
Kedi Wu,
Guo-Wei Wei
Abstract:
Toxicity analysis and prediction are of paramount importance to human health and environmental protection. Existing computational methods are built from a wide variety of descriptors and regressors, which makes their performance analysis difficult. For example, deep neural network (DNN), a successful approach in many occasions, acts like a black box and offers little conceptual elegance or physica…
▽ More
Toxicity analysis and prediction are of paramount importance to human health and environmental protection. Existing computational methods are built from a wide variety of descriptors and regressors, which makes their performance analysis difficult. For example, deep neural network (DNN), a successful approach in many occasions, acts like a black box and offers little conceptual elegance or physical understanding. The present work constructs a common set of microscopic descriptors based on established physical models for charges, surface areas and free energies to assess the performance of multi-task convolutional neural network (MT-CNN) architectures and a few other approaches, including random forest (RF) and gradient boosting decision tree (GBDT), on an equal footing. Comparison is also given to convolutional neural network (CNN) and non-convolutional deep neural network (DNN) algorithms. Four benchmark toxicity data sets (i.e., endpoints) are used to evaluate various approaches. Extensive numerical studies indicate that the present MT-CNN architecture is able to outperform the state-of-the-art methods.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
The factorization and simulation for fundamental solution of Cauchy problem
Authors:
Xinjun Gan,
Gang Wei,
Jie Zhang,
Qi Zhang
Abstract:
In this paper, we demonstrate the simulation of fundamental solution for the parabolic equation by the relationship with Ito diffusion. The factorization and Monte Carlo methods of the fundamental solution are considered. With the fact that the fundamental solution can be written as a product of the transition function and the expectation of a bridge path integral, we give an novel and efficient a…
▽ More
In this paper, we demonstrate the simulation of fundamental solution for the parabolic equation by the relationship with Ito diffusion. The factorization and Monte Carlo methods of the fundamental solution are considered. With the fact that the fundamental solution can be written as a product of the transition function and the expectation of a bridge path integral, we give an novel and efficient algorithm to simulate the fundamental solution by importance sampling method, especially for dealing with the multi-dimensional case.
△ Less
Submitted 4 July, 2014; v1 submitted 20 March, 2014;
originally announced March 2014.