-
Task Affinity with Maximum Bipartite Matching in Few-Shot Learning
Authors:
Cat P. Le,
Juncheng Dong,
Mohammadreza Soltani,
Vahid Tarokh
Abstract:
We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a…
▽ More
We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one. Our method is based on the maximum bipartite matching algorithm and utilizes the Fisher Information matrix. We provide theoretical analyses demonstrating that the proposed score is mathematically well-defined, and subsequently use the affinity score to propose a novel algorithm for the few-shot learning problem. In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model. Results on various few-shot benchmark datasets demonstrate the efficacy of the proposed approach by improving the classification accuracy over the state-of-the-art methods even when using smaller models.
△ Less
Submitted 21 January, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Semi-Empirical Objective Functions for MCMC Proposal Optimization
Authors:
Chris Cannella,
Vahid Tarokh
Abstract:
Current objective functions used for training neural MCMC proposal distributions implicitly rely on architectural restrictions to yield sensible optimization results, which hampers the development of highly expressive neural MCMC proposal architectures. In this work, we introduce and demonstrate a semi-empirical procedure for determining approximate objective functions suitable for optimizing arbi…
▽ More
Current objective functions used for training neural MCMC proposal distributions implicitly rely on architectural restrictions to yield sensible optimization results, which hampers the development of highly expressive neural MCMC proposal architectures. In this work, we introduce and demonstrate a semi-empirical procedure for determining approximate objective functions suitable for optimizing arbitrarily parameterized proposal distributions in MCMC methods. Our proposed Ab Initio objective functions consist of the weighted combination of functions following constraints on their global optima and transformation invariances that we argue should be upheld by general measures of MCMC efficiency for use in proposal optimization. Our experimental results demonstrate that Ab Initio objective functions maintain favorable performance and preferable optimization behavior compared to existing objective functions for neural MCMC optimization. We find that Ab Initio objective functions are sufficiently robust to enable the confident optimization of neural proposal distributions parameterized by deep generative networks extending beyond the regimes of traditional MCMC schemes
△ Less
Submitted 9 April, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address th…
▽ More
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to `fine-tune global model with labeled data' and `generate pseudo-labels with the global model.' We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.
△ Less
Submitted 11 October, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collabo…
▽ More
Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.
△ Less
Submitted 11 October, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
A Methodology for Exploring Deep Convolutional Features in Relation to Hand-Crafted Features with an Application to Music Audio Modeling
Authors:
Anna K. Yanchenko,
Mohammadreza Soltani,
Robert J. Ravier,
Sayan Mukherjee,
Vahid Tarokh
Abstract:
Understanding the features learned by deep models is important from a model trust perspective, especially as deep systems are deployed in the real world. Most recent approaches for deep feature understanding or model explanation focus on highlighting input data features that are relevant for classification decisions. In this work, we instead take the perspective of relating deep features to well-s…
▽ More
Understanding the features learned by deep models is important from a model trust perspective, especially as deep systems are deployed in the real world. Most recent approaches for deep feature understanding or model explanation focus on highlighting input data features that are relevant for classification decisions. In this work, we instead take the perspective of relating deep features to well-studied, hand-crafted features that are meaningful for the application of interest. We propose a methodology and set of systematic experiments for exploring deep features in this setting, where input feature importance approaches for deep feature understanding do not apply. Our experiments focus on understanding which hand-crafted and deep features are useful for the classification task of interest, how robust these features are for related tasks and how similar the deep features are to the meaningful hand-crafted features. Our proposed method is general to many application areas and we demonstrate its utility on orchestral music audio data.
△ Less
Submitted 9 October, 2021; v1 submitted 31 May, 2021;
originally announced June 2021.
-
Fisher Task Distance and Its Application in Neural Architecture Search
Authors:
Cat P. Le,
Mohammadreza Soltani,
Juncheng Dong,
Vahid Tarokh
Abstract:
We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Tas…
▽ More
We formulate an asymmetric (or non-commutative) distance between tasks based on Fisher Information Matrices, called Fisher task distance. This distance represents the complexity of transferring the knowledge from one task to another. We provide a proof of consistency for our distance through theorems and experiments on various classification tasks from MNIST, CIFAR-10, CIFAR-100, ImageNet, and Taskonomy datasets. Next, we construct an online neural architecture search framework using the Fisher task distance, in which we have access to the past learned tasks. By using the Fisher task distance, we can identify the closest learned tasks to the target task, and utilize the knowledge learned from these related tasks for the target task. Here, we show how the proposed distance between a target task and a set of learned tasks can be used to reduce the neural architecture search space for the target task. The complexity reduction in search space for task-specific architectures is achieved by building on the optimized architectures for similar tasks instead of doing a full search and without using this side information. Experimental results for tasks in MNIST, CIFAR-10, CIFAR-100, ImageNet datasets demonstrate the efficacy of the proposed approach and its improvements, in terms of the performance and the number of parameters, over other gradient-based search methods, such as ENAS, DARTS, PC-DARTS.
△ Less
Submitted 30 April, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Talaria: A Framework for Simulation of Permissioned Blockchains for Logistics and Beyond
Authors:
Jiali Xing,
David Fischer,
Nitya Labh,
Ryan Piersma,
Benjamin C. Lee,
Yu Amy Xia,
Tuhin Sahai,
Vahid Tarokh
Abstract:
In this paper, we present Talaria, a novel permissioned blockchain simulator that supports numerous protocols and use cases, most notably in supply chain management. Talaria extends the capability of BlockSim, an existing blockchain simulator, to include permissioned blockchains and serves as a foundation for further private blockchain assessment. Talaria is designed with both practical Byzantine…
▽ More
In this paper, we present Talaria, a novel permissioned blockchain simulator that supports numerous protocols and use cases, most notably in supply chain management. Talaria extends the capability of BlockSim, an existing blockchain simulator, to include permissioned blockchains and serves as a foundation for further private blockchain assessment. Talaria is designed with both practical Byzantine Fault Tolerance (pBFT) and simplified version of Proof-of-Authority consensus protocols, but can be revised to include other permissioned protocols within its modular framework. Moreover, Talaria is able to simulate different types of malicious authorities and a variable daily transaction load at each node. In using Talaria, business practitioners and policy planners have an opportunity to measure, evaluate, and adapt a range of blockchain solutions for commercial operations.
△ Less
Submitted 30 March, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers
Authors:
Mohammadreza Momenifar,
Enmao Diao,
Vahid Tarokh,
Andrew D. Bragg
Abstract:
Analyzing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantization to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent fl…
▽ More
Analyzing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantization to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent flows. The deep learning framework is composed of convolutional layers and incorporates physical constraints on the flow, such as preserving incompressibility and global statistical characteristics of the velocity gradients. The accuracy of the model is assessed using statistical, comparison-based similarity and physics-based metrics. The training data set is produced from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow.
The performance of this lossy data compression scheme is evaluated not only with unseen data from the stationary, isotropic turbulent flow, but also with data from decaying isotropic turbulence, a Taylor-Green vortex flow, and a turbulent channel flow. Defining the compression ratio (CR) as the ratio of original data size to the compressed one, the results show that our model based on vector quantization can offer CR$=85$ with a mean square error (MSE) of $O(10^{-3})$, and predictions that faithfully reproduce the statistics of the flow, except at the very smallest scales where there is some loss. Compared to the recent study of Glaws. et. al. (Physical Review Fluids, 5(11):114602, 2020), which was based on a conventional autoencoder (where compression is performed in a continuous space), our model improves the CR by more than $30$ percent...
△ Less
Submitted 24 May, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Improved Automated Machine Learning from Transfer Learning
Authors:
Cat P. Le,
Mohammadreza Soltani,
Robert Ravier,
Vahid Tarokh
Abstract:
In this paper, we propose a neural architecture search framework based on a similarity measure between some baseline tasks and a target task. We first define the notion of the task similarity based on the log-determinant of the Fisher Information matrix. Next, we compute the task similarity from each of the baseline tasks to the target task. By utilizing the relation between a target and a set of…
▽ More
In this paper, we propose a neural architecture search framework based on a similarity measure between some baseline tasks and a target task. We first define the notion of the task similarity based on the log-determinant of the Fisher Information matrix. Next, we compute the task similarity from each of the baseline tasks to the target task. By utilizing the relation between a target and a set of learned baseline tasks, the search space of architectures for the target task can be significantly reduced, making the discovery of the best candidates in the set of possible architectures tractable and efficient, in terms of GPU days. This method eliminates the requirement for training the networks from scratch for a given target task as well as introducing the bias in the initialization of the search space from the human domain.
△ Less
Submitted 29 January, 2022; v1 submitted 27 February, 2021;
originally announced March 2021.
-
Generative Archimedean Copulas
Authors:
Yuting Ng,
Ali Hasan,
Khalil Elkhalil,
Vahid Tarokh
Abstract:
We propose a new generative modeling technique for learning multidimensional cumulative distribution functions (CDFs) in the form of copulas. Specifically, we consider certain classes of copulas known as Archimedean and hierarchical Archimedean copulas, popular for their parsimonious representation and ability to model different tail dependencies. We consider their representation as mixture models…
▽ More
We propose a new generative modeling technique for learning multidimensional cumulative distribution functions (CDFs) in the form of copulas. Specifically, we consider certain classes of copulas known as Archimedean and hierarchical Archimedean copulas, popular for their parsimonious representation and ability to model different tail dependencies. We consider their representation as mixture models with Laplace transforms of latent random variables from generative neural networks. This alternative representation allows for computational efficiencies and easy sampling, especially in high dimensions. We describe multiple methods for optimizing the network parameters. Finally, we present empirical results that demonstrate the efficacy of our proposed method in learning multidimensional CDFs and its computational efficiency compared to existing methods.
△ Less
Submitted 10 June, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Modeling Extremes with d-max-decreasing Neural Networks
Authors:
Ali Hasan,
Khalil Elkhalil,
Yuting Ng,
Joao M. Pereira,
Sina Farsiu,
Jose H. Blanchet,
Vahid Tarokh
Abstract:
We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasin…
▽ More
We propose a novel neural network architecture that enables non-parametric calibration and generation of multivariate extreme value distributions (MEVs). MEVs arise from Extreme Value Theory (EVT) as the necessary class of models when extrapolating a distributional fit over large spatial and temporal scales based on data observed in intermediate scales. In turn, EVT dictates that $d$-max-decreasing, a stronger form of convexity, is an essential shape constraint in the characterization of MEVs. As far as we know, our proposed architecture provides the first class of non-parametric estimators for MEVs that preserve these essential shape constraints. We show that our architecture approximates the dependence structure encoded by MEVs at parametric rate. Moreover, we present a new method for sampling high-dimensional MEVs using a generative model. We demonstrate our methodology on a wide range of experimental settings, ranging from environmental sciences to financial mathematics and verify that the structural properties of MEVs are retained compared to existing methods.
△ Less
Submitted 1 March, 2022; v1 submitted 17 February, 2021;
originally announced February 2021.
-
On Statistical Efficiency in Learning
Authors:
Jie Ding,
Enmao Diao,
Jiawei Zhou,
Vahid Tarokh
Abstract:
A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (or overfitting), while small models tend to cause biases (or underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus…
▽ More
A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (or overfitting), while small models tend to cause biases (or underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus gaining reliable predictive power. We consider the task of approaching the theoretical limit of statistical learning, meaning that the selected model has the predictive performance that is as good as the best possible model given a class of potentially misspecified candidate models. We propose a generalized notion of Takeuchi's information criterion and prove that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. It is the first proof of the asymptotic property of Takeuchi's information criterion to our best knowledge. Our proof applies to a wide variety of nonlinear models, loss functions, and high dimensionality (in the sense that the models' complexity can grow with sample size). The proposed method can be used as a computationally efficient surrogate for leave-one-out cross-validation. Moreover, for modeling streaming data, we propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce computation cost. Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Task-Aware Neural Architecture Search
Authors:
Cat P. Le,
Mohammadreza Soltani,
Robert Ravier,
Vahid Tarokh
Abstract:
The design of handcrafted neural networks requires a lot of time and resources. Recent techniques in Neural Architecture Search (NAS) have proven to be competitive or better than traditional handcrafted design, although they require domain knowledge and have generally used limited search spaces. In this paper, we propose a novel framework for neural architecture search, utilizing a dictionary of m…
▽ More
The design of handcrafted neural networks requires a lot of time and resources. Recent techniques in Neural Architecture Search (NAS) have proven to be competitive or better than traditional handcrafted design, although they require domain knowledge and have generally used limited search spaces. In this paper, we propose a novel framework for neural architecture search, utilizing a dictionary of models of base tasks and the similarity between the target task and the atoms of the dictionary; hence, generating an adaptive search space based on the base models of the dictionary. By introducing a gradient-based search algorithm, we can evaluate and discover the best architecture in the search space without fully training the networks. The experimental results show the efficacy of our proposed task-aware approach.
△ Less
Submitted 15 March, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Federated Learning (FL) is a method of training machine learning models on private data distributed over a large number of possibly heterogeneous clients such as mobile phones and IoT devices. In this work, we propose a new federated learning framework named HeteroFL to address heterogeneous clients equipped with very different computation and communication capabilities. Our solution can enable th…
▽ More
Federated Learning (FL) is a method of training machine learning models on private data distributed over a large number of possibly heterogeneous clients such as mobile phones and IoT devices. In this work, we propose a new federated learning framework named HeteroFL to address heterogeneous clients equipped with very different computation and communication capabilities. Our solution can enable the training of heterogeneous local models with varying computation complexities and still produce a single global inference model. For the first time, our method challenges the underlying assumption of existing work that local models have to share the same architecture as the global model. We demonstrate several strategies to enhance FL training and conduct extensive empirical evaluations, including five computation complexity levels of three model architecture on three datasets. We show that adaptively distributing subnetworks according to clients' capabilities is both computation and communication efficient.
△ Less
Submitted 13 December, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
GeoStat Representations of Time Series for Fast Classification
Authors:
Robert J. Ravier,
Mohammadreza Soltani,
Miguel Simões,
Denis Garagic,
Vahid Tarokh
Abstract:
Recent advances in time series classification have largely focused on methods that either employ deep learning or utilize other machine learning models for feature extraction. Though successful, their power often comes at the requirement of computational complexity. In this paper, we introduce GeoStat representations for time series. GeoStat representations are based off of a generalization of rec…
▽ More
Recent advances in time series classification have largely focused on methods that either employ deep learning or utilize other machine learning models for feature extraction. Though successful, their power often comes at the requirement of computational complexity. In this paper, we introduce GeoStat representations for time series. GeoStat representations are based off of a generalization of recent methods for trajectory classification, and summarize the information of a time series in terms of comprehensive statistics of (possibly windowed) distributions of easy to compute differential geometric quantities, requiring no dynamic time warping. The features used are intuitive and require minimal parameter tuning. We perform an exhaustive evaluation of GeoStat on a number of real datasets, showing that simple KNN and SVM classifiers trained on these representations exhibit surprising performance relative to modern single model methods requiring significant computational power, achieving state of the art results in many cases. In particular, we show that this methodology achieves good performance on a challenging dataset involving the classification of fishing vessels, where our methods achieve good performance relative to the state of the art despite only having access to approximately two percent of the dataset used in training and evaluating this state of the art.
△ Less
Submitted 11 January, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Deep Cross-Subject Mapping of Neural Activity
Authors:
Marko Angjelichinoski,
Bijan Pesaran,
Vahid Tarokh
Abstract:
Objective. In this paper, we consider the problem of cross-subject decoding, where neural activity data collected from the prefrontal cortex of a given subject (destination) is used to decode motor intentions from the neural activity of a different subject (source). Approach. We cast the problem of neural activity mapping in a probabilistic framework where we adopt deep generative modelling. Our p…
▽ More
Objective. In this paper, we consider the problem of cross-subject decoding, where neural activity data collected from the prefrontal cortex of a given subject (destination) is used to decode motor intentions from the neural activity of a different subject (source). Approach. We cast the problem of neural activity mapping in a probabilistic framework where we adopt deep generative modelling. Our proposed algorithm uses deep conditional variational autoencoder to infer the representation of the neural activity of the source subject into an adequate feature space of the destination subject where neural decoding takes place. Results. We verify our approach on an experimental data set in which two macaque monkeys perform memory-guided visual saccades to one of eight target locations. The results show a peak cross-subject decoding improvement of $8\%$ over subject-specific decoding. Conclusion. We demonstrate that a neural decoder trained on neural activity signals of one subject can be used to robustly decode the motor intentions of a different subject with high reliability. This is achieved in spite of the non-stationary nature of neural activity signals and the subject-specific variations of the recording conditions. Significance. The findings reported in this paper are an important step towards the development of cross-subject brain-computer that generalize well across a population.
△ Less
Submitted 21 February, 2022; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows
Authors:
Chris Cannella,
Mohammadreza Soltani,
Vahid Tarokh
Abstract:
We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the high-dimensional conditional distributions learned by a normalizing flow. We prove that a Metropolis-Hastings implementation of PL-MCMC asymptotically samples from the exact conditional distributions associated with a normalizing flow. As a conditional sampling method, PL-MCMC enables Monte Carlo Ex…
▽ More
We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the high-dimensional conditional distributions learned by a normalizing flow. We prove that a Metropolis-Hastings implementation of PL-MCMC asymptotically samples from the exact conditional distributions associated with a normalizing flow. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. Through experimental tests applying normalizing flows to missing data tasks for a variety of data sets, we demonstrate the efficacy of PL-MCMC for conditional sampling from normalizing flows.
△ Less
Submitted 26 February, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Fisher Auto-Encoders
Authors:
Khalil Elkhalil,
Ali Hasan,
Jie Ding,
Sina Farsiu,
Vahid Tarokh
Abstract:
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and la…
▽ More
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.
△ Less
Submitted 23 October, 2020; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Identifying Latent Stochastic Differential Equations
Authors:
Ali Hasan,
João M. Pereira,
Sina Farsiu,
Vahid Tarokh
Abstract:
We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown Itô process, the proposed method learns the mapping from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of varia…
▽ More
We present a method for learning latent stochastic differential equations (SDEs) from high-dimensional time series data. Given a high-dimensional time series generated from a lower dimensional latent unknown Itô process, the proposed method learns the mapping from ambient to latent space, and the underlying SDE coefficients, through a self-supervised learning approach. Using the framework of variational autoencoders, we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of latent variable models to show that the proposed model can recover not only the underlying SDE coefficients, but also the original latent variables, up to an isometry, in the limit of infinite data. We validate the method through several simulated video processing tasks, where the underlying SDE is known, and through real world datasets.
△ Less
Submitted 26 November, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Model Linkage Selection for Cooperative Learning
Authors:
Jiaying Zhou,
Jie Ding,
Kean Ming Tan,
Vahid Tarokh
Abstract:
We consider a distributed learning setting where each agent/learner holds a specific parametric model and data source. The goal is to integrate information across a set of learners to enhance the prediction accuracy of a given learner. A natural way to integrate information is to build a joint model across a group of learners that shares common parameters of interest. However, the underlying param…
▽ More
We consider a distributed learning setting where each agent/learner holds a specific parametric model and data source. The goal is to integrate information across a set of learners to enhance the prediction accuracy of a given learner. A natural way to integrate information is to build a joint model across a group of learners that shares common parameters of interest. However, the underlying parameter sharing patterns across a set of learners may not be a priori known. Misspecifying the parameter sharing patterns or the parametric model for each learner often yields a biased estimation and degrades the prediction accuracy. We propose a general method to integrate information across a set of learners that is robust against misspecifications of both models and parameter sharing patterns. The main crux is to sequentially incorporate additional learners that can enhance the prediction accuracy of an existing joint model based on user-specified parameter sharing patterns across a set of learners. Theoretically, we show that the proposed method can data-adaptively select the most suitable way of parameter sharing and thus enhance the predictive performance of any particular learner of interest. Extensive numerical studies show the promising performance of the proposed method.
△ Less
Submitted 20 September, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization
Authors:
Yi Zhou,
Zhe Wang,
Kaiyi Ji,
Yingbin Liang,
Vahid Tarokh
Abstract:
Various types of parameter restart schemes have been proposed for accelerated gradient algorithms to facilitate their practical convergence in convex optimization. However, the convergence properties of accelerated gradient algorithms under parameter restart remain obscure in nonconvex optimization. In this paper, we propose a novel accelerated proximal gradient algorithm with parameter restart (n…
▽ More
Various types of parameter restart schemes have been proposed for accelerated gradient algorithms to facilitate their practical convergence in convex optimization. However, the convergence properties of accelerated gradient algorithms under parameter restart remain obscure in nonconvex optimization. In this paper, we propose a novel accelerated proximal gradient algorithm with parameter restart (named APG-restart) for solving nonconvex and nonsmooth problems. Our APG-restart is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization. Numerical experiments demonstrate the effectiveness of our proposed algorithm.
△ Less
Submitted 27 April, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Multimodal Controller for Generative Models
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Class-conditional generative models are crucial tools for data generation from user-specified class labels. Existing approaches for class-conditional generative models require nontrivial modifications of backbone generative architectures to model conditional information fed into the model. This paper introduces a plug-and-play module named `multimodal controller' to generate multimodal data withou…
▽ More
Class-conditional generative models are crucial tools for data generation from user-specified class labels. Existing approaches for class-conditional generative models require nontrivial modifications of backbone generative architectures to model conditional information fed into the model. This paper introduces a plug-and-play module named `multimodal controller' to generate multimodal data without introducing additional learning parameters. In the absence of the controllers, our model reduces to non-conditional generative models. We test the efficacy of multimodal controllers on CIFAR10, COIL100, and Omniglot benchmark datasets. We demonstrate that multimodal controlled generative models (including VAE, PixelCNN, Glow, and GAN) can generate class-conditional images of significantly better quality when compared with conditional generative models. Moreover, we show that multimodal controlled models can also create novel modalities of images.
△ Less
Submitted 3 August, 2022; v1 submitted 6 February, 2020;
originally announced February 2020.
-
Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means
Authors:
Yuting Ng,
João M. Pereira,
Denis Garagic,
Vahid Tarokh
Abstract:
Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. In this paper, we formulate marine buoy placement as a clustering problem, and propose dropout k-means and dropout k-median to improve placement robustness to buoy disruption.
We simu…
▽ More
Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. In this paper, we formulate marine buoy placement as a clustering problem, and propose dropout k-means and dropout k-median to improve placement robustness to buoy disruption.
We simulated the passage of ships in the Gabonese waters near West Africa using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median. With 5 buoys, the buoy arrangement computed by classic k-means, dropout k-means, classic k-median and dropout k-median have ship detection probabilities of 38%, 45%, 48% and 52%.
△ Less
Submitted 20 February, 2020; v1 submitted 2 January, 2020;
originally announced January 2020.
-
Distributed Online Convex Optimization with Improved Dynamic Regret
Authors:
Yan Zhang,
Robert J. Ravier,
Vahid Tarokh,
Michael M. Zavlanos
Abstract:
In this paper, we consider the problem of distributed online convex optimization, where a group of agents collaborate to track the global minimizers of a sum of time-varying objective functions in an online manner. Specifically, we propose a novel distributed online gradient descent algorithm that relies on an online adaptation of the gradient tracking technique used in static optimization. We sho…
▽ More
In this paper, we consider the problem of distributed online convex optimization, where a group of agents collaborate to track the global minimizers of a sum of time-varying objective functions in an online manner. Specifically, we propose a novel distributed online gradient descent algorithm that relies on an online adaptation of the gradient tracking technique used in static optimization. We show that the dynamic regret bound of this algorithm has no explicit dependence on the time horizon and, therefore, can be tighter than existing bounds especially for problems with long horizons. Our bound depends on a new regularity measure that quantifies the total change in the gradients at the optimal points at each time instant. Furthermore, when the optimizer is approximatly subject to linear dynamics, we show that the dynamic regret bound can be further tightened by replacing the regularity measure that captures the path length of the optimizer with the accumulated prediction errors, which can be much lower in this special case. We present numerical experiments to corroborate our theoretical results.
△ Less
Submitted 13 October, 2020; v1 submitted 12 November, 2019;
originally announced November 2019.
-
A Distributed Online Convex Optimization Algorithm with Improved Dynamic Regret
Authors:
Yan Zhang,
Robert J. Ravier,
Michael M. Zavlanos,
Vahid Tarokh
Abstract:
In this paper, we consider the problem of distributed online convex optimization, where a network of local agents aim to jointly optimize a convex function over a period of multiple time steps. The agents do not have any information about the future. Existing algorithms have established dynamic regret bounds that have explicit dependence on the number of time steps. In this work, we show that we c…
▽ More
In this paper, we consider the problem of distributed online convex optimization, where a network of local agents aim to jointly optimize a convex function over a period of multiple time steps. The agents do not have any information about the future. Existing algorithms have established dynamic regret bounds that have explicit dependence on the number of time steps. In this work, we show that we can remove this dependence assuming that the local objective functions are strongly convex. More precisely, we propose a gradient tracking algorithm where agents jointly communicate and descend based on corrected gradient steps. We verify our theoretical results through numerical experiments.
△ Less
Submitted 12 November, 2019;
originally announced November 2019.
-
Cross-subject Decoding of Eye Movement Goals from Local Field Potentials
Authors:
Marko Angjelichinoski,
John Choi,
Taposh Banerjee,
Bijan Pesaran,
Vahid Tarokh
Abstract:
Objective. We consider the cross-subject decoding problem from local field potential (LFP) signals, where training data collected from the prefrontal cortex (PFC) of a source subject is used to decode intended motor actions in a destination subject. Approach. We propose a novel supervised transfer learning technique, referred to as data centering, which is used to adapt the feature space of the so…
▽ More
Objective. We consider the cross-subject decoding problem from local field potential (LFP) signals, where training data collected from the prefrontal cortex (PFC) of a source subject is used to decode intended motor actions in a destination subject. Approach. We propose a novel supervised transfer learning technique, referred to as data centering, which is used to adapt the feature space of the source to the feature space of the destination. The key ingredients of data centering are the transfer functions used to model the deterministic component of the relationship between the source and destination feature spaces. We propose an efficient data-driven estimation approach for linear transfer functions that uses the first and second order moments of the class-conditional distributions. Main result. We apply our data centering technique with linear transfer functions for cross-subject decoding of eye movement intentions in an experiment where two macaque monkeys perform memory-guided visual saccades to one of eight target locations. The results show peak cross-subject decoding performance of $80\%$, which marks a substantial improvement over random choice decoder. In addition to this, data centering also outperforms standard sampling-based methods in setups with imbalanced training data. Significance. The analyses presented herein demonstrate that the proposed data centering is a viable novel technique for reliable LFP-based cross-subject brain-computer interfacing and neural prostheses.
△ Less
Submitted 6 January, 2020; v1 submitted 8 November, 2019;
originally announced November 2019.
-
Supervised Encoding for Discrete Representation Learning
Authors:
Cat P. Le,
Yi Zhou,
Jie Ding,
Vahid Tarokh
Abstract:
Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Su…
▽ More
Classical supervised classification tasks search for a nonlinear mapping that maps each encoded feature directly to a probability mass over the labels. Such a learning framework typically lacks the intuition that encoded features from the same class tend to be similar and thus has little interpretability for the learned features. In this paper, we propose a novel supervised learning model named Supervised-Encoding Quantizer (SEQ). The SEQ applies a quantizer to cluster and classify the encoded features. We found that the quantizer provides an interpretable graph where each cluster in the graph represents a class of data samples that have a particular style. We also trained a decoder that can decode convex combinations of the encoded features from similar and different clusters and provide guidance on style transfer between sub-classes.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Deep Clustering of Compressed Variational Embeddings
Authors:
Suya Wu,
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models…
▽ More
Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models (BMMs). Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. In this way, the data vendor benefits from data security and communication bandwidth, while the data consumer benefits from low computational complexity. To enable training using the gradient descent algorithm, we propose to use the Gumbel-Softmax distribution to resolve the infeasibility of the back-propagation algorithm when assessing categorical samples.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Learning Partial Differential Equations from Data Using Neural Networks
Authors:
Ali Hasan,
João M. Pereira,
Robert Ravier,
Sina Farsiu,
Vahid Tarokh
Abstract:
We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictio…
▽ More
We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictionary functions, and generalizes previous methods that only consider parabolic PDEs. We introduce a regularization scheme that prevents the function approximation from overfitting the data and forces it to be a solution of the underlying PDE. We validate the model on simulated data generated by the known PDEs and added Gaussian noise, and we study our method under different levels of noise. We also compare the error of our method with a Cramer-Rao lower bound for an ordinary differential equation. Our results indicate that our method outperforms other methods in estimating PDEs, especially in the low signal-to-noise regime.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Perception-Distortion Trade-off with Restricted Boltzmann Machines
Authors:
Chris Cannella,
Jie Ding,
Mohammadreza Soltani,
Vahid Tarokh
Abstract:
In this work, we introduce a new procedure for applying Restricted Boltzmann Machines (RBMs) to missing data inference tasks, based on linearization of the effective energy function governing the distribution of observations. We compare the performance of our proposed procedure with those obtained using existing reconstruction procedures trained on incomplete data. We place these performance compa…
▽ More
In this work, we introduce a new procedure for applying Restricted Boltzmann Machines (RBMs) to missing data inference tasks, based on linearization of the effective energy function governing the distribution of observations. We compare the performance of our proposed procedure with those obtained using existing reconstruction procedures trained on incomplete data. We place these performance comparisons within the context of the perception-distortion trade-off observed in other data reconstruction tasks, which has, until now, remained unexplored in tasks relying on incomplete training data.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
Speech Emotion Recognition with Dual-Sequence LSTM Architecture
Authors:
Jianyou Wang,
Michael Xue,
Ryan Culhane,
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Speech Emotion Recognition (SER) has emerged as a critical component of the next generation human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions.…
▽ More
Speech Emotion Recognition (SER) has emerged as a critical component of the next generation human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3%---a 6% improvement over current state-of-the-art unimodal models---and is comparable with multimodal models that leverage textual information as well as audio signals.
△ Less
Submitted 12 February, 2020; v1 submitted 19 October, 2019;
originally announced October 2019.
-
Restricted Recurrent Neural Networks
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while ma…
▽ More
Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50\% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.
△ Less
Submitted 14 November, 2019; v1 submitted 21 August, 2019;
originally announced August 2019.
-
DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression
Authors:
Enmao Diao,
Jie Ding,
Vahid Tarokh
Abstract:
We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one jo…
▽ More
We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning.
△ Less
Submitted 27 December, 2019; v1 submitted 23 March, 2019;
originally announced March 2019.
-
Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes
Authors:
Chin Hei Chan,
Vahid Tarokh,
Maosheng Xiong
Abstract:
It is known that the empirical spectral distribution of random matrices obtained from linear codes of increasing length converges to the well-known Marchenko-Pastur law, if the Hamming distance of the dual codes is at least 5. In this paper, we prove that the convergence in probability is at least of the order $n^{-1/4}$ where $n$ is the length of the code.
It is known that the empirical spectral distribution of random matrices obtained from linear codes of increasing length converges to the well-known Marchenko-Pastur law, if the Hamming distance of the dual codes is at least 5. In this paper, we prove that the convergence in probability is at least of the order $n^{-1/4}$ where $n$ is the length of the code.
△ Less
Submitted 12 May, 2020; v1 submitted 22 February, 2019;
originally announced February 2019.
-
Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization
Authors:
Yi Zhou,
Zhe Wang,
Kaiyi Ji,
Yingbin Liang,
Vahid Tarokh
Abstract:
Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization. However, the theoretical advantage of SPIDER does not lead to substantial improvement of practical performance over SVRG. To address this issue, momentum technique can be a good candidate to imp…
▽ More
Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization. However, the theoretical advantage of SPIDER does not lead to substantial improvement of practical performance over SVRG. To address this issue, momentum technique can be a good candidate to improve the performance of SPIDER. However, existing momentum schemes used in variance-reduced algorithms are designed specifically for convex optimization, and are not applicable to nonconvex scenarios. In this paper, we develop novel momentum schemes with flexible coefficient settings to accelerate SPIDER for nonconvex and nonsmooth composite optimization, and show that the resulting algorithms achieve the near-optimal gradient oracle complexity for achieving a generalized first-order stationary condition. Furthermore, we generalize our algorithm to online nonconvex and nonsmooth optimization, and establish an oracle complexity result that matches the state-of-the-art. Our extensive experiments demonstrate the superior performance of our proposed algorithm over other stochastic variance-reduced algorithms.
△ Less
Submitted 15 May, 2019; v1 submitted 7 February, 2019;
originally announced February 2019.
-
Prediction in Online Convex Optimization for Parametrizable Objective Functions
Authors:
Robert Ravier,
Vahid Tarokh
Abstract:
Many techniques for online optimization problems involve making decisions based solely on presently available information: fewer works take advantage of potential predictions. In this paper, we discuss the problem of online convex optimization for parametrizable objectives, i.e. optimization problems that depend solely on the value of a parameter at a given time. We introduce a new regularity for…
▽ More
Many techniques for online optimization problems involve making decisions based solely on presently available information: fewer works take advantage of potential predictions. In this paper, we discuss the problem of online convex optimization for parametrizable objectives, i.e. optimization problems that depend solely on the value of a parameter at a given time. We introduce a new regularity for dynamic regret based on the accuracy of predicted values of the parameters and show that, under mild assumptions, accurate prediction can yield tighter bounds on dynamic regret. Inspired by recent advances on learning how to optimize, we also propose a novel algorithm to simultaneously predict and optimize for parametrizable objectives and study its performance using simulated and real data.
△ Less
Submitted 31 January, 2019; v1 submitted 31 January, 2019;
originally announced January 2019.
-
Minimax-optimal decoding of movement goals from local field potentials using complex spectral features
Authors:
Marko Angjelichinoski,
Taposh Banerjee,
John Choi,
Bijan Pesaran,
Vahid Tarokh
Abstract:
We consider the problem of predicting eye movement goals from local field potentials (LFP) recorded through a multielectrode array in the macaque prefrontal cortex. The monkey is tasked with performing memory-guided saccades to one of eight targets during which LFP activity is recorded and used to train a decoder. Previous reports have mainly relied on the spectral amplitude of the LFPs as a featu…
▽ More
We consider the problem of predicting eye movement goals from local field potentials (LFP) recorded through a multielectrode array in the macaque prefrontal cortex. The monkey is tasked with performing memory-guided saccades to one of eight targets during which LFP activity is recorded and used to train a decoder. Previous reports have mainly relied on the spectral amplitude of the LFPs as a feature in the decoding step to limited success, while neglecting the phase without proper theoretical justification. This paper formulates the problem of decoding eye movement intentions in a statistically optimal framework and uses Gaussian sequence modeling and Pinsker's theorem to generate minimax-optimal estimates of the LFP signals which are later used as features in the decoding step. The approach is shown to act as a low-pass filter and each LFP in the feature space is represented via its complex Fourier coefficients after appropriate shrinking such that higher frequency components are attenuated; this way, the phase information inherently present in the LFP signal is naturally embedded into the feature space. The proposed complex spectrum-based decoder achieves prediction accuracy of up to $94\%$ at superficial electrode depths near the surface of the prefrontal cortex, which marks a significant performance improvement over conventional power spectrum-based decoders.
△ Less
Submitted 29 January, 2019;
originally announced January 2019.
-
SGD Converges to Global Minimum in Deep Learning via Star-convex Path
Authors:
Yi Zhou,
Junjie Yang,
Huishuai Zhang,
Yingbin Liang,
Vahid Tarokh
Abstract:
Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural ne…
▽ More
Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.
△ Less
Submitted 2 January, 2019;
originally announced January 2019.
-
SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms
Authors:
Zhe Wang,
Kaiyi Ji,
Yi Zhou,
Yingbin Liang,
Vahid Tarokh
Abstract:
SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose Sp…
▽ More
SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization. However, SPIDER uses an accuracy-dependent stepsize that slows down the convergence in practice, and cannot handle objective functions that involve nonsmooth regularizers. In this paper, we propose SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee. In particular, we show that proximal SpiderBoost achieves an oracle complexity of $\mathcal{O}(\min\{n^{1/2}ε^{-2},ε^{-3}\})$ in composite nonconvex optimization, improving the state-of-the-art result by a factor of $\mathcal{O}(\min\{n^{1/6},ε^{-1/3}\})$. We further develop a novel momentum scheme to accelerate SpiderBoost for composite optimization, which achieves the near-optimal oracle complexity in theory and substantial improvement in experiments.
△ Less
Submitted 15 May, 2020; v1 submitted 24 October, 2018;
originally announced October 2018.
-
Model Selection Techniques -- An Overview
Authors:
Jie Ding,
Vahid Tarokh,
Yuhong Yang
Abstract:
In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliabl…
▽ More
In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
Authors:
Shahin Shahrampour,
Vahid Tarokh
Abstract:
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing su…
▽ More
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
Asymptotically Pseudo-Independent Matrices
Authors:
Ilya Soloveychik,
Vahid Tarokh
Abstract:
We show that the family of pseudo-random matrices recently discovered by Soloveychik, Xiang, and Tarokh in their work `Symmetric Pseudo-Random Matrices' exhibits asymptotic independence. More specifically, any two sequences of matrices of matching sizes from that construction generated using sequences of different non-reciprocal primitive polynomials are asymptotically independent.
We show that the family of pseudo-random matrices recently discovered by Soloveychik, Xiang, and Tarokh in their work `Symmetric Pseudo-Random Matrices' exhibits asymptotic independence. More specifically, any two sequences of matrices of matching sizes from that construction generated using sequences of different non-reciprocal primitive polynomials are asymptotically independent.
△ Less
Submitted 29 October, 2018; v1 submitted 2 September, 2018;
originally announced September 2018.
-
Sequential Detection of Regime Changes in Neural Data
Authors:
Taposh Banerjee,
Stephen Allsop,
Kay M. Tye,
Demba Ba,
Vahid Tarokh
Abstract:
The problem of detecting changes in firing patterns in neural data is studied. The problem is formulated as a quickest change detection problem. Important algorithms from the literature are reviewed. A new algorithmic technique is discussed to detect deviations from learned baseline behavior. The algorithms studied can be applied to both spike and local field potential data. The algorithms are app…
▽ More
The problem of detecting changes in firing patterns in neural data is studied. The problem is formulated as a quickest change detection problem. Important algorithms from the literature are reviewed. A new algorithmic technique is discussed to detect deviations from learned baseline behavior. The algorithms studied can be applied to both spike and local field potential data. The algorithms are applied to mice spike data to verify the presence of behavioral learning.
△ Less
Submitted 2 September, 2018;
originally announced September 2018.
-
Cyclostationary Statistical Models and Algorithms for Anomaly Detection Using Multi-Modal Data
Authors:
Taposh Banerjee,
Gene Whipps,
Prudhvi Gurram,
Vahid Tarokh
Abstract:
A framework is proposed to detect anomalies in multi-modal data. A deep neural network-based object detector is employed to extract counts of objects and sub-events from the data. A cyclostationary model is proposed to model regular patterns of behavior in the count sequences. The anomaly detection problem is formulated as a problem of detecting deviations from learned cyclostationary behavior. Se…
▽ More
A framework is proposed to detect anomalies in multi-modal data. A deep neural network-based object detector is employed to extract counts of objects and sub-events from the data. A cyclostationary model is proposed to model regular patterns of behavior in the count sequences. The anomaly detection problem is formulated as a problem of detecting deviations from learned cyclostationary behavior. Sequential algorithms are proposed to detect anomalies using the proposed model. The proposed algorithms are shown to be asymptotically efficient in a well-defined sense. The developed algorithms are applied to a multi-modal data consisting of CCTV imagery and social media posts to detect a 5K run in New York City.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
Stationary Geometric Graphical Model Selection
Authors:
Ilya Soloveychik,
Vahid Tarokh
Abstract:
We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. In many practically important cases, the underlying networks are embedded into Euclidean spaces. Using the natural geometric structure, we introduce the notion of spatially stationary distributions over geometric graphs. This directly generalizes the notion of stationary time series to the multid…
▽ More
We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. In many practically important cases, the underlying networks are embedded into Euclidean spaces. Using the natural geometric structure, we introduce the notion of spatially stationary distributions over geometric graphs. This directly generalizes the notion of stationary time series to the multidimensional setting lacking time axis. We show that the idea of spatial stationarity leads to a dramatic decrease in the sample complexity of the model selection compared to abstract graphs with the same level of sparsity. For geometric graphs on randomly spread vertices and edges of bounded length, we develop tight information-theoretic bounds on sample complexity and show that a finite number of independent samples is sufficient for a consistent recovery. Finally, we develop an efficient technique capable of reliably and consistently reconstructing graphs with a bounded number of measurements.
△ Less
Submitted 29 October, 2018; v1 submitted 9 June, 2018;
originally announced June 2018.
-
Sequential Event Detection Using Multimodal Data in Nonstationary Environments
Authors:
Taposh Banerjee,
Gene Whipps,
Prudhvi Gurram,
Vahid Tarokh
Abstract:
The problem of sequential detection of anomalies in multimodal data is considered. The objective is to observe physical sensor data from CCTV cameras, and social media data from Twitter and Instagram to detect anomalous behaviors or events. Data from each modality is transformed to discrete time count data by using an artificial neural network to obtain counts of objects in CCTV images and by coun…
▽ More
The problem of sequential detection of anomalies in multimodal data is considered. The objective is to observe physical sensor data from CCTV cameras, and social media data from Twitter and Instagram to detect anomalous behaviors or events. Data from each modality is transformed to discrete time count data by using an artificial neural network to obtain counts of objects in CCTV images and by counting the number of tweets or Instagram posts in a geographical area. The anomaly detection problem is then formulated as a problem of quickest detection of changes in count statistics. The quickest detection problem is then solved using the framework of partially observable Markov decision processes (POMDP), and structural results on the optimal policy are obtained. The resulting optimal policy is then applied to real multimodal data collected from New York City around a 5K race to detect the race. The count data both before and after the change is found to be nonstationary in nature. The proposed mathematical approach to this problem provides a framework for event detection in such nonstationary environments and across multiple data modalities.
△ Less
Submitted 23 March, 2018;
originally announced March 2018.
-
Estimation of the Evolutionary Spectra with Application to Stationarity Test
Authors:
Yu Xiang,
Jie Ding,
Vahid Tarokh
Abstract:
In this work, we propose a new inference procedure for understanding non-stationary processes, under the framework of evolutionary spectra developed by Priestley. Among various frameworks of modeling non-stationary processes, the distinguishing feature of the evolutionary spectra is its focus on the physical meaning of frequency. The classical estimate of the evolutionary spectral density is based…
▽ More
In this work, we propose a new inference procedure for understanding non-stationary processes, under the framework of evolutionary spectra developed by Priestley. Among various frameworks of modeling non-stationary processes, the distinguishing feature of the evolutionary spectra is its focus on the physical meaning of frequency. The classical estimate of the evolutionary spectral density is based on a double-window technique consisting of a short-time Fourier transform and a smoothing. However, smoothing is known to suffer from the so-called bias leakage problem. By incorporating Thomson's multitaper method that was originally designed for stationary processes, we propose an improved estimate of the evolutionary spectral density, and analyze its bias/variance/resolution tradeoff. As an application of the new estimate, we further propose a non-parametric rank-based stationarity test, and provide various experimental studies.
△ Less
Submitted 17 January, 2019; v1 submitted 25 February, 2018;
originally announced February 2018.
-
Large Deviations of Convex Polyominoes
Authors:
Ilya Soloveychik,
Vahid Tarokh
Abstract:
Enumeration of various types of lattice polygons and in particular polyominoes is of primary importance in many machine learning, pattern recognition, and geometric analysis problems. In this work, we develop a large deviation principle for convex polyominoes under different restrictions, such as fixed area and/or perimeter.
Enumeration of various types of lattice polygons and in particular polyominoes is of primary importance in many machine learning, pattern recognition, and geometric analysis problems. In this work, we develop a large deviation principle for convex polyominoes under different restrictions, such as fixed area and/or perimeter.
△ Less
Submitted 18 April, 2018; v1 submitted 11 February, 2018;
originally announced February 2018.
-
Region Detection in Markov Random Fields: Gaussian Case
Authors:
Ilya Soloveychik,
Vahid Tarokh
Abstract:
We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. The benchmark information-theoretic results in the case of d-regular graphs require the number of samples to be at least proportional to the logarithm of the number of vertices to allow consistent graph recovery. When the number of samples is less than this amount, reliable detection of all edges…
▽ More
We consider the problem of model selection in Gaussian Markov fields in the sample deficient scenario. The benchmark information-theoretic results in the case of d-regular graphs require the number of samples to be at least proportional to the logarithm of the number of vertices to allow consistent graph recovery. When the number of samples is less than this amount, reliable detection of all edges is impossible. In many applications, it is more important to learn the distribution of the edge (coupling) parameters over the network than the specific locations of the edges. Assuming that the entire graph can be partitioned into a number of spatial regions with similar edge parameters and reasonably regular boundaries, we develop new information-theoretic sample complexity bounds and show that a bounded number of samples can be sufficient to consistently recover these regions. Finally, we introduce and analyze an efficient region growing algorithm capable of recovering the regions with high accuracy. We show that it is consistent and demonstrate its performance benefits in synthetic simulations.
△ Less
Submitted 28 March, 2018; v1 submitted 11 February, 2018;
originally announced February 2018.
-
On Data-Dependent Random Features for Improved Generalization in Supervised Learning
Authors:
Shahin Shahrampour,
Ahmad Beirami,
Vahid Tarokh
Abstract:
The randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random…
▽ More
The randomized-feature approach has been successfully employed in large-scale kernel approximation and supervised learning. The distribution from which the random features are drawn impacts the number of features required to efficiently perform a learning task. Recently, it has been shown that employing data-dependent randomization improves the performance in terms of the required number of random features. In this paper, we are concerned with the randomized-feature approach in supervised learning for good generalizability. We propose the Energy-based Exploration of Random Features (EERF) algorithm based on a data-dependent score function that explores the set of possible features and exploits the promising regions. We prove that the proposed score function with high probability recovers the spectrum of the best fit within the model class. Our empirical results on several benchmark datasets further verify that our method requires smaller number of random features to achieve a certain generalization error compared to the state-of-the-art while introducing negligible pre-processing overhead. EERF can be implemented in a few lines of code and requires no additional tuning parameters.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.