-
A Scale Invariant Flatness Measure for Deep Network Minima
Authors:
Akshay Rangamani,
Nam H. Nguyen,
Abhishek Kumar,
Dzung Phan,
Sang H. Chin,
Trac D. Tran
Abstract:
It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be…
▽ More
It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using this manifold structure and an appropriate metric, we propose a Hessian-based measure for flatness that is invariant to rescaling. We use this new measure to confirm the proposition that Large-Batch SGD minima are indeed sharper than Small-Batch SGD minima.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Reducing Sampling Ratios Improves Bagging in Sparse Regression
Authors:
Luoluo Liu,
Sang Peter Chin,
Trac D. Tran
Abstract:
Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios…
▽ More
Bagging, a powerful ensemble method from machine learning, improves the performance of unstable predictors. Although the power of Bagging has been shown mostly in classification problems, we demonstrate the success of employing Bagging in sparse regression over the baseline method (L1 minimization). The framework employs the generalized version of the original Bagging with various bootstrap ratios. The performance limits associated with different choices of bootstrap sampling ratio L/m and number of estimates K is analyzed theoretically. Simulation shows that the proposed method yields state-of-the-art recovery performance, outperforming L1 minimization and Bolasso in the challenging case of low levels of measurements. A lower L/m ratio (60% - 90%) leads to better performance, especially with a small number of measurements. With the reduced sampling rate, SNR improves over the original Bagging by up to 24%. With a properly chosen sampling ratio, a reasonably small number of estimates K = 30 gives satisfying result, even though increasing K is discovered to always improve or at least maintain the performance.
△ Less
Submitted 1 May, 2019; v1 submitted 20 December, 2018;
originally announced December 2018.
-
Generative Adversarial Networks for Recovering Missing Spectral Information
Authors:
Dung N. Tran,
Trac D. Tran,
Lam Nguyen
Abstract:
Ultra-wideband (UWB) radar systems nowadays typical operate in the low frequency spectrum to achieve penetration capability. However, this spectrum is also shared by many others communication systems, which causes missing information in the frequency bands. To recover this missing spectral information, we propose a generative adversarial network, called SARGAN, that learns the relationship between…
▽ More
Ultra-wideband (UWB) radar systems nowadays typical operate in the low frequency spectrum to achieve penetration capability. However, this spectrum is also shared by many others communication systems, which causes missing information in the frequency bands. To recover this missing spectral information, we propose a generative adversarial network, called SARGAN, that learns the relationship between original and missing band signals by observing these training pairs in a clever way. Initial results shows that this approach is promising in tackling this challenging missing band problem.
△ Less
Submitted 12 December, 2018; v1 submitted 11 December, 2018;
originally announced December 2018.
-
JOBS: Joint-Sparse Optimization from Bootstrap Samples
Authors:
Luoluo Liu,
Sang Peter Chin,
Trac D. Tran
Abstract:
Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recover…
▽ More
Classical signal recovery based on $\ell_1$ minimization solves the least squares problem with all available measurements via sparsity-promoting regularization. In practice, it is often the case that not all measurements are available or required for recovery. Measurements might be corrupted/missing or they arrive sequentially in streaming fashion. In this paper, we propose a global sparse recovery strategy based on subsets of measurements, named JOBS, in which multiple measurements vectors are generated from the original pool of measurements via bootstrapping, and then a joint-sparse constraint is enforced to ensure support consistency among multiple predictors. The final estimate is obtained by averaging over the $K$ predictors. The performance limits associated with different choices of number of bootstrap samples $L$ and number of estimates $K$ is analyzed theoretically. Simulation results validate some of the theoretical analysis, and show that the proposed method yields state-of-the-art recovery performance, outperforming $\ell_1$ minimization and a few other existing bootstrap-based techniques in the challenging case of low levels of measurements and is preferable over other bagging-based methods in the streaming setting since it performs better with small $K$ and $L$ for data-sets with large sizes.
△ Less
Submitted 10 December, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Sparse Coding and Autoencoders
Authors:
Akshay Rangamani,
Anirbit Mukherjee,
Amitabh Basu,
Tejaswini Ganapathy,
Ashish Arora,
Sang Chin,
Trac D. Tran
Abstract:
In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y \in \mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of wheth…
▽ More
In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0 <p < 1$ while having access to observations $y \in \mathbb{R}^n$ where $y = A^*x^*$. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a $\mathbb{R}^n \rightarrow \mathbb{R}^n$ mapping with a single ReLU activation layer of size $h$.
Under very mild distributional assumptions on $x^*$, we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of $A^*$. This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that $A^*$ is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest.
△ Less
Submitted 20 October, 2017; v1 submitted 11 August, 2017;
originally announced August 2017.
-
Linear Disentangled Representation Learning for Facial Actions
Authors:
Xiang Xiang,
Trac D. Tran
Abstract:
Limited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic mult…
▽ More
Limited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic multichannel signals such as 2D face videos. The simple yet powerful model does not rely on huge training data and is natural for recognizing facial actions without explicitly disentangling the identity. Base on well-understood intuitive linear models such as Sparse Representation based Classification (SRC), previous attempts require a prepossessing of explicit decoupling which is practically inexact. Instead, we exploit the low-rank property across frames to subtract the underlying neutral faces which are modeled jointly with sparse representation on the action components with group sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot automatic method on raw face videos performs as competitive as SRC applied on manually prepared action components and performs even better than SRC in terms of true positive rate. We apply the model to the even more challenging task of facial action unit recognition, verified on the MPI Face Video Database (MPI-VDB) achieving a decent performance. All the programs and data have been made publicly available.
△ Less
Submitted 11 January, 2017;
originally announced January 2017.
-
Pose-Selective Max Pooling for Measuring Similarity
Authors:
Xiang Xiang,
Trac D. Tran
Abstract:
In this paper, we deal with two challenges for measuring the similarity of the subject identities in practical video-based face recognition - the variation of the head pose in uncontrolled environments and the computational expense of processing videos. Since the frame-wise feature mean is unable to characterize the pose diversity among frames, we define and preserve the overall pose diversity and…
▽ More
In this paper, we deal with two challenges for measuring the similarity of the subject identities in practical video-based face recognition - the variation of the head pose in uncontrolled environments and the computational expense of processing videos. Since the frame-wise feature mean is unable to characterize the pose diversity among frames, we define and preserve the overall pose diversity and closeness in a video. Then, identity will be the only source of variation across videos since the pose varies even within a single video. Instead of simply using all the frames, we select those faces whose pose point is closest to the centroid of the K-means cluster containing that pose point. Then, we represent a video as a bag of frame-wise deep face features while the number of features has been reduced from hundreds to K. Since the video representation can well represent the identity, now we measure the subject similarity between two videos as the max correlation among all possible pairs in the two bags of features. On the official 5,000 video-pairs of the YouTube Face dataset for face verification, our algorithm achieves a comparable performance with VGG-face that averages over deep features of all frames. Other vision tasks can also benefit from the generic idea of employing geometric cues to improve the descriptiveness of deep features.
△ Less
Submitted 13 November, 2016; v1 submitted 22 September, 2016;
originally announced September 2016.
-
ICR: Iterative Convex Refinement for Sparse Signal Recovery Using Spike and Slab Priors
Authors:
Hojjat S. Mousavi,
Vishal Monga,
Trac D. Tran
Abstract:
In this letter, we address sparse signal recovery using spike and slab priors. In particular, we focus on a Bayesian framework where sparsity is enforced on reconstruction coefficients via probabilistic priors. The optimization resulting from spike and slab prior maximization is known to be a hard non-convex problem, and existing solutions involve simplifying assumptions and/or relaxations. We pro…
▽ More
In this letter, we address sparse signal recovery using spike and slab priors. In particular, we focus on a Bayesian framework where sparsity is enforced on reconstruction coefficients via probabilistic priors. The optimization resulting from spike and slab prior maximization is known to be a hard non-convex problem, and existing solutions involve simplifying assumptions and/or relaxations. We propose an approach called Iterative Convex Refinement (ICR) that aims to solve the aforementioned optimization problem directly allowing for greater generality in the sparse structure. Essentially, ICR solves a sequence of convex optimization problems such that sequence of solutions converges to a sub-optimal solution of the original hard optimization problem. We propose two versions of our algorithm: a.) an unconstrained version, and b.) with a non-negativity constraint on sparse coefficients, which may be required in some real-world problems. Experimental validation is performed on both synthetic data and for a real-world image recovery problem, which illustrates merits of ICR over state of the art alternatives.
△ Less
Submitted 16 February, 2015;
originally announced February 2015.
-
Collaborative Multi-sensor Classification via Sparsity-based Representation
Authors:
Minh Dao,
Nam H. Nguyen,
Nasser M. Nasrabadi,
Trac D. Tran
Abstract:
In this paper, we propose a general collaborative sparse representation framework for multi-sensor classification, which takes into account the correlations as well as complementary information between heterogeneous sensors simultaneously while considering joint sparsity within each sensor's observations. We also robustify our models to deal with the presence of sparse noise and low-rank interfere…
▽ More
In this paper, we propose a general collaborative sparse representation framework for multi-sensor classification, which takes into account the correlations as well as complementary information between heterogeneous sensors simultaneously while considering joint sparsity within each sensor's observations. We also robustify our models to deal with the presence of sparse noise and low-rank interference signals. Specifically, we demonstrate that incorporating the noise or interference signal as a low-rank component in our models is essential in a multi-sensor classification problem when multiple co-located sources/sensors simultaneously record the same physical event. We further extend our frameworks to kernelized models which rely on sparsely representing a test sample in terms of all the training samples in a feature space induced by a kernel function. A fast and efficient algorithm based on alternative direction method is proposed where its convergence to an optimal solution is guaranteed. Extensive experiments are conducted on several real multi-sensor data sets and results are compared with the conventional classifiers to verify the effectiveness of the proposed methods.
△ Less
Submitted 16 June, 2016; v1 submitted 29 October, 2014;
originally announced October 2014.
-
Structured Priors for Sparse-Representation-Based Hyperspectral Image Classification
Authors:
Xiaoxia Sun,
Qing Qu,
Nasser M. Nasrabadi,
Trac D. Tran
Abstract:
Pixel-wise classification, where each pixel is assigned to a predefined class, is one of the most important procedures in hyperspectral image (HSI) analysis. By representing a test pixel as a linear combination of a small subset of labeled pixels, a sparse representation classifier (SRC) gives rather plausible results compared with that of traditional classifiers such as the support vector machine…
▽ More
Pixel-wise classification, where each pixel is assigned to a predefined class, is one of the most important procedures in hyperspectral image (HSI) analysis. By representing a test pixel as a linear combination of a small subset of labeled pixels, a sparse representation classifier (SRC) gives rather plausible results compared with that of traditional classifiers such as the support vector machine (SVM). Recently, by incorporating additional structured sparsity priors, the second generation SRCs have appeared in the literature and are reported to further improve the performance of HSI. These priors are based on exploiting the spatial dependencies between the neighboring pixels, the inherent structure of the dictionary, or both. In this paper, we review and compare several structured priors for sparse-representation-based HSI classification. We also propose a new structured prior called the low rank group prior, which can be considered as a modification of the low rank prior. Furthermore, we will investigate how different structured priors improve the result for the HSI classification.
△ Less
Submitted 15 January, 2014;
originally announced January 2014.