-
Detection Selection Algorithm: A Likelihood based Optimization Method to Perform Post Processing for Object Detection
Authors:
Angzhi Fan,
Benjamin Ticknor,
Yali Amit
Abstract:
In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used af…
▽ More
In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.
△ Less
Submitted 3 April, 2023; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Biologically Plausible Training Mechanisms for Self-Supervised Learning in Deep Networks
Authors:
Mufeng Tang,
Yibo Yang,
Yali Amit
Abstract:
We develop biologically plausible training mechanisms for self-supervised learning (SSL) in deep networks. Specifically, by biological plausible training we mean (i) All updates of weights are based on current activities of pre-synaptic units and current, or activity retrieved from short term memory of post synaptic units, including at the top-most error computing layer, (ii) Complex computations…
▽ More
We develop biologically plausible training mechanisms for self-supervised learning (SSL) in deep networks. Specifically, by biological plausible training we mean (i) All updates of weights are based on current activities of pre-synaptic units and current, or activity retrieved from short term memory of post synaptic units, including at the top-most error computing layer, (ii) Complex computations such as normalization, inner products and division are avoided (iii) Asymmetric connections between units, (iv) Most learning is carried out in an unsupervised manner. SSL with a contrastive loss satisfies the third condition as it does not require labelled data and it introduces robustness to observed perturbations of objects, which occur naturally as objects or observer move in 3d and with variable lighting over time. We propose a contrastive hinge based loss whose error involves simple local computations satisfying (ii), as opposed to the standard contrastive losses employed in the literature, which do not lend themselves easily to implementation in a network architecture due to complex computations involving ratios and inner products. Furthermore we show that learning can be performed with one of two more plausible alternatives to backpropagation that satisfy conditions (i) and (ii). The first is difference target propagation (DTP) and the second is layer-wise learning (LL), where each layer is directly connected to a layer computing the loss error. Both methods represent alternatives to the symmetric weight issue of backpropagation. By training convolutional neural networks (CNNs) with SSL and DTP, LL, we find that our proposed framework achieves comparable performance to standard BP learning downstream linear classifier evaluation of the learned embeddings.
△ Less
Submitted 1 February, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
Do We Really Need to Learn Representations from In-domain Data for Outlier Detection?
Authors:
Zhisheng Xiao,
Qing Yan,
Yali Amit
Abstract:
Unsupervised outlier detection, which predicts if a test sample is an outlier or not using only the information from unlabelled inlier data, is an important but challenging task. Recently, methods based on the two-stage framework achieve state-of-the-art performance on this task. The framework leverages self-supervised representation learning algorithms to train a feature extractor on inlier data,…
▽ More
Unsupervised outlier detection, which predicts if a test sample is an outlier or not using only the information from unlabelled inlier data, is an important but challenging task. Recently, methods based on the two-stage framework achieve state-of-the-art performance on this task. The framework leverages self-supervised representation learning algorithms to train a feature extractor on inlier data, and applies a simple outlier detector in the feature space. In this paper, we explore the possibility of avoiding the high cost of training a distinct representation for each outlier detection task, and instead using a single pre-trained network as the universal feature extractor regardless of the source of in-domain data. In particular, we replace the task-specific feature extractor by one network pre-trained on ImageNet with a self-supervised loss. In experiments, we demonstrate competitive or better performance on a variety of outlier detection benchmarks compared with previous two-stage methods, suggesting that learning representations from in-domain data may be unnecessary for outlier detection.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
EBMs Trained with Maximum Likelihood are Generator Models Trained with a Self-adverserial Loss
Authors:
Zhisheng Xiao,
Qing Yan,
Yali Amit
Abstract:
Maximum likelihood estimation is widely used in training Energy-based models (EBMs). Training requires samples from an unnormalized distribution, which is usually intractable, and in practice, these are obtained by MCMC algorithms such as Langevin dynamics. However, since MCMC in high-dimensional space converges extremely slowly, the current understanding of maximum likelihood training, which assu…
▽ More
Maximum likelihood estimation is widely used in training Energy-based models (EBMs). Training requires samples from an unnormalized distribution, which is usually intractable, and in practice, these are obtained by MCMC algorithms such as Langevin dynamics. However, since MCMC in high-dimensional space converges extremely slowly, the current understanding of maximum likelihood training, which assumes approximate samples from the model can be drawn, is problematic. In this paper, we try to understand this training procedure by replacing Langevin dynamics with deterministic solutions of the associated gradient descent ODE. Doing so allows us to study the density induced by the dynamics (if the dynamics are invertible), and connect with GANs by treating the dynamics as generator models, the initial values as latent variables and the loss as optimizing a critic defined by the very same energy that determines the generator through its gradient. Hence the term - self-adversarial loss. We show that reintroducing the noise in the dynamics does not lead to a qualitative change in the behavior, and merely reduces the quality of the generator. We thus show that EBM training is effectively a self-adversarial procedure rather than maximum likelihood estimation.
△ Less
Submitted 21 April, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Exponential Tilting of Generative Models: Improving Sample Quality by Training and Sampling from Latent Energy
Authors:
Zhisheng Xiao,
Qing Yan,
Yali Amit
Abstract:
In this paper, we present a general method that can improve the sample quality of pre-trained likelihood based generative models. Our method constructs an energy function on the latent variable space that yields an energy function on samples produced by the pre-trained generative model. The energy based model is efficiently trained by maximizing the data likelihood, and after training, new samples…
▽ More
In this paper, we present a general method that can improve the sample quality of pre-trained likelihood based generative models. Our method constructs an energy function on the latent variable space that yields an energy function on samples produced by the pre-trained generative model. The energy based model is efficiently trained by maximizing the data likelihood, and after training, new samples in the latent space are generated from the energy based model and passed through the generator to producing samples in observation space. We show that using our proposed method, we can greatly improve the sample quality of popular likelihood based generative models, such as normalizing flows and VAEs, with very little computational overhead.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder
Authors:
Zhisheng Xiao,
Qing Yan,
Yali Amit
Abstract:
Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, some recent studies show that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD sam…
▽ More
Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, some recent studies show that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD samples, making the OOD detection rules based on likelihood threshold problematic. To address this issue, several OOD detection methods have been proposed for deep generative models. In this paper, we make the observation that many of these methods fail when applied to generative models based on Variational Auto-encoders (VAE). As an alternative, we propose Likelihood Regret, an efficient OOD score for VAEs. We benchmark our proposed method over existing approaches, and empirical results suggest that our method obtains the best overall OOD detection performances when applied to VAEs.
△ Less
Submitted 10 October, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
A Method to Model Conditional Distributions with Normalizing Flows
Authors:
Zhisheng Xiao,
Qing Yan,
Yali Amit
Abstract:
In this work, we investigate the use of normalizing flows to model conditional distributions. In particular, we use our proposed method to analyze inverse problems with invertible neural networks by maximizing the posterior likelihood. Our method uses only a single loss and is easy to train. This is an improvement on the previous method that solves similar inverse problems with invertible neural n…
▽ More
In this work, we investigate the use of normalizing flows to model conditional distributions. In particular, we use our proposed method to analyze inverse problems with invertible neural networks by maximizing the posterior likelihood. Our method uses only a single loss and is easy to train. This is an improvement on the previous method that solves similar inverse problems with invertible neural networks but which involves a combination of several loss terms with ad-hoc weighting. In addition, our method provides a natural framework to incorporate conditioning in normalizing flows, and therefore, we can train an invertible network to perform conditional generation. We analyze our method and perform a careful comparison with previous approaches. Simple experiments show the effectiveness of our method, and more comprehensive experimental evaluations are undergoing.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
Generative Latent Flow
Authors:
Zhisheng Xiao,
Qing Yan,
Yali Amit
Abstract:
In this work, we propose the Generative Latent Flow (GLF), an algorithm for generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to learn latent representations of the data, and a normalizing flow to map the distribution of the latent variables to that of simple i.i.d noise. In contrast to some other Auto-encoder based generative models, which use various regularizers that e…
▽ More
In this work, we propose the Generative Latent Flow (GLF), an algorithm for generative modeling of the data distribution. GLF uses an Auto-encoder (AE) to learn latent representations of the data, and a normalizing flow to map the distribution of the latent variables to that of simple i.i.d noise. In contrast to some other Auto-encoder based generative models, which use various regularizers that encourage the encoded latent distribution to match the prior distribution, our model explicitly constructs a mapping between these two distributions, leading to better density matching while avoiding over regularizing the latent variables. We compare our model with several related techniques, and show that it has many relative advantages including fast convergence, single stage training and minimal reconstruction trade-off. We also study the relationship between our model and its stochastic counterpart, and show that our model can be viewed as a vanishing noise limit of VAEs with flow prior. Quantitatively, under standardized evaluations, our method achieves state-of-the-art sample quality among AE based models on commonly used datasets, and is competitive with GANs' benchmarks.
△ Less
Submitted 22 September, 2019; v1 submitted 24 May, 2019;
originally announced May 2019.
-
Deep learning with asymmetric connections and Hebbian updates
Authors:
Yali Amit
Abstract:
We show that deep networks can be trained using Hebbian updates yielding similar performance to ordinary back-propagation on challenging image datasets. To overcome the unrealistic symmetry in connections between layers, implicit in back-propagation, the feedback weights are separate from the feedforward weights. The feedback weights are also updated with a local rule, the same as the feedforward…
▽ More
We show that deep networks can be trained using Hebbian updates yielding similar performance to ordinary back-propagation on challenging image datasets. To overcome the unrealistic symmetry in connections between layers, implicit in back-propagation, the feedback weights are separate from the feedforward weights. The feedback weights are also updated with a local rule, the same as the feedforward weights - a weight is updated solely based on the product of activity of the units it connects. With fixed feedback weights as proposed in Lillicrap et. al (2016) performance degrades quickly as the depth of the network increases. If the feedforward and feedback weights are initialized with the same values, as proposed in Zipser and Rumelhart (1990), they remain the same throughout training thus precisely implementing back-propagation. We show that even when the weights are initialized differently and at random, and the algorithm is no longer performing back-propagation, performance is comparable on challenging datasets. We also propose a cost function whose derivative can be represented as a local Hebbian update on the last layer. Convolutional layers are updated with tied weights across space, which is not biologically plausible. We show that similar performance is achieved with untied layers, also known as locally connected layers, corresponding to the connectivity implied by the convolutional layers, but where weights are untied and updated separately. In the linear case we show theoretically that the convergence of the error to zero is accelerated by the update of the feedback weights.
△ Less
Submitted 12 March, 2019; v1 submitted 19 November, 2018;
originally announced December 2018.
-
Deformable Classifiers
Authors:
Jiajun Shen,
Yali Amit
Abstract:
Geometric variations of objects, which do not modify the object class, pose a major challenge for object recognition. These variations could be rigid as well as non-rigid transformations. In this paper, we design a framework for training deformable classifiers, where latent transformation variables are introduced, and a transformation of the object image to a reference instantiation is computed in…
▽ More
Geometric variations of objects, which do not modify the object class, pose a major challenge for object recognition. These variations could be rigid as well as non-rigid transformations. In this paper, we design a framework for training deformable classifiers, where latent transformation variables are introduced, and a transformation of the object image to a reference instantiation is computed in terms of the classifier output, separately for each class. The classifier outputs for each class, after transformation, are compared to yield the final decision. As a by-product of the classification this yields a transformation of the input object to a reference pose, which can be used for downstream tasks such as the computation of object support. We apply a two-step training mechanism for our framework, which alternates between optimizing over the latent transformation variables and the classifier parameters to minimize the loss function. We show that multilayer perceptrons, also known as deep networks, are well suited for this approach and achieve state of the art results on the rotated MNIST and the Google Earth dataset, and produce competitive results on MNIST and CIFAR-10 when training on smaller subsets of training data.
△ Less
Submitted 18 December, 2017;
originally announced December 2017.
-
Dynamic Partition Models
Authors:
Marc Goessling,
Yali Amit
Abstract:
We present a new approach for learning compact and intuitive distributed representations with binary encoding. Rather than summing up expert votes as in products of experts, we employ for each variable the opinion of the most reliable expert. Data points are hence explained through a partitioning of the variables into expert supports. The partitions are dynamically adapted based on which experts a…
▽ More
We present a new approach for learning compact and intuitive distributed representations with binary encoding. Rather than summing up expert votes as in products of experts, we employ for each variable the opinion of the most reliable expert. Data points are hence explained through a partitioning of the variables into expert supports. The partitions are dynamically adapted based on which experts are active. During the learning phase we adopt a smoothed version of this model that uses separate mixtures for each data dimension. In our experiments we achieve accurate reconstructions of high-dimensional data points with at most a dozen experts.
△ Less
Submitted 15 February, 2017;
originally announced February 2017.
-
Mixtures of Sparse Autoregressive Networks
Authors:
Marc Goessling,
Yali Amit
Abstract:
We consider high-dimensional distribution estimation through autoregressive networks. By combining the concepts of sparsity, mixtures and parameter sharing we obtain a simple model which is fast to train and which achieves state-of-the-art or better results on several standard benchmark datasets. Specifically, we use an L1-penalty to regularize the conditional distributions and introduce a procedu…
▽ More
We consider high-dimensional distribution estimation through autoregressive networks. By combining the concepts of sparsity, mixtures and parameter sharing we obtain a simple model which is fast to train and which achieves state-of-the-art or better results on several standard benchmark datasets. Specifically, we use an L1-penalty to regularize the conditional distributions and introduce a procedure for automatic parameter sharing between mixture components. Moreover, we propose a simple distributed representation which permits exact likelihood evaluations since the latent variables are interleaved with the observable variables and can be easily integrated out. Our model achieves excellent generalization performance and scales well to extremely high dimensions.
△ Less
Submitted 26 April, 2016; v1 submitted 15 November, 2015;
originally announced November 2015.
-
Compact Compositional Models
Authors:
Marc Goessling,
Yali Amit
Abstract:
Learning compact and interpretable representations is a very natural task, which has not been solved satisfactorily even for simple binary datasets. In this paper, we review various ways of composing experts for binary data and argue that competitive forms of interaction are best suited to learn low-dimensional representations. We propose a new composition rule that discourages experts from focusi…
▽ More
Learning compact and interpretable representations is a very natural task, which has not been solved satisfactorily even for simple binary datasets. In this paper, we review various ways of composing experts for binary data and argue that competitive forms of interaction are best suited to learn low-dimensional representations. We propose a new composition rule that discourages experts from focusing on similar structures and that penalizes opposing votes strongly so that abstaining from voting becomes more attractive. We also introduce a novel sequential initialization procedure, which is based on a process of oversimplification and correction. Experiments show that with our approach very intuitive models can be learned.
△ Less
Submitted 29 October, 2016; v1 submitted 11 December, 2014;
originally announced December 2014.