-
Automating Control of Overestimation Bias for Reinforcement Learning
Authors:
Arsenii Kuznetsov,
Alexander Grishin,
Artem Tsypin,
Arsenii Ashukha,
Artur Kadurin,
Dmitry Vetrov
Abstract:
Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias contr…
▽ More
Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias control hyperparameters. We demonstrate its effectiveness on three algorithms: Truncated Quantile Critics, Weighted Delayed DDPG, and Maxmin Q-learning. The proposed technique eliminates the need for an extensive hyperparameter search. We show that it leads to a significant reduction of the actual number of interactions while preserving the performance.
△ Less
Submitted 28 January, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Resolution-robust Large Mask Inpainting with Fourier Convolutions
Authors:
Roman Suvorov,
Elizaveta Logacheva,
Anton Mashikhin,
Anastasia Remizova,
Arsenii Ashukha,
Aleksei Silvestrov,
Naejin Kong,
Harshith Goka,
Kiwoong Park,
Victor Lempitsky
Abstract:
Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa…
▽ More
Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions (FFCs), which have the image-wide receptive field; ii) a high receptive field perceptual loss; iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&time costs than the competitive baselines. The code is available at \url{https://github.com/saic-mdal/lama}.
△ Less
Submitted 10 November, 2021; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Mean Embeddings with Test-Time Data Augmentation for Ensembling of Representations
Authors:
Arsenii Ashukha,
Andrei Atanov,
Dmitry Vetrov
Abstract:
Averaging predictions over a set of models -- an ensemble -- is widely used to improve predictive performance and uncertainty estimation of deep learning models. At the same time, many machine learning systems, such as search, matching, and recommendation systems, heavily rely on embeddings. Unfortunately, due to misalignment of features of independently trained models, embeddings, cannot be impro…
▽ More
Averaging predictions over a set of models -- an ensemble -- is widely used to improve predictive performance and uncertainty estimation of deep learning models. At the same time, many machine learning systems, such as search, matching, and recommendation systems, heavily rely on embeddings. Unfortunately, due to misalignment of features of independently trained models, embeddings, cannot be improved with a naive deep ensemble like approach. In this work, we look at the ensembling of representations and propose mean embeddings with test-time augmentation (MeTTA) simple yet well-performing recipe for ensembling representations. Empirically we demonstrate that MeTTA significantly boosts the quality of linear evaluation on ImageNet for both supervised and self-supervised models. Even more exciting, we draw connections between MeTTA, image retrieval, and transformation invariant models. We believe that spreading the success of ensembles to inference higher-quality representations is the important step that will open many new applications of ensembling.
△ Less
Submitted 14 July, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation
Authors:
Dmitry Molchanov,
Alexander Lyzhov,
Yuliya Molchanova,
Arsenii Ashukha,
Dmitry Vetrov
Abstract:
Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance. While many advanced learnable data augmentation techniques have emerged in recent years, they are focused on the training phase. Such techniques are not necessarily optimal for test-time augmentation and…
▽ More
Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance. While many advanced learnable data augmentation techniques have emerged in recent years, they are focused on the training phase. Such techniques are not necessarily optimal for test-time augmentation and can be outperformed by a policy consisting of simple crops and flips. The primary goal of this paper is to demonstrate that test-time augmentation policies can be successfully learned too. We introduce greedy policy search (GPS), a simple but high-performing method for learning a policy of test-time augmentation. We demonstrate that augmentation policies learned with GPS achieve superior predictive performance on image classification problems, provide better in-domain uncertainty estimation, and improve the robustness to domain shift.
△ Less
Submitted 20 June, 2020; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
Authors:
Arsenii Ashukha,
Alexander Lyzhov,
Dmitry Molchanov,
Dmitry Vetrov
Abstract:
Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point ou…
▽ More
Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point out pitfalls of existing metrics. Avoiding these pitfalls, we perform a broad study of different ensembling techniques. To provide more insight in this study, we introduce the deep ensemble equivalent score (DEE) and show that many sophisticated ensembling techniques are equivalent to an ensemble of only few independently trained networks in terms of test performance.
△ Less
Submitted 18 July, 2021; v1 submitted 15 February, 2020;
originally announced February 2020.
-
Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Authors:
Andrei Atanov,
Alexandra Volokhova,
Arsenii Ashukha,
Ivan Sosnovik,
Dmitry Vetrov
Abstract:
This paper proposes a semi-conditional normalizing flow model for semi-supervised learning. The model uses both labelled and unlabeled data to learn an explicit model of joint distribution over objects and labels. Semi-conditional architecture of the model allows us to efficiently compute a value and gradients of the marginal likelihood for unlabeled objects. The conditional part of the model is b…
▽ More
This paper proposes a semi-conditional normalizing flow model for semi-supervised learning. The model uses both labelled and unlabeled data to learn an explicit model of joint distribution over objects and labels. Semi-conditional architecture of the model allows us to efficiently compute a value and gradients of the marginal likelihood for unlabeled objects. The conditional part of the model is based on a proposed conditional coupling layer. We demonstrate performance of the model for semi-supervised classification problem on different datasets. The model outperforms the baseline approach based on variational auto-encoders on MNIST dataset.
△ Less
Submitted 22 June, 2020; v1 submitted 1 May, 2019;
originally announced May 2019.
-
The Deep Weight Prior
Authors:
Andrei Atanov,
Arsenii Ashukha,
Kirill Struminsky,
Dmitry Vetrov,
Max Welling
Abstract:
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional f…
▽ More
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.
△ Less
Submitted 18 February, 2019; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Bayesian Incremental Learning for Deep Neural Networks
Authors:
Max Kochurov,
Timur Garipov,
Dmitry Podoprikhin,
Dmitry Molchanov,
Arsenii Ashukha,
Dmitry Vetrov
Abstract:
In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new data to improve performance. However, deep neural networks are prone to getting stuck in a suboptimal solution when trained on only new data as compar…
▽ More
In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new data to improve performance. However, deep neural networks are prone to getting stuck in a suboptimal solution when trained on only new data as compared to the full dataset. Our work focuses on a continuous learning setup where the task is always the same and new parts of data arrive sequentially. We apply a Bayesian approach to update the posterior approximation with each new piece of data and find this method to outperform the traditional approach in our experiments.
△ Less
Submitted 27 March, 2018; v1 submitted 20 February, 2018;
originally announced February 2018.
-
Uncertainty Estimation via Stochastic Batch Normalization
Authors:
Andrei Atanov,
Arsenii Ashukha,
Dmitry Molchanov,
Kirill Neklyudov,
Dmitry Vetrov
Abstract:
In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally ineff…
▽ More
In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization -- an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.
△ Less
Submitted 20 March, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Variational Dropout Sparsifies Deep Neural Networks
Authors:
Dmitry Molchanov,
Arsenii Ashukha,
Dmitry Vetrov
Abstract:
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse soluti…
▽ More
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.
△ Less
Submitted 13 June, 2017; v1 submitted 19 January, 2017;
originally announced January 2017.