Skip to main content

Showing 1–50 of 79 results for author: Vetrov, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.10488  [pdf, other

    cs.LG cs.CV cs.HC

    Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion

    Authors: Evgeniia Vu, Andrei Boiarov, Dmitry Vetrov

    Abstract: Generating co-speech gestures in real time requires both temporal coherence and efficient sampling. We introduce Accelerated Rolling Diffusion, a novel framework for streaming gesture generation that extends rolling diffusion models with structured progressive noise scheduling, enabling seamless long-sequence motion synthesis while preserving realism and diversity. We further propose Rolling Diffu… ▽ More

    Submitted 4 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  2. arXiv:2502.02472  [pdf, other

    stat.ML cs.LG

    SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations

    Authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

    Abstract: The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation through approximate SDE solutions, which limit scalability. In this work, we propose SDE Matching, a new simulation-free method for training Latent SDEs. Inspired by… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  3. arXiv:2410.22113  [pdf, other

    cs.LG stat.ML

    Where Do Large Learning Rates Lead Us?

    Authors: Ildus Sadrtdinov, Maxim Kodryan, Eduard Pokonechny, Ekaterina Lobacheva, Dmitry Vetrov

    Abstract: It is generally accepted that starting neural networks training with large learning rates (LRs) improves generalization. Following a line of research devoted to understanding this effect, we conduct an empirical study in a controlled setting focusing on two questions: 1) how large an initial LR is required for obtaining optimal quality, and 2) what are the key differences between models trained wi… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: Published in NeurIPS 2024. First three authors contributed equally, last two authors share senior authorship

  4. arXiv:2409.01322  [pdf, other

    cs.CV

    Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

    Authors: Vadim Titov, Madina Khalmatova, Alexandra Ivanova, Dmitry Vetrov, Aibek Alanov

    Abstract: Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-speci… ▽ More

    Submitted 25 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted to ECCV 2024. The project page is available at https://macderru.github.io/Guide-and-Rescale

  5. arXiv:2406.14762  [pdf, other

    cs.CV cs.LG

    Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation

    Authors: Denis Rakitin, Ivan Shchekotov, Dmitry Vetrov

    Abstract: Diffusion distillation methods aim to compress the diffusion models into efficient one-step generators while trying to preserve quality. Among them, Distribution Matching Distillation (DMD) offers a suitable framework for training general-form one-step generators, applicable beyond unconditional generation. In this work, we introduce its modification, called Regularized Distribution Matching Disti… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.13655  [pdf, other

    cs.LG cs.AI

    Improving GFlowNets with Monte Carlo Tree Search

    Authors: Nikita Morozov, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov

    Abstract: Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insights, we propose to enhance planning capabilities of… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2024 SPIGM Workshop

  7. arXiv:2406.10601  [pdf, other

    cs.CV

    The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

    Authors: Denis Bobkov, Vadim Titov, Aibek Alanov, Dmitry Vetrov

    Abstract: The task of manipulating real image attributes through StyleGAN inversion has been extensively researched. This process involves searching latent variables from a well-trained StyleGAN generator that can synthesize a real image, modifying these latent variables, and then synthesizing an image with the desired edits. A balance must be struck between the quality of the reconstruction and the ability… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024

  8. arXiv:2404.12940  [pdf, other

    stat.ML cs.CV cs.LG

    Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling

    Authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

    Abstract: Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process' task in learning generative trajectories, and results in costly inference for diffusion models. To address these limitations, we introduce Neural Flow Diffusion Models (NFDM), a novel framework that… ▽ More

    Submitted 1 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  9. arXiv:2404.01094  [pdf, other

    cs.CV

    HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

    Authors: Maxim Nikolaev, Mikhail Kuznetsov, Dmitry Vetrov, Aibek Alanov

    Abstract: Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on. This task is challenging due to the need to adapt to various photo poses, the sensitivity of hairstyles, and the lack of objective metrics. The current state of the art hairstyle transfer methods use an optimization process for different parts of the approach, making t… ▽ More

    Submitted 25 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  10. arXiv:2403.03726  [pdf, other

    cs.LG cs.AI q-bio.BM

    Diffusion on language model encodings for protein sequence generation

    Authors: Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov, Fedor Nikolaev, Nikita Ivanisenko, Olga Kardymon, Dmitry Vetrov

    Abstract: Protein sequence design has seen significant advances through discrete diffusion and autoregressive approaches, yet the potential of continuous diffusion remains underexplored. Here, we present DiMA, a latent diffusion framework that operates on protein language model representations. Through systematic exploration of architectural choices and diffusion components, we develop a robust methodology… ▽ More

    Submitted 5 February, 2025; v1 submitted 6 March, 2024; originally announced March 2024.

  11. arXiv:2402.19097  [pdf, other

    cs.CL

    TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

    Authors: Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, Dmitry Vetrov

    Abstract: This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process.… ▽ More

    Submitted 24 February, 2025; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 15 pages, 13 figures

    ACM Class: I.2; I.7

  12. arXiv:2311.11303  [pdf, other

    cs.LG stat.ML

    Large Learning Rates Improve Generalization: But How Large Are We Talking About?

    Authors: Ekaterina Lobacheva, Eduard Pockonechnyy, Maxim Kodryan, Dmitry Vetrov

    Abstract: Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed.… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: Published in Mathematics of Modern Machine Learning Workshop at NeurIPS 2023. First two authors contributed equally

  13. arXiv:2311.06295  [pdf, other

    physics.chem-ph cs.LG

    Gradual Optimization Learning for Conformational Energy Minimization

    Authors: Artem Tsypin, Leonid Ugadiarov, Kuzma Khrabrov, Alexander Telepov, Egor Rumiantsev, Alexey Skrynnik, Aleksandr I. Panov, Dmitry Vetrov, Elena Tutubalina, Artur Kadurin

    Abstract: Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to acc… ▽ More

    Submitted 12 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Published as a conference paper at ICLR2024 (Poster)

  14. arXiv:2310.12934  [pdf, other

    cs.LG stat.ML

    Generative Flow Networks as Entropy-Regularized RL

    Authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov

    Abstract: The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We de… ▽ More

    Submitted 25 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024 (Oral)

  15. arXiv:2310.08337  [pdf, other

    cs.LG stat.ML

    Neural Diffusion Models

    Authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

    Abstract: Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true nega… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  16. arXiv:2306.00721  [pdf, other

    cs.SD cs.AI eess.AS

    UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

    Authors: Anastasiia Iashchenko, Pavel Andreev, Ivan Shchekotov, Nicholas Babaev, Dmitry Vetrov

    Abstract: This paper introduces UnDiff, a diffusion probabilistic model capable of solving various speech inverse tasks. Being once trained for speech waveform generation in an unconditional manner, it can be adapted to different tasks including degradation inversion, neural vocoding, and source separation. In this paper, we, first, tackle the challenging problem of unconditional waveform generation by comp… ▽ More

    Submitted 12 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  17. arXiv:2303.03374  [pdf, other

    cs.LG stat.ML

    To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

    Authors: Ildus Sadrtdinov, Dmitrii Pozdeev, Dmitry Vetrov, Ekaterina Lobacheva

    Abstract: Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train basin, and thus have limited diversity. In this work,… ▽ More

    Submitted 15 January, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Published in NeurIPS 2023. First two authors contributed equally

  18. arXiv:2302.10970  [pdf, other

    cs.CV

    Differentiable Rendering with Reparameterized Volume Sampling

    Authors: Nikita Morozov, Denis Rakitin, Oleg Desheulin, Dmitry Vetrov, Kirill Struminsky

    Abstract: In view synthesis, a neural radiance field approximates underlying density and radiance fields based on a sparse set of scene pictures. To generate a pixel of a novel view, it marches a ray through the pixel and computes a weighted sum of radiance emitted from a dense set of ray points. This rendering algorithm is fully differentiable and facilitates gradient-based optimization of the fields. Howe… ▽ More

    Submitted 1 March, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted at AISTATS 2024. Short version of this paper appeared in ICLR 2023 Neural Fields workshop

  19. arXiv:2302.05259  [pdf, other

    stat.ML cs.LG

    Star-Shaped Denoising Diffusion Probabilistic Models

    Authors: Andrey Okhotin, Dmitry Molchanov, Vladimir Arkhipkin, Grigory Bartosh, Viktor Ohanesian, Aibek Alanov, Dmitry Vetrov

    Abstract: Denoising Diffusion Probabilistic Models (DDPMs) provide the foundation for the recent breakthroughs in generative modeling. Their Markovian structure makes it difficult to define DDPMs with distributions other than Gaussian or discrete. In this paper, we introduce Star-Shaped DDPM (SS-DDPM). Its star-shaped diffusion process allows us to bypass the need to define the transition probabilities or c… ▽ More

    Submitted 28 October, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: Accepted at NeurIPS 2023

  20. arXiv:2212.10229  [pdf, other

    cs.CV cs.LG

    StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation

    Authors: Aibek Alanov, Vadim Titov, Maksim Nakhodnov, Dmitry Vetrov

    Abstract: Domain adaptation of GANs is a problem of fine-tuning GAN models pretrained on a large dataset (e.g. StyleGAN) to a specific domain with few samples (e.g. painting faces, sketches, etc.). While there are many methods that tackle this problem in different ways, there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain… ▽ More

    Submitted 12 September, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ICCV 2023

  21. arXiv:2211.01156  [pdf, other

    cs.LG

    Entropic Neural Optimal Transport via Diffusion Processes

    Authors: Nikita Gushchin, Alexander Kolesov, Alexander Korotin, Dmitry Vetrov, Evgeny Burnaev

    Abstract: We propose a novel neural algorithm for the fundamental problem of computing the entropic optimal transport (EOT) plan between continuous probability distributions which are accessible by samples. Our algorithm is based on the saddle point reformulation of the dynamic version of EOT which is known as the Schrödinger Bridge problem. In contrast to the prior methods for large-scale EOT, our algorith… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  22. arXiv:2210.08884  [pdf, other

    cs.CV

    HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks

    Authors: Aibek Alanov, Vadim Titov, Dmitry Vetrov

    Abstract: Domain adaptation framework of GANs has achieved great progress in recent years as a main successful approach of training contemporary GANs in the case of very limited training data. In this work, we significantly improve this framework by proposing an extremely compact parameter space for fine-tuning the generator. We introduce a novel domain-modulation technique that allows to optimize only 6 th… ▽ More

    Submitted 30 March, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  23. arXiv:2209.03695  [pdf, other

    cs.LG stat.ML

    Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

    Authors: Maxim Kodryan, Ekaterina Lobacheva, Maksim Nakhodnov, Dmitry Vetrov

    Abstract: A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the v… ▽ More

    Submitted 15 January, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: Published in NeurIPS 2022. First three authors contributed equally

  24. arXiv:2204.03042  [pdf, other

    cs.SD cs.AI eess.AS

    FFC-SE: Fast Fourier Convolution for Speech Enhancement

    Authors: Ivan Shchekotov, Pavel Andreev, Oleg Ivanov, Aibek Alanov, Dmitry Vetrov

    Abstract: Fast Fourier convolution (FFC) is the recently proposed neural operator showing promising performance in several computer vision problems. The FFC operator allows employing large receptive field operations within early layers of the neural network. It was shown to be especially helpful for inpainting of periodic structures which are common in audio processing. In this work, we design neural networ… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  25. HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement

    Authors: Pavel Andreev, Aibek Alanov, Oleg Ivanov, Dmitry Vetrov

    Abstract: Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement.… ▽ More

    Submitted 10 December, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted to ICASSP 2023

  26. arXiv:2112.14423  [pdf, other

    eess.SP cs.LG cs.NI

    Machine Learning Methods for Spectral Efficiency Prediction in Massive MIMO Systems

    Authors: Evgeny Bobrov, Sergey Troshin, Nadezhda Chirkova, Ekaterina Lobacheva, Sviatoslav Panchenko, Dmitry Vetrov, Dmitry Kropotov

    Abstract: Channel decoding, channel detection, channel assessment, and resource management for wireless multiple-input multiple-output (MIMO) systems are all examples of problems where machine learning (ML) can be successfully applied. In this paper, we study several ML approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: To appear in Optimization Methods & Software, 22 pages, 10 figures, 2 tables

  27. arXiv:2111.15626  [pdf, other

    eess.SP cs.AI cs.IT

    Variational Autoencoders for Precoding Matrices with High Spectral Efficiency

    Authors: Evgeny Bobrov, Alexander Markov, Sviatoslav Panchenko, Dmitry Vetrov

    Abstract: Neural networks are used for channel decoding, channel detection, channel evaluation, and resource management in multi-input and multi-output (MIMO) wireless communication systems. In this paper, we consider the problem of finding precoding matrices with high spectral efficiency (SE) using variational autoencoder (VAE). We propose a computationally efficient algorithm for sampling precoding matric… ▽ More

    Submitted 5 May, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: The work is prepared for the MOTOR 22 conference, it contains 12 pages and 3 figures

  28. arXiv:2110.15072  [pdf, other

    cs.LG

    Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

    Authors: Kirill Struminsky, Artyom Gadetsky, Denis Rakitin, Danil Karpushkin, Dmitry Vetrov

    Abstract: Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to define a latent variable as a perturbed algorithm output and to use a differentiable surrogate for training. In general, the surrogate puts additional constrai… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted as a conference paper at NeurIPS 2021

  29. arXiv:2110.13523  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Automating Control of Overestimation Bias for Reinforcement Learning

    Authors: Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha, Artur Kadurin, Dmitry Vetrov

    Abstract: Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias contr… ▽ More

    Submitted 28 January, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  30. arXiv:2108.13996  [pdf, other

    cs.AI cs.LG

    Quantization of Generative Adversarial Networks for Efficient Inference: a Methodological Study

    Authors: Pavel Andreev, Alexander Fritzler, Dmitry Vetrov

    Abstract: Generative adversarial networks (GANs) have an enormous potential impact on digital content creation, e.g., photo-realistic digital avatars, semantic content editing, and quality enhancement of speech and images. However, the performance of modern GANs comes together with massive amounts of computations performed during the inference and high energy consumption. That complicates, or even makes imp… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

  31. arXiv:2106.15739  [pdf, other

    cs.LG stat.ML

    On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

    Authors: Ekaterina Lobacheva, Maxim Kodryan, Nadezhda Chirkova, Andrey Malinin, Dmitry Vetrov

    Abstract: Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate t… ▽ More

    Submitted 15 January, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: Published in NeurIPS 2021. First two authors contributed equally

  32. arXiv:2106.08038  [pdf, other

    cs.LG cs.CV

    Mean Embeddings with Test-Time Data Augmentation for Ensembling of Representations

    Authors: Arsenii Ashukha, Andrei Atanov, Dmitry Vetrov

    Abstract: Averaging predictions over a set of models -- an ensemble -- is widely used to improve predictive performance and uncertainty estimation of deep learning models. At the same time, many machine learning systems, such as search, matching, and recommendation systems, heavily rely on embeddings. Unfortunately, due to misalignment of features of independently trained models, embeddings, cannot be impro… ▽ More

    Submitted 14 July, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

  33. arXiv:2106.04499  [pdf, other

    cs.LG cs.AI

    Towards Practical Credit Assignment for Deep Reinforcement Learning

    Authors: Vyacheslav Alipov, Riley Simmons-Edler, Nikita Putintsev, Pavel Kalinin, Dmitry Vetrov

    Abstract: Credit assignment is a fundamental problem in reinforcement learning, the problem of measuring an action's influence on future rewards. Explicit credit assignment methods have the potential to boost the performance of RL algorithms on many tasks, but thus far remain impractical for general use. Recently, a family of methods called Hindsight Credit Assignment (HCA) was proposed, which explicitly as… ▽ More

    Submitted 11 February, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: 8 pages plus 8 page appendix

  34. arXiv:2007.08483  [pdf, other

    cs.LG stat.ML

    On Power Laws in Deep Ensembles

    Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Maxim Kodryan, Dmitry Vetrov

    Abstract: Ensembles of deep neural networks are known to achieve state-of-the-art performance in uncertainty estimation and lead to accuracy improvement. In this work, we focus on a classification problem and investigate the behavior of both non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble as a function of the ensemble size and the member network size. We indicate the conditio… ▽ More

    Submitted 28 June, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Published in NeurIPS 2020 and Workshop on Uncertainty and Robustness in Deep Learning at ICML 2020

  35. arXiv:2006.16653  [pdf, other

    cs.LG stat.CO stat.ME stat.ML

    Involutive MCMC: a Unifying Framework

    Authors: Kirill Neklyudov, Max Welling, Evgenii Egorov, Dmitry Vetrov

    Abstract: Markov Chain Monte Carlo (MCMC) is a computational approach to fundamental problems such as inference, integration, optimization, and simulation. The field has developed a broad spectrum of algorithms, varying in the way they are motivated, the way they are applied and how efficiently they sample. Despite all the differences, many of them share the same core principle, which we unify as the Involu… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  36. arXiv:2006.10859  [pdf, other

    cs.LG stat.ML

    MARS: Masked Automatic Ranks Selection in Tensor Decompositions

    Authors: Maxim Kodryan, Dmitry Kropotov, Dmitry Vetrov

    Abstract: Tensor decomposition methods have proven effective in various applications, including compression and acceleration of neural networks. At the same time, the problem of determining optimal decomposition ranks, which present the crucial parameter controlling the compression-accuracy trade-off, is still acute. In this paper, we introduce MARS -- a new efficient method for the automatic selection of r… ▽ More

    Submitted 4 April, 2023; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: AISTATS 2023

  37. arXiv:2005.07292  [pdf, other

    cs.LG stat.ML

    Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?

    Authors: Nadezhda Chirkova, Ekaterina Lobacheva, Dmitry Vetrov

    Abstract: One of the generally accepted views of modern deep learning is that increasing the number of parameters usually leads to better quality. The two easiest ways to increase the number of parameters is to increase the size of the network, e.g. width, or to train a deep ensemble; both approaches improve the performance in practice. In this work, we consider a fixed memory budget setting, and investigat… ▽ More

    Submitted 14 May, 2020; originally announced May 2020.

    Comments: Under review by the International Conference on Machine Learning (ICML 2020)

  38. arXiv:2005.04269  [pdf, other

    cs.LG cs.AI stat.ML

    Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

    Authors: Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov

    Abstract: The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional represent… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Under review by the International Conference on Machine Learning

  39. arXiv:2003.02174  [pdf, other

    cs.LG stat.ML

    Deterministic Decoding for Discrete Data in Variational Autoencoders

    Authors: Daniil Polykovskiy, Dmitry Vetrov

    Abstract: Variational autoencoders are prominent generative models for modeling discrete data. However, with flexible decoders, they tend to ignore the latent codes. In this paper, we study a VAE model with a deterministic decoder (DD-VAE) for sequential data that selects the highest-scoring tokens instead of sampling. Deterministic decoding solely relies on latent codes as the only way to produce diverse o… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: AISTATS 2020; GitHub: https://github.com/insilicomedicine/DD-VAE

  40. arXiv:2002.09779  [pdf, other

    cs.LG stat.ML

    Stochasticity in Neural ODEs: An Empirical Study

    Authors: Viktor Oganesyan, Alexandra Volokhova, Dmitry Vetrov

    Abstract: Stochastic regularization of neural networks (e.g. dropout) is a wide-spread technique in deep learning that allows for better generalization. Despite its success, continuous-time models, such as neural ordinary differential equation (ODE), usually rely on a completely deterministic feed-forward operation. This work provides an empirical study of stochastically regularized neural ODE on several im… ▽ More

    Submitted 26 June, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

  41. arXiv:2002.09103  [pdf, other

    stat.ML cs.CV cs.LG

    Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation

    Authors: Dmitry Molchanov, Alexander Lyzhov, Yuliya Molchanova, Arsenii Ashukha, Dmitry Vetrov

    Abstract: Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance. While many advanced learnable data augmentation techniques have emerged in recent years, they are focused on the training phase. Such techniques are not necessarily optimal for test-time augmentation and… ▽ More

    Submitted 20 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  42. arXiv:2002.06470  [pdf, other

    stat.ML cs.LG

    Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning

    Authors: Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov

    Abstract: Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point ou… ▽ More

    Submitted 18 July, 2021; v1 submitted 15 February, 2020; originally announced February 2020.

    Journal ref: Eighth International Conference on Learning Representations (ICLR 2020)

  43. arXiv:1911.10036  [pdf, other

    cs.LG stat.ML

    Low-variance Black-box Gradient Estimates for the Plackett-Luce Distribution

    Authors: Artyom Gadetsky, Kirill Struminsky, Christopher Robinson, Novi Quadrianto, Dmitry Vetrov

    Abstract: Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradient estimates. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates fo… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

    Comments: Accepted as a conference paper at AAAI 2020. Shortened version of the paper appears at BDL NeurIPS 2019 workshop

  44. arXiv:1911.05585  [pdf, other

    cs.LG cs.CL stat.ML

    Structured Sparsification of Gated Recurrent Neural Networks

    Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Alexander Markovich, Dmitry Vetrov

    Abstract: Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e.g. neurons. We adjust the existing sparsification approaches to the gated recurrent architectures. Specifically, in addition to the sparsification of weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Published in Workshop on Context and Compositionality in Biological and Artificial Neural Systems, NeurIPS 2019

  45. arXiv:1910.13148  [pdf, other

    cs.LG stat.ML

    A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models

    Authors: Maksim Kuznetsov, Daniil Polykovskiy, Dmitry Vetrov, Alexander Zhebrak

    Abstract: Generative models produce realistic objects in many domains, including text, image, video, and audio synthesis. Most popular models---Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs)---usually employ a standard Gaussian distribution as a prior. Previous works show that the richer family of prior distributions may help to avoid the mode collapse problem in GANs and to impr… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019; GitHub: https://github.com/insilicomedicine/TRIP

  46. arXiv:1907.07504  [pdf, other

    cs.LG stat.ML

    Subspace Inference for Bayesian Deep Learning

    Authors: Pavel Izmailov, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson

    Abstract: Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty. However, scaling Bayesian inference techniques to deep neural networks is challenging due to the high dimensionality of the parameter space. In this paper, we construct low-dimensional subspaces of parameter space, such as the first princi… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Published at UAI 2019

  47. arXiv:1906.03644  [pdf, other

    stat.ML cs.LG

    The Implicit Metropolis-Hastings Algorithm

    Authors: Kirill Neklyudov, Evgenii Egorov, Dmitry Vetrov

    Abstract: Recent works propose using the discriminator of a GAN to filter out unrealistic samples of the generator. We generalize these ideas by introducing the implicit Metropolis-Hastings algorithm. For any implicit probabilistic model and a target distribution represented by a set of samples, implicit Metropolis-Hastings operates by learning a discriminator to estimate the density-ratio and then generati… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

  48. arXiv:1905.03290  [pdf, other

    stat.ML cs.LG

    Importance Weighted Hierarchical Variational Inference

    Authors: Artem Sobolev, Dmitry Vetrov

    Abstract: Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblo… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

  49. arXiv:1905.00505  [pdf, other

    stat.ML cs.LG

    Semi-Conditional Normalizing Flows for Semi-Supervised Learning

    Authors: Andrei Atanov, Alexandra Volokhova, Arsenii Ashukha, Ivan Sosnovik, Dmitry Vetrov

    Abstract: This paper proposes a semi-conditional normalizing flow model for semi-supervised learning. The model uses both labelled and unlabeled data to learn an explicit model of joint distribution over objects and labels. Semi-conditional architecture of the model allows us to efficiently compute a value and gradients of the marginal likelihood for unlabeled objects. The conditional part of the model is b… ▽ More

    Submitted 22 June, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

  50. arXiv:1904.04751  [pdf, other

    cs.CV cs.LG stat.ML

    User-Controllable Multi-Texture Synthesis with Generative Adversarial Networks

    Authors: Aibek Alanov, Max Kochurov, Denis Volkhonskiy, Daniil Yashkov, Evgeny Burnaev, Dmitry Vetrov

    Abstract: We propose a novel multi-texture synthesis model based on generative adversarial networks (GANs) with a user-controllable mechanism. The user control ability allows to explicitly specify the texture which should be generated by the model. This property follows from using an encoder part which learns a latent representation for each texture from the dataset. To ensure a dataset coverage, we use an… ▽ More

    Submitted 24 April, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

    Comments: 8 pages paper, 17 pages supplementary material