Skip to main content

Showing 1–8 of 8 results for author: Voronov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.01819  [pdf, other

    cs.CV

    Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

    Authors: Anton Voronov, Denis Kuznedelev, Mikhail Khoroshikh, Valentin Khrulkov, Dmitry Baranchuk

    Abstract: This work presents Switti, a scale-wise transformer for text-to-image generation. We start by adapting an existing next-scale prediction autoregressive (AR) architecture to T2I generation, investigating and mitigating training stability issues in the process. Next, we argue that scale-wise transformers do not require causality and propose a non-causal counterpart facilitating ~21% faster sampling… ▽ More

    Submitted 20 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: CVPR 2025

  2. arXiv:2409.00492  [pdf, other

    cs.CV

    Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

    Authors: Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk

    Abstract: Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resou… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: project page: https://yandex-research.github.io/vqdm

  3. arXiv:2401.06766  [pdf, other

    cs.CL

    Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

    Authors: Anton Voronov, Lena Wolf, Max Ryabinin

    Abstract: Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples. The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning. In this work, we conduct a comprehensive study of the template format's influence on the in-context learning performance. We evaluate… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to Findings of ACL 2024. 24 pages, 10 figures. Code: https://github.com/yandex-research/mind-your-format

  4. arXiv:2302.04841  [pdf, other

    cs.CV cs.LG

    Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics

    Authors: Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin

    Abstract: Text-to-image generation models represent the next step of evolution in image synthesis, offering a natural way to achieve flexible yet fine-grained control over the result. One emerging area of research is the fast adaptation of large text-to-image models to smaller datasets or new visual concepts. However, many efficient methods of adaptation have a long training time, which limits their practic… ▽ More

    Submitted 1 November, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2023. 20 pages, 15 figures. Code: https://github.com/yandex-research/DVAR

  5. arXiv:2211.00688  [pdf, other

    cs.AI cs.CL

    Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

    Authors: Alexey Skrynnik, Zoya Volovikova, Marc-Alexandre Côté, Anton Voronov, Artem Zholus, Negar Arabzadeh, Shrestha Mohanty, Milagro Teruel, Ahmed Awadallah, Aleksandr Panov, Mikhail Burtsev, Julia Kiseleva

    Abstract: The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of bu… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: 6 pages, 3 figures

  6. arXiv:2111.10974  [pdf, other

    cs.CV cs.AI cs.CL

    Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture

    Authors: Daria Bakshandaeva, Denis Dimitrov, Vladimir Arkhipkin, Alex Shonenkov, Mark Potanin, Denis Karachev, Andrey Kuznetsov, Anton Voronov, Vera Davydova, Elena Tutubalina, Aleksandr Petiushko

    Abstract: Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Transl… ▽ More

    Submitted 28 December, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

  7. arXiv:2109.08914  [pdf, other

    cs.CL cs.LG

    Text Detoxification using Large Pre-trained Neural Models

    Authors: David Dale, Anton Voronov, Daryna Dementieva, Varvara Logacheva, Olga Kozlova, Nikita Semenov, Alexander Panchenko

    Abstract: We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second… ▽ More

    Submitted 3 November, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted to the EMNLP 2021 conference

  8. arXiv:2107.05951  [pdf, other

    math.OC cs.DC

    One-Point Feedback for Composite Optimization with Applications to Distributed and Federated Learning

    Authors: Aleksandr Beznosikov, Ivan Stepanov, Artyom Voronov, Alexander Gasnikov

    Abstract: This work is devoted to solving the composite optimization problem with the mixture oracle: for the smooth part of the problem, we have access to the gradient, and for the non-smooth part, only the one-point zero-order oracle is available. For such a setup, we present a new method based on the sliding algorithm. Our method allows to separate the oracle complexities and to compute the gradient for… ▽ More

    Submitted 10 May, 2025; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: New in v3: completely new text of the paper; 33 pages, 1 figure, 2 tables, 1 algorithm