Skip to main content

Showing 1–21 of 21 results for author: Kuznedelev, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.06261  [pdf, other

    cs.LG cs.CL

    Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

    Authors: Gleb Rodionov, Roman Garipov, Alina Shutova, George Yakushev, Erik Schultheis, Vage Egiazarian, Anton Sinitsin, Denis Kuznedelev, Dan Alistarh

    Abstract: Large Language Models (LLMs) have demonstrated the ability to tackle increasingly complex tasks through advanced reasoning, long-form content generation, and tool use. Solving these tasks often involves long inference-time computations. In human problem solving, a common strategy to expedite work is collaboration: by dividing the problem into sub-tasks, exploring different strategies concurrently,… ▽ More

    Submitted 23 May, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

    Comments: Preprint

  2. arXiv:2503.16397  [pdf, other

    cs.CV

    Scale-wise Distillation of Diffusion Models

    Authors: Nikita Starodubcev, Denis Kuznedelev, Artem Babenko, Dmitry Baranchuk

    Abstract: We present SwD, a scale-wise distillation framework for diffusion models (DMs), which effectively employs next-scale prediction ideas for diffusion-based few-step generators. In more detail, SwD is inspired by the recent insights relating diffusion processes to the implicit spectral autoregression. We suppose that DMs can initiate generation at lower data resolutions and gradually upscale the samp… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  3. arXiv:2501.19392  [pdf, other

    cs.LG

    Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

    Authors: Alina Shutova, Vladimir Malinovskii, Vage Egiazarian, Denis Kuznedelev, Denis Mazur, Nikita Surkov, Ivan Ermakov, Dan Alistarh

    Abstract: Efficient real-world deployments of large language models (LLMs) rely on Key-Value (KV) caching for processing and generating long outputs, reducing the need for repetitive computation. For large contexts, Key-Value caches can take up tens of gigabytes of device memory, as they store vector representations for each token and layer. Recent work has shown that the cached vectors can be compressed th… ▽ More

    Submitted 28 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: Preprint, under review

  4. arXiv:2412.16669  [pdf, other

    cs.LG cs.CR

    Label Privacy in Split Learning for Large Models with Parameter-Efficient Training

    Authors: Philip Zmushko, Marat Mansurov, Ruslan Svirschevski, Denis Kuznedelev, Max Ryabinin, Aleksandr Beznosikov

    Abstract: As deep learning models become larger and more expensive, many practitioners turn to fine-tuning APIs. These web services allow fine-tuning a model between two parties: the client that provides the data, and the server that hosts the model. While convenient, these APIs raise a new concern: the data of the client is at risk of privacy breach during the training procedure. This challenge presents an… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  5. arXiv:2412.01819  [pdf, other

    cs.CV

    Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

    Authors: Anton Voronov, Denis Kuznedelev, Mikhail Khoroshikh, Valentin Khrulkov, Dmitry Baranchuk

    Abstract: This work presents Switti, a scale-wise transformer for text-to-image generation. We start by adapting an existing next-scale prediction autoregressive (AR) architecture to T2I generation, investigating and mitigating training stability issues in the process. Next, we argue that scale-wise transformers do not require causality and propose a non-causal counterpart facilitating ~21% faster sampling… ▽ More

    Submitted 20 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: CVPR 2025

  6. arXiv:2410.14649  [pdf, ps, other

    cs.LG

    EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

    Authors: Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh

    Abstract: The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guarante… ▽ More

    Submitted 1 July, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: ICML camera-ready

  7. arXiv:2409.00492  [pdf, other

    cs.CV

    Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

    Authors: Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk

    Abstract: Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resou… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: project page: https://yandex-research.github.io/vqdm

  8. arXiv:2408.17163  [pdf, other

    cs.LG

    The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

    Authors: Diyuan Wu, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, Dan Alistarh

    Abstract: The rising footprint of machine learning has led to a focus on imposing \emph{model sparsity} as a means of reducing computational and memory costs. For deep neural networks (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics inspired by the classical Optimal Brain Surgeon (OBS) framework~\citep{lecun90brain, hassibi1992second, hassibi1993optimal}, which leverages loss curv… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  9. arXiv:2405.17261  [pdf, other

    eess.IV cs.CV

    Does Diffusion Beat GAN in Image Super Resolution?

    Authors: Denis Kuznedelev, Valerii Startsev, Daniil Shlenskii, Sergey Kastryulin

    Abstract: There is a prevalent opinion that diffusion-based models outperform GAN-based counterparts in the Image Super Resolution (ISR) problem. However, in most studies, diffusion-based ISR models employ larger networks and are trained longer than the GAN baselines. This raises the question of whether the high performance stems from the superiority of the diffusion paradigm or if it is a consequence of th… ▽ More

    Submitted 25 October, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2405.14852  [pdf, other

    cs.LG

    PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

    Authors: Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik

    Abstract: There has been significant interest in "extreme" compression of large language models (LLMs), i.e., to 1-2 bits per parameter, which allows such models to be executed efficiently on resource-constrained devices. Existing work focused on improved one-shot quantization techniques and weight representations; yet, purely post-training approaches are reaching diminishing returns in terms of the accurac… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint

  11. arXiv:2404.05666  [pdf, other

    cs.CV

    YaART: Yet Another ART Rendering Technology

    Authors: Sergey Kastryulin, Artem Konev, Alexander Shishenya, Eugene Lyapustin, Artem Khurshudov, Alexander Tselousov, Nikita Vinokurov, Denis Kuznedelev, Alexander Markovich, Grigoriy Livshits, Alexey Kirillov, Anastasiia Tabisheva, Liubov Chubarova, Marina Kaminskaia, Alexander Ustyuzhanin, Artemii Shvetsov, Daniil Shlenskii, Valerii Startsev, Dmitrii Kornilov, Mikhail Romanov, Artem Babenko, Sergei Ovcharenko, Valentin Khrulkov

    Abstract: In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Prompts and additional information are available on the project page, see https://ya.ru/ai/art/paper-yaart-v1

  12. arXiv:2401.06118  [pdf, other

    cs.LG cs.CL

    Extreme Compression of Large Language Models via Additive Quantization

    Authors: Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, Dan Alistarh

    Abstract: The emergence of accurate open large language models (LLMs) has led to a race towards performant quantization techniques which can enable their execution on end-user devices. In this paper, we revisit the problem of "extreme" LLM compression-defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter-from the point of view of classic methods in Multi-Codebook Quantization (MCQ… ▽ More

    Submitted 11 September, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: ICML, 2024

  13. arXiv:2310.06927  [pdf, other

    cs.CL cs.AI

    Sparse Fine-tuning for Inference Acceleration of Large Language Models

    Authors: Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

    Abstract: We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pretrained LLMs on specialized tasks, while inducing sparsity in their weights. On the accuracy side, we observe that standard loss-based fine-tuning may fail to recover accuracy, especially at high sparsities. To address this, we perform a detailed study of distillation-type losses, determ… ▽ More

    Submitted 13 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  14. arXiv:2308.02060  [pdf, other

    cs.LG cs.AI

    Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

    Authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

    Abstract: Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and mo… ▽ More

    Submitted 8 September, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  15. arXiv:2306.03078  [pdf, other

    cs.CL cs.LG

    SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

    Authors: Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh

    Abstract: Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit into memory-limited devices such as laptops and mobile phones, enabling personalized use. However, quantization down to 3-4 bits per parameter usually leads to moderate-to-high accuracy losses, especiall… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Extended preprint

  16. arXiv:2303.14409  [pdf, other

    cs.CV

    Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

    Authors: Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

    Abstract: Recent vision architectures and self-supervised training methods enable vision models that are extremely accurate and general, but come with massive parameter and computational costs. In practical settings, such as camera traps, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. These users may wish to mak… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    MSC Class: 68T07 ACM Class: I.m

  17. arXiv:2302.13875  [pdf, other

    cs.LG stat.ML

    Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts

    Authors: Gleb Bazhenov, Denis Kuznedelev, Andrey Malinin, Artem Babenko, Liudmila Prokhorenkova

    Abstract: In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributi… ▽ More

    Submitted 1 November, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  18. arXiv:2302.11640  [pdf, ps, other

    cs.LG

    A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

    Authors: Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova

    Abstract: Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it is often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and it is typically assumed that specialized method… ▽ More

    Submitted 2 March, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  19. arXiv:2210.09223  [pdf, other

    cs.CV cs.LG

    CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

    Authors: Denis Kuznedelev, Eldar Kurtic, Elias Frantar, Dan Alistarh

    Abstract: Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Prune… ▽ More

    Submitted 31 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    MSC Class: 68T07 ACM Class: I.m

  20. arXiv:2209.06177  [pdf, other

    cs.SI cs.DM cs.LG math.PR

    Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

    Authors: Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova

    Abstract: Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. It is often believed that heterophilous graphs are challenging for standard message-passing graph neural networks (GNNs), and much effort has been put into developing efficient methods for this setting. However, there is no universally agreed-upon measure of homophily in the… ▽ More

    Submitted 15 April, 2024; v1 submitted 13 September, 2022; originally announced September 2022.

  21. arXiv:2206.11124  [pdf, other

    cs.LG math.OC stat.ML

    A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

    Authors: Maksim Velikanov, Denis Kuznedelev, Dmitry Yarotsky

    Abstract: Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models. In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. Our key idea is to consider the dynamics of the second moments of model parameters for a special family of "Spectrally Expres… ▽ More

    Submitted 9 March, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: The revised version accepted at ICLR2023