Skip to main content

Showing 1–5 of 5 results for author: Zmushko, P

.
  1. arXiv:2506.03725  [pdf, ps, other

    cs.LG math.OC

    Sign-SGD is the Golden Gate between Multi-Node to Single-Node Learning: Significant Boost via Parameter-Free Optimization

    Authors: Daniil Medyakov, Sergey Stanko, Gleb Molodtsov, Philip Zmushko, Grigoriy Evseev, Egor Petrov, Aleksandr Beznosikov

    Abstract: Quite recently, large language models have made a significant breakthrough across various disciplines. However, training them is an extremely resource-intensive task, even for major players with vast computing resources. One of the methods gaining popularity in light of these challenges is Sign-SGD. This method can be applied both as a memory-efficient approach in single-node training and as a gra… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 58 pages, 5 figures, 5 tables

  2. arXiv:2502.07923  [pdf, other

    math.OC cs.LG

    Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness

    Authors: Nikita Kornilov, Philip Zmushko, Andrei Semenov, Mark Ikonnikov, Alexander Gasnikov, Alexander Beznosikov

    Abstract: In recent years, non-convex optimization problems are more often described by generalized $(L_0, L_1)$-smoothness assumption rather than standard one. Meanwhile, severely corrupted data used in these problems has increased the demand for methods capable of handling heavy-tailed noises, i.e., noises with bounded $κ$-th moment. Motivated by these real-world trends and challenges, we explore sign-bas… ▽ More

    Submitted 27 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  3. arXiv:2412.16669  [pdf, other

    cs.LG cs.CR

    Label Privacy in Split Learning for Large Models with Parameter-Efficient Training

    Authors: Philip Zmushko, Marat Mansurov, Ruslan Svirschevski, Denis Kuznedelev, Max Ryabinin, Aleksandr Beznosikov

    Abstract: As deep learning models become larger and more expensive, many practitioners turn to fine-tuning APIs. These web services allow fine-tuning a model between two parties: the client that provides the data, and the server that hosts the model. While convenient, these APIs raise a new concern: the data of the client is at risk of privacy breach during the training procedure. This challenge presents an… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  4. arXiv:2412.11689  [pdf, other

    cs.LG cs.CR

    Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning

    Authors: Andrei Semenov, Philip Zmushko, Alexander Pichugin, Aleksandr Beznosikov

    Abstract: Vertical Federated Learning (VFL) aims to enable collaborative training of deep learning models while maintaining privacy protection. However, the VFL procedure still has components that are vulnerable to attacks by malicious parties. In our work, we consider feature reconstruction attacks, a common risk targeting input data compromise. We theoretically claim that feature reconstruction attacks ca… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 29 pages, 12 figures, 3 tables

    ACM Class: I.2.m; F.2.0

  5. arXiv:2411.07837  [pdf, other

    cs.LG

    FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

    Authors: Philip Zmushko, Aleksandr Beznosikov, Martin Takáč, Samuel Horváth

    Abstract: With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA (Hu et al., 2021)), low-rank gradient projection (GaLore (Zhao et al.,… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.