Skip to main content

Showing 1–50 of 154 results for author: Oseledets, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22832  [pdf, ps, other

    cs.CV cs.AI

    Listener-Rewarded Thinking in VLMs for Image Preferences

    Authors: Alexander Gambashidze, Li Pengyi, Matvey Skripkin, Andrey Galichin, Anton Gusarov, Konstantin Sobolev, Andrey Kuznetsov, Ivan Oseledets

    Abstract: Training robust and generalizable reward models for human visual preferences is essential for aligning text-to-image and text-to-video generative models with human intent. However, current reward models often fail to generalize, and supervised fine-tuning leads to memorization, demanding complex annotation pipelines. While reinforcement learning (RL), specifically Group Relative Policy Optimizatio… ▽ More

    Submitted 1 July, 2025; v1 submitted 28 June, 2025; originally announced June 2025.

  2. arXiv:2506.06751  [pdf, ps, other

    cs.CL

    Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models

    Authors: Mikhail Salnikov, Dmitrii Korzh, Ivan Lazichny, Elvir Karimov, Artyom Iudin, Ivan Oseledets, Oleg Y. Rogov, Natalia Loukachevitch, Alexander Panchenko, Elena Tutubalina

    Abstract: This paper evaluates geopolitical biases in LLMs with respect to various countries though an analysis of their interpretation of historical events with conflicting national perspectives (USA, UK, USSR, and China). We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries. Our findings show significant geopolitical biases, with models favoring… ▽ More

    Submitted 20 June, 2025; v1 submitted 7 June, 2025; originally announced June 2025.

  3. arXiv:2506.06395  [pdf, ps, other

    cs.CL cs.LG

    Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

    Authors: Pengyi Li, Matvey Skripkin, Alexander Zubrey, Andrey Kuznetsov, Ivan Oseledets

    Abstract: Large language models (LLMs) excel at reasoning, yet post-training remains critical for aligning their behavior with task goals. Existing reinforcement learning (RL) methods often depend on costly human annotations or external reward models. We propose Reinforcement Learning via Self-Confidence (RLSC), which uses the model's own confidence as reward signals-eliminating the need for labels, prefere… ▽ More

    Submitted 11 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2506.05229  [pdf, ps, other

    cs.LG cs.CL

    Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts

    Authors: Danil Sivtsov, Ivan Rodkin, Gleb Kuzmin, Yuri Kuratov, Ivan Oseledets

    Abstract: Transformer models struggle with long-context inference due to their quadratic time and linear memory complexity. Recurrent Memory Transformers (RMTs) offer a solution by reducing the asymptotic cost to linear time and constant memory usage. However, their memory update mechanism leads to sequential execution, causing a performance bottleneck. We introduce Diagonal Batching, a scheduling scheme… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  5. arXiv:2506.04869  [pdf, other

    cs.CV

    Geological Field Restoration through the Lens of Image Inpainting

    Authors: Vladislav Trifonov, Ivan Oseledets, Ekaterina Muravleva

    Abstract: We present a new viewpoint on a reconstructing multidimensional geological fields from sparse observations. Drawing inspiration from deterministic image inpainting techniques, we model a partially observed spatial field as a multidimensional tensor and recover missing values by enforcing a global low-rank structure. Our approach combines ideas from tensor completion and geostatistics, providing a… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  6. arXiv:2506.04053  [pdf

    cs.LG cs.IT

    Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence

    Authors: Alexander Semenenko, Ivan Butakov, Alexey Frolov, Ivan Oseledets

    Abstract: Sliced Mutual Information (SMI) is widely used as a scalable alternative to mutual information for measuring non-linear statistical dependence. Despite its advantages, such as faster convergence, robustness to high dimensionality, and nullification only under statistical independence, we demonstrate that SMI is highly susceptible to data manipulation and exhibits counterintuitive behavior. Through… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    MSC Class: 94A16; 68T07; 94A17 ACM Class: E.4; H.1.1

  7. arXiv:2505.23911  [pdf, ps, other

    cs.CL

    One Task Vector is not Enough: A Large-Scale Study for In-Context Learning

    Authors: Pavel Tikhonov, Ivan Oseledets, Elena Tutubalina

    Abstract: In-context learning (ICL) enables Large Language Models (LLMs) to adapt to new tasks using few examples, with task vectors - specific hidden state activations - hypothesized to encode task information. Existing studies are limited by small-scale benchmarks, restricting comprehensive analysis. We introduce QuiteAFew, a novel dataset of 3,096 diverse few-shot tasks, each with 30 input-output pairs d… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2505.21189  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Exploring the Latent Capacity of LLMs for One-Step Text Generation

    Authors: Gleb Mezentsev, Ivan Oseledets

    Abstract: A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one specially trained input embedding. In this work, we explore whether such reconstruction is possible without autoregression. We show that frozen LLMs can generate hundreds of accurate tokens in just one forward pass, when provided… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: under review

  9. arXiv:2505.03648  [pdf, ps, other

    q-bio.NC cs.AI cs.LG

    Binding threshold units with artificial oscillatory neurons

    Authors: Vladimir Fanaskov, Ivan Oseledets

    Abstract: Artificial Kuramoto oscillatory neurons were recently introduced as an alternative to threshold units. Empirical evidence suggests that oscillatory units outperform threshold units in several tasks including unsupervised object discovery and certain reasoning problems. The proposed coupling mechanism for these oscillatory neurons is heterogeneous, combining a generalized Kuramoto equation with sta… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  10. arXiv:2504.13236  [pdf, other

    cs.LG cs.MS

    NNTile: a machine learning framework capable of training extremely large GPT language models on a single node

    Authors: Aleksandr Mikhalev, Aleksandr Katrutsa, Konstantin Sozykin, Ivan Oseledets

    Abstract: This study presents an NNTile framework for training large deep neural networks in heterogeneous clusters. The NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units (CPUs and GPUs). It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or G… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  11. arXiv:2504.04444  [pdf

    cs.CL cs.AI cs.LG

    On the Spatial Structure of Mixture-of-Experts in Transformers

    Authors: Daniel Bershatsky, Ivan Oseledets

    Abstract: A common assumption is that MoE routers primarily leverage semantic features for expert selection. However, our study challenges this notion by demonstrating that positional token information also plays a crucial role in routing decisions. Through extensive empirical analysis, we provide evidence supporting this hypothesis, develop a phenomenological explanation of the observed behavior, and discu… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Accepted to ICLR 2025 Workshop on Sparsity in LLMs (SLLM)

  12. arXiv:2503.19948   

    cs.CV cs.AI

    Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards

    Authors: Alexander Gambashidze, Konstantin Sobolev, Andrey Kuznetsov, Ivan Oseledets

    Abstract: Can Visual Language Models (VLMs) effectively capture human visual preferences? This work addresses this question by training VLMs to think about preferences at test time, employing reinforcement learning methods inspired by DeepSeek R1 and OpenAI O1. Using datasets such as ImageReward and Human Preference Score v2 (HPSv2), our models achieve accuracies of 64.9% on the ImageReward test set (traine… ▽ More

    Submitted 28 June, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: We are withdrawing this paper because the main contributions and methodology have significantly changed after further research and experimental updates. The current version no longer reflects our results and main contribution / topic

  13. arXiv:2503.18878  [pdf, other

    cs.CL

    I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

    Authors: Andrey Galichin, Alexey Dontsov, Polina Druzhinina, Anton Razzhigaev, Oleg Y. Rogov, Elena Tutubalina, Ivan Oseledets

    Abstract: Large Language Models (LLMs) have achieved remarkable success in natural language processing. Recent advances have led to the developing of a new class of reasoning LLMs; for example, open-source DeepSeek-R1 has achieved state-of-the-art performance by integrating deep thinking and complex reasoning. Despite these impressive capabilities, the internal reasoning mechanisms of such models remain une… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  14. arXiv:2503.03283  [pdf, other

    stat.ML cs.AI cs.LG math.NA

    Exploring specialization and sensitivity of convolutional neural networks in the context of simultaneous image augmentations

    Authors: Pavel Kharyuk, Sergey Matveev, Ivan Oseledets

    Abstract: Drawing parallels with the way biological networks are studied, we adapt the treatment--control paradigm to explainable artificial intelligence research and enrich it through multi-parametric input alterations. In this study, we propose a framework for investigating the internal inference impacted by input data augmentations. The internal changes in network operation are reflected in activation ch… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 26 pages; main text: 5 figures, 4 tables; appendix: 4 sections, 3 tables; supplementary: 7 files (figures S1-S6: packed as 7z archive, S7: single pdf file)

    MSC Class: 68T07 ACM Class: I.2.6; G.3; I.2.10

  15. arXiv:2503.01375  [pdf, other

    cs.LG cs.AI

    Bayesian Inverse Problems Meet Flow Matching: Efficient and Flexible Inference via Transformers

    Authors: Daniil Sherki, Ivan Oseledets, Ekaterina Muravleva

    Abstract: The efficient resolution of Bayesian inverse problems remains challenging due to the high computational cost of traditional sampling methods. In this paper, we propose a novel framework that integrates Conditional Flow Matching (CFM) with a transformer-based architecture to enable fast and flexible sampling from complex posterior distributions. The proposed methodology involves the direct learning… ▽ More

    Submitted 16 May, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  16. arXiv:2502.17029  [pdf, other

    eess.IV cs.CV

    M3DA: Benchmark for Unsupervised Domain Adaptation in 3D Medical Image Segmentation

    Authors: Boris Shirokikh, Anvar Kurmukov, Mariia Donskova, Valentin Samokhin, Mikhail Belyaev, Ivan Oseledets

    Abstract: Domain shift presents a significant challenge in applying Deep Learning to the segmentation of 3D medical images from sources like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). Although numerous Domain Adaptation methods have been developed to address this issue, they are often evaluated under impractical data shift scenarios. Specifically, the medical imaging datasets used are of… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 17 pages,7 figures,11 tables

  17. arXiv:2502.15007  [pdf, other

    cs.CL cs.AI

    LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Temurbek Rahmatullaev, Elizaveta Goncharova, Polina Druzhinina, Ivan Oseledets, Andrey Kuznetsov

    Abstract: We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our a… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: accepted to NAACL 2025

  18. arXiv:2502.09175  [pdf, other

    cs.CR cs.AI cs.CL

    FLAME: Flexible LLM-Assisted Moderation Engine

    Authors: Ivan Bakulin, Ilia Kopanichuk, Iaroslav Bespalov, Nikita Radchenko, Vladimir Shaposhnikov, Dmitry Dylov, Ivan Oseledets

    Abstract: The rapid advancement of Large Language Models (LLMs) has introduced significant challenges in moderating user-model interactions. While LLMs demonstrate remarkable capabilities, they remain vulnerable to adversarial attacks, particularly ``jailbreaking'' techniques that bypass content safety measures. Current content moderation systems, which primarily rely on input prompt filtering, have proven… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  19. arXiv:2502.08321  [pdf, other

    cs.CV

    Screener: Self-supervised Pathology Segmentation Model for 3D Medical Images

    Authors: Mikhail Goncharov, Eugenia Soboleva, Mariia Donskova, Ivan Oseledets, Marina Munkhoeva, Maxim Panov

    Abstract: Accurate segmentation of all pathological findings in 3D medical images remains a significant challenge, as supervised models are limited to detecting only the few pathology classes annotated in existing datasets. To address this, we frame pathology segmentation as an unsupervised visual anomaly segmentation (UVAS) problem, leveraging the inherent rarity of pathological patterns compared to health… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  20. arXiv:2502.07845  [pdf, other

    cs.CV cs.AI

    Spread them Apart: Towards Robust Watermarking of Generated Content

    Authors: Mikhail Pautov, Danil Ivanov, Andrey V. Galichin, Oleg Rogov, Ivan Oseledets

    Abstract: Generative models that can produce realistic images have improved significantly in recent years. The quality of the generated content has increased drastically, so sometimes it is very difficult to distinguish between the real images and the generated ones. Such an improvement comes at a price of ethical concerns about the usage of the generative models: the users of generative models can improper… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  21. arXiv:2502.03183  [pdf, other

    cs.CV cs.LG

    MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding

    Authors: Pengyi Li, Irina Abdullaeva, Alexander Gambashidze, Andrey Kuznetsov, Ivan Oseledets

    Abstract: Modern Video Large Language Models (VLLMs) often rely on uniform frame sampling for video understanding, but this approach frequently fails to capture critical information due to frame redundancy and variations in video content. We propose MaxInfo, a training-free method based on the maximum volume principle, which selects and retains the most representative frames from the input video. By maximiz… ▽ More

    Submitted 27 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  22. arXiv:2502.01397  [pdf, other

    cs.LG cs.AI math.NA

    Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

    Authors: Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets

    Abstract: Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. This position paper argues that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations. We demonstrate that message-passing GNNs fundamentally fail to approximate sparse triangular factorizations f… ▽ More

    Submitted 28 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  23. arXiv:2411.16821  [pdf, other

    cs.CL cs.LG

    KL-geodesics flow matching with a novel sampling scheme

    Authors: Egor Sevriugov, Ivan Oseledets

    Abstract: Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models, but they face challenges in modeling the complex dependencies inherent in text data. In this work, we investigate a conditional flow matching approach for text generation. We represent tokens as one-hot vectors in a \(V\)-dimensional simplex and utilize… ▽ More

    Submitted 25 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

  24. arXiv:2410.18057  [pdf, ps, other

    cs.CV cs.CL

    CLEAR: Character Unlearning in Textual and Visual Modalities

    Authors: Alexey Dontsov, Dmitrii Korzh, Alexey Zhavoronkin, Boris Mikheev, Denis Bobkov, Aibek Alanov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina

    Abstract: Machine Unlearning (MU) is critical for removing private or hazardous information from deep learning models. While MU has advanced significantly in unimodal (text or vision) settings, multimodal unlearning (MMU) remains underexplored due to the lack of open benchmarks for evaluating cross-modal data removal. To address this gap, we introduce CLEAR, the first open-source benchmark designed specific… ▽ More

    Submitted 31 May, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  25. arXiv:2410.17765  [pdf, other

    cs.LG

    Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

    Authors: Artem Basharin, Andrei Chertkov, Ivan Oseledets

    Abstract: We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy. Motivated by recent work that predicts the probabilities of subsequent tokens using multiple heads, we connect this approach to rank-$1$ canonical tensor decomposition. By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved mo… ▽ More

    Submitted 10 February, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  26. arXiv:2410.13866  [pdf, other

    q-bio.NC cs.AI cs.NE

    Associative memory and dead neurons

    Authors: Vladimir Fanaskov, Ivan Oseledets

    Abstract: In "Large Associative Memory Problem in Neurobiology and Machine Learning," Dmitry Krotov and John Hopfield introduced a general technique for the systematic construction of neural ordinary differential equations with non-increasing energy or Lyapunov function. We study this energy function and identify that it is vulnerable to the problem of dead neurons. Each point in the state space where the n… ▽ More

    Submitted 26 February, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

    Comments: Reviewed in https://openreview.net/forum?id=mkNVPGpEPm, accepted to ICLR 2025

  27. arXiv:2410.07383  [pdf, other

    cs.CL cs.AI

    SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

    Authors: Viktoriia Chekalina, Anna Rudenko, Gleb Mezentsev, Alexander Mikhalev, Alexander Panchenko, Ivan Oseledets

    Abstract: The performance of Transformer models has been enhanced by increasing the number of parameters and the length of the processed text. Consequently, fine-tuning the entire model becomes a memory-intensive process. High-performance methods for parameter-efficient fine-tuning (PEFT) typically work with Attention blocks and often overlook MLP blocks, which contain about half of the model parameters. We… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  28. arXiv:2410.04462  [pdf, other

    cs.CV cs.LG

    Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search

    Authors: Georgii Novikov, Alexander Gneushev, Alexey Kadeishvili, Ivan Oseledets

    Abstract: Nearest-neighbor search in large vector databases is crucial for various machine learning applications. This paper introduces a novel method using tensor-train (TT) low-rank tensor decomposition to efficiently represent point clouds and enable fast approximate nearest-neighbor searches. We propose a probabilistic interpretation and utilize density estimation losses like Sliced Wasserstein to train… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  29. arXiv:2410.04096  [pdf, other

    cs.LG cs.AI cs.NE math.NA physics.comp-ph

    Sinc Kolmogorov-Arnold Network and Its Applications on Physics-informed Neural Networks

    Authors: Tianchi Yu, Jingwei Qiu, Jiang Yang, Ivan Oseledets

    Abstract: In this paper, we propose to use Sinc interpolation in the context of Kolmogorov-Arnold Networks, neural networks with learnable activation functions, which recently gained attention as alternatives to multilayer perceptron. Many different function representations have already been tried, but we show that Sinc interpolation proposes a viable alternative, since it is known in numerical analysis to… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  30. Scalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs

    Authors: Gleb Mezentsev, Danil Gusak, Ivan Oseledets, Evgeny Frolov

    Abstract: Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers… ▽ More

    Submitted 30 November, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

    Comments: 11 pages, fixed some typos

  31. arXiv:2409.10291  [pdf, other

    cs.CV

    Anatomical Positional Embeddings

    Authors: Mikhail Goncharov, Valentin Samokhin, Eugenia Soboleva, Roman Sokolov, Boris Shirokikh, Mikhail Belyaev, Anvar Kurmukov, Ivan Oseledets

    Abstract: We propose a self-supervised model producing 3D anatomical positional embeddings (APE) of individual medical image voxels. APE encodes voxels' anatomical closeness, i.e., voxels of the same organ or nearby organs always have closer positional embeddings than the voxels of more distant body parts. In contrast to the existing models of anatomical positional embeddings, our method is able to efficien… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  32. arXiv:2408.16414  [pdf, other

    cs.LG cs.AI math.NA physics.comp-ph

    Spectral Informed Neural Network: An Efficient and Low-Memory PINN

    Authors: Tianchi Yu, Yiming Qi, Ivan Oseledets, Shiyi Chen

    Abstract: With growing investigations into solving partial differential equations by physics-informed neural networks (PINNs), more accurate and efficient PINNs are required to meet the practical demands of scientific computing. One bottleneck of current PINNs is computing the high-order derivatives via automatic differentiation which often necessitates substantial computing resources. In this paper, we foc… ▽ More

    Submitted 8 October, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

  33. RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

    Authors: Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

    Abstract: Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE… ▽ More

    Submitted 14 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: 5 pages, accepted for CIKM'24

  34. arXiv:2407.15545  [pdf, other

    cs.LG

    Inverted Activations: Reducing Memory Footprint in Neural Network Training

    Authors: Georgii Novikov, Ivan Oseledets

    Abstract: The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with activation tensors, particularly in pointwise nonlinearity layers that traditionally save the entire input tensor for the backward pass, leading to substantial memory c… ▽ More

    Submitted 6 October, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  35. arXiv:2406.04709  [pdf, other

    cs.LG

    ConDiff: A Challenging Dataset for Neural Solvers of Partial Differential Equations

    Authors: Vladislav Trifonov, Alexander Rudikov, Oleg Iliev, Yuri M. Laevsky, Ivan Oseledets, Ekaterina Muravleva

    Abstract: We present ConDiff, a novel dataset for scientific machine learning. ConDiff focuses on the parametric diffusion equation with space dependent coefficients, a fundamental problem in many applications of partial differential equations (PDEs). The main novelty of the proposed dataset is that we consider discontinuous coefficients with high contrast. These coefficient functions are sampled from a sel… ▽ More

    Submitted 3 February, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

  36. arXiv:2406.02645  [pdf, ps, other

    physics.comp-ph cs.AI cs.LG math.NA

    Astral: training physics-informed neural networks with error majorants

    Authors: Vladimir Fanaskov, Tianchi Yu, Alexander Rudikov, Ivan Oseledets

    Abstract: The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired ac… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  37. arXiv:2405.15557  [pdf, other

    cs.LG math.NA

    Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers

    Authors: Vladislav Trifonov, Alexander Rudikov, Oleg Iliev, Yuri M. Laevsky, Ivan Oseledets, Ekaterina Muravleva

    Abstract: Large linear systems are ubiquitous in modern computational science and engineering. The main recipe for solving them is the use of Krylov subspace iterative methods with well-designed preconditioners. Recently, GNNs have been shown to be a promising tool for designing preconditioners to reduce the overall computational cost of iterative methods by constructing them more efficiently than with clas… ▽ More

    Submitted 3 February, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

  38. arXiv:2405.12250  [pdf, other

    cs.LG cs.AI cs.CL

    Your Transformer is Secretly Linear

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 9 pages, 9 figures

  39. arXiv:2405.07562  [pdf, other

    cs.LG cs.AI

    GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation

    Authors: Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets

    Abstract: While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  40. arXiv:2404.18791  [pdf, other

    cs.SD cs.AI eess.AS

    Certification of Speaker Recognition Models to Additive Perturbations

    Authors: Dmitrii Korzh, Elvir Karimov, Mikhail Pautov, Oleg Y. Rogov, Ivan Oseledets

    Abstract: Speaker recognition technology is applied to various tasks, from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, initially developed for the image domain. O… ▽ More

    Submitted 18 December, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures; AAAI-2025 accepted paper

  41. arXiv:2404.09737  [pdf, other

    cs.LG cs.CL

    Quantization of Large Language Models with an Overdetermined Basis

    Authors: Daniil Merkulov, Daria Cherniuk, Alexander Rudikov, Ivan Oseledets, Ekaterina Muravleva, Aleksandr Mikhalev, Boris Kashin

    Abstract: In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decompos… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  42. arXiv:2404.06212  [pdf, other

    cs.CV cs.AI cs.LG

    OmniFusion Technical Report

    Authors: Elizaveta Goncharova, Anton Razzhigaev, Matvey Mikhalchuk, Maxim Kurkin, Irina Abdullaeva, Matvey Skripkin, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures, 9 tables, 2 appendices

    MSC Class: 6804; 68T50 (Primary) ACM Class: I.2.7; I.2.10; I.4.9

  43. arXiv:2402.03232  [pdf, other

    cs.LG

    Explicit Flow Matching: On The Theory of Flow Matching Algorithms with Applications

    Authors: Gleb Ryzhakov, Svetlana Pavlova, Egor Sevriugov, Ivan Oseledets

    Abstract: This paper proposes a novel method, Explicit Flow Matching (ExFM), for training and analyzing flow-based generative models. ExFM leverages a theoretically grounded loss function, ExFM loss (a tractable form of Flow Matching (FM) loss), to demonstrably reduce variance during training, leading to faster convergence and more stable learning. Based on theoretical analysis of these formulas, we derived… ▽ More

    Submitted 1 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  44. arXiv:2402.02890  [pdf, other

    cs.LG math.OC

    Black-Box Approximation and Optimization with Hierarchical Tucker Decomposition

    Authors: Gleb Ryzhakov, Andrei Chertkov, Artem Basharin, Ivan Oseledets

    Abstract: We develop a new method HTBB for the multidimensional black-box approximation and gradient-free optimization, which is based on the low-rank hierarchical Tucker decomposition with the use of the MaxVol indices selection procedure. Numerical experiments for 14 complex model problems demonstrate the robustness of the proposed method for dimensions up to 1000, while it shows significantly more accura… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  45. arXiv:2402.01376  [pdf

    cs.CL cs.AI cs.LG

    LoTR: Low Tensor Rank Weight Adaptation

    Authors: Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets

    Abstract: In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor dec… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Submitted; missing author and sections were added;

  46. arXiv:2401.16367  [pdf, other

    cs.LG cs.AI cs.CL

    TQCompressor: improving tensor decomposition methods in neural networks via permutations

    Authors: V. Abronin, A. Naumov, D. Mazur, D. Bystrov, K. Tsarova, Ar. Melnikov, I. Oseledets, S. Dolgov, R. Brasher, M. Perelshtein

    Abstract: We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associ… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  47. arXiv:2401.14031  [pdf, other

    cs.LG cs.CR cs.CV

    Sparse and Transferable Universal Singular Vectors Attack

    Authors: Kseniia Kuvshinova, Olga Tsymboi, Ivan Oseledets

    Abstract: The research in the field of adversarial attacks and models' vulnerability is one of the fundamental directions in modern machine learning. Recent studies reveal the vulnerability phenomenon, and understanding the mechanisms behind this is essential for improving neural network characteristics and interpretability. In this paper, we propose a novel sparse universal white-box adversarial attack. Ou… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  48. arXiv:2401.10748  [pdf, other

    cs.NE cs.LG

    Fast gradient-free activation maximization for neurons in spiking neural networks

    Authors: Nikita Pospelov, Andrei Chertkov, Maxim Beketov, Ivan Oseledets, Konstantin Anokhin

    Abstract: Elements of neural networks, both biological and artificial, can be described by their selectivity for specific cognitive features. Understanding these features is important for understanding the inner workings of neural networks. For a living system, such as a neuron, whose response to a stimulus is unknown and not differentiable, the only way to reveal these features is through a feedback loop t… ▽ More

    Submitted 25 June, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

  49. Probabilistically Robust Watermarking of Neural Networks

    Authors: Mikhail Pautov, Nikita Bogdanov, Stanislav Pyatkin, Oleg Rogov, Ivan Oseledets

    Abstract: As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based water… ▽ More

    Submitted 18 September, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the International Joint Conferences on Artificial Intelligence, 33 (2024), 4778-4787

  50. arXiv:2312.10064  [pdf, other

    cs.IR cs.AI

    Dynamic Collaborative Filtering for Matrix- and Tensor-based Recommender Systems

    Authors: Albert Saiapin, Ivan Oseledets, Evgeny Frolov

    Abstract: In production applications of recommender systems, a continuous data flow is employed to update models in real-time. Many recommender models often require complete retraining to adapt to new data. In this work, we introduce a novel collaborative filtering model for sequential problems known as Tucker Integrator Recommender - TIRecA. TIRecA efficiently updates its parameters using only the new data… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.