Skip to main content

Showing 1–7 of 7 results for author: Laptev, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.06608  [pdf, ps, other

    cs.LG

    Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors

    Authors: Viacheslav Sinii, Nikita Balagansky, Gleb Gerasimov, Daniil Laptev, Yaroslav Aksenov, Vadim Kurochkin, Alexey Gorbatovski, Boris Shaposhnikov, Daniil Gavrilov

    Abstract: The mechanisms by which reasoning training reshapes LLMs' internal computations remain unclear. We study lightweight steering vectors inserted into the base model's residual stream and trained with a reinforcement-learning objective. These vectors match full fine-tuning performance while preserving the interpretability of small, additive interventions. Using logit-lens readouts and path-patching a… ▽ More

    Submitted 29 September, 2025; v1 submitted 8 September, 2025; originally announced September 2025.

    Comments: Preprint

  2. arXiv:2507.17509  [pdf, ps, other

    cond-mat.dis-nn cs.LG

    Graph Neural Network Approach to Predicting Magnetization in Quasi-One-Dimensional Ising Systems

    Authors: V. Slavin, O. Kryvchikov, D. Laptev

    Abstract: We present a graph-based deep learning framework for predicting the magnetic properties of quasi-one-dimensional Ising spin systems. The lattice geometry is encoded as a graph and processed by a graph neural network (GNN) followed by fully connected layers. The model is trained on Monte Carlo simulation data and accurately reproduces key features of the magnetization curve, including plateaus, cri… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

    Comments: 18 pages, 4 figures

  3. arXiv:2507.12990  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Teach Old SAEs New Domain Tricks with Boosting

    Authors: Nikita Koriagin, Yaroslav Aksenov, Daniil Laptev, Gleb Gerasimov, Nikita Balagansky, Daniil Gavrilov

    Abstract: Sparse Autoencoders have emerged as powerful tools for interpreting the internal representations of Large Language Models, yet they often fail to capture domain-specific features not prevalent in their training corpora. This paper introduces a residual learning approach that addresses this feature blindness without requiring complete retraining. We propose training a secondary SAE specifically to… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  4. arXiv:2505.24473  [pdf, ps, other

    cs.LG cs.AI

    Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

    Authors: Nikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koryagin, Daniil Gavrilov

    Abstract: Sparse Autoencoders (SAEs) have proven to be powerful tools for interpreting neural networks by decomposing hidden representations into disentangled, interpretable features via sparsity constraints. However, conventional SAEs are constrained by the fixed sparsity level chosen during training; meeting different sparsity requirements therefore demands separate models and increases the computational… ▽ More

    Submitted 5 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  5. arXiv:2505.22255  [pdf, ps, other

    cs.LG cs.CL

    Train Sparse Autoencoders Efficiently by Utilizing Features Correlation

    Authors: Vadim Kurochkin, Yaroslav Aksenov, Daniil Laptev, Daniil Gavrilov, Nikita Balagansky

    Abstract: Sparse Autoencoders (SAEs) have demonstrated significant promise in interpreting the hidden states of language models by decomposing them into interpretable latent directions. However, training SAEs at scale remains challenging, especially when large dictionary sizes are used. While decoders can leverage sparse-aware kernels for efficiency, encoders still require computationally intensive linear o… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  6. arXiv:2502.03032  [pdf, ps, other

    cs.LG cs.CL

    Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

    Authors: Daniil Laptev, Nikita Balagansky, Yaroslav Aksenov, Daniil Gavrilov

    Abstract: We introduce a new approach to systematically map features discovered by sparse autoencoder across consecutive layers of large language models, extending earlier work that examined inter-layer feature links. By using a data-free cosine similarity technique, we trace how specific features persist, transform, or first appear at each stage. This method yields granular flow graphs of feature evolution… ▽ More

    Submitted 24 July, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  7. arXiv:1604.06318  [pdf, other

    cs.CV

    TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

    Authors: Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys

    Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number… ▽ More

    Submitted 22 September, 2016; v1 submitted 21 April, 2016; originally announced April 2016.

    Comments: Accepted at CVPR 2016. The first two authors assert equal contribution and joint first authorship