Skip to main content

Showing 1–4 of 4 results for author: Kharlapenko, D

.
  1. arXiv:2505.24360  [pdf, ps, other

    cs.LG

    Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

    Authors: Stepan Shabalin, Ayush Panda, Dmitrii Kharlapenko, Abdur Raheem Ali, Yixiong Hao, Arthur Conmy

    Abstract: Sparse autoencoders are a promising new approach for decomposing language model activations for interpretation and control. They have been applied successfully to vision transformer image encoders and to small-scale diffusion models. Inference-Time Decomposition of Activations (ITDA) is a recently proposed variant of dictionary learning that takes the dictionary to be a set of data points from the… ▽ More

    Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 10 pages, 10 figures, Mechanistic Interpretability for Vision at CVPR 2025

  2. arXiv:2504.13756  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Scaling sparse feature circuit finding for in-context learning

    Authors: Dmitrii Kharlapenko, Stepan Shabalin, Fazl Barez, Arthur Conmy, Neel Nanda

    Abstract: Sparse autoencoders (SAEs) are a popular tool for interpreting large language model activations, but their utility in addressing open questions in interpretability remains unclear. In this work, we demonstrate their effectiveness by using SAEs to deepen our understanding of the mechanism behind in-context learning (ICL). We identify abstract SAE features that (i) encode the model's knowledge of wh… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  3. arXiv:2406.06309  [pdf, other

    cs.LG cs.AI

    Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?

    Authors: Denis Tarasov, Kirill Brilliantov, Dmitrii Kharlapenko

    Abstract: In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing… ▽ More

    Submitted 16 November, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: https://github.com/DT6A/ClORL

  4. arXiv:2405.20318  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries

    Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Ahmad Khan, Amélie Reymond, Rada Mihalcea, Bernhard Schölkopf, Mrinmaya Sachan, Zhijing Jin

    Abstract: Recent progress in Large Language Model (LLM) technology has changed our role in interacting with these models. Instead of primarily testing these models with questions we already know answers to, we are now using them for queries where the answers are unknown to us, driven by human curiosity. This shift highlights the growing need to understand curiosity-driven human questions - those that are mo… ▽ More

    Submitted 24 February, 2025; v1 submitted 30 May, 2024; originally announced May 2024.