Skip to main content

Showing 1–6 of 6 results for author: Arad, D

.
  1. arXiv:2506.09047  [pdf, ps, other

    cs.CL

    Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

    Authors: Yaniv Nikankin, Dana Arad, Yossi Gandelsman, Yonatan Belinkov

    Abstract: Vision-Language models (VLMs) show impressive abilities to answer questions on visual inputs (e.g., counting objects in an image), yet demonstrate higher accuracies when performing an analogous task on text (e.g., counting words in a text). We investigate this accuracy gap by identifying and comparing the \textit{circuits} - the task-specific computational sub-graphs - in different modalities. We… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    MSC Class: 68T5 ACM Class: I.2.7

  2. arXiv:2505.20063  [pdf, other

    cs.LG cs.AI cs.CL

    SAEs Are Good for Steering -- If You Select the Right Features

    Authors: Dana Arad, Aaron Mueller, Yonatan Belinkov

    Abstract: Sparse Autoencoders (SAEs) have been proposed as an unsupervised approach to learn a decomposition of a model's latent space. This enables useful applications such as steering - influencing the output of a model towards a desired concept - without requiring labeled data. Current methods identify SAE features to steer by analyzing the input tokens that activate them. However, recent work has highli… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  3. arXiv:2504.13151  [pdf, ps, other

    cs.LG cs.AI cs.CL

    MIB: A Mechanistic Interpretability Benchmark

    Authors: Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

    Abstract: How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization… ▽ More

    Submitted 9 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted to ICML 2025. Project website at https://mib-bench.github.io

  4. Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

    Authors: Michael Toker, Hadas Orgad, Mor Ventura, Dana Arad, Yonatan Belinkov

    Abstract: Text-to-image diffusion models (T2I) use a latent representation of a text prompt to guide the image generation process. However, the process by which the encoder produces the text representation is unknown. We propose the Diffusion Lens, a method for analyzing the text encoder of T2I models by generating images from its intermediate representations. Using the Diffusion Lens, we perform an extensi… ▽ More

    Submitted 21 October, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: Published in: ACL 2024 Project webpage: tokeron.github.io/DiffusionLensWeb

    ACM Class: I.2.7; I.4.0

  5. arXiv:2306.00738  [pdf, other

    cs.CL cs.CV

    ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

    Authors: Dana Arad, Hadas Orgad, Yonatan Belinkov

    Abstract: Our world is marked by unprecedented technological, global, and socio-political transformations, posing a significant challenge to text-to-image generative models. These models encode factual associations within their parameters that can quickly become outdated, diminishing their utility for end-users. To that end, we introduce ReFACT, a novel approach for editing factual associations in text-to-i… ▽ More

    Submitted 7 May, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to NAACL 2024 (Main Conference)

    MSC Class: 68T50 ACM Class: I.2.7

  6. arXiv:1405.2530  [pdf, ps, other

    cs.DS

    Tighter Bounds for Makespan Minimization on Unrelated Machines

    Authors: Dor Arad, Yael Mordechai, Hadas Shachnai

    Abstract: We consider the problem of scheduling $n$ jobs to minimize the makespan on $m$ unrelated machines, where job $j$ requires time $p_{ij}$ if processed on machine $i$. A classic algorithm of Lenstra et al. yields the best known approximation ratio of $2$ for the problem. Improving this bound has been a prominent open problem for over two decades. In this paper we obtain a tighter bound for a wide sub… ▽ More

    Submitted 23 June, 2014; v1 submitted 11 May, 2014; originally announced May 2014.

    Comments: 12 pages, 2 figures. arXiv admin note: text overlap with arXiv:1011.1168 by other authors