Skip to main content

Showing 1–13 of 13 results for author: Engels, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15679  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Dense SAE Latents Are Features, Not Bugs

    Authors: Xiaoqing Sun, Alessandro Stolfo, Joshua Engels, Ben Wu, Senthooran Rajamanoharan, Mrinmaya Sachan, Max Tegmark

    Abstract: Sparse autoencoders (SAEs) are designed to extract interpretable features from language models by enforcing a sparsity constraint. Ideally, training an SAE would yield latents that are both sparse and semantically meaningful. However, many SAE latents activate frequently (i.e., are \emph{dense}), raising concerns that they may be undesirable artifacts of the training procedure. In this work, we sy… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  2. arXiv:2504.18530  [pdf, other

    cs.AI cs.CY cs.LG

    Scaling Laws For Scalable Oversight

    Authors: Joshua Engels, David D. Baek, Subhash Kantamneni, Max Tegmark

    Abstract: Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address this gap, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system bein… ▽ More

    Submitted 9 May, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: 32 pages, 18 figures; The first three authors contributed equally

  3. arXiv:2502.16681  [pdf, other

    cs.LG cs.AI

    Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

    Authors: Subhash Kantamneni, Joshua Engels, Senthooran Rajamanoharan, Max Tegmark, Neel Nanda

    Abstract: Sparse autoencoders (SAEs) are a popular method for interpreting concepts represented in large language model (LLM) activations. However, there is a lack of evidence regarding the validity of their interpretations due to the lack of a ground truth for the concepts used by an LLM, and a growing number of works have presented problems with current SAEs. One alternative source of evidence would be de… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  4. arXiv:2501.19406  [pdf, other

    cs.LG

    Low-Rank Adapting Models for Sparse Autoencoders

    Authors: Matthew Chen, Joshua Engels, Max Tegmark

    Abstract: Sparse autoencoders (SAEs) decompose language model representations into a sparse set of linear latent vectors. Recent works have improved SAEs using language model gradients, but these techniques require many expensive backward passes during training and still cause a significant increase in cross entropy loss when SAE reconstructions are inserted into the model. In this work, we improve on these… ▽ More

    Submitted 27 May, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: Code available at https://github.com/matchten/LoRA-Models-for-SAEs

  5. arXiv:2410.19750  [pdf, other

    q-bio.NC cs.AI cs.LG

    The Geometry of Concepts: Sparse Autoencoder Feature Structure

    Authors: Yuxiao Li, Eric J. Michaud, David D. Baek, Joshua Engels, Xiaoqing Sun, Max Tegmark

    Abstract: Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: 1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man-woman-ki… ▽ More

    Submitted 30 March, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: 16 pages, 12 figures

    Journal ref: Entropy 2025, 27(4), 344

  6. arXiv:2410.14670  [pdf, other

    cs.LG

    Decomposing The Dark Matter of Sparse Autoencoders

    Authors: Joshua Engels, Logan Riggs, Max Tegmark

    Abstract: Sparse autoencoders (SAEs) are a promising technique for decomposing language model activations into interpretable linear features. However, current SAEs fall short of completely explaining model performance, resulting in "dark matter": unexplained variance in activations. This work investigates dark matter as an object of study in its own right. Surprisingly, we find that much of SAE dark matter… ▽ More

    Submitted 25 March, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Published in TMLR. Code at https://github.com/JoshEngels/SAE-Dark-Matter

  7. arXiv:2410.08201  [pdf, ps, other

    cs.LG

    Efficient Dictionary Learning with Switch Sparse Autoencoders

    Authors: Anish Mudide, Joshua Engels, Eric J. Michaud, Max Tegmark, Christian Schroeder de Witt

    Abstract: Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary to scale them up to very high width, posing a computational challenge. In this work, we introduce Switch Sparse Autoencoders, a novel SAE architecture aimed at reducin… ▽ More

    Submitted 2 June, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: Code available at https://github.com/amudide/switch_sae

  8. arXiv:2405.14860  [pdf, other

    cs.LG

    Not All Language Model Features Are One-Dimensionally Linear

    Authors: Joshua Engels, Eric J. Michaud, Isaac Liao, Wes Gurnee, Max Tegmark

    Abstract: Recent work has proposed that language models perform computation by manipulating one-dimensional representations of concepts ("features") in activation space. In contrast, we explore whether some language model representations may be inherently multi-dimensional. We begin by developing a rigorous definition of irreducible multi-dimensional features based on whether they can be decomposed into eit… ▽ More

    Submitted 26 February, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2025. Code and data at https://github.com/JoshEngels/MultiDimensionalFeatures

  9. arXiv:2402.00943  [pdf, other

    cs.DS cs.IR cs.LG

    Approximate Nearest Neighbor Search with Window Filters

    Authors: Joshua Engels, Benjamin Landrum, Shangdi Yu, Laxman Dhulipala, Julian Shun

    Abstract: We define and investigate the problem of $\textit{c-approximate window search}$: approximate nearest neighbor search where each point in the dataset has a numeric label, and the goal is to find nearest neighbors to queries within arbitrary label ranges. Many semantic search problems, such as image and document search with timestamp filters, or product search with cost filters, are natural examples… ▽ More

    Submitted 4 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Code available: https://github.com/JoshEngels/RangeFilteredANN

  10. arXiv:2312.03940  [pdf, ps, other

    cs.DS cs.DC cs.LG

    PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search

    Authors: Shangdi Yu, Joshua Engels, Yihao Huang, Julian Shun

    Abstract: This paper studies density-based clustering of point sets. These methods use dense regions of points to detect clusters of arbitrary shapes. In particular, we study variants of density peaks clustering, a popular type of algorithm that has been shown to work well in practice. Our goal is to cluster large high-dimensional datasets, which are prevalent in practice. Prior solutions are either sequent… ▽ More

    Submitted 3 June, 2025; v1 submitted 6 December, 2023; originally announced December 2023.

  11. BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Search and Recommendation Models on Commodity CPU Hardware

    Authors: Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels, David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger, Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava

    Abstract: Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limite… ▽ More

    Submitted 12 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: 6 pages, 5 tables, 3 figures. CIKM 2023 (Applied Research Track)

  12. arXiv:2210.15748  [pdf, other

    cs.DS

    DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries

    Authors: Joshua Engels, Benjamin Coleman, Vihan Lakshman, Anshumali Shrivastava

    Abstract: We study the problem of $\textit{vector set search}$ with $\textit{vector set queries}$. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are $\textit{sets}$ of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards th… ▽ More

    Submitted 26 October, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Code available, https://github.com/ThirdAIResearch/Dessert

  13. arXiv:2106.11565  [pdf, other

    cs.DS

    Practical Near Neighbor Search via Group Testing

    Authors: Joshua Engels, Benjamin Coleman, Anshumali Shrivastava

    Abstract: We present a new algorithm for the approximate near neighbor problem that combines classical ideas from group testing with locality-sensitive hashing (LSH). We reduce the near neighbor search problem to a group testing problem by designating neighbors as "positives," non-neighbors as "negatives," and approximate membership queries as group tests. We instantiate this framework using distance-sensit… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: For source code see https://github.com/JoshuaEng/FLINNG