Skip to main content

Showing 1–12 of 12 results for author: Neo, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.00051  [pdf, ps, other

    cs.CY

    Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Large Language Models

    Authors: Philip Quirke, Narmeen Oozeer, Chaithanya Bandi, Amir Abdullah, Jason Hoelscher-Obermaier, Jeff M. Phillips, Joshua Greaves, Clement Neo, Fazl Barez, Shriyash Upadhyay

    Abstract: This position paper argues that the prevailing trajectory toward ever larger, more expensive generalist foundation models controlled by a handful of big companies limits innovation and constrains progress. We challenge this approach by advocating for an "Expert Orchestration" framework as a superior alternative that democratizes LLM advancement. Our proposed framework intelligently selects from th… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures

  2. arXiv:2505.23556  [pdf, other

    cs.CL

    Understanding Refusal in Language Models with Sparse Autoencoders

    Authors: Wei Jie Yeo, Nirmalendu Prakash, Clement Neo, Roy Ka-Wei Lee, Erik Cambria, Ranjan Satapathy

    Abstract: Refusal is a key safety behavior in aligned language models, yet the internal mechanisms driving refusals remain opaque. In this work, we conduct a mechanistic study of refusal in instruction-tuned LLMs using sparse autoencoders to identify latent features that causally mediate refusal behaviors. We apply our method to two open-source chat models and intervene on refusal-related features to assess… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  3. arXiv:2503.12730  [pdf, ps, other

    cs.LG cs.AI cs.DB

    TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

    Authors: Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah

    Abstract: Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset, progressing from basic to advanced SQL operations, and train… ▽ More

    Submitted 6 June, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: 9 pages, 19 figures, 7 tables, 18 trained models

  4. arXiv:2412.06700  [pdf, other

    cs.CR

    Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection

    Authors: Alex Kantchelian, Casper Neo, Ryan Stevens, Hyungwon Kim, Zhaohao Fu, Sadegh Momeni, Birkett Huber, Elie Bursztein, Yanis Pavlidis, Senaka Buthpitiya, Martin Cochran, Massimiliano Poletto

    Abstract: We present Facade (Fast and Accurate Contextual Anomaly DEtection): a high-precision deep-learning-based anomaly detection system deployed at Google (a large technology company) as the last line of defense against insider threats since 2018. Facade is an innovative unsupervised action-context system that detects suspicious actions by considering the context surrounding each action, including relev… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Under review

  5. arXiv:2411.02645  [pdf, other

    cs.CR cs.LG

    Fine Grained Insider Risk Detection

    Authors: Birkett Huber, Casper Neo, Keiran Sampson, Alex Kantchelian, Brett Ksobiech, Yanis Pavlidis

    Abstract: We present a method to detect departures from business-justified workflows among support agents. Our goal is to assist auditors in identifying agent actions that cannot be explained by the activity within their surrounding context, where normal activity patterns are established from historical data. We apply our method to help audit millions of actions of over three thousand support agents. We c… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  6. arXiv:2410.09247  [pdf, other

    cs.LG cs.AI cs.CL

    Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

    Authors: Jacob Haimes, Cenny Wenner, Kunvar Thaman, Vassil Tashev, Clement Neo, Esben Kran, Jason Schreiber

    Abstract: The training data for many Large Language Models (LLMs) is contaminated with test data. This means that public benchmarks used to assess LLMs are compromised, suggesting a performance gap between benchmark scores and actual capabilities. Ideally, a private holdout set could be used to accurately verify scores. Unfortunately, such datasets do not exist for most benchmarks, and post-hoc construction… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  7. arXiv:2410.07149  [pdf, other

    cs.CV cs.LG

    Towards Interpreting Visual Information Processing in Vision-Language Models

    Authors: Clement Neo, Luke Ong, Philip Torr, Mor Geva, David Krueger, Fazl Barez

    Abstract: Vision-Language Models (VLMs) are powerful tools for processing and understanding text and images. We study the processing of visual tokens in the language model component of LLaVA, a prominent VLM. Our approach focuses on analyzing the localization of object information, the evolution of visual token representations across layers, and the mechanism of integrating visual information for prediction… ▽ More

    Submitted 26 April, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Published at ICLR 2025

  8. arXiv:2407.01082  [pdf, ps, other

    cs.CL

    Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

    Authors: Minh Nhat Nguyen, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, Ravid Shwartz-Ziv

    Abstract: Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. Popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and diversity, especially at higher temperatures which lead to incoherent or repetitive outputs. We propose min-p sampling, a dynamic truncation method that adjusts t… ▽ More

    Submitted 27 June, 2025; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Oral presentation at ICLR 2025. Camera-ready version available at https://iclr.cc/virtual/2025/poster/30358

    Journal ref: In Proceedings of the 2025 International Conference on Learning Representations (ICLR), 2025

  9. arXiv:2402.15055  [pdf, other

    cs.CL cs.AI cs.LG

    Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

    Authors: Clement Neo, Shay B. Cohen, Fazl Barez

    Abstract: Understanding the inner workings of large language models (LLMs) is crucial for advancing their theoretical foundations and real-world applications. While the attention mechanism and multi-layer perceptrons (MLPs) have been studied independently, their interactions remain largely unexplored. This study investigates how attention heads and next-token neurons interact in LLMs to predict new words. W… ▽ More

    Submitted 23 October, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

  10. arXiv:2402.02619  [pdf, other

    cs.LG cs.CL

    Arithmetic in Transformers Explained

    Authors: Philip Quirke, Clement Neo, Fazl Barez

    Abstract: While recent work has shown transformers can learn addition, previous models exhibit poor prediction accuracy and are limited to small numbers. Furthermore, the relationship between single-task and multitask arithmetic capabilities remains unexplored. In this work, we analyze 44 autoregressive transformer models trained on addition, subtraction, or both. These include 16 addition-only models, 2 su… ▽ More

    Submitted 13 February, 2025; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures, 4 tables

  11. arXiv:2310.08164  [pdf, other

    cs.LG

    Interpreting Learned Feedback Patterns in Large Language Models

    Authors: Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, David Krueger, Philip Torr, Fazl Barez

    Abstract: Reinforcement learning from human feedback (RLHF) is widely used to train large language models (LLMs). However, it is unclear whether LLMs accurately learn the underlying preferences in human feedback data. We coin the term \textit{Learned Feedback Pattern} (LFP) for patterns in an LLM's activations learned during RLHF that improve its performance on the fine-tuning task. We hypothesize that LLMs… ▽ More

    Submitted 19 August, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: 19 pages, 8 figures

  12. arXiv:2209.15585  [pdf, other

    physics.ao-ph cs.LG

    Cloud Classification with Unsupervised Deep Learning

    Authors: Takuya Kurihana, Ian Foster, Rebecca Willett, Sydney Jenkins, Kathryn Koenig, Ruby Werman, Ricardo Barros Lourenco, Casper Neo, Elisabeth Moyer

    Abstract: We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detail… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    Comments: 5 pages, 6 figures, Proceedings for Climate Informatics Workshop 2019 Paris