Skip to main content

Showing 1–12 of 12 results for author: Hendryx, S

.
  1. arXiv:2502.08859  [pdf, other

    cs.AI cs.CL

    EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

    Authors: Clinton J. Wang, Dean Lee, Cristina Menghini, Johannes Mols, Jack Doughty, Adam Khoja, Jayson Lynch, Sean Hendryx, Summer Yue, Dan Hendrycks

    Abstract: As language models master existing reasoning benchmarks, we need new challenges to evaluate their cognitive frontiers. Puzzle-solving events are rich repositories of challenging multimodal problems that test a wide range of advanced reasoning and knowledge capabilities, making them a unique testbed for evaluating frontier language models. We introduce EnigmaEval, a dataset of problems and solution… ▽ More

    Submitted 14 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  2. arXiv:2501.01290  [pdf, other

    cs.CL

    ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark

    Authors: Vaskar Nath, Pranav Raja, Claire Yoon, Sean Hendryx

    Abstract: Despite recent advances in AI, the development of systems capable of executing complex, multi-step reasoning tasks involving multiple tools remains a significant challenge. Current benchmarks fall short in capturing the real-world complexity of tool-use reasoning, where verifying the correctness of not only the final answer but also the intermediate steps is important for evaluation, development,… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  3. arXiv:2410.13886  [pdf, other

    cs.CR cs.LG

    Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

    Authors: Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang

    Abstract: For safety reasons, large language models (LLMs) are trained to refuse harmful user instructions, such as assisting dangerous activities. We study an open question in this work: does the desired safety refusal, typically enforced in chat contexts, generalize to non-chat and agentic use cases? Unlike chatbots, LLM agents equipped with general-purpose tools, such as web browsers and mobile devices,… ▽ More

    Submitted 21 October, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

  4. arXiv:2410.03717  [pdf, other

    cs.CL cs.AI cs.LG

    Revisiting the Superficial Alignment Hypothesis

    Authors: Mohit Raghavendra, Vaskar Nath, Sean Hendryx

    Abstract: The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training, while post-training is about giving a model the right style and format. We re-examine these claims by empirically studying the scaling behavior of post-training with increasing finetuning examples and evaluating them using objective task-specific standardized b… ▽ More

    Submitted 27 September, 2024; originally announced October 2024.

  5. arXiv:2409.03733  [pdf, other

    cs.LG cs.AI cs.CL

    Planning In Natural Language Improves LLM Search For Code Generation

    Authors: Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang

    Abstract: While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversi… ▽ More

    Submitted 18 October, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  6. arXiv:2409.00238  [pdf, other

    cs.CL cs.CV

    Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data

    Authors: Spencer Whitehead, Jacob Phillips, Sean Hendryx

    Abstract: Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability. The ability to automatically detect these errors is important for mitigating them, but has been less explored and existing efforts do not localize hallucinations, instead framing this as a classification task. In this work, we first pose multimodal hallucination detection as a sequence labeling… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  7. arXiv:2407.13887  [pdf, other

    cs.CL

    Learning Goal-Conditioned Representations for Language Reward Models

    Authors: Vaskar Nath, Dylan Slack, Jeff Da, Yuntao Ma, Hugh Zhang, Spencer Whitehead, Sean Hendryx

    Abstract: Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive,… ▽ More

    Submitted 23 October, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  8. arXiv:2405.00332  [pdf, other

    cs.CL cs.AI cs.LG

    A Careful Examination of Large Language Model Performance on Grade School Arithmetic

    Authors: Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Charlotte Zhuang, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

    Abstract: Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1… ▽ More

    Submitted 22 November, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 2024 NeurIPS Camera Ready (Datasets and Benchmarks Track)

  9. arXiv:2401.12129  [pdf, other

    cs.CV cs.LG

    Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy

    Authors: Will LeVine, Benjamin Pikus, Jacob Phillips, Berk Norman, Fernando Amat Gil, Sean Hendryx

    Abstract: As deep neural networks become adopted in high-stakes domains, it is crucial to identify when inference inputs are Out-of-Distribution (OOD) so that users can be alerted of likely drops in performance and calibration despite high confidence -- ultimately to know when networks' decisions (and their uncertainty in those decisions) should be trusted. In this paper we introduce Ablated Learned Tempera… ▽ More

    Submitted 14 April, 2025; v1 submitted 22 January, 2024; originally announced January 2024.

  10. arXiv:2311.14743  [pdf, other

    cs.CL cs.LG

    A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift

    Authors: Will LeVine, Benjamin Pikus, Anthony Chen, Sean Hendryx

    Abstract: Foundation models, specifically Large Language Models (LLMs), have lately gained wide-spread attention and adoption. Reinforcement Learning with Human Feedback (RLHF) involves training a reward model to capture desired behaviors, which is then used to align LLM's. These reward models are additionally used at inference-time to estimate LLM responses' adherence to those desired behaviors. However, t… ▽ More

    Submitted 24 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  11. arXiv:2109.00150  [pdf, other

    cs.LG

    Federated Reconnaissance: Efficient, Distributed, Class-Incremental Learning

    Authors: Sean M. Hendryx, Dharma Raj KC, Bradley Walls, Clayton T. Morrison

    Abstract: We describe federated reconnaissance, a class of learning problems in which distributed clients learn new concepts independently and communicate that knowledge efficiently. In particular, we propose an evaluation framework and methodological baseline for a system in which each client is expected to learn a growing set of classes and communicate knowledge of those classes efficiently with other cli… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

  12. arXiv:1912.06290  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Meta-Learning Initializations for Image Segmentation

    Authors: Sean M. Hendryx, Andrew B. Leach, Paul D. Hein, Clayton T. Morrison

    Abstract: We extend first-order model agnostic meta-learning algorithms (including FOMAML and Reptile) to image segmentation, present a novel neural network architecture built for fast learning which we call EfficientLab, and leverage a formal definition of the test error of meta-learning algorithms to decrease error on out of distribution tasks. We show state of the art results on the FSS-1000 dataset by m… ▽ More

    Submitted 7 May, 2020; v1 submitted 12 December, 2019; originally announced December 2019.