Skip to main content

Showing 1–17 of 17 results for author: Bergen, L

.
  1. arXiv:2505.24033  [pdf, ps, other

    cs.CL cs.CE cs.LG

    The Surprising Soupability of Documents in State Space Models

    Authors: Yasaman Jafari, Zixian Wang, Leon Bergen, Taylor Berg-Kirkpatrick

    Abstract: We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post-hoc to support downstream reasoning. Inspired by model souping, we propose a strategy where documents are encoded independently and their representations are pooled -- via simple operations like averaging -- into a single context state. This approach, which we call document souping, enables modular en… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  2. arXiv:2505.03997  [pdf, other

    cs.LG cs.CL

    Quiet Feature Learning in Algorithmic Tasks

    Authors: Prudhviraj Naidu, Zixian Wang, Leon Bergen, Ramamohan Paturi

    Abstract: We train Transformer-based language models on ten foundational algorithmic tasks and observe pronounced phase transitions in their loss curves that deviate from established power-law scaling trends. Over large ranges of compute, the validation loss barely improves, then abruptly decreases. Probing the models' internal representations reveals the learning of quiet features during the stagnant phase… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  3. arXiv:2504.18736  [pdf, other

    cs.CL

    EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers

    Authors: Jianyou Wang, Weili Cao, Kaicheng Wang, Xiaoyue Wang, Ashish Dalvi, Gino Prasad, Qishan Liang, Hsuan-lin Her, Ming Wang, Qin Yang, Gene W. Yeo, David E. Neal, Maxim Khan, Christopher D. Rosin, Ramamohan Paturi, Leon Bergen

    Abstract: We study the task of automatically finding evidence relevant to hypotheses in biomedical papers. Finding relevant evidence is an important step when researchers investigate scientific hypotheses. We introduce EvidenceBench to measure models performance on this task, which is created by a novel pipeline that consists of hypothesis generation and sentence-by-sentence annotation of biomedical papers… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  4. arXiv:2504.03101  [pdf, other

    cs.CL

    Single-Pass Document Scanning for Question Answering

    Authors: Weili Cao, Jianyou Wang, Youze Zheng, Longtian Bao, Qirui Zheng, Taylor Berg-Kirkpatrick, Ramamohan Paturi, Leon Bergen

    Abstract: Handling extremely large documents for question answering is challenging: chunk-based embedding methods often lose track of important global context, while full-context transformers can be prohibitively expensive for hundreds of thousands of tokens. We propose a single-pass document scanning approach that processes the entire text in linear time, preserving global coherence while deciding which se… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  5. arXiv:2411.18831  [pdf, other

    cs.CL

    Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark

    Authors: Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen

    Abstract: Systems that answer questions by reviewing the scientific literature are becoming increasingly feasible. To draw reliable conclusions, these systems should take into account the quality of available evidence, placing more weight on studies that use a valid methodology. We present a benchmark for measuring the methodological strength of biomedical papers, drawing on the risk-of-bias framework used… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  6. arXiv:2411.00412  [pdf, other

    cs.LG cs.AI cs.CL

    Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

    Authors: Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu

    Abstract: Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but, even with domain-specific fine-tuning, often produce hallucinations for complex ones. While integrating LLMs with tools can mitigate this reliability issue, models finetuned on tool usage only often over-rely on them, incurring unnecessary costs from resource-intensive scientific tools even f… ▽ More

    Submitted 5 February, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

    Comments: 32 pages, 16 figures

    ACM Class: I.2.6; I.2.7

  7. arXiv:2410.16701  [pdf, other

    cs.LG

    ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models

    Authors: Veeramakali Vignesh Manivannan, Yasaman Jafari, Srikar Eranky, Spencer Ho, Rose Yu, Duncan Watson-Parris, Yian Ma, Leon Bergen, Taylor Berg-Kirkpatrick

    Abstract: The use of Large Language Models (LLMs) in climate science has recently gained significant attention. However, a critical issue remains: the lack of a comprehensive evaluation framework capable of assessing the quality and scientific validity of model outputs. To address this issue, we develop ClimaGen (Climate QA Generator), an adaptive learning framework that generates question-answer pairs from… ▽ More

    Submitted 9 March, 2025; v1 submitted 22 October, 2024; originally announced October 2024.

    Comments: Accepted to ICLR 2025

    Journal ref: ICLR 2025

  8. arXiv:2405.15092  [pdf, other

    cs.AI cs.CL

    Dissociation of Faithful and Unfaithful Reasoning in LLMs

    Authors: Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen

    Abstract: Large language models (LLMs) often improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. We investigate how LLMs recover from errors in Chain of Thought. Through analysis of error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, which occurs when models arrive at the correct answer despite invalid re… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: code published at https://github.com/CoTErrorRecovery/CoTErrorRecovery

  9. arXiv:2402.16200  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    IR2: Information Regularization for Information Retrieval

    Authors: Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Weili Cao, Ramamohan Paturi, Leon Bergen

    Abstract: Effective information retrieval (IR) in settings with limited training data, particularly for complex queries, remains a challenging task. This paper introduces IR2, Information Regularization for Information Retrieval, a technique for reducing overfitting during synthetic data generation. This approach, representing a novel application of regularization techniques in synthetic data creation for I… ▽ More

    Submitted 1 April, 2025; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted by LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation

  10. arXiv:2402.14151  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives

    Authors: Xiaoyue Wang, Jianyou Wang, Weili Cao, Kaicheng Wang, Ramamohan Paturi, Leon Bergen

    Abstract: We present the Benchmark of Information Retrieval (IR) tasks with Complex Objectives (BIRCO). BIRCO evaluates the ability of IR systems to retrieve documents given multi-faceted user objectives. The benchmark's complexity and compact size make it suitable for evaluating large language model (LLM)-based information retrieval systems. We present a modular framework for investigating factors that may… ▽ More

    Submitted 3 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  11. arXiv:2310.04678  [pdf, other

    cs.IR cs.CL

    DORIS-MAE: Scientific Document Retrieval using Multi-level Aspect-based Queries

    Authors: Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Prudhviraj Naidu, Leon Bergen, Ramamohan Paturi

    Abstract: In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high cost and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect… ▽ More

    Submitted 28 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in NeurIPS 2023 Datasets and Benchmarks Track

  12. arXiv:2112.00578  [pdf, other

    cs.CL cs.LG

    Systematic Generalization with Edge Transformers

    Authors: Leon Bergen, Timothy J. O'Donnell, Dzmitry Bahdanau

    Abstract: Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector state… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted as a conference paper at NeurIPS 2021

  13. arXiv:2104.06645  [pdf, other

    cs.CL

    Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention

    Authors: Leon Bergen, Dzmitry Bahdanau, Timothy J. O'Donnell

    Abstract: We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics. Our model builds on the neurosymbolic approach of Mao et al. (2019), learning to ground objects in the CLEVR dataset (Johnson et al., 2017) using a novel parallel attention mechanism. The model achieves state of the art performance on visual question answering, learni… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

  14. arXiv:2010.13870  [pdf, other

    cs.CL

    Word Frequency Does Not Predict Grammatical Knowledge in Language Models

    Authors: Charles Yu, Ryan Sie, Nico Tedeschi, Leon Bergen

    Abstract: Neural language models learn, to varying degrees of accuracy, the grammatical properties of natural languages. In this work, we investigate whether there are systematic sources of variation in the language models' accuracy. Focusing on subject-verb agreement and reflexive anaphora, we find that certain nouns are systematically understood better than others, an effect which is robust across grammat… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  15. arXiv:2004.05357  [pdf, ps, other

    cond-mat.mtrl-sci cond-mat.str-el

    Unusual magnetoelectric effect in paramagnetic rare-earth langasite

    Authors: L. Weymann, L. Bergen, Th. Kain, Anna Pimenov, A. Shuvaev, E. Constable, D. Szaller, A. Pimenov, B. V. Mill, A. M. Kuzmenko, V. Yu. Ivanov, N. V. Kostyuchenko, A. I. Popov, A. K. Zvezdin, A. A. Mukhin, M. Mostovoy

    Abstract: Violation of time reversal and spatial inversion symmetries has profound consequences for elementary particles and cosmology. Spontaneous breaking of these symmetries at phase transitions gives rise to unconventional physical phenomena in condensed matter systems, such as ferroelectricity induced by magnetic spirals, electromagnons, non-reciprocal propagation of light and spin waves, and the linea… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

    Comments: 8 pages, 3 figures

    Journal ref: npj Quantum Mater. 5, 61 (2020)

  16. arXiv:1710.11350  [pdf, other

    cs.CL

    Grammar Induction for Minimalist Grammars using Variational Bayesian Inference : A Technical Report

    Authors: Eva Portelance, Amelia Bruno, Daniel Harasim, Leon Bergen, Timothy J. O'Donnell

    Abstract: The following technical report presents a formal approach to probabilistic minimalist grammar parameter estimation. We describe a formalization of a minimalist grammar. We then present an algorithm for the application of variational Bayesian inference to this formalization.

    Submitted 28 August, 2019; v1 submitted 31 October, 2017; originally announced October 2017.

  17. arXiv:0708.2096  [pdf, ps, other

    quant-ph

    Non-uniform mixing of quantum walk on cycles

    Authors: William Adamczak, Kevin Andrew, Leon Bergen, Dillon Ethier, Peter Hernberg, Jennifer Lin, Christino Tamon

    Abstract: A classical lazy random walk on cycles is known to mix to the uniform distribution. In contrast, we show that a continuous-time quantum walk on cycles exhibit strong non-uniform mixing properties. Our results include the following: - The instantaneous distribution of a quantum walk on most even-length cycles is never uniform. - The average distribution of a quantum walk on any Abelian circulan… ▽ More

    Submitted 15 August, 2007; originally announced August 2007.

    Comments: 12 pages, 1 figure

    Journal ref: International Journal of Quantum Information 5(6):781-793, 2007