Skip to main content

Showing 1–19 of 19 results for author: Kuehl, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.07096  [pdf, other

    cs.CL

    OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

    Authors: Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz, Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt, Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Rock Yuren Pang, Pang Wei Koh , et al. (6 additional authors not shown)

    Abstract: We present OLMoTrace, the first system that traces the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace finds and shows verbatim matches between segments of language model output and documents in the training text corpora. Powered by an extended version of infini-gram (Liu et al., 2024), our system returns tracing results within a few second… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Under submission at ACL 2025 demo track

  2. arXiv:2407.16148  [pdf, other

    cs.CL

    CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

    Authors: Chao-Chun Hsu, Erin Bransom, Jenna Sparks, Bailey Kuehl, Chenhao Tan, David Wadden, Lucy Lu Wang, Aakanksha Naik

    Abstract: Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical ca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 2024 ACL Findings

  3. arXiv:2406.08446  [pdf, other

    cs.CL cs.AI

    OLMES: A Standard for Language Model Evaluations

    Authors: Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi

    Abstract: Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models can be particularly challenging, as choices of how a model is evaluated on a task can lead to large changes in measured performance. There is no common standard setup, so different models are evaluated on the same tasks in different ways, leading to cla… ▽ More

    Submitted 11 February, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Findings of NAACL 2025

  4. arXiv:2403.03866  [pdf, other

    cs.CL

    KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions

    Authors: Fangyuan Xu, Kyle Lo, Luca Soldaini, Bailey Kuehl, Eunsol Choi, David Wadden

    Abstract: Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. To evaluate the capabilities of current LLMs on this task, we construct KIWI, a dataset of knowledge-intensive writing instructions in the scientifi… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  5. arXiv:2311.09736  [pdf, other

    cs.CL

    CARE: Extracting Experimental Findings From Clinical Literature

    Authors: Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope

    Abstract: Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the real-world complexity and nuance required. Focusing on biomedicine, this work presents CARE -- a new IE dataset for the task of extracting clinical findings. We de… ▽ More

    Submitted 24 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: To appear at NAACL Findings 2024

  6. arXiv:2306.12587  [pdf, other

    cs.CL

    ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

    Authors: Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey

    Abstract: We introduce the task of automatically revising scientific papers based on peer feedback and release ARIES, a dataset of review comments and their corresponding paper edits. The data is drawn from real reviewer-author interactions from computer science, and we provide labels linking each reviewer comment to the specific paper edits made by the author in response. We automatically create a high-pre… ▽ More

    Submitted 5 August, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: ACL 2024, 10 pages, 2 figures

  7. arXiv:2305.13693  [pdf, other

    cs.CL

    Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

    Authors: Lucy Lu Wang, Yulia Otmakhova, Jay DeYoung, Thinh Hung Truong, Bailey E. Kuehl, Erin Bransom, Byron C. Wallace

    Abstract: Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior work has shown that rather than performing the task, models may exploit shortcuts that are difficult to detect using standard n-gram similarity metrics such as… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023; Github: https://github.com/allenai/mslr-annotated-dataset

  8. arXiv:2305.00366  [pdf, other

    cs.CL cs.IR cs.LG

    S2abEL: A Dataset for Entity Linking from Scientific Tables

    Authors: Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey

    Abstract: Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific ta… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

  9. arXiv:2303.14334  [pdf, other

    cs.HC cs.AI cs.CL

    The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

    Authors: Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney , et al. (30 additional authors not shown)

    Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan… ▽ More

    Submitted 23 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  10. arXiv:2301.13298  [pdf, other

    cs.CL

    LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

    Authors: Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo

    Abstract: While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers on long-form summarization, we first shed light on current human evaluation practices surrounding long-form summaries. We find that 73% of t… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: EACL 2023 camera ready. Code and data can be found in https://github.com/martiansideofthemoon/longeval-summarization

  11. arXiv:2301.10140  [pdf, other

    cs.DL cs.CL

    The Semantic Scholar Open Data Platform

    Authors: Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin , et al. (23 additional authors not shown)

    Abstract: The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF conte… ▽ More

    Submitted 25 April, 2025; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: 8 pages, 6 figures

  12. arXiv:2210.13777  [pdf, other

    cs.CL cs.AI

    SciFact-Open: Towards open-domain scientific claim verification

    Authors: David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, Hannaneh Hajishirzi

    Abstract: While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature. Moving to this open-domain evaluation setting, however, poses unique challenges; in particular, it is infeasible to exhaustively annotate all evidence d… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP Findings 2022. GitHub: https://github.com/dwadden/scifact-open-2022

  13. arXiv:2205.06982  [pdf, other

    cs.CL cs.AI cs.HC

    ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

    Authors: Sonia K. Murthy, Kyle Lo, Daniel King, Chandra Bhagavatula, Bailey Kuehl, Sophie Johnson, Jonathan Borchardt, Daniel S. Weld, Tom Hope, Doug Downey

    Abstract: Systems that can automatically define unfamiliar terms hold the promise of improving the accessibility of scientific texts, especially for readers who may lack prerequisite background knowledge. However, current systems assume a single "best" description per concept, which fails to account for the many potentially useful ways a concept can be described. We present ACCoRD, an end-to-end system tack… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

  14. arXiv:2203.12990  [pdf, other

    cs.CL

    Generating Scientific Claims for Zero-Shot Scientific Fact Checking

    Authors: Dustin Wright, David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Isabelle Augenstein, Lucy Lu Wang

    Abstract: Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of training data, as annotation requires domain expertise. To address this challenge, we propose scientific claim generation, the task of generating one or more atomic and verifiable claims from scientific sentences, and demonstrate its usefulness in zero-shot fact checkin… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022; 13 pages, 3 figures, 8 tables

  15. arXiv:2108.13751  [pdf, other

    cs.CL cs.HC cs.IR

    A Search Engine for Discovery of Scientific Challenges and Directions

    Authors: Dan Lahav, Jon Saad Falcon, Bailey Kuehl, Sophie Johnson, Sravanthi Parasa, Noam Shomron, Duen Horng Chau, Diyi Yang, Eric Horvitz, Daniel S. Weld, Tom Hope

    Abstract: Keeping track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of important knowledge. In biomedicine, this directly impacts human lives. To address this problem, we present a novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge disco… ▽ More

    Submitted 19 January, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: AAAI 2022

    Journal ref: AAAI 2022

  16. arXiv:2107.00414  [pdf, other

    cs.CL

    MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

    Authors: Anne Lauscher, Brandon Ko, Bailey Kuehl, Sophie Johnson, David Jurgens, Arman Cohan, Kyle Lo

    Abstract: Citation context analysis (CCA) is an important task in natural language processing that studies how and why scholars discuss each others' work. Despite decades of study, traditional frameworks for CCA have largely relied on overly-simplistic assumptions of how authors cite, which ignore several important phenomena. For instance, scholarly papers often contain rich discussions of cited work that s… ▽ More

    Submitted 31 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

  17. arXiv:2106.00676  [pdf, other

    cs.CL cs.CV

    VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

    Authors: Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey

    Abstract: Accurately extracting structured content from PDFs is a critical first step for NLP over scientific papers. Recent work has improved extraction accuracy by incorporating elementary layout information, e.g., each token's 2D position on the page, into language model pretraining. We introduce new methods that explicitly model VIsual LAyout (VILA) groups, i.e., text lines or text blocks, to further im… ▽ More

    Submitted 5 January, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: To appear in TACL 2022. The arXiv version is a pre-MIT Press publication version. (17 pages, 5 figures, 9 tables)

  18. arXiv:2105.00076  [pdf, other

    cs.DL cs.HC

    Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users

    Authors: Lucy Lu Wang, Isabel Cachola, Jonathan Bragg, Evie Yu-Yen Cheng, Chelsea Haupt, Matt Latzke, Bailey Kuehl, Madeleine van Zuylen, Linda Wagner, Daniel S. Weld

    Abstract: The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by assessing the accessibility of 11,397 PDFs published 2010--2019 sampled across various fields of study, finding that only 2.4% of these PDFs satisfy all of our defined accessibility criteria. We introduce… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

    Comments: 44 pages, 11 figures, 10 tables, 4 appendices; accessible PDF is available at https://llwang.net/publications/2021_wang_scia11y.pdf

  19. arXiv:2104.06486  [pdf, other

    cs.CL cs.AI cs.LG

    MS2: Multi-Document Summarization of Medical Studies

    Authors: Jay DeYoung, Iz Beltagy, Madeleine van Zuylen, Bailey Kuehl, Lucy Lu Wang

    Abstract: To assess the effectiveness of any medical intervention, researchers must conduct a time-intensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature. Th… ▽ More

    Submitted 22 November, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: 8 pages of content, 20 pages including references and appendix. See https://github.com/allenai/ms2/ for code, https://ai2-s2-ms2.s3-us-west-2.amazonaws.com/ms_data_2021-04-12.zip for data (1.8G, zipped) Published in EMNLP 2021 @ https://aclanthology.org/2021.emnlp-main.594/