Skip to main content

Showing 1–50 of 115 results for author: Wallace, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.03022  [pdf, other

    cs.CL cs.AI

    The Dual-Route Model of Induction

    Authors: Sheridan Feucht, Eric Todd, Byron Wallace, David Bau

    Abstract: Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we introduce a new type of induction head: concept-level induction heads, which copy entire lexical units instead of individual tokens. Concept induction heads learn to attend to the ends of multi-token words throughout training, working in paralle… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 36 pages, 39 figures. Code and data at https://dualroute.baulab.info

    ACM Class: I.2.7

  2. arXiv:2502.13319  [pdf, other

    cs.CL

    Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare

    Authors: Hiba Ahsan, Arnab Sen Sharma, Silvio Amir, David Bau, Byron C. Wallace

    Abstract: We know from prior work that LLMs encode social biases, and that this manifests in clinical tasks. In this work we adopt tools from mechanistic interpretability to unveil sociodemographic representations and biases within LLMs in the context of healthcare. Specifically, we ask: Can we identify activations within LLMs that encode sociodemographic information (e.g., gender, race)? We find that gende… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  3. arXiv:2502.07963  [pdf, other

    cs.CL cs.AI

    Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?

    Authors: Hye Sun Yun, Karen Y. C. Zhang, Ramez Kouzy, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace

    Abstract: Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and ma… ▽ More

    Submitted 5 May, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: 22 pages, 12 figures, 4 tables, CHIL 2025

  4. arXiv:2502.06659  [pdf, other

    cs.CL

    Who Taught You That? Tracing Teachers in Model Distillation

    Authors: Somin Wadhwa, Chantal Shaib, Silvio Amir, Byron C. Wallace

    Abstract: Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a students' teacher based on its outputs? Such "footprints" left by teacher LLMs would be interesting artifacts. Beyond this, reliable teacher inference may have practical implications as actors seek to dis… ▽ More

    Submitted 20 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Findings of ACL 2025

  5. arXiv:2411.16638  [pdf, other

    cs.CL cs.AI

    Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation

    Authors: Sanjana Ramprasad, Byron C. Wallace

    Abstract: Modern LLMs can now produce highly readable abstractive summaries, to the point where traditional automated metrics for evaluating summary quality, such as ROUGE, have become saturated. However, LLMs still sometimes introduce unwanted content into summaries, i.e., information inconsistent with or unsupported by their source. Measuring the occurrence of these often subtle ``hallucinations'' automat… ▽ More

    Submitted 28 November, 2024; v1 submitted 25 November, 2024; originally announced November 2024.

  6. arXiv:2411.05697  [pdf, other

    eess.IV cs.DC cs.LG

    IPMN Risk Assessment under Federated Learning Paradigm

    Authors: Hongyi Pan, Ziliang Hong, Gorkem Durak, Elif Keles, Halil Ertugrul Aktas, Yavuz Taktak, Alpay Medetalibeyoglu, Zheyuan Zhang, Yury Velichko, Concetto Spampinato, Ivo Schoots, Marco J. Bruno, Pallavi Tiwari, Candice Bolan, Tamas Gonda, Frank Miller, Rajesh N. Keswani, Michael B. Wallace, Ziyue Xu, Ulas Bagci

    Abstract: Accurate classification of Intraductal Papillary Mucinous Neoplasms (IPMN) is essential for identifying high-risk cases that require timely intervention. In this study, we develop a federated learning framework for multi-center IPMN classification utilizing a comprehensive pancreas MRI dataset. This dataset includes 652 T1-weighted and 655 T2-weighted MRI images, accompanied by corresponding IPMN… ▽ More

    Submitted 22 January, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: This paper has been accepted to ISBI 2025

  7. arXiv:2410.22530  [pdf, other

    eess.IV cs.CV cs.DC

    Adaptive Aggregation Weights for Federated Segmentation of Pancreas MRI

    Authors: Hongyi Pan, Gorkem Durak, Zheyuan Zhang, Yavuz Taktak, Elif Keles, Halil Ertugrul Aktas, Alpay Medetalibeyoglu, Yury Velichko, Concetto Spampinato, Ivo Schoots, Marco J. Bruno, Rajesh N. Keswani, Pallavi Tiwari, Candice Bolan, Tamas Gonda, Michael G. Goggins, Michael B. Wallace, Ziyue Xu, Ulas Bagci

    Abstract: Federated learning (FL) enables collaborative model training across institutions without sharing sensitive data, making it an attractive solution for medical imaging tasks. However, traditional FL methods, such as Federated Averaging (FedAvg), face difficulties in generalizing across domains due to variations in imaging protocols and patient demographics across institutions. This challenge is part… ▽ More

    Submitted 6 May, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: This paper has been accepted to ISBI 2025

  8. arXiv:2410.08507  [pdf, ps, other

    cs.RO

    Decentralized Uncertainty-Aware Active Search with a Team of Aerial Robots

    Authors: Wennie Tabib, John Stecklein, Caleb McDowell, Kshitij Goel, Felix Jonathan, Abhishek Rathod, Meghan Kokoski, Edsel Burkholder, Brian Wallace, Luis Ernesto Navarro-Serment, Nikhil Angad Bakshi, Tejus Gupta, Norman Papernick, David Guttendorf, Erik E. Kahn, Jessica Kasemer, Jesse Holdaway, Jeff Schneider

    Abstract: Rapid search and rescue is critical to maximizing survival rates following natural disasters. However, these efforts are challenged by the need to search large disaster zones, lack of reliability in the communications infrastructure, and a priori unknown numbers of objects of interest (OOIs), such as injured survivors. Aerial robots are increasingly being deployed for search and rescue due to thei… ▽ More

    Submitted 10 June, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: accepted at ISER 2025

  9. arXiv:2407.19284  [pdf, other

    eess.IV cs.CV

    Optimizing Synthetic Data for Enhanced Pancreatic Tumor Segmentation

    Authors: Linkai Peng, Zheyuan Zhang, Gorkem Durak, Frank H. Miller, Alpay Medetalibeyoglu, Michael B. Wallace, Ulas Bagci

    Abstract: Pancreatic cancer remains one of the leading causes of cancer-related mortality worldwide. Precise segmentation of pancreatic tumors from medical images is a bottleneck for effective clinical decision-making. However, achieving a high accuracy is often limited by the small size and availability of real patient data for training deep learning models. Recent approaches have employed synthetic data g… ▽ More

    Submitted 1 October, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: MICCAI Workshop AIPAD 2024

  10. arXiv:2407.14561  [pdf, other

    cs.LG cs.AI

    NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

    Authors: Jaden Fiotto-Kaufman, Alexander R. Loftus, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, David Bau

    Abstract: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU re… ▽ More

    Submitted 1 April, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Code at https://nnsight.net

  11. arXiv:2407.09429  [pdf, other

    cs.CL

    Open (Clinical) LLMs are Sensitive to Instruction Phrasings

    Authors: Alberto Mario Ceballos Arroyo, Monica Munnangi, Jiuding Sun, Karen Y. C. Zhang, Denis Jered McInerney, Byron C. Wallace, Silvio Amir

    Abstract: Instruction-tuned Large Language Models (LLMs) can perform a wide range of tasks given natural language instructions to do so, but they are sensitive to how such instructions are phrased. This issue is especially concerning in healthcare, as clinicians are unlikely to be experienced prompt engineers and the potential consequences of inaccurate outputs are heightened in this domain. This raises a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: To appear at BioNLP, ACL 2024

  12. arXiv:2407.00211  [pdf, other

    cs.CL

    Detection and Measurement of Syntactic Templates in Generated Text

    Authors: Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace

    Abstract: Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic templates and show that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts. W… ▽ More

    Submitted 6 October, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: EMNLP 2024

  13. arXiv:2406.20086  [pdf, other

    cs.CL cs.LG

    Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

    Authors: Sheridan Feucht, David Atkinson, Byron Wallace, David Bau

    Abstract: LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantical… ▽ More

    Submitted 11 October, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: 13 pages, 14 figures. Code and data at https://footprints.baulab.info/

    ACM Class: I.2.7

  14. arXiv:2406.14511  [pdf, other

    cs.CL

    Investigating Mysteries of CoT-Augmented Distillation

    Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

    Abstract: Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a s… ▽ More

    Submitted 27 September, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2024

  15. arXiv:2406.09330  [pdf, other

    cs.CL

    Learning from Natural Language Explanations for Generalizable Entity Matching

    Authors: Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C. Wallace, Chris Kong

    Abstract: Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted L… ▽ More

    Submitted 27 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2024

  16. arXiv:2405.12367  [pdf, other

    eess.IV cs.CV

    Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

    Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

    Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More

    Submitted 24 October, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Peer-reviewer version

  17. arXiv:2405.01686  [pdf, other

    cs.CL cs.AI

    Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

    Authors: Hye Sun Yun, David Pogrebitskiy, Iain J. Marshall, Byron C. Wallace

    Abstract: Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individu… ▽ More

    Submitted 24 July, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 25 pages, 7 figures, 6 tables, MLHC 2024

  18. arXiv:2404.00152  [pdf, other

    cs.CL

    On-the-fly Definition Augmentation of LLMs for Biomedical NER

    Authors: Monica Munnangi, Sergey Feldman, Byron C Wallace, Silvio Amir, Tom Hope, Aakanksha Naik

    Abstract: Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to p… ▽ More

    Submitted 23 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: To appear at NAACL 2024 (Main)

  19. arXiv:2403.00553  [pdf, other

    cs.CL

    Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

    Authors: Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: The diversity across outputs generated by LLMs shapes perception of their quality and utility. High lexical diversity is often desirable, but there is no standard method to measure this property. Templated answer structures and ``canned'' responses across different documents are readily noticeable, but difficult to visualize across large corpora. This work aims to standardize measurement of text d… ▽ More

    Submitted 20 March, 2025; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Preprint

  20. arXiv:2402.18756  [pdf, other

    cs.CL

    How Much Annotation is Needed to Compare Summarization Models?

    Authors: Chantal Shaib, Joe Barrow, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: Modern instruction-tuned models have become highly capable in text generation tasks such as summarization, and are expected to be released at a steady pace. In practice one may now wish to choose confidently, but with minimal effort, the best performing summarization model when applied to a new domain or purpose. In this work, we empirically investigate the test sample size necessary to select a p… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Preprint

  21. arXiv:2402.15663  [pdf, other

    cs.CL

    Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study

    Authors: Zhaoyue Sun, Gabriele Pergola, Byron C. Wallace, Yulan He

    Abstract: With the advent of large language models (LLMs), there has been growing interest in exploring their potential for medical applications. This research aims to investigate the ability of LLMs, specifically ChatGPT, in the context of pharmacovigilance event extraction, of which the main goal is to identify and extract adverse events or potential therapeutic events from textual medical sources. We con… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 14 pages, 2 figures, accepted by EACL 2024

  22. arXiv:2402.12566  [pdf, other

    cs.CL cs.LG

    GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

    Authors: Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham

    Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar… ▽ More

    Submitted 19 January, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and models available at https://genaudit.org

  23. arXiv:2402.11456  [pdf, other

    cs.CL

    FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence

    Authors: Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li

    Abstract: Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Preprint has been updated to match the final revision for ACL 2024

  24. arXiv:2402.10109  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Reducing Diagnostic Errors with Interpretable Risk Prediction

    Authors: Denis Jered McInerney, William Dickinson, Lucy C. Flynn, Andrea C. Young, Geoffrey S. Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propo… ▽ More

    Submitted 19 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  25. arXiv:2402.03509  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

    Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton, Byron C Wallace

    Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2402.01700  [pdf

    cs.CL cs.AI

    Question answering systems for health professionals at the point of care -- a systematic review

    Authors: Gregory Kell, Angus Roberts, Serge Umansky, Linglong Qian, Davide Ferrari, Frank Soboczenski, Byron Wallace, Nikhil Patel, Iain J Marshall

    Abstract: Objective: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. Materials and method… ▽ More

    Submitted 24 January, 2024; originally announced February 2024.

    Comments: Accepted to the Journal of the American Medical Informatics Association (JAMIA)

  27. arXiv:2401.16475  [pdf, other

    cs.CL

    InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

    Authors: Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li

    Abstract: Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at ACL 2024 (main conference)

  28. arXiv:2311.13978  [pdf, other

    cs.LG eess.IV

    MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis

    Authors: Adam Byfield, William Poulett, Ben Wallace, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, Muhammad Bilal

    Abstract: Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do n… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  29. arXiv:2311.12908  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Model Alignment Using Direct Preference Optimization

    Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

    Abstract: Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  30. arXiv:2311.11211  [pdf

    cs.AI

    Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

    Authors: Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng

    Abstract: Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, ho… ▽ More

    Submitted 31 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

  31. Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

    Authors: Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau

    Abstract: We conjecture that hidden state vectors corresponding to individual input tokens encode information sufficient to accurately predict several tokens ahead. More concretely, in this paper we ask: Given a hidden (internal) representation of a single token at position $t$ in an input, can we reliably anticipate the tokens that will appear at positions $\geq t + 2$? To test this, we measure linear appr… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted at CoNLL 2023

  32. arXiv:2310.15213  [pdf, other

    cs.CL cs.LG

    Function Vectors in Large Language Models

    Authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau

    Abstract: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are… ▽ More

    Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. 52 pages, 30 figures, 23 tables. Code and data at https://functions.baulab.info

  33. arXiv:2309.04550  [pdf, other

    cs.CL

    Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges

    Authors: Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C. Wallace

    Abstract: Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  34. arXiv:2307.08920  [pdf, other

    eess.SY cs.AI cs.LG

    Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

    Authors: Brent A. Wallace, Jennie Si

    Abstract: Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL al… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  35. arXiv:2306.11270  [pdf, other

    cs.CL cs.LG

    Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

    Authors: Jiuding Sun, Chantal Shaib, Byron C. Wallace

    Abstract: Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-… ▽ More

    Submitted 8 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

  36. arXiv:2305.14296  [pdf, other

    cs.CL cs.LG

    USB: A Unified Summarization Benchmark Across Tasks and Domains

    Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace, Jeffrey P. Bigham, Zachary C. Lipton

    Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP Findings 2023 Camera Ready

  37. arXiv:2305.13693  [pdf, other

    cs.CL

    Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

    Authors: Lucy Lu Wang, Yulia Otmakhova, Jay DeYoung, Thinh Hung Truong, Bailey E. Kuehl, Erin Bransom, Byron C. Wallace

    Abstract: Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior work has shown that rather than performing the task, models may exploit shortcuts that are difficult to detect using standard n-gram similarity metrics such as… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023; Github: https://github.com/allenai/mslr-annotated-dataset

  38. arXiv:2305.12532  [pdf, other

    cs.CL

    Multilingual Simplification of Medical Texts

    Authors: Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh J. Ramanathan, Wei Xu, Byron C. Wallace, Junyi Jessy Li

    Abstract: Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text… ▽ More

    Submitted 18 October, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: This version will be in EMNLP 2023 main

  39. arXiv:2305.11828  [pdf, other

    cs.CL cs.AI cs.HC

    Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

    Authors: Hye Sun Yun, Iain J. Marshall, Thomas A. Trikalinos, Byron C. Wallace

    Abstract: Medical systematic reviews play a vital role in healthcare decision making and policy. However, their production is time-consuming, limiting the availability of high-quality and up-to-date evidence summaries. Recent advancements in large language models (LLMs) offer the potential to automatically generate literature reviews on demand, addressing this issue. However, LLMs sometimes generate inaccur… ▽ More

    Submitted 18 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 18 pages, 2 figures, 8 tables. Accepted as an EMNLP 2023 main paper

  40. arXiv:2305.06299  [pdf, other

    cs.CL

    Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

    Authors: Chantal Shaib, Millicent L. Li, Sebastian Joseph, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace

    Abstract: Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings. However, it is unclear if such models are similarly capable in more specialized, high-stakes domains such as biomedicine. In this paper, we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generat… ▽ More

    Submitted 11 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted short paper to ACL 2023

  41. arXiv:2305.05003  [pdf, other

    cs.CL

    Revisiting Relation Extraction in the era of Large Language Models

    Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

    Abstract: Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be… ▽ More

    Submitted 16 July, 2024; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  42. arXiv:2305.03642  [pdf, other

    cs.CL

    Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs

    Authors: Somin Wadhwa, Jay DeYoung, Benjamin Nye, Silvio Amir, Byron C. Wallace

    Abstract: Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of i… ▽ More

    Submitted 17 July, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to MLHC 2023

  43. arXiv:2303.13703  [pdf, other

    cs.CV cs.AI cs.LG

    End-to-End Diffusion Latent Optimization Improves Classifier Guidance

    Authors: Bram Wallace, Akash Gokul, Stefano Ermon, Nikhil Naik

    Abstract: Classifier guidance -- using the gradients of an image classifier to steer the generations of a diffusion model -- has the potential to dramatically expand the creative control over image generation and editing. However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, whi… ▽ More

    Submitted 31 May, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  44. arXiv:2303.05392  [pdf, other

    cs.CL cs.IR cs.LG

    Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges

    Authors: Sanjana Ramprasad, Denis Jered McInerney, Iain J. Marshal, Byron C. Wallace

    Abstract: We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  45. arXiv:2302.12343  [pdf, other

    cs.CL cs.AI cs.LG

    CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models

    Authors: Denis Jered McInerney, Geoffrey Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: We propose CHiLL (Crafting High-Level Latents), an approach for natural-language specification of features for linear models. CHiLL prompts LLMs with expert-crafted queries to generate interpretable features from health records. The resulting noisy labels are then used to train a simple linear classifier. Generating features based on queries to an LLM can empower physicians to use their domain exp… ▽ More

    Submitted 19 October, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: To be published at EMNLP Findings 2023

  46. arXiv:2302.05574  [pdf, other

    cs.CL

    NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization

    Authors: Junru Lu, Jiazheng Li, Byron C. Wallace, Yulan He, Gabriele Pergola

    Abstract: Accessing medical literature is difficult for laypeople as the content is written for specialists and contains medical jargon. Automated text simplification methods offer a potential means to address this issue. In this work, we propose a summarize-then-simplify two-stage strategy, which we call NapSS, identifying the relevant content to simplify while ensuring that the original narrative flow is… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Findings of EACL 2023

  47. arXiv:2302.02169  [pdf, other

    cs.LG cs.AI cs.CL

    How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

    Authors: Jinghan Yang, Sarthak Jain, Byron C. Wallace

    Abstract: We consider the problem of identifying a minimal subset of training data $\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of robustness (if… ▽ More

    Submitted 8 February, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023

  48. arXiv:2301.13844  [pdf, other

    cs.CL

    Do Multi-Document Summarization Models Synthesize?

    Authors: Jay DeYoung, Stephanie C. Martinez, Iain J. Marshall, Byron C. Wallace

    Abstract: Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately synthesize inputs with respect to a key aspect, e.g., a synopsis of film reviews written about a particular movie should reflect the average critic consensus. As a more consequential example, narrative summaries that accompany biomedical systematic reviews… ▽ More

    Submitted 12 July, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to TACL, to be presented at ACL 2024 in Bangkok, Thailand. 9 Figures, 11 Tables, 14 pages of main content, 20 pages total. This paper has some _history_. Buy me a drink if you want to hear about it

    Report number: TACL 6011

  49. arXiv:2212.01641  [pdf, other

    cs.CL cs.LG

    Intermediate Entity-based Sparse Interpretable Representation Learning

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh, Byron C. Wallace

    Abstract: Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types and values are predicted probabilities that a given entity is of the corresponding type. These methods perform well in zero-shot and low supervision settings. Compared to standard dense neural embeddings, such interpretable representations may permi… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted into BlackBox NLP Workshop at EMNLP 2022

  50. arXiv:2211.12446  [pdf, other

    cs.CV cs.AI cs.LG

    EDICT: Exact Diffusion Inversion via Coupled Transformations

    Authors: Bram Wallace, Akash Gokul, Nikhil Naik

    Abstract: Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate stat… ▽ More

    Submitted 22 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: 24 pages, 22 figures. Code now available