Skip to main content

Showing 1–50 of 95 results for author: McKeown, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15925  [pdf, ps, other

    cs.CL

    Reranking-based Generation for Unbiased Perspective Summarization

    Authors: Narutatsu Ri, Nicholas Deas, Kathleen McKeown

    Abstract: Generating unbiased summaries in real-world settings such as political perspective summarization remains a crucial application of Large Language Models (LLMs). Yet, existing evaluation frameworks rely on traditional metrics for measuring key attributes such as coverage and faithfulness without verifying their applicability, and efforts to develop improved summarizers are still nascent. We address… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  2. arXiv:2506.06273  [pdf, ps, other

    cs.CL

    AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization

    Authors: Mukur Gupta, Nikhil Reddy Varimalla, Nicholas Deas, Melanie Subbiah, Kathleen McKeown

    Abstract: Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data, leading to inappropriate or unfair outputs in downstream tasks. In this work, we present AdvSumm (Adversarial Summarization), a domain-agnostic training framework… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2505.21740  [pdf, ps, other

    cs.CL cs.AI

    Counterfactual Simulatability of LLM Explanations for Generation Tasks

    Authors: Marvin Limpijankit, Yanda Chen, Melanie Subbiah, Nicholas Deas, Kathleen McKeown

    Abstract: LLMs can be unpredictable, as even slight alterations to the prompt can cause the output to change in unexpected ways. Thus, the ability of models to accurately explain their behavior is critical, especially in high-stakes settings. One approach for evaluating explanations is counterfactual simulatability, how well an explanation allows users to infer the model's output on related counterfactuals.… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  4. arXiv:2505.15068  [pdf, other

    cs.AI cs.CL cs.LG

    ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

    Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji

    Abstract: Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisciplinary reasoning and integration of computational tools. To address this gap, we introduce ModelingBench, a novel benchmark featuring real-world-inspired, open… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 36 Pages, 26 Figures, 5 Tables

  5. arXiv:2505.03209  [pdf, other

    cs.LG

    DYSTIL: Dynamic Strategy Induction with Large Language Models for Reinforcement Learning

    Authors: Borui Wang, Kathleen McKeown, Rex Ying

    Abstract: Reinforcement learning from expert demonstrations has long remained a challenging research problem, and existing state-of-the-art methods using behavioral cloning plus further RL training often suffer from poor generalization, low sample efficiency, and poor model interpretability. Inspired by the strong reasoning abilities of large language models (LLMs), we propose a novel strategy-based reinfor… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  6. arXiv:2504.08905  [pdf, other

    cs.CL

    Forecasting Communication Derailments Through Conversation Generation

    Authors: Yunfan Zhang, Kathleen McKeown, Smaranda Muresan

    Abstract: Forecasting communication derailment can be useful in real-world settings such as online content moderation, conflict resolution, and business negotiations. However, despite language models' success at identifying offensive speech present in conversations, they struggle to forecast future communication derailments. In contrast to prior work that predicts conversation outcomes solely based on the p… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  7. arXiv:2504.01132  [pdf, other

    cs.CL cs.AI

    Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding

    Authors: Melanie Subbiah, Akankshya Mishra, Grace Kim, Liyan Tang, Greg Durrett, Kathleen McKeown

    Abstract: Determining faithfulness of a claim to a source document is an important problem across many domains. This task is generally treated as a binary judgment of whether the claim is supported or unsupported in relation to the source. In many cases, though, whether a claim is supported can be ambiguous. For instance, it may depend on making inferences from given evidence, and different people can reaso… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Preprint

  8. arXiv:2503.10789  [pdf, other

    cs.CL

    Data Caricatures: On the Representation of African American Language in Pretraining Corpora

    Authors: Nicholas Deas, Blake Vente, Amith Ananthram, Jessica A. Grieser, Desmond Patton, Shana Kleiner, James Shepard, Kathleen McKeown

    Abstract: With a combination of quantitative experiments, human judgments, and qualitative analyses, we evaluate the quantity and quality of African American Language (AAL) representation in 12 predominantly English, open-source pretraining corpora. We specifically focus on the sources, variation, and naturalness of included AAL texts representing the AAL-speaking community. We find that AAL is underreprese… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Preprint

  9. arXiv:2503.00958  [pdf, ps, other

    cs.CL

    Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers

    Authors: Milad Alshomary, Nikhil Reddy Varimalla, Vishal Anand, Smaranda Muresan, Kathleen McKeown

    Abstract: We propose a new approach for the authorship attribution task that leverages the various linguistic representations learned at different layers of pre-trained transformer-based models. We evaluate our approach on three datasets, comparing it to a state-of-the-art baseline in in-domain and out-of-domain scenarios. We found that utilizing various transformer layers improves the robustness of authors… ▽ More

    Submitted 3 July, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

  10. arXiv:2502.16143  [pdf, other

    cs.CL

    The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

    Authors: Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi R. Fung, Kathleen McKeown, Chengxiang Zhai, Manling Li, Heng Ji

    Abstract: Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant kn… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: 19 pages, 5 figures

  11. arXiv:2501.06848  [pdf, other

    cs.LG cs.CL cs.CV

    A General Framework for Inference-time Scaling and Steering of Diffusion Models

    Authors: Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, Rajesh Ranganath

    Abstract: Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we propose Feynm… ▽ More

    Submitted 15 January, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  12. arXiv:2411.04093  [pdf, other

    cs.CL

    Summarization of Opinionated Political Documents with Varied Perspectives

    Authors: Nicholas Deas, Kathleen McKeown

    Abstract: Global partisan hostility and polarization has increased, and this polarization is heightened around presidential elections. Models capable of generating accurate summaries of diverse perspectives can help reduce such polarization by exposing users to alternative perspectives. In this work, we introduce a novel dataset and task for independently summarizing each political perspective in a set of p… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  13. arXiv:2410.16407  [pdf, other

    cs.CL cs.AI cs.MM

    Enhancing Multimodal Affective Analysis with Learned Live Comment Features

    Authors: Zhaoyuan Deng, Amith Ananthram, Kathleen McKeown

    Abstract: Live comments, also known as Danmaku, are user-generated messages that are synchronized with video content. These comments overlay directly onto streaming videos, capturing viewer emotions and reactions in real-time. While prior work has leveraged live comments in affective analysis, its use has been limited due to the relative rarity of live comments across different video platforms. To address t… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  14. arXiv:2410.12757  [pdf, other

    cs.CL cs.LG

    StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples

    Authors: Ajay Patel, Jiacheng Zhu, Justin Qiu, Zachary Horvitz, Marianna Apidianaki, Kathleen McKeown, Chris Callison-Burch

    Abstract: Style representations aim to embed texts with similar writing styles closely and texts with different styles far apart, regardless of content. However, the contrastive triplets often used for training these representations may vary in both style and content, leading to potential content leakage in the representations. We introduce StyleDistance, a novel approach to training stronger content-indepe… ▽ More

    Submitted 8 February, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: To appear at NAACL 2025

  15. arXiv:2409.07072  [pdf, other

    cs.CL

    Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution

    Authors: Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, Kathleen McKeown

    Abstract: Recent state-of-the-art authorship attribution methods learn authorship representations of texts in a latent, non-interpretable space, hindering their usability in real-world applications. Our work proposes a novel approach to interpreting these learned embeddings by identifying representative points in the latent space and utilizing LLMs to generate informative natural language descriptions of th… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 8 pages, 8 figures, under review

  16. arXiv:2407.12196  [pdf, other

    cs.CL

    MASIVE: Open-Ended Affective State Identification in English and Spanish

    Authors: Nicholas Deas, Elsbeth Turcan, Iván Pérez Mejía, Kathleen McKeown

    Abstract: In the field of emotion analysis, much NLP research focuses on identifying a limited number of discrete emotion categories, often applied across languages. These basic sets, however, are rarely designed with textual data in mind, and culture, language, and dialect can influence how particular emotions are interpreted. In this work, we broaden our scope to a practically unbounded set of \textit{aff… ▽ More

    Submitted 12 November, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: EMNLP 2024

  17. arXiv:2407.06501  [pdf, other

    cs.AI cs.CL

    STORYSUMM: Evaluating Faithfulness in Story Summarization

    Authors: Melanie Subbiah, Faisal Ladhak, Akankshya Mishra, Griffin Adams, Lydia B. Chilton, Kathleen McKeown

    Abstract: Human evaluation has been the gold standard for checking faithfulness in abstractive summarization. However, with a challenging source domain like narrative, multiple annotators can agree a summary is faithful, while missing details that are obvious errors only once pointed out. We therefore introduce a new dataset, STORYSUMM, comprising LLM summaries of short stories with localized faithfulness l… ▽ More

    Submitted 1 April, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: EMNLP Main 2024

  18. arXiv:2407.03956  [pdf, other

    cs.MA cs.CL

    Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems

    Authors: Shmuel Berman, Kathleen McKeown, Baishakhi Ray

    Abstract: Prior research has enhanced the ability of Large Language Models (LLMs) to solve logic puzzles using techniques such as chain-of-thought prompting or introducing a symbolic representation. These frameworks are still usually insufficient to solve complicated logical problems, such as Zebra puzzles, due to the inherent complexity of translating natural language clues into logical statements. We intr… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    MSC Class: 68T01; 68T20; 68T27; ACM Class: I.2.3; I.2.6; I.2.7; I.2.11

  19. arXiv:2406.15586  [pdf, other

    cs.CL

    TinyStyler: Efficient Few-Shot Text Style Transfer with Authorship Embeddings

    Authors: Zachary Horvitz, Ajay Patel, Kanishk Singh, Chris Callison-Burch, Kathleen McKeown, Zhou Yu

    Abstract: The goal of text style transfer is to transform the style of texts while preserving their original meaning, often with only a few examples of the target style. Existing style transfer methods generally rely on the few-shot capabilities of large language models or on complex controllable text generation approaches that are inefficient and underperform on fluency metrics. We introduce TinyStyler, a… ▽ More

    Submitted 7 November, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  20. arXiv:2406.11665  [pdf, other

    cs.CL cs.AI cs.CV

    See It from My Perspective: How Language Affects Cultural Bias in Image Understanding

    Authors: Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal, Kathleen McKeown

    Abstract: Vision-language models (VLMs) can respond to queries about images in many languages. However, beyond language, culture affects how we see things. For example, individuals from Western cultures focus more on the central figure in an image while individuals from East Asian cultures attend more to scene context. In this work, we characterize the Western bias of VLMs in image understanding and investi… ▽ More

    Submitted 28 February, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at ICLR 2025. 22 pages, 6 figures. Code/models: https://github.com/amith-ananthram/see-it-from-my-perspective

  21. arXiv:2403.04770  [pdf, other

    cs.CL cs.LG

    Social Orientation: A New Feature for Dialogue Analysis

    Authors: Todd Morrill, Zhaoyuan Deng, Yanda Chen, Amith Ananthram, Colin Wayne Leach, Kathleen McKeown

    Abstract: There are many settings where it is useful to predict and explain the success or failure of a dialogue. Circumplex theory from psychology models the social orientations (e.g., Warm-Agreeable, Arrogant-Calculating) of conversation participants and can be used to predict and explain the outcome of social interactions. Our work is novel in its systematic application of social orientation tags to mode… ▽ More

    Submitted 25 February, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  22. arXiv:2403.01061  [pdf, other

    cs.CL

    Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers

    Authors: Melanie Subbiah, Sean Zhang, Lydia B. Chilton, Kathleen McKeown

    Abstract: We evaluate recent Large Language Models (LLMs) on the challenging task of summarizing short stories, which can be lengthy, and include nuanced subtext or scrambled timelines. Importantly, we work directly with authors to ensure that the stories have not been shared online (and therefore are unseen by the models), and to obtain informed evaluations of summary quality using judgments from the autho… ▽ More

    Submitted 11 July, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: pre-MIT Press publication version

  23. arXiv:2403.00794  [pdf, other

    cs.CL cs.AI cs.LG

    Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

    Authors: Zachary Horvitz, Jingru Chen, Rahul Aditya, Harshvardhan Srivastava, Robert West, Zhou Yu, Kathleen McKeown

    Abstract: Humor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. In our work, we investigate whether large language models (LLMs), can generate synthetic data for humor detection via… ▽ More

    Submitted 21 June, 2024; v1 submitted 22 February, 2024; originally announced March 2024.

  24. arXiv:2402.18479  [pdf, other

    cs.CL

    NewsQs: Multi-Source Question Generation for the Inquiring Mind

    Authors: Alyssa Hwang, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba, Vittorio Castelli, Markus Dreyer, Mohit Bansal, Kathleen McKeown

    Abstract: We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judg… ▽ More

    Submitted 15 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: minor wording change

  25. arXiv:2402.13249  [pdf, other

    cs.CL cs.AI

    TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

    Authors: Liyan Tang, Igor Shalyminov, Amy Wing-mei Wong, Jon Burnsky, Jake W. Vincent, Yu'an Yang, Siffi Singh, Song Feng, Hwanjun Song, Hang Su, Lijia Sun, Yi Zhang, Saab Mansour, Kathleen McKeown

    Abstract: Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-le… ▽ More

    Submitted 31 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: NAACL 2024; Linguistic annotations available at https://github.com/amazon-science/tofueval

  26. arXiv:2402.12530  [pdf, other

    cs.CL cs.AI cs.LG

    Parallel Structures in Pre-training Data Yield In-Context Learning

    Authors: Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He

    Abstract: Pre-trained language models (LMs) are capable of in-context learning (ICL): they can adapt to a task with only a few examples given in the prompt without any parameter update. However, it is unclear where this capability comes from as there is a stark distribution shift between pre-training text and ICL prompts. In this work, we study what patterns of the pre-training data contribute to ICL. We fi… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  27. arXiv:2311.07884  [pdf, other

    cs.CL

    Fair Abstractive Summarization of Diverse Perspectives

    Authors: Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang

    Abstract: People from different social and demographic groups express diverse perspectives and conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and politics. A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. However, current work in summarization metrics and Large Language Models (LLMs) evaluation h… ▽ More

    Submitted 29 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  28. arXiv:2308.15459  [pdf, other

    cs.CL cs.AI

    ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

    Authors: Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown

    Abstract: Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. Target "styles" can be defined in numerous ways, ranging from single attributes (e.g, formality) to authorship (e.g, Shakespeare). Previous unsupervised style-transfer approaches generally rely on significant amounts of labeled data for only a fixed set of styles or require large language mode… ▽ More

    Submitted 22 February, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  29. arXiv:2308.05317  [pdf, other

    cs.CL

    Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

    Authors: Alexander Hanbo Li, Mingyue Shang, Evangelia Spiliopoulou, Jie Ma, Patrick Ng, Zhiguo Wang, Bonan Min, William Wang, Kathleen McKeown, Vittorio Castelli, Dan Roth, Bing Xiang

    Abstract: We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  30. arXiv:2307.08678  [pdf, other

    cs.CL cs.AI cs.LG

    Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

    Authors: Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

    Abstract: Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  31. arXiv:2305.18265  [pdf, other

    cs.CL cs.AI cs.CY

    Check-COVID: Fact-Checking COVID-19 News Claims with Scientific Evidence

    Authors: Gengyu Wang, Kate Harwood, Lawrence Chillrud, Amith Ananthram, Melanie Subbiah, Kathleen McKeown

    Abstract: We present a new fact-checking benchmark, Check-COVID, that requires systems to verify claims about COVID-19 from news using evidence from scientific articles. This approach to fact-checking is particularly challenging as it requires checking internet text written in everyday language against evidence from journal articles written in formal academic language. Check-COVID contains 1, 504 expert-ann… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted as ACL 2023 Findings

  32. arXiv:2305.17779  [pdf, other

    cs.CL

    Generating EDU Extracts for Plan-Guided Summary Re-Ranking

    Authors: Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Kathleen McKeown, Noémie Elhadad

    Abstract: Two-step approaches, in which summary candidates are generated-then-reranked to return a single summary, can improve ROUGE scores over the standard single-step approach. Yet, standard decoding methods (i.e., beam search, nucleus sampling, and diverse beam search) produce candidates with redundant, and often low quality, content. In this paper, we design a novel method to generate candidates for re… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  33. arXiv:2305.17534  [pdf, other

    cs.CL cs.AI cs.LG

    Unsupervised Selective Rationalization with Noise Injection

    Authors: Adam Storek, Melanie Subbiah, Kathleen McKeown

    Abstract: A major issue with using deep learning models in sensitive applications is that they provide no explanation for their output. To address this problem, unsupervised selective rationalization produces rationales alongside predictions by chaining two jointly-trained components, a rationale generator and a predictor. Although this architecture guarantees that the prediction relies solely on the ration… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  34. arXiv:2305.14291  [pdf, other

    cs.CL

    Evaluation of African American Language Bias in Natural Language Generation

    Authors: Nicholas Deas, Jessi Grieser, Shana Kleiner, Desmond Patton, Elsbeth Turcan, Kathleen McKeown

    Abstract: We evaluate how well LLMs understand African American Language (AAL) in comparison to their performance on White Mainstream English (WME), the encouraged "standard" form of English taught in American classrooms. We measure LLM performance using automatic metrics and human judgments for two tasks: a counterpart generation task, where a model generates AAL (or WME) given WME (or AAL), and a masked s… ▽ More

    Submitted 12 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Camera-Ready

  35. arXiv:2305.14225  [pdf, other

    cs.CL

    ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media

    Authors: Kung-Hsiang Huang, Hou Pong Chan, Kathleen McKeown, Heng Ji

    Abstract: Considerable advancements have been made to tackle the misrepresentation of information derived from reference articles in the domains of fact-checking and faithful summarization. However, an unaddressed aspect remains - the identification of social media posts that manipulate information within associated news articles. This task presents a significant challenge, primarily due to the prevalence o… ▽ More

    Submitted 1 December, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: COLING 2025

  36. arXiv:2305.12696  [pdf, other

    cs.CL

    Learning Interpretable Style Embeddings via Prompting LLMs

    Authors: Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, Chris Callison-Burch

    Abstract: Style representation learning builds content-independent representations of author style in text. Stylometry, the analysis of style in text, is often performed by expert forensic linguists and no large dataset of stylometric annotations exists for training. Current style representation learning uses neural methods to disentangle style from content to create style vectors, however, these approaches… ▽ More

    Submitted 9 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  37. arXiv:2303.03278  [pdf, other

    cs.CL cs.AI cs.LG

    Faithfulness-Aware Decoding Strategies for Abstractive Summarization

    Authors: David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, Mohit Bansal

    Abstract: Despite significant progress in understanding and improving faithfulness in abstractive summarization, the question of how decoding strategies affect faithfulness is less studied. We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization. We find a consistent trend where beam search with large beam siz… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: EACL 2023 (17 pages)

  38. arXiv:2302.00102  [pdf, other

    cs.CL cs.LG

    Towards Detecting Harmful Agendas in News Articles

    Authors: Melanie Subbiah, Amrita Bhattacharjee, Yilun Hua, Tharindu Kumarage, Huan Liu, Kathleen McKeown

    Abstract: Manipulated news online is a growing problem which necessitates the use of automated systems to curtail its spread. We argue that while misinformation and disinformation detection have been studied, there has been a lack of investment in the important open challenge of detecting harmful agendas in news articles; identifying harmful agendas is critical to flag news campaigns with the greatest poten… ▽ More

    Submitted 2 August, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

    Comments: Camera-ready for ACL-WASSA 2023. First two authors contributed equally

  39. arXiv:2301.13848  [pdf, other

    cs.CL cs.AI cs.LG

    Benchmarking Large Language Models for News Summarization

    Authors: Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto

    Abstract: Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability. S… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  40. arXiv:2301.10483  [pdf, other

    cs.CL

    SWING: Balancing Coverage and Faithfulness for Dialogue Summarization

    Authors: Kung-Hsiang Huang, Siffi Singh, Xiaofei Ma, Wei Xiao, Feng Nan, Nicholas Dingwall, William Yang Wang, Kathleen McKeown

    Abstract: Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries. To address this issue, we propose to utilize natural language inference (NLI) models to improve coverage while avoiding introducing factual inconsistencies. Specifically, we use NLI to compute fine-grained training signals to encourage the mod… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: Accepted by Findings of EACL 2023

  41. arXiv:2212.10670  [pdf, other

    cs.CL cs.LG

    In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

    Authors: Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown

    Abstract: Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  42. arXiv:2211.11724  [pdf, other

    cs.CL

    Legal and Political Stance Detection of SCOTUS Language

    Authors: Noah Bergam, Emily Allaway, Kathleen McKeown

    Abstract: We analyze publicly available US Supreme Court documents using automated stance detection. In the first phase of our work, we investigate the extent to which the Court's public-facing language is political. We propose and calculate two distinct ideology metrics of SCOTUS justices using oral argument transcripts. We then compare these language-based metrics to existing social scientific measures of… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Natural Legal Language Processing Workshop at EMNLP 2022

  43. arXiv:2211.05886  [pdf, ps, other

    cs.CL

    CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryściński, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman

    Abstract: This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique cha… ▽ More

    Submitted 6 December, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: 4 pages + 3 for references and appendix

  44. arXiv:2211.04903  [pdf, other

    cs.CL

    Novel Chapter Abstractive Summarization using Spinal Tree Aware Sub-Sentential Content Selection

    Authors: Hardy Hardy, Miguel Ballesteros, Faisal Ladhak, Muhammad Khalifa, Vittorio Castelli, Kathleen McKeown

    Abstract: Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter. We present a pipelined extractive-abstractive approach where the extractive step filters the content that is passed to the abstractive component. Extremely lengthy input also results in a highly skewed data… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  45. arXiv:2210.10045  [pdf, other

    cs.CL cs.AI

    SafeText: A Benchmark for Exploring Physical Safety in Language Models

    Authors: Sharon Levy, Emily Allaway, Melanie Subbiah, Lydia Chilton, Desmond Patton, Kathleen McKeown, William Yang Wang

    Abstract: Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create th… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

  46. arXiv:2210.09306  [pdf, other

    cs.AI cs.CL cs.LG

    Mitigating Covertly Unsafe Text within Natural Language Systems

    Authors: Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway, John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang

    Abstract: An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particul… ▽ More

    Submitted 20 March, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing

  47. arXiv:2209.07661  [pdf, other

    cs.CL cs.AI cs.LG

    On the Relation between Sensitivity and Accuracy in In-context Learning

    Authors: Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He

    Abstract: In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple perturbation types. First, we find that label bias obscures the true sensitivity, and therefore prior work may have significantly underestimated ICL sensitivity. Second, we observe a strong negative correlation between ICL sens… ▽ More

    Submitted 27 January, 2024; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: EMNLP 2023 camera-ready

  48. arXiv:2205.11658  [pdf, other

    cs.CL

    Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions

    Authors: Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi

    Abstract: Generics express generalizations about the world (e.g., birds can fly) that are not universally true (e.g., newborn birds and penguins cannot fly). Commonsense knowledge bases, used extensively in NLP, encode some generic knowledge but rarely enumerate such exceptions and knowing when a generic statement holds or does not hold true is crucial for developing a comprehensive understanding of generic… ▽ More

    Submitted 24 March, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: EACL 2023

  49. arXiv:2205.11602  [pdf, other

    cs.CL

    Seeded Hierarchical Clustering for Expert-Crafted Taxonomies

    Authors: Anish Saha, Amith Ananthram, Emily Allaway, Heng Ji, Kathleen McKeown

    Abstract: Practitioners from many disciplines (e.g., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora. In this work, we study Seeded Hierarchical Clustering (SHC): the task of automatically fitting unlabeled data to such taxonomies using only a small set of labeled examples. We propose HierSeed, a novel weakly supervised algorithm for this task that uses only a smal… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  50. arXiv:2204.10290  [pdf, other

    cs.CL

    Learning to Revise References for Faithful Summarization

    Authors: Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, Noémie Elhadad

    Abstract: In real-world scenarios with naturally occurring datasets, reference summaries are noisy and may contain information that cannot be inferred from the source text. On large news corpora, removing low quality samples has been shown to reduce model hallucinations. Yet, for smaller, and/or noisier corpora, filtering is detrimental to performance. To improve reference quality while retaining all data,… ▽ More

    Submitted 11 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Findings of EMNLP 2022