Skip to main content

Showing 1–6 of 6 results for author: Ramprasad, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.16638  [pdf, other

    cs.CL cs.AI

    Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation

    Authors: Sanjana Ramprasad, Byron C. Wallace

    Abstract: Modern LLMs can now produce highly readable abstractive summaries, to the point where traditional automated metrics for evaluating summary quality, such as ROUGE, have become saturated. However, LLMs still sometimes introduce unwanted content into summaries, i.e., information inconsistent with or unsupported by their source. Measuring the occurrence of these often subtle ``hallucinations'' automat… ▽ More

    Submitted 28 November, 2024; v1 submitted 25 November, 2024; originally announced November 2024.

  2. arXiv:2406.03487  [pdf, other

    cs.CL cs.AI

    Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends

    Authors: Sanjana Ramprasad, Elisa Ferracane, Zachary C. Lipton

    Abstract: Recent advancements in large language models (LLMs) have considerably advanced the capabilities of summarization systems. However, they continue to face concerns about hallucinations. While prior work has evaluated LLMs extensively in news domains, most evaluation of dialogue summarization has focused on BART-based models, leaving a gap in our understanding of their faithfulness. Our work benchmar… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  3. arXiv:2402.12566  [pdf, other

    cs.CL cs.LG

    GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

    Authors: Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham

    Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar… ▽ More

    Submitted 19 January, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and models available at https://genaudit.org

  4. arXiv:2402.03509  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

    Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton, Byron C Wallace

    Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  5. arXiv:2305.14296  [pdf, other

    cs.CL cs.LG

    USB: A Unified Summarization Benchmark Across Tasks and Domains

    Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace, Jeffrey P. Bigham, Zachary C. Lipton

    Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP Findings 2023 Camera Ready

  6. arXiv:2303.05392  [pdf, other

    cs.CL cs.IR cs.LG

    Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges

    Authors: Sanjana Ramprasad, Denis Jered McInerney, Iain J. Marshal, Byron C. Wallace

    Abstract: We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.