Skip to main content

Showing 1–6 of 6 results for author: DeLucia, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18452  [pdf, ps, other

    cs.CL

    MedScore: Factuality Evaluation of Free-Form Medical Answers

    Authors: Heyuan Huang, Alexandra DeLucia, Vijay Murari Tiyyala, Mark Dredze

    Abstract: While Large Language Models (LLMs) can generate fluent and convincing responses, they are not necessarily correct. This is especially apparent in the popular decompose-then-verify factuality evaluation pipeline, where LLMs evaluate generations by decomposing the generations into individual, valid claims. Factuality evaluation is especially important for medical answers, since incorrect medical inf… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2503.15768  [pdf, other

    cs.CL cs.AI

    Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer

    Authors: Alexandra DeLucia, Mark Dredze

    Abstract: Abstractive multi-document summarization (MDS) is the task of automatically summarizing information in multiple documents, from news articles to conversations with multiple speakers. The training approaches for current MDS models can be grouped into four approaches: end-to-end with special pre-training ("direct"), chunk-then-summarize, extract-then-summarize, and inference with GPT-style models. I… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  3. arXiv:2401.06742  [pdf, other

    cs.CL cs.AI

    Using Natural Language Inference to Improve Persona Extraction from Dialogue in a New Domain

    Authors: Alexandra DeLucia, Mengjie Zhao, Yoshinori Maeda, Makoto Yoda, Keiichi Yamada, Hiromi Wakaki

    Abstract: While valuable datasets such as PersonaChat provide a foundation for training persona-grounded dialogue agents, they lack diversity in conversational and narrative settings, primarily existing in the "real" world. To develop dialogue agents with unique personas, models are trained to converse given a specific persona, but hand-crafting these persona can be time-consuming, thus methods exist to aut… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Code and models will be released upon publication

  4. arXiv:2311.08324  [pdf, other

    cs.CL cs.AI

    Anti-LM Decoding for Zero-shot In-context Machine Translation

    Authors: Suzanna Sia, Alexandra DeLucia, Kevin Duh

    Abstract: Zero-shot In-context learning is the phenomenon where models can perform the task simply given the instructions. However, pre-trained large language models are known to be poorly calibrated for this task. One of the most effective approaches to handling this bias is to adopt a contrastive decoding objective, which accounts for the prior probability of generating the next token by conditioning on s… ▽ More

    Submitted 2 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL Findings 2024

  5. arXiv:2010.07375  [pdf, other

    cs.CL

    Decoding Methods for Neural Narrative Generation

    Authors: Alexandra DeLucia, Aaron Mueller, Xiang Lisa Li, João Sedoc

    Abstract: Narrative generation is an open-ended NLP task in which a model generates a story given a prompt. The task is similar to neural response generation for chatbots; however, innovations in response generation are often not applied to narrative generation, despite the similarity between these tasks. We aim to bridge this gap by applying and evaluating advances in decoding methods for neural response g… ▽ More

    Submitted 8 July, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 20 pages. Updated to the accepted version in Workshop on Generation Evaluation and Metrics at ACL 2021 (GEM'21)

  6. arXiv:2010.04321  [pdf, other

    cs.DC cs.IR

    Analyzing HPC Support Tickets: Experience and Recommendations

    Authors: Alexandra DeLucia, Elisabeth Moore

    Abstract: High performance computing (HPC) user support teams are the first line of defense against large-scale problems, as they are often the first to learn of problems reported by users. Developing tools to better assist support teams in solving user problems and tracking issue trends is critical for maintaining system health. Our work examines the Los Alamos National Laboratory HPC Consult Team's user s… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.