Skip to main content

Showing 1–17 of 17 results for author: Gatto, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.17509  [pdf, other

    cs.CL cs.AI

    Follow-up Question Generation For Enhanced Patient-Provider Conversations

    Authors: Joseph Gatto, Parker Seegmiller, Timothy Burdick, Inas S. Khayal, Sarah DeLozier, Sarah M. Preum

    Abstract: Follow-up question generation is an essential feature of dialogue systems as it can reduce conversational ambiguity and enhance modeling complex interactions. Conversational contexts often pose core NLP challenges such as (i) extracting relevant information buried in fragmented data sources, and (ii) modeling parallel thought processes. These two challenges occur frequently in medical dialogue as… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 17 Pages, 7 Figures, 6 Tables

  2. arXiv:2502.16838  [pdf, other

    cs.CL

    REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction

    Authors: Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum

    Abstract: Event argument extraction identifies arguments for predefined event roles in text. Traditional evaluations rely on exact match (EM), requiring predicted arguments to match annotated spans exactly. However, this approach fails for generative models like large language models (LLMs), which produce diverse yet semantically accurate responses. EM underestimates performance by disregarding valid variat… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Comments: 20 pages, 9 figures, 13 tables

  3. arXiv:2411.06549  [pdf, other

    cs.AI cs.CL

    In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages

    Authors: Joseph Gatto, Parker Seegmiller, Timothy E. Burdick, Sarah Masud Preum

    Abstract: Since the COVID-19 pandemic, clinicians have seen a large and sustained influx in patient portal messages, significantly contributing to clinician burnout. To the best of our knowledge, there are no large-scale public patient portal messages corpora researchers can use to build tools to optimize clinician portal workflows. Informed by our ongoing work with a regional hospital, this study introduce… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 8 pages

  4. arXiv:2410.03594  [pdf, other

    cs.CL

    Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments

    Authors: Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum

    Abstract: Prior works formulate the extraction of event-specific arguments as a span extraction problem, where event arguments are explicit -- i.e. assumed to be contiguous spans of text in a document. In this study, we revisit this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks. First, implicit arguments are event arguments which a… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted in EMNLP-2024 (Main). 21 pages, 8 figures, and 11 tables

  5. arXiv:2406.14695  [pdf, other

    cs.CL cs.AI

    Depth $F_1$: Improving Evaluation of Cross-Domain Text Classification by Measuring Semantic Generalizability

    Authors: Parker Seegmiller, Joseph Gatto, Sarah Masud Preum

    Abstract: Recent evaluations of cross-domain text classification models aim to measure the ability of a model to obtain domain-invariant performance in a target domain given labeled samples in a source domain. The primary strategy for this evaluation relies on assumed differences between source domain samples and target domain samples in benchmark datasets. This evaluation strategy fails to account for the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2404.01147  [pdf, other

    cs.CL cs.LG

    Do LLMs Find Human Answers To Fact-Driven Questions Perplexing? A Case Study on Reddit

    Authors: Parker Seegmiller, Joseph Gatto, Omar Sharif, Madhusudan Basak, Sarah Masud Preum

    Abstract: Large language models (LLMs) have been shown to be proficient in correctly answering questions in the context of online discourse. However, the study of using LLMs to model human-like answers to fact-driven social media questions is still under-explored. In this work, we investigate how LLMs model the wide variety of human answers to fact-driven questions posed on several topic-specific Reddit com… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 4 pages, 2 figures

  7. arXiv:2403.03336  [pdf, other

    cs.CL cs.SI

    Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse

    Authors: Joseph Gatto, Madhusudan Basak, Yash Srivastava, Philip Bohlman, Sarah M. Preum

    Abstract: In this paper, we develop an LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities. We formulate emerging opinion mining as a pairwise stance detection problem between (title, comment) pairs sourced from Reddit, where post titles contain emerging health-related claims on a topic that is not predefined. The claims are either explicitly or impl… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  8. arXiv:2403.03304  [pdf, other

    cs.CL cs.LG

    Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

    Authors: Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum

    Abstract: Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The nee… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Paper in submission (8 pages)

  9. arXiv:2310.19750  [pdf, other

    cs.CL

    Chain-of-Thought Embeddings for Stance Detection on Social Media

    Authors: Joseph Gatto, Omar Sharif, Sarah Masud Preum

    Abstract: Stance detection on social media is challenging for Large Language Models (LLMs), as emerging slang and colloquial language in online conversations often contain deeply implicit stance labels. Chain-of-Thought (COT) prompting has recently been shown to improve performance on stance detection tasks -- alleviating some of these issues. However, COT prompting still struggles with implicit stance iden… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP-2023, 8 pages

  10. arXiv:2309.09877  [pdf, other

    cs.CL

    Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts

    Authors: Joseph Gatto, Sarah M. Preum

    Abstract: User-generated texts available on the web and social platforms are often long and semantically challenging, making them difficult to annotate. Obtaining human annotation becomes increasingly difficult as problem domains become more specialized. For example, many health NLP problems require domain experts to be a part of the annotation pipeline. Thus, it is crucial that we develop low-resource NLP… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  11. arXiv:2309.06541  [pdf, other

    cs.CL

    Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity

    Authors: Joseph Gatto, Omar Sharif, Parker Seegmiller, Philip Bohlman, Sarah Masud Preum

    Abstract: Amidst the sharp rise in the evaluation of large language models (LLMs) on various tasks, we find that semantic textual similarity (STS) has been under-explored. In this study, we show that STS can be cast as a text generation problem while maintaining strong performance on multiple STS benchmarks. Additionally, we show generative LLMs significantly outperform existing encoder-based STS models whe… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Under review GEM@EMNLP-2023, 12 pages

  12. arXiv:2303.09366  [pdf, other

    cs.CL cs.LG

    The Scope of In-Context Learning for the Extraction of Medical Temporal Constraints

    Authors: Parker Seegmiller, Joseph Gatto, Madhusudan Basak, Diane Cook, Hassan Ghasemzadeh, John Stankovic, Sarah Preum

    Abstract: Medications often impose temporal constraints on everyday patient activity. Violations of such medical temporal constraints (MTCs) lead to a lack of treatment adherence, in addition to poor health outcomes and increased healthcare expenses. These MTCs are found in drug usage guidelines (DUGs) in both patient education materials and clinical texts. Computationally representing MTCs in DUGs will adv… ▽ More

    Submitted 16 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  13. arXiv:2301.11508  [pdf, other

    cs.CL

    Theme-driven Keyphrase Extraction to Analyze Social Media Discourse

    Authors: William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, Sarah Preum

    Abstract: Social media platforms are vital resources for sharing self-reported health experiences, offering rich data on various health topics. Despite advancements in Natural Language Processing (NLP) enabling large-scale social media data analysis, a gap remains in applying keyphrase extraction to health-related content. Keyphrase extraction is used to identify salient concepts in social media discourse w… ▽ More

    Submitted 28 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 11 pages, 2 figures, submitted to ICWSM. This version represents a substantial expansion and refocus of the previous manuscript, including new experiments, expanded data analysis, and comprehensive discussions

  14. arXiv:2301.07051  [pdf, other

    cs.LG

    ActSafe: Predicting Violations of Medical Temporal Constraints for Medication Adherence

    Authors: Parker Seegmiller, Joseph Gatto, Abdullah Mamun, Hassan Ghasemzadeh, Diane Cook, John Stankovic, Sarah Masud Preum

    Abstract: Prescription medications often impose temporal constraints on regular health behaviors (RHBs) of patients, e.g., eating before taking medication. Violations of such medical temporal constraints (MTCs) can result in adverse effects. Detecting and predicting such violations before they occur can help alert the patient. We formulate the problem of modeling MTCs and develop a proof-of-concept solution… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  15. arXiv:2210.03246  [pdf, other

    cs.CL

    HealthE: Classifying Entities in Online Textual Health Advice

    Authors: Joseph Gatto, Parker Seegmiller, Garrett Johnston, Sarah M. Preum

    Abstract: The processing of entities in natural language is essential to many medical NLP systems. Unfortunately, existing datasets vastly under-represent the entities required to model public health relevant texts such as health advice often found on sites like WebMD. People rely on such information for personal health management and clinically relevant decision making. In this work, we release a new annot… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  16. arXiv:2209.11102  [pdf, other

    cs.CL

    Scope of Pre-trained Language Models for Detecting Conflicting Health Information

    Authors: Joseph Gatto, Madhusudan Basak, Sarah M. Preum

    Abstract: An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  17. arXiv:1911.11901  [pdf, other

    cs.LG stat.ML

    Single Sample Feature Importance: An Interpretable Algorithm for Low-Level Feature Analysis

    Authors: Joseph Gatto, Ravi Lanka, Yumi Iwashita, Adrian Stoica

    Abstract: Have you ever wondered how your feature space is impacting the prediction of a specific sample in your dataset? In this paper, we introduce Single Sample Feature Importance (SSFI), which is an interpretable feature importance algorithm that allows for the identification of the most important features that contribute to the prediction of a single sample. When a dataset can be learned by a Random Fo… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The work of Joseph Gatto was sponsored by the JPL Summer Internship Program and the National Aeronautics and Space Administration