Skip to main content

Showing 1–29 of 29 results for author: Preum, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.21986  [pdf, other

    cs.HC cs.CL cs.CY

    Socially Constructed Treatment Plans: Analyzing Online Peer Interactions to Understand How Patients Navigate Complex Medical Conditions

    Authors: Madhusudan Basak, Omar Sharif, Jessica Hulsey, Elizabeth C. Saunders, Daisy J. Goodman, Luke J. Archibald, Sarah M. Preum

    Abstract: When faced with complex and uncertain medical conditions (e.g., cancer, mental health conditions, recovery from substance dependency), millions of patients seek online peer support. In this study, we leverage content analysis of online discourse and ethnographic studies with clinicians and patient representatives to characterize how treatment plans for complex conditions are "socially constructed.… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  2. arXiv:2503.17509  [pdf, other

    cs.CL cs.AI

    Follow-up Question Generation For Enhanced Patient-Provider Conversations

    Authors: Joseph Gatto, Parker Seegmiller, Timothy Burdick, Inas S. Khayal, Sarah DeLozier, Sarah M. Preum

    Abstract: Follow-up question generation is an essential feature of dialogue systems as it can reduce conversational ambiguity and enhance modeling complex interactions. Conversational contexts often pose core NLP challenges such as (i) extracting relevant information buried in fragmented data sources, and (ii) modeling parallel thought processes. These two challenges occur frequently in medical dialogue as… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 17 Pages, 7 Figures, 6 Tables

  3. arXiv:2502.16838  [pdf, other

    cs.CL

    REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction

    Authors: Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum

    Abstract: Event argument extraction identifies arguments for predefined event roles in text. Traditional evaluations rely on exact match (EM), requiring predicted arguments to match annotated spans exactly. However, this approach fails for generative models like large language models (LLMs), which produce diverse yet semantically accurate responses. EM underestimates performance by disregarding valid variat… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

    Comments: 20 pages, 9 figures, 13 tables

  4. arXiv:2411.06549  [pdf, other

    cs.AI cs.CL

    In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages

    Authors: Joseph Gatto, Parker Seegmiller, Timothy E. Burdick, Sarah Masud Preum

    Abstract: Since the COVID-19 pandemic, clinicians have seen a large and sustained influx in patient portal messages, significantly contributing to clinician burnout. To the best of our knowledge, there are no large-scale public patient portal messages corpora researchers can use to build tools to optimize clinician portal workflows. Informed by our ongoing work with a regional hospital, this study introduce… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 8 pages

  5. arXiv:2410.03594  [pdf, other

    cs.CL

    Explicit, Implicit, and Scattered: Revisiting Event Extraction to Capture Complex Arguments

    Authors: Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum

    Abstract: Prior works formulate the extraction of event-specific arguments as a span extraction problem, where event arguments are explicit -- i.e. assumed to be contiguous spans of text in a document. In this study, we revisit this definition of Event Extraction (EE) by introducing two key argument types that cannot be modeled by existing EE frameworks. First, implicit arguments are event arguments which a… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted in EMNLP-2024 (Main). 21 pages, 8 figures, and 11 tables

  6. arXiv:2410.01633  [pdf

    cs.CY cs.CL

    A Thematic Framework for Analyzing Large-scale Self-reported Social Media Data on Opioid Use Disorder Treatment Using Buprenorphine Product

    Authors: Madhusudan Basak, Omar Sharif, Sarah E. Lord, Jacob T. Borodovsky, Lisa A. Marsch, Sandra A. Springer, Edward Nunes, Charlie D. Brackett, Luke J. ArchiBald, Sarah M. Preum

    Abstract: Background: One of the key FDA-approved medications for Opioid Use Disorder (OUD) is buprenorphine. Despite its popularity, individuals often report various information needs regarding buprenorphine treatment on social media platforms like Reddit. However, the key challenge is to characterize these needs. In this study, we propose a theme-based framework to curate and analyze large-scale data from… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  7. arXiv:2409.09570  [pdf, other

    cs.HC cs.AI

    MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences

    Authors: Subigya Nepal, Arvind Pillai, William Campbell, Talie Massachi, Michael V. Heinz, Ashmita Kunwar, Eunsol Soul Choi, Orson Xu, Joanna Kuc, Jeremy Huckins, Jason Holden, Sarah M. Preum, Colin Depp, Nicholas Jacobson, Mary Czerwinski, Eric Granholm, Andrew T. Campbell

    Abstract: Mental health concerns are prevalent among college students, highlighting the need for effective interventions that promote self-awareness and holistic well-being. MindScape pioneers a novel approach to AI-powered journaling by integrating passively collected behavioral patterns such as conversational engagement, sleep, and location with Large Language Models (LLMs). This integration creates a hig… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.00487

    ACM Class: H.5.0; H.5.3; H.5.m; J.0

  8. arXiv:2406.14695  [pdf, other

    cs.CL cs.AI

    Depth $F_1$: Improving Evaluation of Cross-Domain Text Classification by Measuring Semantic Generalizability

    Authors: Parker Seegmiller, Joseph Gatto, Sarah Masud Preum

    Abstract: Recent evaluations of cross-domain text classification models aim to measure the ability of a model to obtain domain-invariant performance in a target domain given labeled samples in a source domain. The primary strategy for this evaluation relies on assumed differences between source domain samples and target domain samples in benchmark datasets. This evaluation strategy fails to account for the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2404.01147  [pdf, other

    cs.CL cs.LG

    Do LLMs Find Human Answers To Fact-Driven Questions Perplexing? A Case Study on Reddit

    Authors: Parker Seegmiller, Joseph Gatto, Omar Sharif, Madhusudan Basak, Sarah Masud Preum

    Abstract: Large language models (LLMs) have been shown to be proficient in correctly answering questions in the context of online discourse. However, the study of using LLMs to model human-like answers to fact-driven social media questions is still under-explored. In this work, we investigate how LLMs model the wide variety of human answers to fact-driven questions posed on several topic-specific Reddit com… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 4 pages, 2 figures

  10. arXiv:2403.10829  [pdf, other

    cs.CL

    Deciphering Hate: Identifying Hateful Memes and Their Targets

    Authors: Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum

    Abstract: Internet memes have become a powerful means for individuals to express emotions, thoughts, and perspectives on social media. While often considered as a source of humor and entertainment, memes can also disseminate hateful content targeting individuals or communities. Most existing research focuses on the negative aspects of memes in high-resource languages, overlooking the distinctive challenges… ▽ More

    Submitted 22 September, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024, 13 pages

  11. arXiv:2403.03336  [pdf, other

    cs.CL cs.SI

    Scope of Large Language Models for Mining Emerging Opinions in Online Health Discourse

    Authors: Joseph Gatto, Madhusudan Basak, Yash Srivastava, Philip Bohlman, Sarah M. Preum

    Abstract: In this paper, we develop an LLM-powered framework for the curation and evaluation of emerging opinion mining in online health communities. We formulate emerging opinion mining as a pairwise stance detection problem between (title, comment) pairs sourced from Reddit, where post titles contain emerging health-related claims on a topic that is not predefined. The claims are either explicitly or impl… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  12. arXiv:2403.03304  [pdf, other

    cs.CL cs.LG

    Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

    Authors: Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum

    Abstract: Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The nee… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Paper in submission (8 pages)

  13. Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit

    Authors: Nur Yildirim, Susanna Zlotnikov, Deniz Sayar, Jeremy M. Kahn, Leigh A. Bukowski, Sher Shah Amin, Kathryn A. Riman, Billie S. Davis, John S. Minturn, Andrew J. King, Dan Ricketts, Lu Tang, Venkatesh Sivaraman, Adam Perer, Sarah M. Preum, James McCann, John Zimmerman

    Abstract: Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use case… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: to appear at CHI 2024

  14. arXiv:2402.09738  [pdf, other

    cs.CL

    Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection

    Authors: Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum

    Abstract: Multimodal hateful content detection is a challenging task that requires complex reasoning across visual and textual modalities. Therefore, creating a meaningful multimodal representation that effectively captures the interplay between visual and textual features through intermediate fusion is critical. Conventional fusion techniques are unable to attend to the modality-specific features effective… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL-SRW, 2024

  15. arXiv:2310.19750  [pdf, other

    cs.CL

    Chain-of-Thought Embeddings for Stance Detection on Social Media

    Authors: Joseph Gatto, Omar Sharif, Sarah Masud Preum

    Abstract: Stance detection on social media is challenging for Large Language Models (LLMs), as emerging slang and colloquial language in online conversations often contain deeply implicit stance labels. Chain-of-Thought (COT) prompting has recently been shown to improve performance on stance detection tasks -- alleviating some of these issues. However, COT prompting still struggles with implicit stance iden… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP-2023, 8 pages

  16. arXiv:2310.15010  [pdf, other

    cs.CL

    Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings

    Authors: Parker Seegmiller, Sarah Masud Preum

    Abstract: The popularity of transformer-based text embeddings calls for better statistical tools for measuring distributions of such embeddings. One such tool would be a method for ranking texts within a corpus by centrality, i.e. assigning each text a number signifying how representative that text is of the corpus as a whole. However, an intrinsic center-outward ordering of high-dimensional text representa… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  17. arXiv:2309.09877  [pdf, other

    cs.CL

    Not Enough Labeled Data? Just Add Semantics: A Data-Efficient Method for Inferring Online Health Texts

    Authors: Joseph Gatto, Sarah M. Preum

    Abstract: User-generated texts available on the web and social platforms are often long and semantically challenging, making them difficult to annotate. Obtaining human annotation becomes increasingly difficult as problem domains become more specialized. For example, many health NLP problems require domain experts to be a part of the annotation pipeline. Thus, it is crucial that we develop low-resource NLP… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  18. arXiv:2309.06541  [pdf, other

    cs.CL

    Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity

    Authors: Joseph Gatto, Omar Sharif, Parker Seegmiller, Philip Bohlman, Sarah Masud Preum

    Abstract: Amidst the sharp rise in the evaluation of large language models (LLMs) on various tasks, we find that semantic textual similarity (STS) has been under-explored. In this study, we show that STS can be cast as a text generation problem while maintaining strong performance on multiple STS benchmarks. Additionally, we show generative LLMs significantly outperform existing encoder-based STS models whe… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Under review GEM@EMNLP-2023, 12 pages

  19. arXiv:2308.09156  [pdf, other

    cs.CL

    Characterizing Information Seeking Events in Health-Related Social Discourse

    Authors: Omar Sharif, Madhusudan Basak, Tanzia Parvin, Ava Scharfstein, Alphonso Bradham, Jacob T. Borodovsky, Sarah E. Lord, Sarah M. Preum

    Abstract: Social media sites have become a popular platform for individuals to seek and share health information. Despite the progress in natural language processing for social media mining, a gap remains in analyzing health-related texts on social discourse in the context of events. Event-driven analysis can offer insights into different facets of healthcare at an individual and collective level, including… ▽ More

    Submitted 19 December, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted at AAAI-2024. 9 pages, 6 tables, 2 figures

  20. arXiv:2303.09366  [pdf, other

    cs.CL cs.LG

    The Scope of In-Context Learning for the Extraction of Medical Temporal Constraints

    Authors: Parker Seegmiller, Joseph Gatto, Madhusudan Basak, Diane Cook, Hassan Ghasemzadeh, John Stankovic, Sarah Preum

    Abstract: Medications often impose temporal constraints on everyday patient activity. Violations of such medical temporal constraints (MTCs) lead to a lack of treatment adherence, in addition to poor health outcomes and increased healthcare expenses. These MTCs are found in drug usage guidelines (DUGs) in both patient education materials and clinical texts. Computationally representing MTCs in DUGs will adv… ▽ More

    Submitted 16 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  21. arXiv:2302.09665  [pdf, other

    cs.AI

    CitySpec with Shield: A Secure Intelligent Assistant for Requirement Formalization

    Authors: Zirong Chen, Issa Li, Haoxiang Zhang, Sarah Preum, John A. Stankovic, Meiyi Ma

    Abstract: An increasing number of monitoring systems have been developed in smart cities to ensure that the real-time operations of a city satisfy safety and performance requirements. However, many existing city requirements are written in English with missing, inaccurate, or ambiguous information. There is a high demand for assisting city policymakers in converting human-specified requirements to machine-u… ▽ More

    Submitted 30 March, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.03132

  22. arXiv:2301.11508  [pdf, other

    cs.CL

    Theme-driven Keyphrase Extraction to Analyze Social Media Discourse

    Authors: William Romano, Omar Sharif, Madhusudan Basak, Joseph Gatto, Sarah Preum

    Abstract: Social media platforms are vital resources for sharing self-reported health experiences, offering rich data on various health topics. Despite advancements in Natural Language Processing (NLP) enabling large-scale social media data analysis, a gap remains in applying keyphrase extraction to health-related content. Keyphrase extraction is used to identify salient concepts in social media discourse w… ▽ More

    Submitted 28 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 11 pages, 2 figures, submitted to ICWSM. This version represents a substantial expansion and refocus of the previous manuscript, including new experiments, expanded data analysis, and comprehensive discussions

  23. arXiv:2301.07051  [pdf, other

    cs.LG

    ActSafe: Predicting Violations of Medical Temporal Constraints for Medication Adherence

    Authors: Parker Seegmiller, Joseph Gatto, Abdullah Mamun, Hassan Ghasemzadeh, Diane Cook, John Stankovic, Sarah Masud Preum

    Abstract: Prescription medications often impose temporal constraints on regular health behaviors (RHBs) of patients, e.g., eating before taking medication. Violations of such medical temporal constraints (MTCs) can result in adverse effects. Detecting and predicting such violations before they occur can help alert the patient. We formulate the problem of modeling MTCs and develop a proof-of-concept solution… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  24. arXiv:2210.03246  [pdf, other

    cs.CL

    HealthE: Classifying Entities in Online Textual Health Advice

    Authors: Joseph Gatto, Parker Seegmiller, Garrett Johnston, Sarah M. Preum

    Abstract: The processing of entities in natural language is essential to many medical NLP systems. Unfortunately, existing datasets vastly under-represent the entities required to model public health relevant texts such as health advice often found on sites like WebMD. People rely on such information for personal health management and clinically relevant decision making. In this work, we release a new annot… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  25. arXiv:2209.11102  [pdf, other

    cs.CL

    Scope of Pre-trained Language Models for Detecting Conflicting Health Information

    Authors: Joseph Gatto, Madhusudan Basak, Sarah M. Preum

    Abstract: An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  26. arXiv:2206.07152  [pdf, other

    cs.AI cs.FL cs.LG

    An Intelligent Assistant for Converting City Requirements to Formal Specification

    Authors: Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John Stankovic, Meiyi Ma

    Abstract: As more and more monitoring systems have been deployed to smart cities, there comes a higher demand for converting new human-specified requirements to machine-understandable formal specifications automatically. However, these human-specific requirements are often written in English and bring missing, inaccurate, or ambiguous information. In this paper, we present CitySpec, an intelligent assistant… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: This demo paper is accepted by SMARTCOMP 2022

  27. arXiv:2206.03132  [pdf, other

    cs.AI cs.CL cs.LG cs.SE

    CitySpec: An Intelligent Assistant System for Requirement Specification in Smart Cities

    Authors: Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A. Stankovic, Meiyi Ma

    Abstract: An increasing number of monitoring systems have been developed in smart cities to ensure that real-time operations of a city satisfy safety and performance requirements. However, many existing city requirements are written in English with missing, inaccurate, or ambiguous information. There is a high demand for assisting city policy makers in converting human-specified requirements to machine-unde… ▽ More

    Submitted 14 June, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: This paper is accepted by SMARTCOMP 2022

  28. arXiv:2007.05831  [pdf, other

    cs.CY

    MFED: A System for Monitoring Family Eating Dynamics

    Authors: Md Abu Sayeed Mondol, Brooke Bell, Meiyi Ma, Ridwan Alam, Ifat Emi, Sarah Masud Preum, Kayla de la Haye, Donna Spruijt-Metz, John C. Lach, John A. Stankovic

    Abstract: Obesity is a risk factor for many health issues, including heart disease, diabetes, osteoarthritis, and certain cancers. One of the primary behavioral causes, dietary intake, has proven particularly challenging to measure and track. Current behavioral science suggests that family eating dynamics (FED) have high potential to impact child and parent dietary intake, and ultimately the risk of obesity… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

  29. arXiv:1910.12444  [pdf

    cs.HC cs.CY

    Information Seeking and Information Processing Behaviors Among Type 2 Diabetics

    Authors: Sarah Masud Preum, Kate Clark, Ashley Davis, Konstantine Khutsishvilli, Rupa S Valdez

    Abstract: Effective patient education is critical for managing Type 2 Diabetes Mellitus (T2DM), one of the most common chronic diseases in the United States. While some studies focus on the information-seeking behavior of T2DM patients, other self-education behaviors including information processing and utilization are rarely explored in the context of T2DM. This study sought to assess two self-education be… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.