Skip to main content

Showing 1–11 of 11 results for author: Seegmiller, P

.
  1. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  2. arXiv:2503.17509  [pdf, other

    cs.CL cs.AI

    Follow-up Question Generation For Enhanced Patient-Provider Conversations

    Authors: Joseph Gatto, Parker Seegmiller, Timothy Burdick, Inas S. Khayal, Sarah DeLozier, Sarah M. Preum

    Abstract: Follow-up question generation is an essential feature of dialogue systems as it can reduce conversational ambiguity and enhance modeling complex interactions. Conversational contexts often pose core NLP challenges such as (i) extracting relevant information buried in fragmented data sources, and (ii) modeling parallel thought processes. These two challenges occur frequently in medical dialogue as… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 17 Pages, 7 Figures, 6 Tables

  3. arXiv:2411.06549  [pdf, other

    cs.AI cs.CL

    In-Context Learning for Preserving Patient Privacy: A Framework for Synthesizing Realistic Patient Portal Messages

    Authors: Joseph Gatto, Parker Seegmiller, Timothy E. Burdick, Sarah Masud Preum

    Abstract: Since the COVID-19 pandemic, clinicians have seen a large and sustained influx in patient portal messages, significantly contributing to clinician burnout. To the best of our knowledge, there are no large-scale public patient portal messages corpora researchers can use to build tools to optimize clinician portal workflows. Informed by our ongoing work with a regional hospital, this study introduce… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 8 pages

  4. arXiv:2406.14695  [pdf, other

    cs.CL cs.AI

    Depth $F_1$: Improving Evaluation of Cross-Domain Text Classification by Measuring Semantic Generalizability

    Authors: Parker Seegmiller, Joseph Gatto, Sarah Masud Preum

    Abstract: Recent evaluations of cross-domain text classification models aim to measure the ability of a model to obtain domain-invariant performance in a target domain given labeled samples in a source domain. The primary strategy for this evaluation relies on assumed differences between source domain samples and target domain samples in benchmark datasets. This evaluation strategy fails to account for the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2404.01147  [pdf, other

    cs.CL cs.LG

    Do LLMs Find Human Answers To Fact-Driven Questions Perplexing? A Case Study on Reddit

    Authors: Parker Seegmiller, Joseph Gatto, Omar Sharif, Madhusudan Basak, Sarah Masud Preum

    Abstract: Large language models (LLMs) have been shown to be proficient in correctly answering questions in the context of online discourse. However, the study of using LLMs to model human-like answers to fact-driven social media questions is still under-explored. In this work, we investigate how LLMs model the wide variety of human answers to fact-driven questions posed on several topic-specific Reddit com… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 4 pages, 2 figures

  6. arXiv:2403.03304  [pdf, other

    cs.CL cs.LG

    Large Language Models for Document-Level Event-Argument Data Augmentation for Challenging Role Types

    Authors: Joseph Gatto, Parker Seegmiller, Omar Sharif, Sarah M. Preum

    Abstract: Event Argument Extraction (EAE) is an extremely difficult information extraction problem -- with significant limitations in few-shot cross-domain (FSCD) settings. A common solution to FSCD modeling is data augmentation. Unfortunately, existing augmentation methods are not well-suited to a variety of real-world EAE contexts including (i) The need to model long documents (10+ sentences) (ii) The nee… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Paper in submission (8 pages)

  7. arXiv:2310.15010  [pdf, other

    cs.CL

    Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings

    Authors: Parker Seegmiller, Sarah Masud Preum

    Abstract: The popularity of transformer-based text embeddings calls for better statistical tools for measuring distributions of such embeddings. One such tool would be a method for ranking texts within a corpus by centrality, i.e. assigning each text a number signifying how representative that text is of the corpus as a whole. However, an intrinsic center-outward ordering of high-dimensional text representa… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  8. arXiv:2309.06541  [pdf, other

    cs.CL

    Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity

    Authors: Joseph Gatto, Omar Sharif, Parker Seegmiller, Philip Bohlman, Sarah Masud Preum

    Abstract: Amidst the sharp rise in the evaluation of large language models (LLMs) on various tasks, we find that semantic textual similarity (STS) has been under-explored. In this study, we show that STS can be cast as a text generation problem while maintaining strong performance on multiple STS benchmarks. Additionally, we show generative LLMs significantly outperform existing encoder-based STS models whe… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Under review GEM@EMNLP-2023, 12 pages

  9. arXiv:2303.09366  [pdf, other

    cs.CL cs.LG

    The Scope of In-Context Learning for the Extraction of Medical Temporal Constraints

    Authors: Parker Seegmiller, Joseph Gatto, Madhusudan Basak, Diane Cook, Hassan Ghasemzadeh, John Stankovic, Sarah Preum

    Abstract: Medications often impose temporal constraints on everyday patient activity. Violations of such medical temporal constraints (MTCs) lead to a lack of treatment adherence, in addition to poor health outcomes and increased healthcare expenses. These MTCs are found in drug usage guidelines (DUGs) in both patient education materials and clinical texts. Computationally representing MTCs in DUGs will adv… ▽ More

    Submitted 16 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  10. arXiv:2301.07051  [pdf, other

    cs.LG

    ActSafe: Predicting Violations of Medical Temporal Constraints for Medication Adherence

    Authors: Parker Seegmiller, Joseph Gatto, Abdullah Mamun, Hassan Ghasemzadeh, Diane Cook, John Stankovic, Sarah Masud Preum

    Abstract: Prescription medications often impose temporal constraints on regular health behaviors (RHBs) of patients, e.g., eating before taking medication. Violations of such medical temporal constraints (MTCs) can result in adverse effects. Detecting and predicting such violations before they occur can help alert the patient. We formulate the problem of modeling MTCs and develop a proof-of-concept solution… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  11. arXiv:2210.03246  [pdf, other

    cs.CL

    HealthE: Classifying Entities in Online Textual Health Advice

    Authors: Joseph Gatto, Parker Seegmiller, Garrett Johnston, Sarah M. Preum

    Abstract: The processing of entities in natural language is essential to many medical NLP systems. Unfortunately, existing datasets vastly under-represent the entities required to model public health relevant texts such as health advice often found on sites like WebMD. People rely on such information for personal health management and clinically relevant decision making. In this work, we release a new annot… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.