Skip to main content

Showing 1–16 of 16 results for author: Dligach, D

.
  1. arXiv:2412.01955  [pdf

    cs.CL cs.AI

    The use of large language models to enhance cancer clinical trial educational materials

    Authors: Mingye Gao, Aman Varshney, Shan Chen, Vikram Goddla, Jack Gallifant, Patrick Doyle, Claire Novack, Maeve Dillon-Martin, Teresia Perkins, Xinrong Correia, Erik Duhaime, Howard Isenstein, Elad Sharon, Lisa Soleymani Lehmann, David Kozono, Brian Anthony, Dmitriy Dligach, Danielle S. Bitterman

    Abstract: Cancer clinical trials often face challenges in recruitment and engagement due to a lack of participant-facing informational and educational resources. This study investigated the potential of Large Language Models (LLMs), specifically GPT4, in generating patient-friendly educational content from clinical trial informed consent forms. Using data from ClinicalTrials.gov, we employed zero-shot learn… ▽ More

    Submitted 3 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

  2. arXiv:2411.04962  [pdf, other

    cs.AI cs.CL

    Position Paper On Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-test Probability

    Authors: Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy A Miller, Danielle Bitterman, Guanhua Chen, Anoop Mayampurath, Matthew Churpek, Majid Afshar

    Abstract: Large language models (LLMs) are being explored for diagnostic decision support, yet their ability to estimate pre-test probabilities, vital for clinical decision-making, remains limited. This study evaluates two LLMs, Mistral-7B and Llama3-70B, using structured electronic health record data on three diagnosis tasks. We examined three current methods of extracting LLM probability estimations and r… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted to GenAI4Health Workshop at NeurIPS 2024

  3. arXiv:2409.15163  [pdf, other

    cs.CL cs.IR

    Lessons Learned on Information Retrieval in Electronic Health Records: A Comparison of Embedding Models and Pooling Strategies

    Authors: Skatje Myers, Timothy A. Miller, Yanjun Gao, Matthew M. Churpek, Anoop Mayampurath, Dmitriy Dligach, Majid Afshar

    Abstract: Objective: Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone. This paper presents an ablation study exploring how different… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  4. arXiv:2408.11854  [pdf, other

    cs.CL cs.AI cs.LG

    When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?

    Authors: Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy A Miller, Danielle Bitterman, Matthew Churpek, Majid Afshar

    Abstract: The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector represe… ▽ More

    Submitted 19 September, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to Findings of EMNLP 2024

  5. arXiv:2403.19511  [pdf

    cs.CL

    Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data

    Authors: Shan Chen, Jack Gallifant, Marco Guevara, Yanjun Gao, Majid Afshar, Timothy Miller, Dmitriy Dligach, Danielle S. Bitterman

    Abstract: Generative models have been showing potential for producing data in mass. This study explores the enhancement of clinical natural language processing performance by utilizing synthetic data generated from advanced language models. Promising results show feasible applications in such a high-stakes domain.

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: submitted to review

  6. arXiv:2308.14321  [pdf

    cs.CL cs.AI

    Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study

    Authors: Yanjun Gao, Ruizhe Li, Emma Croxford, John Caskey, Brian W Patterson, Matthew Churpek, Timothy Miller, Dmitriy Dligach, Majid Afshar

    Abstract: Electronic Health Records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives overload healthcare providers, risking diagnostic inaccuracies. While Large Language Models (LLMs) have showcased their potential in diverse language tasks, their application in t… ▽ More

    Submitted 24 February, 2025; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Published in JMIR AI

  7. arXiv:2306.05270  [pdf, other

    cs.CL

    Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes

    Authors: Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M. Churpek, Majid Afshar

    Abstract: The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers decision-making process and improve th… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings of the 5th BioNLP Workshop at ACL

  8. arXiv:2306.04551  [pdf, other

    cs.CL cs.LG

    Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning

    Authors: Brihat Sharma, Yanjun Gao, Timothy Miller, Matthew M. Churpek, Majid Afshar, Dmitriy Dligach

    Abstract: Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in… ▽ More

    Submitted 13 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to the Proceedings of the 5th Clinical NLP Workshop at ACL

  9. arXiv:2303.08038  [pdf, other

    cs.AI cs.CL

    Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task

    Authors: Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M Churpek, Ozlem Uzuner, Majid Afshar

    Abstract: Daily progress notes are common types in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: To appear in Journal of Biomedical Informatics

  10. DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing

    Authors: Yanjun Gao, Dmitriy Dligach, Timothy Miller, John Caskey, Brihat Sharma, Matthew M Churpek, Majid Afshar

    Abstract: The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is… ▽ More

    Submitted 13 December, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Under review

  11. arXiv:2208.08408  [pdf, other

    cs.CL cs.AI

    Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models

    Authors: Yanjun Gao, Dmitriy Dligach, Timothy Miller, Dongfang Xu, Matthew M. Churpek, Majid Afshar

    Abstract: Automatically summarizing patients' main problems from daily progress notes using natural language processing methods helps to battle against information and cognitive overload in hospital settings and potentially assists providers with computerized diagnostic decision support. Problem list summarization requires a model to understand, abstract, and generate clinical documentation. In this work, w… ▽ More

    Submitted 14 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Paper is accepted to COLING 2022

  12. arXiv:2204.03035  [pdf, other

    cs.CL cs.AI cs.CY

    Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

    Authors: Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, Majid Afshar

    Abstract: Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a h… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: To appear in 13th Language Resources and Evaluation Conference (LREC 2022)

  13. arXiv:2112.05780  [pdf, other

    cs.CL cs.AI

    A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

    Authors: Yanjun Gao, Dmitriy Dligach, Leslie Christensen, Samuel Tesch, Ryan Laffin, Dongfang Xu, Timothy Miller, Ozlem Uzuner, Matthew M Churpek, Majid Afshar

    Abstract: Objective: to provide a scoping review of papers on clinical natural language processing (NLP) tasks that use publicly available electronic health record data from a cohort of patients. Materials and Methods: We searched six databases, including biomedical research and computer science literature database. A round of title/abstract screening and full-text screening were conducted by two reviewers.… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Comments: Paper submitted to Journal of American Medical Informatics Association (JAMIA)

  14. arXiv:2105.06752  [pdf, other

    cs.CL

    Classifying Long Clinical Documents with Pre-trained Transformers

    Authors: Xin Su, Timothy Miller, Xiyu Ding, Majid Afshar, Dmitriy Dligach

    Abstract: Automatic phenotyping is a task of identifying cohorts of patients that match a predefined set of criteria. Phenotyping typically involves classifying long clinical documents that contain thousands of tokens. At the same time, recent state-of-art transformer-based pre-trained language models limit the input to a few hundred tokens (e.g. 512 tokens for BERT). We evaluate several strategies for inco… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  15. arXiv:1908.05596  [pdf, other

    cs.IR cs.CL cs.LG

    Two-stage Federated Phenotyping and Patient Representation Learning

    Authors: Dianbo Liu, Dmitriy Dligach, Timothy Miller

    Abstract: A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not general… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

    Comments: 9 pages; Proceedings of the 18th BioNLP Workshop and Shared Task

  16. arXiv:1805.02096  [pdf, other

    cs.CL

    Learning Patient Representations from Text

    Authors: Dmitriy Dligach, Timothy Miller

    Abstract: Mining electronic health records for patients who satisfy a set of predefined criteria is known in medical informatics as phenotyping. Phenotyping has numerous applications such as outcome prediction, clinical trial recruitment, and retrospective studies. Supervised machine learning for phenotyping typically relies on sparse patient representations such as bag-of-words. We consider an alternative… ▽ More

    Submitted 5 May, 2018; originally announced May 2018.

    Comments: Accepted to *SEM 2018