Skip to main content

Showing 1–15 of 15 results for author: Pakhomov, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.05610  [pdf, ps, other

    cs.CL

    Mitigating Confounding in Speech-Based Dementia Detection through Weight Masking

    Authors: Zhecheng Sheng, Xiruo Ding, Brian Hur, Changye Li, Trevor Cohen, Serguei Pakhomov

    Abstract: Deep transformer models have been used to detect linguistic anomalies in patient transcripts for early Alzheimer's disease (AD) screening. While pre-trained neural language models (LMs) fine-tuned on AD transcripts perform well, little research has explored the effects of the gender of the speakers represented by these transcripts. This work addresses gender confounding in dementia detection and p… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 16 pages, 20 figures. Accepted to ACL 2025 Main Conference

  2. arXiv:2503.20104  [pdf, other

    cs.CL

    "Is There Anything Else?'': Examining Administrator Influence on Linguistic Features from the Cookie Theft Picture Description Cognitive Test

    Authors: Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov

    Abstract: Alzheimer's Disease (AD) dementia is a progressive neurodegenerative disease that negatively impacts patients' cognitive ability. Previous studies have demonstrated that changes in naturalistic language samples can be useful for early screening of AD dementia. However, the nature of language deficits often requires test administrators to use various speech elicitation techniques during spontaneous… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to CMCL 2025 workshop, co-located with NAACL 2025

  3. arXiv:2503.20103  [pdf, other

    cs.CL

    Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder

    Authors: Changye Li, Weizhe Xu, Serguei Pakhomov, Ellen Bradley, Dror Ben-Zeev, Trevor Cohen

    Abstract: Disorganized thinking is a key diagnostic indicator of schizophrenia-spectrum disorders. Recently, clinical estimates of the severity of disorganized thinking have been shown to correlate with measures of how difficult speech transcripts would be for large language models (LLMs) to predict. However, LLMs' deployment challenges -- including privacy concerns, computational and financial costs, and l… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to CL Psych 2025 workshop, co-located with NAACL 2025

  4. arXiv:2407.13982  [pdf, other

    cs.CL

    Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance

    Authors: Changye Li, Trevor Cohen, Serguei Pakhomov

    Abstract: Automatic speech recognition (ASR) models trained on large amounts of audio data are now widely used to convert speech to written text in a variety of applications from video captioning to automated assistants used in healthcare and other domains. As such, it is important that ASR models and their use is fair and equitable. Prior work examining the performance of commercial ASR systems on the Corp… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2406.02830  [pdf, other

    cs.CL

    Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies

    Authors: Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov

    Abstract: As artificial neural networks grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive neural language models (NLMs), perplexity (PPL), can reflect how "surprised" an NLM model is at novel input. PPL has been widely used to understand the behavior of NLMs.… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 findings

  6. arXiv:2401.05551  [pdf, other

    cs.CL cs.SD eess.AS

    Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?

    Authors: Changye Li, Weizhe Xu, Trevor Cohen, Serguei Pakhomov

    Abstract: \textbf{Objectives}: We aimed to investigate how errors from automatic speech recognition (ASR) systems affect dementia classification accuracy, specifically in the ``Cookie Theft'' picture description task. We aimed to assess whether imperfect ASR-generated transcripts could provide valuable information for distinguishing between language samples from cognitively healthy individuals and those wit… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: To appear on Journal of Biomedical Informatics

  7. arXiv:2312.05435  [pdf, other

    cs.CL

    Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts

    Authors: Xiruo Ding, Zhecheng Sheng, Brian Hur, Feng Chen, Serguei V. S. Pakhomov, Trevor Cohen

    Abstract: Foundation models are a current focus of attention in both industry and academia. While they have shown their capabilities in a variety of tasks, in-depth research is required to determine their robustness to distribution shift when used as a basis for supervised machine learning. This is especially important in the context of clinical data, with particular limitations related to data accessibilit… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted in Workshop on Distribution Shifts, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  8. arXiv:2310.02451  [pdf, other

    cs.CL

    Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes

    Authors: Xiruo Ding, Zhecheng Sheng, Meliha Yetişgen, Serguei Pakhomov, Trevor Cohen

    Abstract: Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and in… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted in AMIA 2023 Annual Symposium

  9. arXiv:2307.07544  [pdf, other

    cs.CL cs.AI

    A Dialogue System for Assessing Activities of Daily Living: Improving Consistency with Grounded Knowledge

    Authors: Zhecheng Sheng, Raymond Finzel, Michael Lucke, Sheena Dufresne, Maria Gini, Serguei Pakhomov

    Abstract: In healthcare, the ability to care for oneself is reflected in the "Activities of Daily Living (ADL)," which serve as a measure of functional ability (functioning). A lack of functioning may lead to poor living conditions requiring personal care and assistance. To accurately identify those in need of support, assistance programs continuously evaluate participants' functioning across various domain… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ACL 2023 DialDoc Workshop

    Journal ref: In Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2023, page 68-79

  10. arXiv:2302.07322  [pdf, other

    cs.CL

    TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments

    Authors: Changye Li, Weizhe Xu, Trevor Cohen, Martin Michalowski, Serguei Pakhomov

    Abstract: The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to mak… ▽ More

    Submitted 14 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Accepted at AMIA Informatics Summit

  11. arXiv:2211.07430  [pdf, other

    eess.AS cs.AI cs.CL cs.LG q-bio.QM

    The Far Side of Failure: Investigating the Impact of Speech Recognition Errors on Subsequent Dementia Classification

    Authors: Changye Li, Trevor Cohen, Serguei Pakhomov

    Abstract: Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications including screening for dementia and other forms of cognitive impairment. The feasibility of deploying automated tools that can classify language samples obtained from speech in large-scale clinical settings depends on the ability to capture and automatically transcribe the speech for subsequ… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted as extended abstract for ML4H 2022

  12. arXiv:2203.13397  [pdf, other

    cs.CL

    GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models

    Authors: Changye Li, David Knopman, Weizhe Xu, Trevor Cohen, Serguei Pakhomov

    Abstract: Deep learning (DL) techniques involving fine-tuning large numbers of model parameters have delivered impressive performance on the task of discriminating between language produced by cognitively healthy individuals, and those with Alzheimer's disease (AD). However, questions remain about their ability to generalize beyond the small reference sets that are publicly available for research. As an alt… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted by ACL 2022

  13. arXiv:2108.02255  [pdf

    cs.CL

    An Empirical Study of UMLS Concept Extraction from Clinical Notes using Boolean Combination Ensembles

    Authors: Greg M. Silverman, Raymond L. Finzel, Michael V. Heinz, Jake Vasilakes, Jacob C. Solinsky, Reed McEwan, Benjamin C. Knoll, Christopher J. Tignanelli, Hongfang Liu, Hua Xu, Xiaoqian Jiang, Genevieve B. Melton, Serguei VS Pakhomov

    Abstract: Our objective in this study is to investigate the behavior of Boolean operators on combining annotation output from multiple Natural Language Processing (NLP) systems across multiple corpora and to assess how filtering by aggregation of Unified Medical Language System (UMLS) Metathesaurus concepts affects system performance for Named Entity Recognition (NER) of UMLS concepts. We used three corpora… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

  14. arXiv:2104.01543  [pdf, other

    cs.CL cs.AI cs.HC

    A Conversational Agent System for Dietary Supplements Use

    Authors: Esha Singh, Anu Bompelli, Ruyuan Wan, Jiang Bian, Serguei Pakhomov, Rui Zhang

    Abstract: Dietary supplements (DS) have been widely used by consumers, but the information around the efficacy and safety of DS is disparate or incomplete, thus creating barriers for consumers to find information effectively. Conversational agent (CA) systems have been applied to the healthcare domain, but there is no such a system to answer consumers regarding DS use, although widespread use of DS. In this… ▽ More

    Submitted 10 May, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

  15. arXiv:2005.03593  [pdf, other

    cs.CL

    A Tale of Two Perplexities: Sensitivity of Neural Language Models to Lexical Retrieval Deficits in Dementia of the Alzheimer's Type

    Authors: Trevor Cohen, Serguei Pakhomov

    Abstract: In recent years there has been a burgeoning interest in the use of computational methods to distinguish between elicited speech samples produced by patients with dementia, and those from healthy controls. The difference between perplexity estimates from two neural language models (LMs) - one trained on transcripts of speech produced by healthy participants and the other trained on transcripts from… ▽ More

    Submitted 28 June, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: To be published in the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)