Search | arXiv e-print repository

Early Dementia Detection Using Multiple Spontaneous Speech Prompts: The PROCESS Challenge

Authors: Fuxiang Tao, Bahman Mirheidari, Madhurananda Pahar, Sophie Young, Yao Xiao, Hend Elghazaly, Fritz Peters, Caitlin Illingworth, Dorota Braun, Ronan O'Malley, Simon Bell, Daniel Blackburn, Fasih Haider, Saturnino Luz, Heidi Christensen

Abstract: Dementia is associated with various cognitive impairments and typically manifests only after significant progression, making intervention at this stage often ineffective. To address this issue, the Prediction and Recognition of Cognitive Decline through Spontaneous Speech (PROCESS) Signal Processing Grand Challenge invites participants to focus on early-stage dementia detection. We provide a new s… ▽ More Dementia is associated with various cognitive impairments and typically manifests only after significant progression, making intervention at this stage often ineffective. To address this issue, the Prediction and Recognition of Cognitive Decline through Spontaneous Speech (PROCESS) Signal Processing Grand Challenge invites participants to focus on early-stage dementia detection. We provide a new spontaneous speech corpus for this challenge. This corpus includes answers from three prompts designed by neurologists to better capture the cognition of speakers. Our baseline models achieved an F1-score of 55.0% on the classification task and an RMSE of 2.98 on the regression task. △ Less

Submitted 5 December, 2024; originally announced December 2024.

Comments: 2 pages, no figure, conference

arXiv:2412.12965 [pdf]

The IBEX Imaging Knowledge-Base: A Community Resource Enabling Adoption and Development of Immunofluoresence Imaging Methods

Authors: Ziv Yaniv, Ifeanyichukwu U. Anidi, Leanne Arakkal, Armando J. Arroyo-Mejías, Rebecca T. Beuschel, Katy Börner, Colin J. Chu, Beatrice Clark, Menna R. Clatworthy, Jake Colautti, Fabian Coscia, Joshua Croteau, Saven Denha, Rose Dever, Walderez O. Dutra, Sonja Fritzsche, Spencer Fullam, Michael Y. Gerner, Anita Gola, Kenneth J. Gollob, Jonathan M. Hernandez, Jyh Liang Hor, Hiroshi Ichise, Zhixin Jing, Danny Jonigk , et al. (36 additional authors not shown)

Abstract: The iterative bleaching extends multiplexity (IBEX) Knowledge-Base is a central portal for researchers adopting IBEX and related 2D and 3D immunofluorescence imaging methods. The design of the Knowledge-Base is modeled after efforts in the open-source software community and includes three facets: a development platform (GitHub), static website, and service for data archiving. The Knowledge-Base fa… ▽ More The iterative bleaching extends multiplexity (IBEX) Knowledge-Base is a central portal for researchers adopting IBEX and related 2D and 3D immunofluorescence imaging methods. The design of the Knowledge-Base is modeled after efforts in the open-source software community and includes three facets: a development platform (GitHub), static website, and service for data archiving. The Knowledge-Base facilitates the practice of open science throughout the research life cycle by providing validation data for recommended and non-recommended reagents, e.g., primary and secondary antibodies. In addition to reporting negative data, the Knowledge-Base empowers method adoption and evolution by providing a venue for sharing protocols, videos, datasets, software, and publications. A dedicated discussion forum fosters a sense of community among researchers while addressing questions not covered in published manuscripts. Together, scientists from around the world are advancing scientific discovery at a faster pace, reducing wasted time and effort, and instilling greater confidence in the resulting data. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2406.10272 [pdf, other]

Connected Speech-Based Cognitive Assessment in Chinese and English

Authors: Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

Abstract: We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age… ▽ More We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction. △ Less

Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: To appear in Proceedings of Interspeech 2024

ACM Class: J.3; I.5.4

arXiv:2406.03138 [pdf, other]

An interpretable speech foundation model for depression detection by revealing prediction-relevant acoustic features from long speech

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Speech-based depression detection tools could aid early screening. Here, we propose an interpretable speech foundation model approach to enhance the clinical applicability of such tools. We introduce a speech-level Audio Spectrogram Transformer (AST) to detect depression using long-duration speech instead of short segments, along with a novel interpretation method that reveals prediction-relevant… ▽ More Speech-based depression detection tools could aid early screening. Here, we propose an interpretable speech foundation model approach to enhance the clinical applicability of such tools. We introduce a speech-level Audio Spectrogram Transformer (AST) to detect depression using long-duration speech instead of short segments, along with a novel interpretation method that reveals prediction-relevant acoustic features for clinician interpretation. Our experiments show the proposed model outperforms a segment-level AST, highlighting the impact of segment-level labelling noise and the advantage of leveraging longer speech duration for more reliable depression detection. Through interpretation, we observe our model identifies reduced loudness and F0 as relevant depression signals, aligning with documented clinical findings. This interpretability supports a responsible AI approach for speech-based depression detection, rendering such tools more clinically applicable. △ Less

Submitted 19 May, 2025; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

arXiv:2404.15367 [pdf, other]

Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks

Authors: Rafael F. Oliveira, Gladston J. P. Moreira, Vander L. S. Freitas, Eduardo J. S. Luz

Abstract: Arrhythmias, detectable through electrocardiograms (ECGs), pose significant health risks, underscoring the need for accurate and efficient automated detection techniques. While recent advancements in graph-based methods have demonstrated potential to enhance arrhythmia classification, the challenge lies in effectively representing ECG signals as graphs. This study investigates the use of Visibilit… ▽ More Arrhythmias, detectable through electrocardiograms (ECGs), pose significant health risks, underscoring the need for accurate and efficient automated detection techniques. While recent advancements in graph-based methods have demonstrated potential to enhance arrhythmia classification, the challenge lies in effectively representing ECG signals as graphs. This study investigates the use of Visibility Graph (VG) and Vector Visibility Graph (VVG) representations combined with Graph Convolutional Networks (GCNs) for arrhythmia classification under the ANSI/AAMI standard, ensuring reproducibility and fair comparison with other techniques. Through extensive experiments on the MIT-BIH dataset, we evaluate various GCN architectures and preprocessing parameters. Our findings demonstrate that VG and VVG mappings enable GCNs to classify arrhythmias directly from raw ECG signals, without the need for preprocessing or noise removal. Notably, VG offers superior computational efficiency, while VVG delivers enhanced classification performance by leveraging additional lead features. The proposed approach outperforms baseline methods in several metrics, although challenges persist in classifying the supraventricular ectopic beat (S) class, particularly under the inter-patient paradigm. △ Less

Submitted 3 December, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2309.13476 [pdf, other]

Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-… ▽ More Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations. △ Less

Submitted 6 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

ACM Class: F.2.2; I.2.7

arXiv:2301.05562 [pdf, ps, other]

Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech: a Signal Processing Grand Challenge

Authors: Saturnino Luz, Fasih Haider, Davida Fromm, Ioulietta Lazarou, Ioannis Kompatsiaris, Brian MacWhinney

Abstract: This Signal Processing Grand Challenge (SPGC) targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). Participants were invited to employ signal processing and machine learning methods to create predictive models based on spontaneous speech data. The Challenge has been designed to assess the extent to which predictive… ▽ More This Signal Processing Grand Challenge (SPGC) targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). Participants were invited to employ signal processing and machine learning methods to create predictive models based on spontaneous speech data. The Challenge has been designed to assess the extent to which predictive models built based on speech in one language (English) generalise to another language (Greek). To the best of our knowledge no work has investigated acoustic features of the speech signal in multilingual AD detection. Our baseline system used conventional machine learning algorithms with Active Data Representation of acoustic features, achieving accuracy of 73.91% on AD detection, and 4.95 root mean squared error on cognitive score prediction. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: ICASSP 2023 SPGC description

MSC Class: 68T10 (Primary) 92C55 (Secondary) ACM Class: J.3; I.2.6; I.5.1

arXiv:2104.09356 [pdf, other]

Detecting cognitive decline using speech only: The ADReSSo Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Abstract: Building on the success of the ADReSS Challenge at Interspeech 2020, which attracted the participation of 34 teams from across the world, the ADReSSo Challenge targets three difficult automatic prediction problems of societal and medical relevance, namely: detection of Alzheimer's Dementia, inference of cognitive testing scores, and prediction of cognitive decline. This paper presents these predic… ▽ More Building on the success of the ADReSS Challenge at Interspeech 2020, which attracted the participation of 34 teams from across the world, the ADReSSo Challenge targets three difficult automatic prediction problems of societal and medical relevance, namely: detection of Alzheimer's Dementia, inference of cognitive testing scores, and prediction of cognitive decline. This paper presents these prediction tasks in detail, describes the datasets used, and reports the results of the baseline classification and regression models we developed for each task. A combination of acoustic and linguistic features extracted directly from audio recordings, without human intervention, yielded a baseline accuracy of 78.87% for the AD classification task, an MMSE prediction root mean squared (RMSE) error of 5.28, and 68.75% accuracy for the cognitive decline prediction task. △ Less

Submitted 22 March, 2021; originally announced April 2021.

arXiv:2010.06047 [pdf, other]

Artificial Intelligence, speech and language processing approaches to monitoring Alzheimer's Disease: a systematic review

Authors: Sofia de la Fuente Garcia, Craig Ritchie, Saturnino Luz

Abstract: Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's… ▽ More Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's Disease, detailing current research procedures, highlighting their limitations and suggesting strategies to address them. We conducted a systematic review of original research between 2000 and 2019, registered in PROSPERO (reference CRD42018116606). An interdisciplinary search covered six databases on engineering (ACM and IEEE), psychology (PsycINFO), medicine (PubMed and Embase) and Web of Science. Bibliographies of relevant papers were screened until December 2019. From 3,654 search results 51 articles were selected against the eligibility criteria. Four tables summarise their findings: study details (aim, population, interventions, comparisons, methods and outcomes), data details (size, type, modalities, annotation, balance, availability and language of study), methodology (pre-processing, feature generation, machine learning, evaluation and results) and clinical applicability (research implications, clinical potential, risk of bias and strengths/limitations). While promising results are reported across nearly all 51 studies, very few have been implemented in clinical research or practice. We concluded that the main limitations of the field are poor standardisation, limited comparability of results, and a degree of disconnect between study aims and clinical applications. Attempts to close these gaps should support translation of future research into clinical practice. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: Pre-print submitted to the Journal of Alzheimer's Disease

ACM Class: J.3; I.2.7; I.2.6; I.5.4

arXiv:2004.06833 [pdf, ps, other]

Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Abstract: The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheime… ▽ More The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheimer's speech classification task and the neuropsychological score regression task. In the Alzheimer's speech classification task, ADReSS challenge participants create models for classifying speech as dementia or healthy control speech. In the the neuropsychological score regression task, participants create models to predict mini-mental state examination scores. This paper describes the ADReSS Challenge in detail and presents a baseline for both tasks, including feature extraction procedures and results for classification and regression models. ADReSS aims to provide the speech and language Alzheimer's research community with a platform for comprehensive methodological comparisons. This will hopefully contribute to addressing the lack of standardisation that currently affects the field and shed light on avenues for future research and clinical applicability. △ Less

Submitted 5 August, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

Comments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, China

arXiv:1811.09919 [pdf, other]

A Method for Analysis of Patient Speech in Dialogue for Dementia Detection

Authors: Saturnino Luz, Sofia de la Fuente, Pierre Albert

Abstract: We present an approach to automatic detection of Alzheimer's type dementia based on characteristics of spontaneous spoken language dialogue consisting of interviews recorded in natural settings. The proposed method employs additive logistic regression (a machine learning boosting method) on content-free features extracted from dialogical interaction to build a predictive model. The model training… ▽ More We present an approach to automatic detection of Alzheimer's type dementia based on characteristics of spontaneous spoken language dialogue consisting of interviews recorded in natural settings. The proposed method employs additive logistic regression (a machine learning boosting method) on content-free features extracted from dialogical interaction to build a predictive model. The model training data consisted of 21 dialogues between patients with Alzheimer's and interviewers, and 17 dialogues between patients with other health conditions and interviewers. Features analysed included speech rate, turn-taking patterns and other speech parameters. Despite relying solely on content-free features, our method obtains overall accuracy of 86.5\%, a result comparable to those of state-of-the-art methods that employ more complex lexical, syntactic and semantic features. While further investigation is needed, the fact that we were able to obtain promising results using only features that can be easily extracted from spontaneous dialogues suggests the possibility of designing non-invasive and low-cost mental health monitoring tools for use at scale. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: 8 pages, Resources and ProcessIng of linguistic, paralinguistic and extra-linguistic Data from people with various forms of cognitive impairment, LREC 2018

Showing 1–11 of 11 results for author: Luz, S