Search | arXiv e-print repository

Predicting Generalization of AI Colonoscopy Models to Unseen Data

Authors: Joel Shor, Carson McNeil, Yotam Intrator, Joseph R Ledsam, Hiro-o Yamano, Daisuke Tsurumaru, Hiroki Kayama, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Masaaki Miyo, Eiji Oki, Ichiro Takemasa, Ehud Rivlin, Roman Goldenberg

Abstract: $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}… ▽ More $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}$: We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. $\textbf{Results}$: MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). $\textbf{Conclusion}$: Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy. △ Less

Submitted 22 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2309.08623 [pdf]

doi 10.1109/ICDH60066.2023.00014

Balance Measures Derived from Insole Sensor Differentiate Prodromal Dementia with Lewy Bodies

Authors: Masatomo Kobayashi, Yasunori Yamada, Kaoru Shinkawa, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai

Abstract: Dementia with Lewy bodies is the second most common type of neurodegenerative dementia, and identification at the prodromal stage$-$i.e., mild cognitive impairment due to Lewy bodies (MCI-LB)$-$is important for providing appropriate care. However, MCI-LB is often underrecognized because of its diversity in clinical manifestations and similarities with other conditions such as mild cognitive impair… ▽ More Dementia with Lewy bodies is the second most common type of neurodegenerative dementia, and identification at the prodromal stage$-$i.e., mild cognitive impairment due to Lewy bodies (MCI-LB)$-$is important for providing appropriate care. However, MCI-LB is often underrecognized because of its diversity in clinical manifestations and similarities with other conditions such as mild cognitive impairment due to Alzheimer's disease (MCI-AD). In this study, we propose a machine learning-based automatic pipeline that helps identify MCI-LB by exploiting balance measures acquired with an insole sensor during a 30-s standing task. An experiment with 98 participants (14 MCI-LB, 38 MCI-AD, 46 cognitively normal) showed that the resultant models could discriminate MCI-LB from the other groups with up to 78.0% accuracy (AUC: 0.681), which was 6.8% better than the accuracy of a reference model based on demographic and clinical neuropsychological measures. Our findings may open up a new approach for timely identification of MCI-LB, enabling better care for patients. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: 2023 IEEE International Conference on Digital Health (ICDH)

arXiv:2309.05777 [pdf]

doi 10.1109/ICDH60066.2023.00015

Smartwatch-derived Acoustic Markers for Deficits in Cognitively Relevant Everyday Functioning

Authors: Yasunori Yamada, Kaoru Shinkawa, Masatomo Kobayashi, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai

Abstract: Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with c… ▽ More Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with cognition-relevant everyday functioning remains uninvestigated. In this study, we demonstrate the feasibility of using a smartwatch-based application to collect acoustic features as objective markers for detecting deficits in everyday functioning. We collected voice data during the performance of cognitive tasks and daily conversation, as possible application scenarios, from 54 older adults, along with a measure of everyday functioning. Machine learning models using acoustic features could detect individuals with deficits in everyday functioning with up to 77.8% accuracy, which was higher than the 68.5% accuracy with standard neuropsychological tests. We also identified common acoustic features for robustly discriminating deficits in everyday functioning across both types of voice data (cognitive tasks and daily conversation). Our results suggest that common acoustic features extracted from different types of voice data can be used as markers for deficits in everyday functioning. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: 2023 IEEE International Conference on Digital Health (ICDH)

arXiv:2211.08685 [pdf]

doi 10.1109/ICDH55609.2022.00008

Automated Analysis of Drawing Process for Detecting Prodromal and Clinical Dementia

Authors: Yasunori Yamada, Masatomo Kobayashi, Kaoru Shinkawa, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai

Abstract: Early diagnosis of dementia, particularly in the prodromal stage (i.e., mild cognitive impairment, or MCI), has become a research and clinical priority but remains challenging. Automated analysis of the drawing process has been studied as a promising means for screening prodromal and clinical dementia, providing multifaceted information encompassing features, such as drawing speed, pen posture, wr… ▽ More Early diagnosis of dementia, particularly in the prodromal stage (i.e., mild cognitive impairment, or MCI), has become a research and clinical priority but remains challenging. Automated analysis of the drawing process has been studied as a promising means for screening prodromal and clinical dementia, providing multifaceted information encompassing features, such as drawing speed, pen posture, writing pressure, and pauses. We examined the feasibility of using these features not only for detecting prodromal and clinical dementia but also for predicting the severity of cognitive impairments assessed using Mini-Mental State Examination (MMSE) as well as the severity of neuropathological changes assessed by medial temporal lobe (MTL) atrophy. We collected drawing data with a digitizing tablet and pen from 145 older adults of cognitively normal (CN), MCI, and dementia. The nested cross-validation results indicate that the combination of drawing features could be used to classify CN, MCI, and dementia with an AUC of 0.909 and 75.1% accuracy (CN vs. MCI: 82.4% accuracy; CN vs. dementia: 92.2% accuracy; MCI vs. dementia: 80.3% accuracy) and predict MMSE scores with an $R^2$ of 0.491 and severity of MTL atrophy with an $R^2$ of 0.293. Our findings suggest that automated analysis of the drawing process can provide information about cognitive impairments and neuropathological changes due to dementia, which can help identify prodromal and clinical dementia as a digital biomarker. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Journal ref: 2022 IEEE International Conference on Digital Health (ICDH)

Showing 1–4 of 4 results for author: Ota, M