-
Evaluating the Usefulness of Non-Diagnostic Speech Data for Developing Parkinson's Disease Classifiers
Authors:
Terry Yi Zhong,
Esther Janse,
Cristian Tejedor-Garcia,
Louis ten Bosch,
Martha Larson
Abstract:
Speech-based Parkinson's disease (PD) detection has gained attention for its automated, cost-effective, and non-intrusive nature. As research studies usually rely on data from diagnostic-oriented speech tasks, this work explores the feasibility of diagnosing PD on the basis of speech data not originally intended for diagnostic purposes, using the Turn-Taking (TT) dataset. Our findings indicate tha…
▽ More
Speech-based Parkinson's disease (PD) detection has gained attention for its automated, cost-effective, and non-intrusive nature. As research studies usually rely on data from diagnostic-oriented speech tasks, this work explores the feasibility of diagnosing PD on the basis of speech data not originally intended for diagnostic purposes, using the Turn-Taking (TT) dataset. Our findings indicate that TT can be as useful as diagnostic-oriented PD datasets like PC-GITA. We also investigate which specific dataset characteristics impact PD classification performance. The results show that concatenating audio recordings and balancing participants' gender and status distributions can be beneficial. Cross-dataset evaluation reveals that models trained on PC-GITA generalize poorly to TT, whereas models trained on TT perform better on PC-GITA. Furthermore, we provide insights into the high variability across folds, which is mainly due to large differences in individual speaker performance.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Authors:
Tianyun Zhong,
Chao Liang,
Jianwen Jiang,
Gaojie Lin,
Jiaqi Yang,
Zhou Zhao
Abstract:
Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. However, their slow inference speed limits practical applications. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. Distilled models exhibit re…
▽ More
Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. However, their slow inference speed limits practical applications. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. Distilled models exhibit reduced robustness with open-set input images and a decreased correlation between audio and video compared to teacher models, undermining the advantages of diffusion models. To address this, we propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation). We first designed a mixed-supervised loss to leverage data of varying quality and enhance the overall model capability as well as robustness. Additionally, we propose a multi-CFG distillation with learnable tokens to utilize the correlation between audio and reference image conditions, reducing the threefold inference runs caused by multi-CFG with acceptable quality degradation. Extensive experiments across multiple datasets show that FADA generates vivid videos comparable to recent diffusion model-based methods while achieving an NFE speedup of 4.17-12.5 times. Demos are available at our webpage http://fadavatar.github.io.
△ Less
Submitted 4 April, 2025; v1 submitted 22 December, 2024;
originally announced December 2024.
-
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports
Authors:
Yutong Zhang,
Yi Pan,
Tianyang Zhong,
Peixin Dong,
Kangni Xie,
Yuxiao Liu,
Hanqi Jiang,
Zhengliang Liu,
Shijie Zhao,
Tuo Zhang,
Xi Jiang,
Dinggang Shen,
Tianming Liu,
Xin Zhang
Abstract:
Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti…
▽ More
Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence (AGI) for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy), and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Holistic Evaluation of GPT-4V for Biomedical Imaging
Authors:
Zhengliang Liu,
Hanqi Jiang,
Tianyang Zhong,
Zihao Wu,
Chong Ma,
Yiwei Li,
Xiaowei Yu,
Yutong Zhang,
Yi Pan,
Peng Shu,
Yanjun Lyu,
Lu Zhang,
Junjie Yao,
Peixin Dong,
Chao Cao,
Zhenxiang Xiao,
Jiaqi Wang,
Huan Zhao,
Shaochen Xu,
Yaonai Wei,
Jingyuan Chen,
Haixing Dai,
Peilong Wang,
Hao He,
Zewei Wang
, et al. (25 additional authors not shown)
Abstract:
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor…
▽ More
In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.
△ Less
Submitted 10 November, 2023;
originally announced December 2023.
-
Nonconvex optimization for optimum retrieval of the transmission matrix of a multimode fiber
Authors:
Shengfu Cheng,
Xuyu Zhang,
Tianting Zhong,
Huanhao Li,
Haoran Li,
Lei Gong,
Honglin Liu,
Puxiang Lai
Abstract:
Transmission matrix (TM) allows light control through complex media such as multimode fibers (MMFs), gaining great attention in areas like biophotonics over the past decade. The measurement of a complex-valued TM is highly desired as it supports full modulation of the light field, yet demanding as the holographic setup is usually entailed. Efforts have been taken to retrieve a TM directly from int…
▽ More
Transmission matrix (TM) allows light control through complex media such as multimode fibers (MMFs), gaining great attention in areas like biophotonics over the past decade. The measurement of a complex-valued TM is highly desired as it supports full modulation of the light field, yet demanding as the holographic setup is usually entailed. Efforts have been taken to retrieve a TM directly from intensity measurements with several representative phase retrieval algorithms, which still see limitations like slow or suboptimum recovery, especially under noisy environment. Here, a modified non-convex optimization approach is proposed. Through numerical evaluations, it shows that the nonconvex method offers an optimum efficiency of focusing with less running time or sampling rate. The comparative test under different signal-to-noise levels further indicates its improved robustness for TM retrieval. Experimentally, the optimum retrieval of the TM of a MMF is collectively validated by multiple groups of single-spot and multi-spot focusing demonstrations. Focus scanning on the working plane of the MMF is also conducted where our method achieves 93.6% efficiency of the gold standard holography method when the sampling rate is 8. Based on the recovered TM, image transmission through the MMF with high fidelity can be realized via another phase retrieval. Thanks to parallel operation and GPU acceleration, the nonconvex approach can retrieve an 8685$\times$1024 TM (sampling rate=8) with 42.3 s on a regular computer. In brief, the proposed method provides optimum efficiency and fast implementation for TM retrieval, which will facilitate wide applications in deep-tissue optical imaging, manipulation and treatment.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Preprocessing Enhanced Image Compression for Machine Vision
Authors:
Guo Lu,
Xingtong Ge,
Tianxiong Zhong,
Jing Geng,
Qiang Hu
Abstract:
Recently, more and more images are compressed and sent to the back-end devices for the machine analysis tasks~(\textit{e.g.,} object detection) instead of being purely watched by humans. However, most traditional or learned image codecs are designed to minimize the distortion of the human visual system without considering the increased demand from machine vision systems. In this work, we propose a…
▽ More
Recently, more and more images are compressed and sent to the back-end devices for the machine analysis tasks~(\textit{e.g.,} object detection) instead of being purely watched by humans. However, most traditional or learned image codecs are designed to minimize the distortion of the human visual system without considering the increased demand from machine vision systems. In this work, we propose a preprocessing enhanced image compression method for machine vision tasks to address this challenge. Instead of relying on the learned image codecs for end-to-end optimization, our framework is built upon the traditional non-differential codecs, which means it is standard compatible and can be easily deployed in practical applications. Specifically, we propose a neural preprocessing module before the encoder to maintain the useful semantic information for the downstream tasks and suppress the irrelevant information for bitrate saving. Furthermore, our neural preprocessing module is quantization adaptive and can be used in different compression ratios. More importantly, to jointly optimize the preprocessing module with the downstream machine vision tasks, we introduce the proxy network for the traditional non-differential codecs in the back-propagation stage. We provide extensive experiments by evaluating our compression method for two representative downstream tasks with different backbone networks. Experimental results show our method achieves a better trade-off between the coding bitrate and the performance of the downstream machine vision tasks by saving about 20% bitrate.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge
Authors:
Yue Sun,
Kun Gao,
Zhengwang Wu,
Zhihao Lei,
Ying Wei,
Jun Ma,
Xiaoping Yang,
Xue Feng,
Li Zhao,
Trung Le Phan,
Jitae Shin,
Tao Zhong,
Yu Zhang,
Lequan Yu,
Caizi Li,
Ramesh Basnet,
M. Omair Ahmad,
M. N. S. Swamy,
Wenao Ma,
Qi Dou,
Toan Duc Bui,
Camilo Bermudez Noguera,
Bennett Landman,
Ian H. Gotlib,
Kathryn L. Humphreys
, et al. (8 additional authors not shown)
Abstract:
To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i…
▽ More
To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site issue, that is, the models trained on a dataset from one site may not be applicable to the datasets acquired from other sites with different imaging protocols/scanners. To promote methodological development in the community, iSeg-2019 challenge (http://iseg2019.web.unc.edu) provides a set of 6-month infant subjects from multiple sites with different protocols/scanners for the participating methods. Training/validation subjects are from UNC (MAP) and testing subjects are from UNC/UMN (BCP), Stanford University, and Emory University. By the time of writing, there are 30 automatic segmentation methods participating in iSeg-2019. We review the 8 top-ranked teams by detailing their pipelines/implementations, presenting experimental results and evaluating performance in terms of the whole brain, regions of interest, and gyral landmark curves. We also discuss their limitations and possible future directions for the multi-site issue. We hope that the multi-site dataset in iSeg-2019 and this review article will attract more researchers on the multi-site issue.
△ Less
Submitted 11 July, 2020; v1 submitted 4 July, 2020;
originally announced July 2020.
-
Histopathological imaging features- versus molecular measurements-based cancer prognosis modeling
Authors:
Sanguo Zhang,
Yu Fan,
Tingyan Zhong,
Shuangge Ma
Abstract:
For most if not all cancers, prognosis is of significant importance, and extensive modeling research has been conducted. With the genetic nature of cancer, in the past two decades, multiple types of molecular data (such as gene expressions and DNA mutations) have been explored. More recently, histopathological imaging data, which is routinely collected in biopsy, has been shown as informative for…
▽ More
For most if not all cancers, prognosis is of significant importance, and extensive modeling research has been conducted. With the genetic nature of cancer, in the past two decades, multiple types of molecular data (such as gene expressions and DNA mutations) have been explored. More recently, histopathological imaging data, which is routinely collected in biopsy, has been shown as informative for modeling prognosis. In this study, using the TCGA LUAD and LUSC data as a showcase, we examine and compare modeling lung cancer overall survival using gene expressions versus histopathological imaging features. High-dimensional regularization methods are adopted for estimation and selection. Our analysis shows that gene expressions have slightly better prognostic performance. In addition, most of the gene expressions are found to be weakly correlated imaging features. It is expected that this study can provide some insight into utilizing the two types of important data in cancer prognosis modeling and into lung cancer overall survival.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.