Search | arXiv e-print repository

Diffusion-based Counterfactual Augmentation: Towards Robust and Interpretable Knee Osteoarthritis Grading

Authors: Zhe Wang, Yuhua Ru, Aladine Chetouani, Tina Shiang, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane, William Ewing Palmer, Mohamed Jarraya, Yung Hsin Chen

Abstract: Automated grading of Knee Osteoarthritis (KOA) from radiographs is challenged by significant inter-observer variability and the limited robustness of deep learning models, particularly near critical decision boundaries. To address these limitations, this paper proposes a novel framework, Diffusion-based Counterfactual Augmentation (DCA), which enhances model robustness and interpretability by gene… ▽ More Automated grading of Knee Osteoarthritis (KOA) from radiographs is challenged by significant inter-observer variability and the limited robustness of deep learning models, particularly near critical decision boundaries. To address these limitations, this paper proposes a novel framework, Diffusion-based Counterfactual Augmentation (DCA), which enhances model robustness and interpretability by generating targeted counterfactual examples. The method navigates the latent space of a diffusion model using a Stochastic Differential Equation (SDE), governed by balancing a classifier-informed boundary drive with a manifold constraint. The resulting counterfactuals are then used within a self-corrective learning strategy to improve the classifier by focusing on its specific areas of uncertainty. Extensive experiments on the public Osteoarthritis Initiative (OAI) and Multicenter Osteoarthritis Study (MOST) datasets demonstrate that this approach significantly improves classification accuracy across multiple model architectures. Furthermore, the method provides interpretability by visualizing minimal pathological changes and revealing that the learned latent space topology aligns with clinical knowledge of KOA progression. The DCA framework effectively converts model uncertainty into a robust training signal, offering a promising pathway to developing more accurate and trustworthy automated diagnostic systems. Our code is available at https://github.com/ZWang78/DCA. △ Less

Submitted 18 June, 2025; originally announced June 2025.

arXiv:2504.07308 [pdf, other]

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

Authors: Zhe Wang, Yuhua Ru, Aladine Chetouani, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane, Mohamed Jarraya, Yung Hsin Chen

Abstract: Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diff… ▽ More Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diffusion-based SR models that apply a uniform denoising process across the entire image, MoEDiff-SR dynamically selects specialized denoising experts at a fine-grained token level, ensuring region-specific adaptation and enhanced SR performance. Specifically, our approach first employs a Transformer-based feature extractor to compute multi-scale patch embeddings, capturing both global structural information and local texture details. The extracted feature embeddings are then fed into an MoE gating network, which assigns adaptive weights to multiple diffusion-based denoisers, each specializing in different brain MRI characteristics, such as centrum semiovale, sulcal and gyral cortex, and grey-white matter junction. The final output is produced by aggregating the denoised results from these specialized experts according to dynamically assigned gating probabilities. Experimental results demonstrate that MoEDiff-SR outperforms existing state-of-the-art methods in terms of quantitative image quality metrics, perceptual fidelity, and computational efficiency. Difference maps from each expert further highlight their distinct specializations, confirming the effective region-specific denoising capability and the interpretability of expert contributions. Additionally, clinical evaluation validates its superior diagnostic capability in identifying subtle pathological features, emphasizing its practical relevance in clinical neuroimaging. Our code is available at https://github.com/ZWang78/MoEDiff-SR. △ Less

Submitted 9 April, 2025; originally announced April 2025.

arXiv:2501.18736 [pdf, other]

Distillation-Driven Diffusion Model for Multi-Scale MRI Super-Resolution: Make 1.5T MRI Great Again

Authors: Zhe Wang, Yuhua Ru, Fabian Bauer, Aladine Chetouani, Fang Chen, Liping Zhang, Didier Hans, Rachid Jennane, Mohamed Jarraya, Yung Hsin Chen

Abstract: Magnetic Resonance Imaging (MRI) offers critical insights into microstructural details, however, the spatial resolution of standard 1.5T imaging systems is often limited. In contrast, 7T MRI provides significantly enhanced spatial resolution, enabling finer visualization of anatomical structures. Though this, the high cost and limited availability of 7T MRI hinder its widespread use in clinical se… ▽ More Magnetic Resonance Imaging (MRI) offers critical insights into microstructural details, however, the spatial resolution of standard 1.5T imaging systems is often limited. In contrast, 7T MRI provides significantly enhanced spatial resolution, enabling finer visualization of anatomical structures. Though this, the high cost and limited availability of 7T MRI hinder its widespread use in clinical settings. To address this challenge, a novel Super-Resolution (SR) model is proposed to generate 7T-like MRI from standard 1.5T MRI scans. Our approach leverages a diffusion-based architecture, incorporating gradient nonlinearity correction and bias field correction data from 7T imaging as guidance. Moreover, to improve deployability, a progressive distillation strategy is introduced. Specifically, the student model refines the 7T SR task with steps, leveraging feature maps from the inference phase of the teacher model as guidance, aiming to allow the student model to achieve progressively 7T SR performance with a smaller, deployable model size. Experimental results demonstrate that our baseline teacher model achieves state-of-the-art SR performance. The student model, while lightweight, sacrifices minimal performance. Furthermore, the student model is capable of accepting MRI inputs at varying resolutions without the need for retraining, significantly further enhancing deployment flexibility. The clinical relevance of our proposed method is validated using clinical data from Massachusetts General Hospital. Our code is available at https://github.com/ZWang78/SR. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2410.06997 [pdf, other]

Feasibility Study of a Diffusion-Based Model for Cross-Modal Generation of Knee MRI from X-ray: Integrating Radiographic Feature Information

Authors: Zhe Wang, Yung Hsin Chen, Aladine Chetouani, Fabian Bauer, Yuhua Ru, Fang Chen, Liping Zhang, Rachid Jennane, Mohamed Jarraya

Abstract: Knee osteoarthritis (KOA) is a prevalent musculoskeletal disorder, often diagnosed using X-rays due to its cost-effectiveness. While Magnetic Resonance Imaging (MRI) provides superior soft tissue visualization and serves as a valuable supplementary diagnostic tool, its high cost and limited accessibility significantly restrict its widespread use. To explore the feasibility of bridging this imaging… ▽ More Knee osteoarthritis (KOA) is a prevalent musculoskeletal disorder, often diagnosed using X-rays due to its cost-effectiveness. While Magnetic Resonance Imaging (MRI) provides superior soft tissue visualization and serves as a valuable supplementary diagnostic tool, its high cost and limited accessibility significantly restrict its widespread use. To explore the feasibility of bridging this imaging gap, we conducted a feasibility study leveraging a diffusion-based model that uses an X-ray image as conditional input, alongside target depth and additional patient-specific feature information, to generate corresponding MRI sequences. Our findings demonstrate that the MRI volumes generated by our approach is visually closer to real MRI scans. Moreover, increasing inference steps enhances the continuity and smoothness of the synthesized MRI sequences. Through ablation studies, we further validate that integrating supplementary patient-specific information, beyond what X-rays alone can provide, enhances the accuracy and clinical relevance of the generated MRI, which underscores the potential of leveraging external patient-specific information to improve the MRI generation. This study is available at https://zwang78.github.io/. △ Less

Submitted 27 December, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

arXiv:2309.06945 [pdf, ps, other]

doi 10.1109/ISM.2018.00063

Improving HEVC Encoding of Rendered Video Data Using True Motion Information

Authors: Christian Herglotz, David Müller, Andreas Weinlich, Frank Bauer, Michael Ortner, Marc Stamminger, André Kaup

Abstract: This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation meth… ▽ More This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation methods, it is proposed to exploit the computer generated motion vectors to enhance the ratedistortion performance. To this end, a motion vector mapping method including disocclusion handling is presented. It is shown that mean rate savings of 3.78% can be achieved. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 4 pages, 4 figures

Journal ref: Proc. 2018 IEEE International Symposium on Multimedia (ISM)

arXiv:2303.13203 [pdf, other]

Confidence-Driven Deep Learning Framework for Early Detection of Knee Osteoarthritis

Authors: Zhe Wang, Aladine Chetouani, Yung Hsin Chen, Yuhua Ru, Fang Chen, Mohamed Jarraya, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane

Abstract: Knee Osteoarthritis (KOA) is a prevalent musculoskeletal disorder that severely impacts mobility and quality of life, particularly among older adults. Its diagnosis often relies on subjective assessments using the Kellgren-Lawrence (KL) grading system, leading to variability in clinical evaluations. To address these challenges, we propose a confidence-driven deep learning framework for early KOA d… ▽ More Knee Osteoarthritis (KOA) is a prevalent musculoskeletal disorder that severely impacts mobility and quality of life, particularly among older adults. Its diagnosis often relies on subjective assessments using the Kellgren-Lawrence (KL) grading system, leading to variability in clinical evaluations. To address these challenges, we propose a confidence-driven deep learning framework for early KOA detection, focusing on distinguishing KL-0 and KL-2 stages. The Siamese-based framework integrates a novel multi-level feature extraction architecture with a hybrid loss strategy. Specifically, multi-level Global Average Pooling (GAP) layers are employed to extract features from varying network depths, ensuring comprehensive feature representation, while the hybrid loss strategy partitions training samples into high-, medium-, and low-confidence subsets. Tailored loss functions are applied to improve model robustness and effectively handle uncertainty in annotations. Experimental results on the Osteoarthritis Initiative (OAI) dataset demonstrate that the proposed framework achieves competitive accuracy, sensitivity, and specificity, comparable to those of expert radiologists. Cohen's kappa values (k > 0.85)) confirm substantial agreement, while McNemar's test (p > 0.05) indicates no statistically significant differences between the model and radiologists. Additionally, Confidence distribution analysis reveals that the model emulates radiologists' decision-making patterns. These findings highlight the potential of the proposed approach to serve as an auxiliary diagnostic tool, enhancing early KOA detection and reducing clinical workload. △ Less

Submitted 15 January, 2025; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2302.13336 [pdf, other]

Key-Exchange Convolutional Auto-Encoder for Data Augmentation in Early Knee Osteoarthritis Detection

Authors: Zhe Wang, Aladine Chetouani, Mohamed Jarraya, Yung Hsin Chen, Yuhua Ru, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane

Abstract: Knee Osteoarthritis (KOA) is a common musculoskeletal condition that significantly affects mobility and quality of life, particularly in elderly populations. However, training deep learning models for early KOA classification is often hampered by the limited availability of annotated medical datasets, owing to the high costs and labour-intensive nature of data labelling. Traditional data augmentat… ▽ More Knee Osteoarthritis (KOA) is a common musculoskeletal condition that significantly affects mobility and quality of life, particularly in elderly populations. However, training deep learning models for early KOA classification is often hampered by the limited availability of annotated medical datasets, owing to the high costs and labour-intensive nature of data labelling. Traditional data augmentation techniques, while useful, rely on simple transformations and fail to introduce sufficient diversity into the dataset. To address these challenges, we propose the Key-Exchange Convolutional Auto-Encoder (KECAE) as an innovative Artificial Intelligence (AI)-based data augmentation strategy for early KOA classification. Our model employs a convolutional autoencoder with a novel key-exchange mechanism that generates synthetic images by selectively exchanging key pathological features between X-ray images, which not only diversifies the dataset but also ensures the clinical validity of the augmented data. A hybrid loss function is introduced to supervise feature learning and reconstruction, integrating multiple components, including reconstruction, supervision, and feature separation losses. Experimental results demonstrate that the KECAE-generated data significantly improve the performance of KOA classification models, with accuracy gains of up to 1.98% across various standard and state-of-the-art architectures. Furthermore, a clinical validation study involving expert radiologists confirms the anatomical plausibility and diagnostic realism of the synthetic outputs. These findings highlight the potential of KECAE as a robust tool for augmenting medical datasets in early KOA detection. △ Less

Submitted 15 January, 2025; v1 submitted 26 February, 2023; originally announced February 2023.

arXiv:2012.01582 [pdf, other]

doi 10.1007/s11548-021-02372-7

Generation of annotated multimodal ground truth datasets for abdominal medical image registration

Authors: Dominik F. Bauer, Tom Russ, Barbara I. Waldkirch, Christian Tönnes, William P. Segars, Lothar R. Schad, Frank G. Zöllner, Alena-Kathrin Golla

Abstract: Sparsity of annotated data is a major limitation in medical image processing tasks such as registration. Registered multimodal image data are essential for the diagnosis of medical conditions and the success of interventional medical procedures. To overcome the shortage of data, we present a method that allows the generation of annotated multimodal 4D datasets. We use a CycleGAN network architectu… ▽ More Sparsity of annotated data is a major limitation in medical image processing tasks such as registration. Registered multimodal image data are essential for the diagnosis of medical conditions and the success of interventional medical procedures. To overcome the shortage of data, we present a method that allows the generation of annotated multimodal 4D datasets. We use a CycleGAN network architecture to generate multimodal synthetic data from the 4D extended cardiac-torso (XCAT) phantom and real patient data. Organ masks are provided by the XCAT phantom, therefore the generated dataset can serve as ground truth for image segmentation and registration. Realistic simulation of respiration and heartbeat is possible within the XCAT framework. To underline the usability as a registration ground truth, a proof of principle registration is performed. Compared to real patient data, the synthetic data showed good agreement regarding the image voxel intensity distribution and the noise characteristics. The generated T1-weighted magnetic resonance imaging (MRI), computed tomography (CT), and cone beam CT (CBCT) images are inherently co-registered. Thus, the synthetic dataset allowed us to optimize registration parameters of a multimodal non-rigid registration, utilizing liver organ masks for evaluation. Our proposed framework provides not only annotated but also multimodal synthetic data which can serve as a ground truth for various tasks in medical imaging processing. We demonstrated the applicability of synthetic data for the development of multimodal medical image registration algorithms. △ Less

Submitted 23 July, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: 12 pages, 5 figures. This work has been published in the International Journal of Computer Assisted Radiology and Surgery volume

Journal ref: Int J CARS 16, 1277-1285 (2021)

arXiv:1911.05521 [pdf, other]

doi 10.1109/TBCAS.2019.2953001

Real-time ultra-low power ECG anomaly detection using an event-driven neuromorphic processor

Authors: Felix Christian Bauer, Dylan Richard Muir, Giacomo Indiveri

Abstract: Accurate detection of pathological conditions in human subjects can be achieved through off-line analysis of recorded biological signals such as electrocardiograms (ECGs). However, human diagnosis is time-consuming and expensive, as it requires the time of medical professionals. This is especially inefficient when indicative patterns in the biological signals are infrequent. Moreover, patients wit… ▽ More Accurate detection of pathological conditions in human subjects can be achieved through off-line analysis of recorded biological signals such as electrocardiograms (ECGs). However, human diagnosis is time-consuming and expensive, as it requires the time of medical professionals. This is especially inefficient when indicative patterns in the biological signals are infrequent. Moreover, patients with suspected pathologies are often monitored for extended periods, requiring the storage and examination of large amounts of non-pathological data, and entailing a difficult visual search task for diagnosing professionals. In this work we propose a compact and sub-mW low power neural processing system that can be used to perform on-line and real-time preliminary diagnosis of pathological conditions, to raise warnings for the existence of possible pathological conditions, or to trigger an off-line data recording system for further analysis by a medical professional. We apply the system to real-time classification of ECG data for distinguishing between healthy heartbeats and pathological rhythms. Multi-channel analog ECG traces are encoded as asynchronous streams of binary events and processed using a spiking recurrent neural network operated in a reservoir computing paradigm. An event-driven neuron output layer is then trained to recognize one of several pathologies. Finally, the filtered activity of this output layer is used to generate a binary trigger signal indicating the presence or absence of a pathological pattern. We validate the approach proposed using a Dynamic Neuromorphic Asynchronous Processor (DYNAP) chip, implemented using a standard 180 nm CMOS VLSI process, and present experimental results measured from the chip. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Showing 1–9 of 9 results for author: Bauer, F