-
Cracking Instance Jigsaw Puzzles: An Alternative to Multiple Instance Learning for Whole Slide Image Analysis
Authors:
Xiwen Chen,
Peijie Qiu,
Wenhui Zhu,
Hao Wang,
Huayu Li,
Xuanzhao Dong,
Xiaotong Sun,
Xiaobing Yu,
Yalin Wang,
Abolfazl Razi,
Aristeidis Sotiras
Abstract:
While multiple instance learning (MIL) has shown to be a promising approach for histopathological whole slide image (WSI) analysis, its reliance on permutation invariance significantly limits its capacity to effectively uncover semantic correlations between instances within WSIs. Based on our empirical and theoretical investigations, we argue that approaches that are not permutation-invariant but…
▽ More
While multiple instance learning (MIL) has shown to be a promising approach for histopathological whole slide image (WSI) analysis, its reliance on permutation invariance significantly limits its capacity to effectively uncover semantic correlations between instances within WSIs. Based on our empirical and theoretical investigations, we argue that approaches that are not permutation-invariant but better capture spatial correlations between instances can offer more effective solutions. In light of these findings, we propose a novel alternative to existing MIL for WSI analysis by learning to restore the order of instances from their randomly shuffled arrangement. We term this task as cracking an instance jigsaw puzzle problem, where semantic correlations between instances are uncovered. To tackle the instance jigsaw puzzles, we propose a novel Siamese network solution, which is theoretically justified by optimal transport theory. We validate the proposed method on WSI classification and survival prediction tasks, where the proposed method outperforms the recent state-of-the-art MIL competitors. The code is available at https://github.com/xiwenc1/MIL-JigsawPuzzles.
△ Less
Submitted 10 July, 2025;
originally announced July 2025.
-
How Effective Can Dropout Be in Multiple Instance Learning ?
Authors:
Wenhui Zhu,
Peijie Qiu,
Xiwen Chen,
Zhangsihao Yang,
Aristeidis Sotiras,
Abolfazl Razi,
Yalin Wang
Abstract:
Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is…
▽ More
Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is well-known that this suboptimal training scheme suffers from "noisy" feature embeddings from the backbone and inherent weak supervision, hindering MIL from learning rich and generalizable features. However, the most commonly used technique (i.e., dropout) for mitigating this issue has yet to be explored in MIL. In this paper, we empirically explore how effective the dropout can be in MIL. Interestingly, we observe that dropping the top-k most important instances within a bag leads to better performance and generalization even under noise attack. Based on this key observation, we propose a novel MIL-specific dropout method, termed MIL-Dropout, which systematically determines which instances to drop. Experiments on five MIL benchmark datasets and two WSI datasets demonstrate that MIL-Dropout boosts the performance of current MIL methods with a negligible computational cost. The code is available at https://github.com/ChongQingNoSubway/MILDropout.
△ Less
Submitted 20 May, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement
Authors:
Wenhui Zhu,
Xuanzhao Dong,
Xin Li,
Yujian Xiong,
Xiwen Chen,
Peijie Qiu,
Vamsi Krishna Vasa,
Zhangsihao Yang,
Yi Su,
Oana Dumitrascu,
Yalin Wang
Abstract:
Over the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical r…
▽ More
Over the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical research (e.g., Vessel morphology consistency). 2) There is a lack of comprehensive evaluation for both paired and unpaired enhancement methods, along with the need for expert protocols to accurately assess clinical value. 3) An ideal evaluation system should provide insights to inform future developments of fundus image enhancement. To this end, we propose a novel comprehensive benchmark, EyeBench, to provide insights that align enhancement models with clinical needs, offering a foundation for future work to improve the clinical relevance and applicability of generative models for fundus image enhancement. EyeBench has three appealing properties: 1) multi-dimensional clinical alignment downstream evaluation: In addition to evaluating the enhancement task, we provide several clinically significant downstream tasks for fundus images, including vessel segmentation, DR grading, denoising generalization, and lesion segmentation. 2) Medical expert-guided evaluation design: We introduce a novel dataset that promote comprehensive and fair comparisons between paired and unpaired methods and includes a manual evaluation protocol by medical experts. 3) Valuable insights: Our benchmark study provides a comprehensive and rigorous evaluation of existing methods across different downstream tasks, assisting medical experts in making informed choices. Additionally, we offer further analysis of the challenges faced by existing methods. The code is available at \url{https://github.com/Retinal-Research/EyeBench}
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction
Authors:
Peijie Qiu,
Satrajit Chakrabarty,
Phuc Nguyen,
Soumyendu Sekhar Ghosh,
Aristeidis Sotiras
Abstract:
Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control…
▽ More
Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans in recent years. However, the reliability of these tools is hampered by the presence of poor-quality segmentation outliers, particularly in out-of-distribution samples, making their implementation in clinical practice difficult. Therefore, there is a need for quality control (QC) to screen the quality of the segmentation results. Although numerous automatic QC methods have been developed for segmentation quality screening, most were designed for cardiac MRI segmentation, which involves a single modality and a single tissue type. Furthermore, most prior works only provided subject-level predictions of segmentation quality and did not identify erroneous parts segmentation that may require refinement. To address these limitations, we proposed a novel multi-task deep learning architecture, termed QCResUNet, which produces subject-level segmentation-quality measures as well as voxel-level segmentation error maps for each available tissue class. To validate the effectiveness of the proposed method, we conducted experiments on assessing its performance on evaluating the quality of two distinct segmentation tasks. First, we aimed to assess the quality of brain tumor segmentation results. For this task, we performed experiments on one internal and two external datasets. Second, we aimed to evaluate the segmentation quality of cardiac Magnetic Resonance Imaging (MRI) data from the Automated Cardiac Diagnosis Challenge. The proposed method achieved high performance in predicting subject-level segmentation-quality metrics and accurately identifying segmentation errors on a voxel basis. This has the potential to be used to guide human-in-the-loop feedback to improve segmentations in clinical settings.
△ Less
Submitted 3 July, 2025; v1 submitted 9 December, 2024;
originally announced December 2024.
-
STA-Unet: Rethink the semantic redundant for Medical Imaging Segmentation
Authors:
Vamsi Krishna Vasa,
Wenhui Zhu,
Xiwen Chen,
Peijie Qiu,
Xuanzhao Dong,
Yalin Wang
Abstract:
In recent years, significant progress has been made in the medical image analysis domain using convolutional neural networks (CNNs). In particular, deep neural networks based on a U-shaped architecture (UNet) with skip connections have been adopted for several medical imaging tasks, including organ segmentation. Despite their great success, CNNs are not good at learning global or semantic features…
▽ More
In recent years, significant progress has been made in the medical image analysis domain using convolutional neural networks (CNNs). In particular, deep neural networks based on a U-shaped architecture (UNet) with skip connections have been adopted for several medical imaging tasks, including organ segmentation. Despite their great success, CNNs are not good at learning global or semantic features. Especially ones that require human-like reasoning to understand the context. Many UNet architectures attempted to adjust with the introduction of Transformer-based self-attention mechanisms, and notable gains in performance have been noted. However, the transformers are inherently flawed with redundancy to learn at shallow layers, which often leads to an increase in the computation of attention from the nearby pixels offering limited information. The recently introduced Super Token Attention (STA) mechanism adapts the concept of superpixels from pixel space to token space, using super tokens as compact visual representations. This approach tackles the redundancy by learning efficient global representations in vision transformers, especially for the shallow layers. In this work, we introduce the STA module in the UNet architecture (STA-UNet), to limit redundancy without losing rich information. Experimental results on four publicly available datasets demonstrate the superiority of STA-UNet over existing state-of-the-art architectures in terms of Dice score and IOU for organ segmentation tasks. The code is available at \url{https://github.com/Retinal-Research/STA-UNet}.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement
Authors:
Xuanzhao Dong,
Vamsi Krishna Vasa,
Wenhui Zhu,
Peijie Qiu,
Xiwen Chen,
Yi Su,
Yujian Xiong,
Zhangsihao Yang,
Yanxi Chen,
Yalin Wang
Abstract:
Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schrödinge…
▽ More
Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schrödinger Bridge (SB), offers a more stable solution by utilizing Optimal Transport (OT) theory to model a stochastic differential equation (SDE) between two arbitrary distributions. This allows SB to effectively transform low-quality retinal images into their high-quality counterparts. In this work, we leverage the SB framework to propose an image-to-image translation pipeline for retinal image enhancement. Additionally, previous methods often fail to capture fine structural details, such as blood vessels. To address this, we enhance our pipeline by introducing Dynamic Snake Convolution, whose tortuous receptive field can better preserve tubular structures. We name the resulting retinal fundus image enhancement framework the Context-aware Unpaired Neural Schrödinger Bridge (CUNSB-RFIE). To the best of our knowledge, this is the first endeavor to use the SB approach for retinal image enhancement. Experimental results on a large-scale dataset demonstrate the advantage of the proposed method compared to several state-of-the-art supervised and unsupervised methods in terms of image quality and performance on downstream tasks.The code is available at https://github.com/Retinal-Research/CUNSB-RFIE .
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
D2-MLP: Dynamic Decomposed MLP Mixer for Medical Image Segmentation
Authors:
Jin Yang,
Xiaobing Yu,
Peijie Qiu
Abstract:
Convolutional neural networks are widely used in various segmentation tasks in medical images. However, they are challenged to learn global features adaptively due to the inherent locality of convolutional operations. In contrast, MLP Mixers are proposed as a backbone to learn global information across channels with low complexity. However, they cannot capture spatial features efficiently. Additio…
▽ More
Convolutional neural networks are widely used in various segmentation tasks in medical images. However, they are challenged to learn global features adaptively due to the inherent locality of convolutional operations. In contrast, MLP Mixers are proposed as a backbone to learn global information across channels with low complexity. However, they cannot capture spatial features efficiently. Additionally, they lack effective mechanisms to fuse and mix features adaptively. To tackle these limitations, we propose a novel Dynamic Decomposed Mixer module. It is designed to employ novel Mixers to extract features and aggregate information across different spatial locations and channels. Additionally, it employs novel dynamic mixing mechanisms to model inter-dependencies between channel and spatial feature representations and to fuse them adaptively. Subsequently, we incorporate it into a U-shaped Transformer-based architecture to generate a novel network, termed the Dynamic Decomposed MLP Mixer. We evaluated it for medical image segmentation on two datasets, and it achieved superior segmentation performance than other state-of-the-art methods.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement
Authors:
Vamsi Krishna Vasa,
Peijie Qiu,
Wenhui Zhu,
Yujian Xiong,
Oana Dumitrascu,
Yalin Wang
Abstract:
Retinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distributi…
▽ More
Retinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distribution alignment problem, by finding a one-to-one mapping between a low-quality image and its high-quality counterpart. This paper proposes a context-informed optimal transport (OT) learning framework for tackling unpaired fundus image enhancement. In contrast to standard generative image enhancement methods, which struggle with handling contextual information (e.g., over-tampered local structures and unwanted artifacts), the proposed context-aware OT learning paradigm better preserves local structures and minimizes unwanted artifacts. Leveraging deep contextual features, we derive the proposed context-aware OT using the earth mover's distance and show that the proposed context-OT has a solid theoretical guarantee. Experimental results on a large-scale dataset demonstrate the superiority of the proposed method over several state-of-the-art supervised and unsupervised methods in terms of signal-to-noise ratio, structural similarity index, as well as two downstream tasks. The code is available at \url{https://github.com/Retinal-Research/Contextual-OT}.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
RBAD: A Dataset and Benchmark for Retinal Vessels Branching Angle Detection
Authors:
Hao Wang,
Wenhui Zhu,
Jiayou Qin,
Xin Li,
Oana Dumitrascu,
Xiwen Chen,
Peijie Qiu,
Abolfazl Razi
Abstract:
Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured ima…
▽ More
Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured image processing technique. Additionally, we offer an open-source annotation tool and a benchmark dataset comprising 40 images annotated with retinal branching angles. Our methodology for retinal branching angle detection and calculation is detailed, followed by a benchmark analysis comparing our method with previous approaches. The results indicate that our method is robust under various conditions with high accuracy and efficiency, which offers a valuable instrument for ophthalmic research and clinical applications.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
Authors:
Wenhui Zhu,
Xiwen Chen,
Peijie Qiu,
Aristeidis Sotiras,
Abolfazl Razi,
Yalin Wang
Abstract:
Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity mode…
▽ More
Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity modeling, which empirically show inferior performance but with a high computational cost. To bridge this gap, we propose a novel MIL aggregation method based on diverse global representation (DGR-MIL), by modeling diversity among instances through a set of global vectors that serve as a summary of all instances. First, we turn the instance correlation into the similarity between instance embeddings and the predefined global vectors through a cross-attention mechanism. This stems from the fact that similar instance embeddings typically would result in a higher correlation with a certain global vector. Second, we propose two mechanisms to enforce the diversity among the global vectors to be more descriptive of the entire bag: (i) positive instance alignment and (ii) a novel, efficient, and theoretically guaranteed diversification learning paradigm. Specifically, the positive instance alignment module encourages the global vectors to align with the center of positive instances (e.g., instances containing tumors in WSI). To further diversify the global representations, we propose a novel diversification learning paradigm leveraging the determinantal point process. The proposed model outperforms the state-of-the-art MIL aggregation models by a substantial margin on the CAMELYON-16 and the TCGA-lung cancer datasets. The code is available at \url{https://github.com/ChongQingNoSubway/DGR-MIL}.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation
Authors:
Wenhui Zhu,
Xiwen Chen,
Peijie Qiu,
Mohammad Farazi,
Aristeidis Sotiras,
Abolfazl Razi,
Yalin Wang
Abstract:
Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto…
▽ More
Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important factors that potentially affect its performance: (i) irrelative feature learned caused by asymmetric supervision; (ii) feature redundancy in the feature map. To this end, we propose to balance the supervision between encoder and decoder and reduce the redundant information in the UNet. Specifically, we use the feature map that contains the most semantic information (i.e., the last layer of the decoder) to provide additional supervision to other blocks to provide additional supervision and reduce feature redundancy by leveraging feature distillation. The proposed method can be easily integrated into existing UNet architecture in a plug-and-play fashion with negligible computational cost. The experimental results suggest that the proposed method consistently improves the performance of standard UNets on four medical image segmentation datasets. The code is available at \url{https://github.com/ChongQingNoSubway/SelfReg-UNet}
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation
Authors:
Peijie Qiu,
Jin Yang,
Sayantan Kumar,
Soumyendu Sekhar Ghosh,
Aristeidis Sotiras
Abstract:
In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in a variety of medical image segmentation tasks. Recently, the introduction of the vision transformer (ViT) has significantly altered the landscape of deep segmentation models. There has been a growing focus on ViTs, driven by their excellent performance and scalabilit…
▽ More
In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in a variety of medical image segmentation tasks. Recently, the introduction of the vision transformer (ViT) has significantly altered the landscape of deep segmentation models. There has been a growing focus on ViTs, driven by their excellent performance and scalability. However, we argue that the current design of the vision transformer-based UNet (ViT-UNet) segmentation models may not effectively handle the heterogeneous appearance (e.g., varying shapes and sizes) of objects of interest in medical image segmentation tasks. To tackle this challenge, we present a structured approach to introduce spatially dynamic components to the ViT-UNet. This adaptation enables the model to effectively capture features of target objects with diverse appearances. This is achieved by three main components: \textbf{(i)} deformable patch embedding; \textbf{(ii)} spatially dynamic multi-head attention; \textbf{(iii)} deformable positional encoding. These components were integrated into a novel architecture, termed AgileFormer. AgileFormer is a spatially agile ViT-UNet designed for medical image segmentation. Experiments in three segmentation tasks using publicly available datasets demonstrated the effectiveness of the proposed method. The code is available at \href{https://github.com/sotiraslab/AgileFormer}{https://github.com/sotiraslab/AgileFormer}.
△ Less
Submitted 16 September, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
D-Net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation
Authors:
Jin Yang,
Peijie Qiu,
Yichi Zhang,
Daniel S. Marcus,
Aristeidis Sotiras
Abstract:
Hierarchical transformers have achieved significant success in medical image segmentation due to their large receptive field and capabilities of effectively leveraging global long-range contextual information. Convolutional neural networks (CNNs) can also deliver a large receptive field by using large kernels, enabling them to achieve competitive performance with fewer model parameters. However, C…
▽ More
Hierarchical transformers have achieved significant success in medical image segmentation due to their large receptive field and capabilities of effectively leveraging global long-range contextual information. Convolutional neural networks (CNNs) can also deliver a large receptive field by using large kernels, enabling them to achieve competitive performance with fewer model parameters. However, CNNs incorporated with large convolutional kernels remain constrained in adaptively capturing multi-scale features from organs with large variations in shape and size due to the employment of fixed-sized kernels. Additionally, they are unable to utilize global contextual information efficiently. To address these limitations, we propose Dynamic Large Kernel (DLK) and Dynamic Feature Fusion (DFF) modules. The DLK module employs multiple large kernels with varying kernel sizes and dilation rates to capture multi-scale features. Subsequently, a dynamic selection mechanism is utilized to adaptively highlight the most important spatial features based on global information. Additionally, the DFF module is proposed to adaptively fuse multi-scale local feature maps based on their global information. We integrate DLK and DFF in a hierarchical transformer architecture to develop a novel architecture, termed D-Net. D-Net is able to effectively utilize a multi-scale large receptive field and adaptively harness global contextual information. Extensive experimental results demonstrate that D-Net outperforms other state-of-the-art models in the two volumetric segmentation tasks, including abdominal multi-organ segmentation and multi-modality brain tumor segmentation. Our code is available at https://github.com/sotiraslab/DLK.
△ Less
Submitted 16 October, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
nnMobileNet: Rethinking CNN for Retinopathy Research
Authors:
Wenhui Zhu,
Peijie Qiu,
Xiwen Chen,
Xin Li,
Natasha Lepore,
Oana M. Dumitrascu,
Yalin Wang
Abstract:
Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability…
▽ More
Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET
△ Less
Submitted 15 April, 2024; v1 submitted 2 June, 2023;
originally announced June 2023.
-
SC-VAE: Sparse Coding-based Variational Autoencoder with Learned ISTA
Authors:
Pan Xiao,
Peijie Qiu,
Sungmin Ha,
Abdalla Bani,
Shuang Zhou,
Aristeidis Sotiras
Abstract:
Learning rich data representations from unlabeled data is a key challenge towards applying deep learning algorithms in downstream tasks. Several variants of variational autoencoders (VAEs) have been proposed to learn compact data representations by encoding high-dimensional data in a lower dimensional space. Two main classes of VAEs methods may be distinguished depending on the characteristics of…
▽ More
Learning rich data representations from unlabeled data is a key challenge towards applying deep learning algorithms in downstream tasks. Several variants of variational autoencoders (VAEs) have been proposed to learn compact data representations by encoding high-dimensional data in a lower dimensional space. Two main classes of VAEs methods may be distinguished depending on the characteristics of the meta-priors that are enforced in the representation learning step. The first class of methods derives a continuous encoding by assuming a static prior distribution in the latent space. The second class of methods learns instead a discrete latent representation using vector quantization (VQ) along with a codebook. However, both classes of methods suffer from certain challenges, which may lead to suboptimal image reconstruction results. The first class suffers from posterior collapse, whereas the second class suffers from codebook collapse. To address these challenges, we introduce a new VAE variant, termed sparse coding-based VAE with learned ISTA (SC-VAE), which integrates sparse coding within variational autoencoder framework. The proposed method learns sparse data representations that consist of a linear combination of a small number of predetermined orthogonal atoms. The sparse coding problem is solved using a learnable version of the iterative shrinkage thresholding algorithm (ISTA). Experiments on two image datasets demonstrate that our model achieves improved image reconstruction results compared to state-of-the-art methods. Moreover, we demonstrate that the use of learned sparse code vectors allows us to perform downstream tasks like image generation and unsupervised image segmentation through clustering image patches.
△ Less
Submitted 10 January, 2024; v1 submitted 29 March, 2023;
originally announced March 2023.
-
OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing
Authors:
Wenhui Zhu,
Peijie Qiu,
Oana M. Dumitrascu,
Jacob M. Sobczak,
Mohammad Farazi,
Zhangsihao Yang,
Keshav Nandakumar,
Yalin Wang
Abstract:
Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an…
▽ More
Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as regularization by enhancing (RE). We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.
△ Less
Submitted 8 April, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Optimal Transport Guided Unsupervised Learning for Enhancing low-quality Retinal Images
Authors:
Wenhui Zhu,
Peijie Qiu,
Mohammad Farazi,
Keshav Nandakumar,
Oana M. Dumitrascu,
Yalin Wang
Abstract:
Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we propo…
▽ More
Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we proposed an unpaired image-to-image translation scheme for transporting low-quality images to their high-quality counterparts. We theoretically proved that a Generative Adversarial Networks (GAN) model with a generator and discriminator is sufficient for this task. Furthermore, to mitigate the inconsistency of information between the low-quality images and their enhancements, an information consistency mechanism was proposed to maximally maintain structural consistency (optical discs, blood vessels, lesions) between the source and enhanced domains. Extensive experiments were conducted on the EyeQ dataset to demonstrate the superiority of our proposed method perceptually and quantitatively.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Self-Supervised Equivariant Regularization Reconciles Multiple Instance Learning: Joint Referable Diabetic Retinopathy Classification and Lesion Segmentation
Authors:
Wenhui Zhu,
Peijie Qiu,
Natasha Lepore,
Oana M. Dumitrascu,
Yalin Wang
Abstract:
Lesion appearance is a crucial clue for medical providers to distinguish referable diabetic retinopathy (rDR) from non-referable DR. Most existing large-scale DR datasets contain only image-level labels rather than pixel-based annotations. This motivates us to develop algorithms to classify rDR and segment lesions via image-level labels. This paper leverages self-supervised equivariant learning an…
▽ More
Lesion appearance is a crucial clue for medical providers to distinguish referable diabetic retinopathy (rDR) from non-referable DR. Most existing large-scale DR datasets contain only image-level labels rather than pixel-based annotations. This motivates us to develop algorithms to classify rDR and segment lesions via image-level labels. This paper leverages self-supervised equivariant learning and attention-based multi-instance learning (MIL) to tackle this problem. MIL is an effective strategy to differentiate positive and negative instances, helping us discard background regions (negative instances) while localizing lesion regions (positive ones). However, MIL only provides coarse lesion localization and cannot distinguish lesions located across adjacent patches. Conversely, a self-supervised equivariant attention mechanism (SEAM) generates a segmentation-level class activation map (CAM) that can guide patch extraction of lesions more accurately. Our work aims at integrating both methods to improve rDR classification accuracy. We conduct extensive validation experiments on the Eyepacs dataset, achieving an area under the receiver operating characteristic curve (AU ROC) of 0.958, outperforming current state-of-the-art algorithms.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
CHARLES: Channel-Quality-Adaptive Over-the-Air Federated Learning over Wireless Networks
Authors:
Jiayu Mao,
Haibo Yang,
Peiwen Qiu,
Jia Liu,
Aylin Yener
Abstract:
Over-the-air federated learning (OTA-FL) has emerged as an efficient mechanism that exploits the superposition property of the wireless medium and performs model aggregation for federated learning in the air. OTA-FL is naturally sensitive to wireless channel fading, which could significantly diminish its learning accuracy. To address this challenge, in this paper, we propose an OTA-FL algorithm ca…
▽ More
Over-the-air federated learning (OTA-FL) has emerged as an efficient mechanism that exploits the superposition property of the wireless medium and performs model aggregation for federated learning in the air. OTA-FL is naturally sensitive to wireless channel fading, which could significantly diminish its learning accuracy. To address this challenge, in this paper, we propose an OTA-FL algorithm called CHARLES (channel-quality-aware over-the-air local estimating and scaling). Our CHARLES algorithm performs channel state information (CSI) estimation and adaptive scaling to mitigate the impacts of wireless channel fading. We establish the theoretical convergence rate performance of CHARLES and analyze the impacts of CSI error on the convergence of CHARLES. We show that the adaptive channel inversion scaling scheme in CHARLES is robust under imperfect CSI scenarios. We also demonstrate through numerical results that CHARLES outperforms existing OTA-FL algorithms with heterogeneous data under imperfect CSI.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Image watermarking and fusion based on Fourier single-pixel imaging with weighed light source
Authors:
Zhiyuan Ye,
Panghe Qiu,
Haibo Wang,
Jun Xiong,
Kaige Wang
Abstract:
In previous single-pixel imaging systems, the light source was generally idle with respect to time. Here, we propose a novel image fusion and visible watermarking scheme based on Fourier single-pixel imaging (FSPI) with a multiplexed time-varying (TV) signal, which is generated by the watermark pattern hidden in the light source. We call this scheme as TV-FSPI. With TV-FSPI, we can realize high-qu…
▽ More
In previous single-pixel imaging systems, the light source was generally idle with respect to time. Here, we propose a novel image fusion and visible watermarking scheme based on Fourier single-pixel imaging (FSPI) with a multiplexed time-varying (TV) signal, which is generated by the watermark pattern hidden in the light source. We call this scheme as TV-FSPI. With TV-FSPI, we can realize high-quality visible image watermarking, encrypted image watermarking and full-color visible image watermarking. We also discuss the extension to invisible watermarking based on TV-FSPI. Furthermore, we don't have to recode illumination patterns, because TV-FSPI can be extended to existing mainstream illumination patterns, such as random illumination mode and Hadamard illumination mode. Thus TV-FSPI has the potential to be used in single-pixel broadcasting system and multi-spectral single-pixel imaging system.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.