Search | arXiv e-print repository

Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

Authors: Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang

Abstract: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on r… ▽ More High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications. △ Less

Submitted 3 May, 2025; originally announced May 2025.

Comments: Under review at Neural Networks

arXiv:2309.00005 [pdf, other]

High Spectral Spatial Resolution Synthetic HyperSpectral Dataset form multi-source fusion

Authors: Yajie Sun, Ali Zia, Jun Zhou

Abstract: This research paper introduces a synthetic hyperspectral dataset that combines high spectral and spatial resolution imaging to achieve a comprehensive, accurate, and detailed representation of observed scenes or objects. Obtaining such desirable qualities is challenging when relying on a single camera. The proposed dataset addresses this limitation by leveraging three modalities: RGB, push-broom v… ▽ More This research paper introduces a synthetic hyperspectral dataset that combines high spectral and spatial resolution imaging to achieve a comprehensive, accurate, and detailed representation of observed scenes or objects. Obtaining such desirable qualities is challenging when relying on a single camera. The proposed dataset addresses this limitation by leveraging three modalities: RGB, push-broom visible hyperspectral camera, and snapshot infrared hyperspectral camera, each offering distinct spatial and spectral resolutions. Different camera systems exhibit varying photometric properties, resulting in a trade-off between spatial and spectral resolution. RGB cameras typically offer high spatial resolution but limited spectral resolution, while hyperspectral cameras possess high spectral resolution at the expense of spatial resolution. Moreover, hyperspectral cameras themselves employ different capturing techniques and spectral ranges, further complicating the acquisition of comprehensive data. By integrating the photometric properties of these modalities, a single synthetic hyperspectral image can be generated, facilitating the exploration of broader spectral-spatial relationships for improved analysis, monitoring, and decision-making across various fields. This paper emphasizes the importance of multi-modal fusion in producing a high-quality synthetic hyperspectral dataset with consistent spectral intervals between bands. △ Less

Submitted 25 June, 2023; originally announced September 2023.

Comments: IJCNN workshop on Multimodal Synthetic Data for Deep Neural Networks (MSynD), 2023

arXiv:2103.00286 [pdf, other]

A Novel Adaptive Deep Network for Building Footprint Segmentation

Authors: A. Ziaee, R. Dehbozorgi, M. Döller

Abstract: Building footprint segmentations for high resolution images are increasingly demanded for many remote sensing applications. By the emerging deep learning approaches, segmentation networks have made significant advances in the semantic segmentation of objects. However, these advances and the increased access to satellite images require the generation of accurate object boundaries in satellite image… ▽ More Building footprint segmentations for high resolution images are increasingly demanded for many remote sensing applications. By the emerging deep learning approaches, segmentation networks have made significant advances in the semantic segmentation of objects. However, these advances and the increased access to satellite images require the generation of accurate object boundaries in satellite images. In the current paper, we propose a novel network-based on Pix2Pix methodology to solve the problem of inaccurate boundaries obtained by converting satellite images into maps using segmentation networks in order to segment building footprints. To define the new network named G2G, our framework includes two generators where the first generator extracts localization features in order to merge them with the boundary features extracted from the second generator to segment all detailed building edges. Moreover, different strategies are implemented to enhance the quality of the proposed networks' results, implying that the proposed network outperforms state-of-the-art networks in segmentation accuracy with a large margin for all evaluation metrics. The implementation is available at https://github.com/A2Amir/A-Novel-Adaptive-Deep-Network-for-Building-Footprint-Segmentation. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: Deep Learning Semantic Segmentation, Building Footprint Segmentation, Conditional Generative Adversarial Networks(CGANs), Pix2Pix Network

arXiv:1907.02060 [pdf, ps, other]

Novel evaluation of surgical activity recognition models using task-based efficiency metrics

Authors: Aneeq Zia, Liheng Guo, Linlin Zhou, Irfan Essa, Anthony Jarc

Abstract: Purpose: Surgical task-based metrics (rather than entire procedure metrics) can be used to improve surgeon training and, ultimately, patient care through focused training interventions. Machine learning models to automatically recognize individual tasks or activities are needed to overcome the otherwise manual effort of video review. Traditionally, these models have been evaluated using frame-leve… ▽ More Purpose: Surgical task-based metrics (rather than entire procedure metrics) can be used to improve surgeon training and, ultimately, patient care through focused training interventions. Machine learning models to automatically recognize individual tasks or activities are needed to overcome the otherwise manual effort of video review. Traditionally, these models have been evaluated using frame-level accuracy. Here, we propose evaluating surgical activity recognition models by their effect on task-based efficiency metrics. In this way, we can determine when models have achieved adequate performance for providing surgeon feedback via metrics from individual tasks. Methods: We propose a new CNN-LSTM model, RP-Net-V2, to recognize the 12 steps of robotic-assisted radical prostatectomies (RARP). We evaluated our model both in terms of conventional methods (e.g. Jaccard Index, task boundary accuracy) as well as novel ways, such as the accuracy of efficiency metrics computed from instrument movements and system events. Results: Our proposed model achieves a Jaccard Index of 0.85 thereby outperforming previous models on robotic-assisted radical prostatectomies. Additionally, we show that metrics computed from tasks automatically identified using RP-Net-V2 correlate well with metrics from tasks labeled by clinical experts. Conclusions: We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies. We believe this approach and our results illustrate the potential for fully automated, post-operative efficiency reports. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Journal ref: International Journal of Computer Assisted Radiology and Surgery (IJCARS) 2019

Showing 1–4 of 4 results for author: Zia, A