FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention

Zeng, Jun; Santosh, KC; Nayak, Deepak Rajan; de Lange, Thomas; Varkey, Jonas; Berzin, Tyler; Jha, Debesh

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2504.13597 (eess)

[Submitted on 18 Apr 2025]

Title:FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention

Authors:Jun Zeng, KC Santosh, Deepak Rajan Nayak, Thomas de Lange, Jonas Varkey, Tyler Berzin, Debesh Jha

View PDF HTML (experimental)

Abstract:Colonoscopy is vital in the early diagnosis of colorectal polyps. Regular screenings can effectively prevent benign polyps from progressing to CRC. While deep learning has made impressive strides in polyp segmentation, most existing models are trained on single-modality and single-center data, making them less effective in real-world clinical environments. To overcome these limitations, we propose FocusNet, a Transformer-enhanced focus attention network designed to improve polyp segmentation. FocusNet incorporates three essential modules: the Cross-semantic Interaction Decoder Module (CIDM) for generating coarse segmentation maps, the Detail Enhancement Module (DEM) for refining shallow features, and the Focus Attention Module (FAM), to balance local detail and global context through local and pooling attention mechanisms. We evaluate our model on PolypDB, a newly introduced dataset with multi-modality and multi-center data for building more reliable segmentation methods. Extensive experiments showed that FocusNet consistently outperforms existing state-of-the-art approaches with a high dice coefficients of 82.47% on the BLI modality, 88.46% on FICE, 92.04% on LCI, 82.09% on the NBI and 93.42% on WLI modality, demonstrating its accuracy and robustness across five different modalities. The source code for FocusNet is available at this https URL.

Comments:	9 pages, 6 figures
Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2504.13597 [eess.IV]
	(or arXiv:2504.13597v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2504.13597

Submission history

From: Jun Zeng [view email]
[v1] Fri, 18 Apr 2025 09:59:26 UTC (4,021 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators