S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Elhassan, Mohammed A. M.; Yang, Chenhui; Huang, Chenxi; Munea, Tewodros Legesse; Hong, Xin; Adam, Abuzar B. M.; Benabid, Amina

Computer Science > Computer Vision and Pattern Recognition

arXiv:2206.07298 (cs)

[Submitted on 15 Jun 2022 (v1), last revised 19 May 2024 (this version, v3)]

Title:S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Authors:Mohammed A. M. Elhassan, Chenhui Yang, Chenxi Huang, Tewodros Legesse Munea, Xin Hong, Abuzar B. M. Adam, Amina Benabid

View PDF HTML (experimental)

Abstract:Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accuracy/speed for real-time road scene semantic segmentation. Specifically, we proposed a lightweight model named Scale-aware Strip Attention Guided Feature Pyramid Network (S$^2$-FPN). Our network consists of three main modules: Attention Pyramid Fusion (APF) module, Scale-aware Strip Attention Module (SSAM), and Global Feature Upsample (GFU) module. APF adopts an attention mechanisms to learn discriminative multi-scale features and help close the semantic gap between different levels. APF uses the scale-aware attention to encode global context with vertical stripping operation and models the long-range dependencies, which helps relate pixels with similar semantic label. In addition, APF employs channel-wise reweighting block (CRB) to emphasize the channel features. Finally, the decoder of S$^2$-FPN then adopts GFU, which is used to fuse features from APF and the encoder. Extensive experiments have been conducted on two challenging semantic segmentation benchmarks, which demonstrate that our approach achieves better accuracy/speed trade-off with different model settings. The proposed models have achieved a results of 76.2\%mIoU/87.3FPS, 77.4\%mIoU/67FPS, and 77.8\%mIoU/30.5FPS on Cityscapes dataset, and 69.6\%mIoU,71.0\% mIoU, and 74.2\% mIoU on Camvid dataset. The code for this work will be made available at \url{this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2206.07298 [cs.CV]
	(or arXiv:2206.07298v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2206.07298

Submission history

From: Mohammed A. M. Elhassan [view email]
[v1] Wed, 15 Jun 2022 05:02:49 UTC (24,461 KB)
[v2] Thu, 16 Jun 2022 05:42:21 UTC (24,461 KB)
[v3] Sun, 19 May 2024 00:31:40 UTC (24,487 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators