-
SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images
Authors:
Naftaly Wambugu,
Ruisheng Wang,
Bo Guo,
Tianshu Yu,
Sheng Xu,
Mohammed Elhassan
Abstract:
Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon in the photogrammetry and remote sensing research community. Currently, massive fine-resolution remotely sensed (FRRS) images acquired by improving sensing and imaging technologies become available. However, accurate semantic segmentation of such FRRS images is greatly affected by substa…
▽ More
Land cover maps generated from semantic segmentation of high-resolution remotely sensed images have drawn mucon in the photogrammetry and remote sensing research community. Currently, massive fine-resolution remotely sensed (FRRS) images acquired by improving sensing and imaging technologies become available. However, accurate semantic segmentation of such FRRS images is greatly affected by substantial class disparities, the invisibility of key ground objects due to occlusion, and object size variation. Despite the extraordinary potential in deep convolutional neural networks (DCNNs) in image feature learning and representation, extracting sufficient features from FRRS images for accurate semantic segmentation is still challenging. These challenges demand the deep learning models to learn robust features and generate sufficient feature descriptors. Specifically, learning multi-contextual features to guarantee adequate coverage of varied object sizes from the ground scene and harnessing global-local contexts to overcome class disparities challenge even profound networks. Deeper networks significantly lose spatial details due to gradual downsampling processes resulting in poor segmentation results and coarse boundaries. This article presents a stacked deep residual network (SDRNet) for semantic segmentation from FRRS images. The proposed framework utilizes two stacked encoder-decoder networks to harness long-range semantics yet preserve spatial information and dilated residual blocks (DRB) between each encoder and decoder network to capture sufficient global dependencies thus improving segmentation performance. Our experimental results obtained using the ISPRS Vaihingen and Potsdam datasets demonstrate that the SDRNet performs effectively and competitively against current DCNNs in semantic segmentation.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Optimizing Wireless Networks with Deep Unfolding: Comparative Study on Two Deep Unfolding Mechanisms
Authors:
Abuzar B. M. Adam,
Mohammed A. M. Elhassan,
Elhadj Moustapha Diallo
Abstract:
In this work, we conduct a comparative study on two deep unfolding mechanisms to efficiently perform power control in the next generation wireless networks. The power control problem is formulated as energy efficiency over multiple interference links. The problem is nonconvex. We employ fractional programming transformation to design two solutions for the problem. The first solution is a numerical…
▽ More
In this work, we conduct a comparative study on two deep unfolding mechanisms to efficiently perform power control in the next generation wireless networks. The power control problem is formulated as energy efficiency over multiple interference links. The problem is nonconvex. We employ fractional programming transformation to design two solutions for the problem. The first solution is a numerical solution while the second solution is a closed-form solution. Based on the first solution, we design a semi-unfolding deep learning model where we combine the domain knowledge of the wireless communications and the recent advances in the data-driven deep learning. Moreover, on the highlights of the closed-form solution, fully deep unfolded deep learning model is designed in which we fully leveraged the expressive closed-form power control solution and deep learning advances. In the simulation results, we compare the performance of the proposed deep learning models and the iterative solutions in terms of accuracy and inference speed to show their suitability for the real-time application in next generation networks.
△ Less
Submitted 17 February, 2024;
originally announced March 2024.
-
Real-Time and Security-Aware Precoding in RIS-Empowered Multi-User Wireless Networks
Authors:
Abuzar B. M. Adam,
Mohamed Amine Ouamri,
Mohammed Saleh Ali Muthanna,
Xingwang Li,
Mohammed A. M. Elhassan,
Ammar Muthanna
Abstract:
In this letter, we propose a deep-unfolding-based framework (DUNet) to maximize the secrecy rate in reconfigurable intelligent surface (RIS) empowered multi-user wireless networks. To tailor DUNet, first we relax the problem, decouple it into beamforming and phase shift subproblems, and propose an alternative optimization (AO) based solution for the relaxed problem. Second, we apply Karush-Kuhn-Tu…
▽ More
In this letter, we propose a deep-unfolding-based framework (DUNet) to maximize the secrecy rate in reconfigurable intelligent surface (RIS) empowered multi-user wireless networks. To tailor DUNet, first we relax the problem, decouple it into beamforming and phase shift subproblems, and propose an alternative optimization (AO) based solution for the relaxed problem. Second, we apply Karush-Kuhn-Tucker (KKT) conditions to obtain a closed-form solutions for the beamforming and the phase shift. Using deep-unfolding mechanism, we transform the closed-form solutions into a deep learning model (i.e., DUNet) that achieves a comparable performance to that of AO in terms of accuracy and about 25.6 times faster.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic Segmentation
Authors:
Mohammed A. M. Elhassan,
Changjun Zhou,
Amina Benabid,
Abuzar B. M. Adam
Abstract:
Recently, Transformer-based models have achieved promising results in various vision tasks, due to their ability to model long-range dependencies. However, transformers are computationally expensive, which limits their applications in real-time tasks such as autonomous driving. In addition, an efficient local and global feature selection and fusion are vital for accurate dense prediction, especial…
▽ More
Recently, Transformer-based models have achieved promising results in various vision tasks, due to their ability to model long-range dependencies. However, transformers are computationally expensive, which limits their applications in real-time tasks such as autonomous driving. In addition, an efficient local and global feature selection and fusion are vital for accurate dense prediction, especially driving scene understanding tasks. In this paper, we propose a real-time semantic segmentation architecture named Pyramid Pooling Axial Transformer (P2AT). The proposed P2AT takes a coarse feature from the CNN encoder to produce scale-aware contextual features, which are then combined with the multi-level feature aggregation scheme to produce enhanced contextual features. Specifically, we introduce a pyramid pooling axial transformer to capture intricate spatial and channel dependencies, leading to improved performance on semantic segmentation. Then, we design a Bidirectional Fusion module (BiF) to combine semantic information at different levels. Meanwhile, a Global Context Enhancer is introduced to compensate for the inadequacy of concatenating different semantic levels. Finally, a decoder block is proposed to help maintain a larger receptive field. We evaluate P2AT variants on three challenging scene-understanding datasets. In particular, our P2AT variants achieve state-of-art results on the Camvid dataset 80.5%, 81.0%, 81.1% for P2AT-S, P2ATM, and P2AT-L, respectively. Furthermore, our experiment on Cityscapes and Pascal VOC 2012 have demonstrated the efficiency of the proposed architecture, with results showing that P2AT-M, achieves 78.7% on Cityscapes. The source code will be available at
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Enhancing Secrecy in UAV RSMA Networks: Deep Unfolding Meets Deep Reinforcement Learning
Authors:
Abuzar B. M. Adam,
Mohammed A. M. Elhassan
Abstract:
In this paper, we consider the maximization of the secrecy rate in multiple unmanned aerial vehicles (UAV) rate-splitting multiple access (RSMA) network. A joint beamforming, rate allocation, and UAV trajectory optimization problem is formulated which is nonconvex. Hence, the problem is transformed into a Markov decision problem and a novel multiagent deep reinforcement learning (DRL) framework is…
▽ More
In this paper, we consider the maximization of the secrecy rate in multiple unmanned aerial vehicles (UAV) rate-splitting multiple access (RSMA) network. A joint beamforming, rate allocation, and UAV trajectory optimization problem is formulated which is nonconvex. Hence, the problem is transformed into a Markov decision problem and a novel multiagent deep reinforcement learning (DRL) framework is designed. The proposed framework (named DUN-DRL) combines deep unfolding to design beamforming and rate allocation, data-driven to design the UAV trajectory, and deep deterministic policy gradient (DDPG) for the learning procedure. The proposed DUN-DRL have shown great performance and outperformed other DRL-based methods in the literature.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation
Authors:
Mohammed A. M. Elhassan,
Chenhui Yang,
Chenxi Huang,
Tewodros Legesse Munea,
Xin Hong,
Abuzar B. M. Adam,
Amina Benabid
Abstract:
Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accu…
▽ More
Modern high-performance semantic segmentation methods employ a heavy backbone and dilated convolution to extract the relevant feature. Although extracting features with both contextual and semantic information is critical for the segmentation tasks, it brings a memory footprint and high computation cost for real-time applications. This paper presents a new model to achieve a trade-off between accuracy/speed for real-time road scene semantic segmentation. Specifically, we proposed a lightweight model named Scale-aware Strip Attention Guided Feature Pyramid Network (S$^2$-FPN). Our network consists of three main modules: Attention Pyramid Fusion (APF) module, Scale-aware Strip Attention Module (SSAM), and Global Feature Upsample (GFU) module. APF adopts an attention mechanisms to learn discriminative multi-scale features and help close the semantic gap between different levels. APF uses the scale-aware attention to encode global context with vertical stripping operation and models the long-range dependencies, which helps relate pixels with similar semantic label. In addition, APF employs channel-wise reweighting block (CRB) to emphasize the channel features. Finally, the decoder of S$^2$-FPN then adopts GFU, which is used to fuse features from APF and the encoder. Extensive experiments have been conducted on two challenging semantic segmentation benchmarks, which demonstrate that our approach achieves better accuracy/speed trade-off with different model settings. The proposed models have achieved a results of 76.2\%mIoU/87.3FPS, 77.4\%mIoU/67FPS, and 77.8\%mIoU/30.5FPS on Cityscapes dataset, and 69.6\%mIoU,71.0\% mIoU, and 74.2\% mIoU on Camvid dataset. The code for this work will be made available at \url{https://github.com/mohamedac29/S2-FPN
△ Less
Submitted 18 May, 2024; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Technical Report on Subspace Pyramid Fusion Network for Semantic Segmentation
Authors:
Mohammed A. M. Elhassan,
Chenhui Yang,
Chenxi Huang,
Tewodros Legesse Munea
Abstract:
The following is a technical report to test the validity of the proposed Subspace Pyramid Fusion Module (SPFM) to capture multi-scale feature representations, which is more useful for semantic segmentation. In this investigation, we have proposed the Efficient Shuffle Attention Module(ESAM) to reconstruct the skip-connections paths by fusing multi-level global context features. Experimental result…
▽ More
The following is a technical report to test the validity of the proposed Subspace Pyramid Fusion Module (SPFM) to capture multi-scale feature representations, which is more useful for semantic segmentation. In this investigation, we have proposed the Efficient Shuffle Attention Module(ESAM) to reconstruct the skip-connections paths by fusing multi-level global context features. Experimental results on two well-known semantic segmentation datasets, including Camvid and Cityscapes, show the effectiveness of our proposed method.
△ Less
Submitted 6 December, 2023; v1 submitted 4 April, 2022;
originally announced April 2022.