-
BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image Segmentation
Authors:
Omid Nejati Manzari,
Javad Mirzapour Kaleybar,
Hooman Saadat,
Shahin Maleki
Abstract:
The accurate segmentation of medical images is critical for various healthcare applications. Convolutional neural networks (CNNs), especially Fully Convolutional Networks (FCNs) like U-Net, have shown remarkable success in medical image segmentation tasks. However, they have limitations in capturing global context and long-range relations, especially for objects with significant variations in shap…
▽ More
The accurate segmentation of medical images is critical for various healthcare applications. Convolutional neural networks (CNNs), especially Fully Convolutional Networks (FCNs) like U-Net, have shown remarkable success in medical image segmentation tasks. However, they have limitations in capturing global context and long-range relations, especially for objects with significant variations in shape, scale, and texture. While transformers have achieved state-of-the-art results in natural language processing and image recognition, they face challenges in medical image segmentation due to image locality and translational invariance issues. To address these challenges, this paper proposes an innovative U-shaped network called BEFUnet, which enhances the fusion of body and edge information for precise medical image segmentation. The BEFUnet comprises three main modules, including a novel Local Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF) module, and dual-branch encoder. The dual-branch encoder consists of an edge encoder and a body encoder. The edge encoder employs PDC blocks for effective edge information extraction, while the body encoder uses the Swin Transformer to capture semantic information with global attention. The LCAF module efficiently fuses edge and body features by selectively performing local cross-attention on features that are spatially close between the two modalities. This local approach significantly reduces computational complexity compared to global cross-attention while ensuring accurate feature matching. BEFUnet demonstrates superior performance over existing methods across various evaluation metrics on medical image segmentation datasets.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Energy-Aware Service Offloading for Semantic Communications in Wireless Networks
Authors:
Hassan Saadat,
Abdullatif Albaseer,
Mohamed Abdallah,
Amr Mohamed,
Aiman Erbad
Abstract:
Today, wireless networks are becoming responsible for serving intelligent applications, such as extended reality and metaverse, holographic telepresence, autonomous transportation, and collaborative robots. Although current fifth-generation (5G) networks can provide high data rates in terms of Gigabytes/second, they cannot cope with the high demands of the aforementioned applications, especially i…
▽ More
Today, wireless networks are becoming responsible for serving intelligent applications, such as extended reality and metaverse, holographic telepresence, autonomous transportation, and collaborative robots. Although current fifth-generation (5G) networks can provide high data rates in terms of Gigabytes/second, they cannot cope with the high demands of the aforementioned applications, especially in terms of the size of the high-quality live videos and images that need to be communicated in real-time. Therefore, with the help of artificial intelligence (AI)-based future sixth-generation (6G) networks, the semantic communication concept can provide the services demanded by these applications. Unlike Shannon's classical information theory, semantic communication urges the use of the semantics (meaningful contents) of the data in designing more efficient data communication schemes. Hence, in this paper, we model semantic communication as an energy minimization framework in heterogeneous wireless networks with respect to delay and quality-of-service constraints. Then, we propose a sub-optimal solution to the NP-hard combinatorial mixed-integer nonlinear programming problem (MINLP) by utilizing efficient techniques such as discrete optimization variables' relaxation. In addition, AI-based autoencoder and classifier are trained and deployed to perform semantic extraction, reconstruction, and classification services. Finally, we compare our proposed sub-optimal solution with different state-of-the-art methods, and the obtained results demonstrate its superiority.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Capturing Local and Global Features in Medical Images by Using Ensemble CNN-Transformer
Authors:
Javad Mirzapour Kaleybar,
Hooman Saadat,
Hooman Khaloo
Abstract:
This paper introduces a groundbreaking classification model called the Controllable Ensemble Transformer and CNN (CETC) for the analysis of medical images. The CETC model combines the powerful capabilities of convolutional neural networks (CNNs) and transformers to effectively capture both local and global features present in medical images. The model architecture comprises three main components:…
▽ More
This paper introduces a groundbreaking classification model called the Controllable Ensemble Transformer and CNN (CETC) for the analysis of medical images. The CETC model combines the powerful capabilities of convolutional neural networks (CNNs) and transformers to effectively capture both local and global features present in medical images. The model architecture comprises three main components: a convolutional encoder block (CEB), a transposed-convolutional decoder block (TDB), and a transformer classification block (TCB). The CEB is responsible for capturing multi-local features at different scales and draws upon components from VGGNet, ResNet, and MobileNet as backbones. By leveraging this combination, the CEB is able to effectively detect and encode local features. The TDB, on the other hand, consists of sub-decoders that decode and sum the captured features using ensemble coefficients. This enables the model to efficiently integrate the information from multiple scales. Finally, the TCB utilizes the SwT backbone and a specially designed prediction head to capture global features, ensuring a comprehensive understanding of the entire image. The paper provides detailed information on the experimental setup and implementation, including the use of transfer learning, data preprocessing techniques, and training settings. The CETC model is trained and evaluated using two publicly available COVID-19 datasets. Remarkably, the model outperforms existing state-of-the-art models across various evaluation metrics. The experimental results clearly demonstrate the superiority of the CETC model, emphasizing its potential for accurately and efficiently analyzing medical images.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Object detection-based inspection of power line insulators: Incipient fault detection in the low data-regime
Authors:
Laya Das,
Mohammad Hossein Saadat,
Blazhe Gjorgiev,
Etienne Auger,
Giovanni Sansavini
Abstract:
Deep learning-based object detection is a powerful approach for detecting faulty insulators in power lines. This involves training an object detection model from scratch, or fine tuning a model that is pre-trained on benchmark computer vision datasets. This approach works well with a large number of insulator images, but can result in unreliable models in the low data regime. The current literatur…
▽ More
Deep learning-based object detection is a powerful approach for detecting faulty insulators in power lines. This involves training an object detection model from scratch, or fine tuning a model that is pre-trained on benchmark computer vision datasets. This approach works well with a large number of insulator images, but can result in unreliable models in the low data regime. The current literature mainly focuses on detecting the presence or absence of insulator caps, which is a relatively easy detection task, and does not consider detection of finer faults such as flashed and broken disks. In this article, we formulate three object detection tasks for insulator and asset inspection from aerial images, focusing on incipient faults in disks. We curate a large reference dataset of insulator images that can be used to learn robust features for detecting healthy and faulty insulators. We study the advantage of using this dataset in the low target data regime by pre-training on the reference dataset followed by fine-tuning on the target dataset. The results suggest that object detection models can be used to detect faults in insulators at a much incipient stage, and that transfer learning adds value depending on the type of object detection model. We identify key factors that dictate performance in the low data-regime and outline potential approaches to improve the state-of-the-art.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Efficient Real-Time Selective Genome Sequencing on Resource-Constrained Devices
Authors:
Po Jui Shih,
Hassaan Saadat,
Sri Parameswaran,
Hasindu Gamaarachchi
Abstract:
Third-generation nanopore sequencers offer a feature called selective sequencing or 'Read Until' that allows genomic reads to be analyzed in real-time and abandoned halfway, if not belonging to a genomic region of 'interest'. This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selecti…
▽ More
Third-generation nanopore sequencers offer a feature called selective sequencing or 'Read Until' that allows genomic reads to be analyzed in real-time and abandoned halfway, if not belonging to a genomic region of 'interest'. This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selective sequencing to be effective so that unnecessary reads can be rejected as early as possible. However, existing methods that employ subsequence Dynamic Time Warping (sDTW) algorithm for this problem are too computationally intensive that a massive workstation with dozens of CPU cores still struggles to keep up with the data rate of a mobile phone-sized MinION sequencer. In this paper, we present Hardware Accelerated Read Until (HARU), a resource-efficient hardware-software co-design-based method that exploits a low-cost and portable heterogeneous MPSoC platform with on-chip FPGA to accelerate the sDTW-based Read Until algorithm. Experimental results show that HARU on a Xilinx FPGA embedded with a 4-core ARM processor is around 2.5X faster than a highly optimized multi-threaded software version (around 85X faster than the existing unoptimized multi-threaded software) running on a sophisticated server with 36-core Intel Xeon processor for a SARS-CoV-2 dataset. The energy consumption of HARU is two orders of magnitudes lower than the same application executing on the 36-core server. Source code for HARU sDTW module is available as open-source at https://github.com/beebdev/HARU and an example application that utilises HARU is at https://github.com/beebdev/sigfish-haru.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference
Authors:
Jing Gong,
Hassaan Saadat,
Hasindu Gamaarachchi,
Haris Javaid,
Xiaobo Sharon Hu,
Sri Parameswaran
Abstract:
Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient acc…
▽ More
Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.
△ Less
Submitted 23 September, 2022; v1 submitted 9 September, 2022;
originally announced September 2022.