Capturing Local and Global Features in Medical Images by Using Ensemble CNN-Transformer

Kaleybar, Javad Mirzapour; Saadat, Hooman; Khaloo, Hooman

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2311.01731 (eess)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 3 Nov 2023]

Title:Capturing Local and Global Features in Medical Images by Using Ensemble CNN-Transformer

Authors:Javad Mirzapour Kaleybar, Hooman Saadat, Hooman Khaloo

View PDF

Abstract:This paper introduces a groundbreaking classification model called the Controllable Ensemble Transformer and CNN (CETC) for the analysis of medical images. The CETC model combines the powerful capabilities of convolutional neural networks (CNNs) and transformers to effectively capture both local and global features present in medical images. The model architecture comprises three main components: a convolutional encoder block (CEB), a transposed-convolutional decoder block (TDB), and a transformer classification block (TCB). The CEB is responsible for capturing multi-local features at different scales and draws upon components from VGGNet, ResNet, and MobileNet as backbones. By leveraging this combination, the CEB is able to effectively detect and encode local features. The TDB, on the other hand, consists of sub-decoders that decode and sum the captured features using ensemble coefficients. This enables the model to efficiently integrate the information from multiple scales. Finally, the TCB utilizes the SwT backbone and a specially designed prediction head to capture global features, ensuring a comprehensive understanding of the entire image. The paper provides detailed information on the experimental setup and implementation, including the use of transfer learning, data preprocessing techniques, and training settings. The CETC model is trained and evaluated using two publicly available COVID-19 datasets. Remarkably, the model outperforms existing state-of-the-art models across various evaluation metrics. The experimental results clearly demonstrate the superiority of the CETC model, emphasizing its potential for accurately and efficiently analyzing medical images.

Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.01731 [eess.IV]
	(or arXiv:2311.01731v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2311.01731

Submission history

From: Javad Mirzapour Kaleybar [view email]
[v1] Fri, 3 Nov 2023 05:55:28 UTC (870 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Capturing Local and Global Features in Medical Images by Using Ensemble CNN-Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Capturing Local and Global Features in Medical Images by Using Ensemble CNN-Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators