U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?

Jia, Xi; Bartlett, Joseph; Zhang, Tianyang; Lu, Wenqi; Qiu, Zhaowen; Duan, Jinming

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2208.04939 (eess)

[Submitted on 7 Aug 2022 (v1), last revised 13 Aug 2022 (this version, v2)]

Title:U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?

Authors:Xi Jia, Joseph Bartlett, Tianyang Zhang, Wenqi Lu, Zhaowen Qiu, Jinming Duan

View PDF

Abstract:Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at this https URL.

Comments:	Accepted to MICCAI-MLMI 2022
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2208.04939 [eess.IV]
	(or arXiv:2208.04939v2 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2208.04939

Submission history

From: Jinming Duan [view email]
[v1] Sun, 7 Aug 2022 20:33:53 UTC (4,588 KB)
[v2] Sat, 13 Aug 2022 14:12:32 UTC (1,436 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators