HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training

Tang, Fenghe; Xu, Ronghao; Yao, Qingsong; Fu, Xueming; Quan, Quan; Zhu, Heqin; Liu, Zaiyi; Zhou, S. Kevin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.05815 (cs)

[Submitted on 11 Aug 2024]

Title:HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training

Authors:Fenghe Tang, Ronghao Xu, Qingsong Yao, Xueming Fu, Quan Quan, Heqin Zhu, Zaiyi Liu, S. Kevin Zhou

View PDF HTML (experimental)

Abstract:The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a collection of multiple large-scale 3D medical imaging datasets. Extensive experiments indicate that our proposed pre-training strategy demonstrates robust transfer-ability in supervised downstream tasks and sheds light on HySparK's promising prospects. The code is available at this https URL

Comments:	Early accept at MICCAI 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
ACM classes:	I.4.10; I.4.6
Cite as:	arXiv:2408.05815 [cs.CV]
	(or arXiv:2408.05815v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.05815

Submission history

From: Fenghe Tang [view email]
[v1] Sun, 11 Aug 2024 16:31:39 UTC (13,877 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators