PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Liu, Yuan; Zhang, Songyang; Chen, Jiacheng; Chen, Kai; Lin, Dahua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.02416 (cs)

[Submitted on 4 Mar 2023 (v1), last revised 24 Mar 2023 (this version, v2)]

Title:PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Authors:Yuan Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin

View PDF

Abstract:Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT. However, subsequent works have complicated the framework with new auxiliary tasks or extra pre-trained models, inevitably increasing computational overhead. This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction, which examines the input image patches and reconstruction target, and highlights two critical but previously overlooked bottlenecks. Based on this analysis, we propose a remarkably simple and effective method, {\ourmethod}, that entails two strategies: 1) filtering the high-frequency components from the reconstruction target to de-emphasize the network's focus on texture-rich details and 2) adopting a conservative data transform strategy to alleviate the problem of missing foreground in MIM training. {\ourmethod} can be easily integrated into most existing pixel-based MIM approaches (\ie, using raw images as reconstruction target) with negligible additional computation. Without bells and whistles, our method consistently improves three MIM approaches, MAE, ConvMAE, and LSMAE, across various downstream tasks. We believe this effective plug-and-play method will serve as a strong baseline for self-supervised learning and provide insights for future improvements of the MIM framework. Code and models are available at \url{this https URL}.

Comments:	Update code link and add additional results
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2303.02416 [cs.CV]
	(or arXiv:2303.02416v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.02416

Submission history

From: Yuan Liu [view email]
[v1] Sat, 4 Mar 2023 13:38:51 UTC (1,157 KB)
[v2] Fri, 24 Mar 2023 05:37:41 UTC (1,429 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators