RobustFormer: Noise-Robust Pre-training for images and videos

Bastola, Ashish; Luitel, Nishant; Wang, Hao; Paudel, Danda Pani; Poudel, Roshani; Razi, Abolfazl

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.13040 (cs)

[Submitted on 20 Nov 2024]

Title:RobustFormer: Noise-Robust Pre-training for images and videos

Authors:Ashish Bastola, Nishant Luitel, Hao Wang, Danda Pani Paudel, Roshani Poudel, Abolfazl Razi

View PDF HTML (experimental)

Abstract:While deep learning models are powerful tools that revolutionized many areas, they are also vulnerable to noise as they rely heavily on learning patterns and features from the exact details of the clean data. Transformers, which have become the backbone of modern vision models, are no exception. Current Discrete Wavelet Transforms (DWT) based methods do not benefit from masked autoencoder (MAE) pre-training since the inverse DWT (iDWT) introduced in these approaches is computationally inefficient and lacks compatibility with video inputs in transformer architectures.
In this work, we present RobustFormer, a method that overcomes these limitations by enabling noise-robust pre-training for both images and videos; improving the efficiency of DWT-based methods by removing the need for computationally iDWT steps and simplifying the attention mechanism. To our knowledge, the proposed method is the first DWT-based method compatible with video inputs and masked pre-training. Our experiments show that MAE-based pre-training allows us to bypass the iDWT step, greatly reducing computation. Through extensive tests on benchmark datasets, RobustFormer achieves state-of-the-art results for both image and video tasks.

Comments:	13 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.13040 [cs.CV]
	(or arXiv:2411.13040v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.13040

Submission history

From: Ashish Bastola [view email]
[v1] Wed, 20 Nov 2024 05:10:48 UTC (13,023 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RobustFormer: Noise-Robust Pre-training for images and videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RobustFormer: Noise-Robust Pre-training for images and videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators