Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Takashima, Sora; Hayamizu, Ryo; Inoue, Nakamasa; Kataoka, Hirokatsu; Yokota, Rio

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.01112 (cs)

[Submitted on 2 Mar 2023]

Title:Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Authors:Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

View PDF

Abstract:Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthetic datasets can achieve the same accuracy as real datasets leaves much room for skepticism. In the present work, we develop a novel methodology based on circular harmonics for systematically investigating the design space of contour-oriented synthetic datasets. This allows us to efficiently search the optimal range of FDSL parameters and maximize the variety of synthetic images in the dataset, which we found to be a critical factor. When the resulting new dataset VisualAtom-21k is used for pre-training ViT-Base, the top-1 accuracy reached 83.7% when fine-tuning on ImageNet-1k. This is close to the top-1 accuracy (84.2%) achieved by JFT-300M pre-training, while the number of images is 1/14. Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility. FDSL is also free of the common issues associated with real images, e.g. privacy/copyright issues, labeling costs/errors, and ethical biases.

Comments:	Accepted to CVPR 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2303.01112 [cs.CV]
	(or arXiv:2303.01112v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.01112

Submission history

From: Sora Takashima [view email]
[v1] Thu, 2 Mar 2023 09:47:28 UTC (8,816 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators