Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies

Anagnostidis, Sotiris; Bachmann, Gregor; Hofmann, Thomas

Computer Science > Machine Learning

arXiv:2311.03233v1 (cs)

[Submitted on 6 Nov 2023 (this version), latest version 23 May 2024 (v3)]

Title:Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies

Authors:Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

View PDF

Abstract:In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: Investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of compute. This leads to the notion of a "compute-optimal" model, i.e. a model that allocates a given level of compute during training optimally to maximise performance. In this work, we extend the concept of optimality by allowing for an "adaptive" model, i.e. a model that can change its shape during the course of training. By allowing the shape to adapt, we can optimally traverse between the underlying scaling laws, leading to a significant reduction in the required compute to reach a given target performance. We focus on vision tasks and the family of Vision Transformers, where the patch size as well as the width naturally serve as adaptive shape parameters. We demonstrate that, guided by scaling laws, we can design compute-optimal adaptive models that beat their "static" counterparts.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.03233 [cs.LG]
	(or arXiv:2311.03233v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.03233

Submission history

From: Sotiris Anagnostidis [view email]
[v1] Mon, 6 Nov 2023 16:20:28 UTC (12,739 KB)
[v2] Wed, 21 Feb 2024 19:50:49 UTC (12,848 KB)
[v3] Thu, 23 May 2024 08:28:56 UTC (12,866 KB)

Computer Science > Machine Learning

Title:Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators