Pay Attention to Small Weights

Zhou, Chao; Jacobs, Tom; Gadhikar, Advait; Burkholz, Rebekka

Computer Science > Machine Learning

arXiv:2506.21374 (cs)

[Submitted on 26 Jun 2025]

Title:Pay Attention to Small Weights

Authors:Chao Zhou, Tom Jacobs, Advait Gadhikar, Rebekka Burkholz

View PDF HTML (experimental)

Abstract:Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.21374 [cs.LG]
	(or arXiv:2506.21374v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.21374

Submission history

From: Chao Zhou [view email]
[v1] Thu, 26 Jun 2025 15:22:55 UTC (16,136 KB)

Computer Science > Machine Learning

Title:Pay Attention to Small Weights

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pay Attention to Small Weights

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators