Skip to main content

Showing 1–1 of 1 results for author: DeVito, Z

Searching in archive math. Search in all archives.
.
  1. arXiv:2304.09871  [pdf, other

    cs.LG cs.AI math.OC

    A Theory on Adam Instability in Large-Scale Machine Learning

    Authors: Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

    Abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.