Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions
Authors:
Wenbo Wei,
Nicholas Chong Jia Le,
Choy Heng Lai,
Ling Feng
Abstract:
We observe a novel 'multiple-descent' phenomenon during the training process of LSTM, in which the test loss goes through long cycles of up and down trend multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in test loss are closely associated with the phase transition process between order and chaos, and the local opt…
▽ More
We observe a novel 'multiple-descent' phenomenon during the training process of LSTM, in which the test loss goes through long cycles of up and down trend multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in test loss are closely associated with the phase transition process between order and chaos, and the local optimal epochs are consistently at the critical transition point between the two phases. More importantly, the global optimal epoch occurs at the first transition from order to chaos, where the 'width' of the 'edge of chaos' is the widest, allowing the best exploration of better weight configurations for learning.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
Self-Organization Towards $1/f$ Noise in Deep Neural Networks
Authors:
Nicholas Chong Jia Le,
Ling Feng
Abstract:
The presence of $1/f$ noise, also known as pink noise, is a well-established phenomenon in biological neural networks, and is thought to play an important role in information processing in the brain. In this study, we find that such $1/f$ noise is also found in deep neural networks trained on natural language, resembling that of their biological counterparts. Specifically, we trained Long Short-Te…
▽ More
The presence of $1/f$ noise, also known as pink noise, is a well-established phenomenon in biological neural networks, and is thought to play an important role in information processing in the brain. In this study, we find that such $1/f$ noise is also found in deep neural networks trained on natural language, resembling that of their biological counterparts. Specifically, we trained Long Short-Term Memory (LSTM) networks on the `IMDb' AI benchmark dataset, then measured the neuron activations. The detrended fluctuation analysis (DFA) on the time series of the different neurons demonstrate clear $1/f$ patterns, which is absent in the time series of the inputs to the LSTM. Interestingly, when the neural network is at overcapacity, having more than enough neurons to achieve the learning task, the activation patterns deviate from $1/f$ noise and shifts towards white noise. This is because many of the neurons are not effectively used, showing little fluctuations when fed with input data. We further examine the exponent values in the $1/f$ noise in ``internal" and ``external" activations in the LSTM cell, finding some resemblance in the variations of the exponents in fMRI signals of the human brain. Our findings further supports the hypothesis that $1/f$ noise is a signature of optimal learning. With deep learning models approaching or surpassing humans in certain tasks, and being more ``experimentable'' than their biological counterparts, our study suggests that they are good candidates to understand the fundamental origins of $1/f$ noise.
△ Less
Submitted 1 April, 2024; v1 submitted 20 January, 2023;
originally announced January 2023.