-
Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines
Authors:
Alberto Fachechi,
Elena Agliari,
Miriam Aquaro,
Anthony Coolen,
Menno Mulder
Abstract:
We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmet…
▽ More
We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmetry). In particular, we outline the effective control parameters (e.g., the relative number of weights to be trained, the regularization parameter), whose tuning can yield qualitatively-different operative regimes. Further, we provide analytical and numerical evidence for the existence of a sub-region in the space of the hyperparameters where replica-symmetry breaking occurs.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting
Authors:
Elena Agliari,
Francesco Alemanno,
Miriam Aquaro,
Alberto Fachechi
Abstract:
In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the extent of suc…
▽ More
In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the extent of such unlearning is proved to be related to the regularization hyperparameter of the loss function and to the training time. Thus, we can design strategies to avoid overfitting that are formulated in terms of regularization and early-stopping tuning. The generalization capabilities of these attractor networks are also investigated: analytical results are obtained for random synthetic datasets, next, the emerging picture is corroborated by numerical experiments that highlight the existence of several regimes (i.e., overfitting, failure and success) as the dataset parameters are varied.
△ Less
Submitted 20 February, 2024; v1 submitted 1 August, 2023;
originally announced August 2023.
-
Supervised Hebbian Learning
Authors:
Francesco Alemanno,
Miriam Aquaro,
Ido Kanter,
Adriano Barra,
Elena Agliari
Abstract:
In neural network's Literature, Hebbian learning traditionally refers to the procedure by which the Hopfield model and its generalizations store archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "Learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred exa…
▽ More
In neural network's Literature, Hebbian learning traditionally refers to the procedure by which the Hopfield model and its generalizations store archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "Learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred examples of these archetypes), in order to make its own representation of the unavailable archetypes. Here, given a sample of examples, we define a supervised learning protocol by which the Hopfield network can infer the archetypes, and we detect the correct control parameters (including size and quality of the dataset) to depict a phase diagram for the system performance. We also prove that, for structureless datasets, the Hopfield model equipped with this supervised learning rule is equivalent to a restricted Boltzmann machine and this suggests an optimal and interpretable training routine. Finally, this approach is generalized to structured datasets: we highlight a quasi-ultrametric organization (reminiscent of replica-symmetry-breaking) in the analyzed datasets and, consequently, we introduce an additional "replica hidden layer" for its (partial) disentanglement, which is shown to improve MNIST classification from 75% to 95%, and to offer a new perspective on deep architectures.
△ Less
Submitted 7 September, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Dreaming neural networks: rigorous results
Authors:
Elena Agliari,
Francesco Alemanno,
Adriano Barra,
Alberto Fachechi
Abstract:
Recently a daily routine for associative neural networks has been proposed: the network Hebbian-learns during the awake state (thus behaving as a standard Hopfield model), then, during its sleep state, optimizing information storage, it consolidates pure patterns and removes spurious ones: this forces the synaptic matrix to collapse to the projector one (ultimately approaching the Kanter-Sompolink…
▽ More
Recently a daily routine for associative neural networks has been proposed: the network Hebbian-learns during the awake state (thus behaving as a standard Hopfield model), then, during its sleep state, optimizing information storage, it consolidates pure patterns and removes spurious ones: this forces the synaptic matrix to collapse to the projector one (ultimately approaching the Kanter-Sompolinksy model). This procedure keeps the learning Hebbian-based (a biological must) but, by taking advantage of a (properly stylized) sleep phase, still reaches the maximal critical capacity (for symmetric interactions). So far this emerging picture (as well as the bulk of papers on unlearning techniques) was supported solely by mathematically-challenging routes, e.g. mainly replica-trick analysis and numerical simulations: here we rely extensively on Guerra's interpolation techniques developed for neural networks and, in particular, we extend the generalized stochastic stability approach to the case. Confining our description within the replica symmetric approximation (where the previous ones lie), the picture painted regarding this generalization (and the previously existing variations on theme) is here entirely confirmed. Further, still relying on Guerra's schemes, we develop a systematic fluctuation analysis to check where ergodicity is broken (an analysis entirely absent in previous investigations). We find that, as long as the network is awake, ergodicity is bounded by the Amit-Gutfreund-Sompolinsky critical line (as it should), but, as the network sleeps, sleeping destroys spin glass states by extending both the retrieval as well as the ergodic region: after an entire sleeping session the solely surviving regions are retrieval and ergodic ones and this allows the network to achieve the perfect retrieval regime (the number of storable patterns equals the number of neurons in the network).
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Dreaming neural networks: forgetting spurious memories and reinforcing pure ones
Authors:
Alberto Fachechi,
Elena Agliari,
Adriano Barra
Abstract:
The standard Hopfield model for associative neural networks accounts for biological Hebbian learning and acts as the harmonic oscillator for pattern recognition, however its maximal storage capacity is $α\sim 0.14$, far from the theoretical bound for symmetric networks, i.e. $α=1$. Inspired by sleeping and dreaming mechanisms in mammal brains, we propose an extension of this model displaying the s…
▽ More
The standard Hopfield model for associative neural networks accounts for biological Hebbian learning and acts as the harmonic oscillator for pattern recognition, however its maximal storage capacity is $α\sim 0.14$, far from the theoretical bound for symmetric networks, i.e. $α=1$. Inspired by sleeping and dreaming mechanisms in mammal brains, we propose an extension of this model displaying the standard on-line (awake) learning mechanism (that allows the storage of external information in terms of patterns) and an off-line (sleep) unlearning$\&$consolidating mechanism (that allows spurious-pattern removal and pure-pattern reinforcement): this obtained daily prescription is able to saturate the theoretical bound $α=1$, remaining also extremely robust against thermal noise. Both neural and synaptic features are analyzed both analytically and numerically. In particular, beyond obtaining a phase diagram for neural dynamics, we focus on synaptic plasticity and we give explicit prescriptions on the temporal evolution of the synaptic matrix. We analytically prove that our algorithm makes the Hebbian kernel converge with high probability to the projection matrix built over the pure stored patterns. Furthermore, we obtain a sharp and explicit estimate for the "sleep rate" in order to ensure such a convergence. Finally, we run extensive numerical simulations (mainly Monte Carlo sampling) to check the approximations underlying the analytical investigations (e.g., we developed the whole theory at the so called replica-symmetric level, as standard in the Amit-Gutfreund-Sompolinsky reference framework) and possible finite-size effects, finding overall full agreement with the theory.
△ Less
Submitted 29 October, 2018;
originally announced October 2018.
-
Instability and network effects in innovative markets
Authors:
Paolo Sgrignoli,
Elena Agliari,
Raffaella Burioni,
Augusto Schianchi
Abstract:
We consider a network of interacting agents and we model the process of choice on the adoption of a given innovative product by means of statistical-mechanics tools. The modelization allows us to focus on the effects of direct interactions among agents in establishing the success or failure of the product itself. Mimicking real systems, the whole population is divided into two sub-communities call…
▽ More
We consider a network of interacting agents and we model the process of choice on the adoption of a given innovative product by means of statistical-mechanics tools. The modelization allows us to focus on the effects of direct interactions among agents in establishing the success or failure of the product itself. Mimicking real systems, the whole population is divided into two sub-communities called, respectively, Innovators and Followers, where the former are assumed to display more influence power. We study in detail and via numerical simulations on a random graph two different scenarios: no-feedback interaction, where innovators are cohesive and not sensitively affected by the remaining population, and feedback interaction, where the influence of followers on innovators is non negligible. The outcomes are markedly different: in the former case, which corresponds to the creation of a niche in the market, Innovators are able to drive and polarize the whole market. In the latter case the behavior of the market cannot be definitely predicted and become unstable. In both cases we highlight the emergence of collective phenomena and we show how the final outcome, in terms of the number of buyers, is affected by the concentration of innovators and by the interaction strengths among agents.
△ Less
Submitted 12 September, 2014;
originally announced September 2014.
-
Percolation on correlated random networks
Authors:
Elena Agliari,
Claudia Cioli,
Enore Guadagnini
Abstract:
We consider a class of random, weighted networks, obtained through a redefinition of patterns in an Hopfield-like model and, by performing percolation processes, we get information about topology and resilience properties of the networks themselves. Given the weighted nature of the graphs, different kinds of bond percolation can be studied: stochastic (deleting links randomly) and deterministic (d…
▽ More
We consider a class of random, weighted networks, obtained through a redefinition of patterns in an Hopfield-like model and, by performing percolation processes, we get information about topology and resilience properties of the networks themselves. Given the weighted nature of the graphs, different kinds of bond percolation can be studied: stochastic (deleting links randomly) and deterministic (deleting links based on rank weights), each mimicking a different physical process. The evolution of the network is accordingly different, as evidenced by the behavior of the largest component size and of the distribution of cluster sizes. In particular, we can derive that weak ties are crucial in order to maintain the graph connected and that, when they are the most prone to failure, the giant component typically shrinks without abruptly breaking apart; these results have been recently evidenced in several kinds of social networks.
△ Less
Submitted 5 September, 2011;
originally announced September 2011.
-
A statistical mechanics approach to Granovetter theory
Authors:
Adriano Barra,
Elena Agliari
Abstract:
In this paper we try to bridge breakthroughs in quantitative sociology/econometrics pioneered during the last decades by Mac Fadden, Brock-Durlauf, Granovetter and Watts-Strogats through introducing a minimal model able to reproduce essentially all the features of social behavior highlighted by these authors. Our model relies on a pairwise Hamiltonian for decision maker interactions which naturall…
▽ More
In this paper we try to bridge breakthroughs in quantitative sociology/econometrics pioneered during the last decades by Mac Fadden, Brock-Durlauf, Granovetter and Watts-Strogats through introducing a minimal model able to reproduce essentially all the features of social behavior highlighted by these authors. Our model relies on a pairwise Hamiltonian for decision maker interactions which naturally extends the multi-populations approaches by shifting and biasing the pattern definitions of an Hopfield model of neural networks. Once introduced, the model is investigated trough graph theory (to recover Granovetter and Watts-Strogats results) and statistical mechanics (to recover Mac-Fadden and Brock-Durlauf results). Due to internal symmetries of our model, the latter is obtained as the relaxation of a proper Markov process, allowing even to study its out of equilibrium properties. The method used to solve its equilibrium is an adaptation of the Hamilton-Jacobi technique recently introduced by Guerra in the spin glass scenario and the picture obtained is the following: just by assuming that the larger the amount of similarities among decision makers, the stronger their relative influence, this is enough to explain both the different role of strong and weak ties in the social network as well as its small world properties. As a result, imitative interaction strengths seem essentially a robust request (enough to break the gauge symmetry in the couplings), furthermore, this naturally leads to a discrete choice modelization when dealing with the external influences and to imitative behavior a la Curie-Weiss as the one introduced by Brock and Durlauf.
△ Less
Submitted 6 December, 2010;
originally announced December 2010.