Search | arXiv e-print repository

The importance of being empty: a spectral approach to Hopfield neural networks with diluted examples

Authors: Elena Agliari, Alberto Fachechi, Domenico Luongo

Abstract: We consider Hopfield networks, where neurons interact pair-wise by Hebbian couplings built over $i$. a set of definite patterns (ground truths), $ii$. a sample of labeled examples (supervised setting), $iii$. a sample of unlabeled examples (unsupervised setting). We focus on the case where ground-truths are Rademacher vectors and examples are noisy versions of these ground-truths, possibly display… ▽ More We consider Hopfield networks, where neurons interact pair-wise by Hebbian couplings built over $i$. a set of definite patterns (ground truths), $ii$. a sample of labeled examples (supervised setting), $iii$. a sample of unlabeled examples (unsupervised setting). We focus on the case where ground-truths are Rademacher vectors and examples are noisy versions of these ground-truths, possibly displaying some blank entries (e.g., mimicking missing or dropped data), and we determine the spectral distribution of the coupling matrices in the three scenarios, by exploiting and extending the Marchenko-Pastur theorem. By levering this knowledge, we are able to analytically inspect the stability and attractiveness of the ground truths, as well as the generalization capabilities of the networks. In particular, as corroborated by long-running Monte Carlo simulations, the presence of black entries can have benefits in some specific conditions, suggesting strategies based on data sparsification; the robustness of these results in structured datasets is confirmed numerically. Finally, we demonstrate that the Hebbian matrix, built on sparse examples, can be recovered as the fixed point of a gradient descent algorithm with dropout, over a suitable loss function. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Report number: Roma01.Math

arXiv:2503.06274 [pdf, other]

Multi-channel pattern reconstruction through $L$-directional associative memories

Authors: Elena Agliari, Andrea Alessandrelli, Paulo Duarte Mourao, Alberto Fachechi

Abstract: We consider $L$-directional associative memories, composed of $L$ Hopfield networks, displaying imitative Hebbian intra-network interactions and anti-imitative Hebbian inter-network interactions, where couplings are built over a set of hidden binary patterns. We evaluate the model's performance in reconstructing the whole set of hidden binary patterns when provided with mixtures of noisy versions… ▽ More We consider $L$-directional associative memories, composed of $L$ Hopfield networks, displaying imitative Hebbian intra-network interactions and anti-imitative Hebbian inter-network interactions, where couplings are built over a set of hidden binary patterns. We evaluate the model's performance in reconstructing the whole set of hidden binary patterns when provided with mixtures of noisy versions of these patterns. Our numerical results demonstrate the model's high effectiveness in the reconstruction task for structureless and structured datasets. △ Less

Submitted 10 April, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

Report number: Roma01.Math

arXiv:2409.10145 [pdf, ps, other]

The thermodynamic limit in mean field neural networks

Authors: Elena Agliari, Adriano Barra, Pierluigi Bianco, Alberto Fachechi, Diego Pallara

Abstract: In the last five decades, mean-field neural-networks have played a crucial role in modelling associative memories and, in particular, the Hopfield model has been extensively studied using tools borrowed from the statistical mechanics of spin glasses. However, achieving mathematical control of the infinite-volume limit of the model's free-energy has remained elusive, as the standard treatments deve… ▽ More In the last five decades, mean-field neural-networks have played a crucial role in modelling associative memories and, in particular, the Hopfield model has been extensively studied using tools borrowed from the statistical mechanics of spin glasses. However, achieving mathematical control of the infinite-volume limit of the model's free-energy has remained elusive, as the standard treatments developed for spin-glasses have proven unfeasible. Here we address this long-standing problem by proving that a measure-concentration assumption for the order parameters of the theory is sufficient for the existence of the asymptotic limit of the model's free energy. The proof leverages the equivalence between the free energy of the Hopfield model and a linear combination of the free energies of a hard and a soft spin-glass, whose thermodynamic limits are rigorously known. Our work focuses on the replica-symmetry level of description (for which we recover the explicit expression of the free-energy found in the eighties via heuristic methods), yet, our scheme is expected to work also under (at least) the first step of replica symmetry breaking. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Report number: Roma01.Math

arXiv:2406.09924 [pdf, other]

Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines

Authors: Alberto Fachechi, Elena Agliari, Miriam Aquaro, Anthony Coolen, Menno Mulder

Abstract: We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmet… ▽ More We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmetry). In particular, we outline the effective control parameters (e.g., the relative number of weights to be trained, the regularization parameter), whose tuning can yield qualitatively-different operative regimes. Further, we provide analytical and numerical evidence for the existence of a sub-region in the space of the hyperparameters where replica-symmetry breaking occurs. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: Roma01.Math

arXiv:2401.16114 [pdf, other]

doi 10.1016/j.amc.2024.128689

A spectral approach to Hebbian-like neural networks

Authors: Elena Agliari, Domenico Luongo, Alberto Fachechi

Abstract: We consider the Hopfield neural network as a model of associative memory and we define its neuronal interaction matrix $\mathbf{J}$ as a function of a set of $K \times M$ binary vectors $\{\mathbfξ^{μ, A} \}_{μ=1,...,K}^{A=1,...,M}$ representing a sample of the reality that we want to retrieve. In particular, any item $\mathbfξ^{μ, A}$ is meant as a corrupted version of an unknown ground pattern… ▽ More We consider the Hopfield neural network as a model of associative memory and we define its neuronal interaction matrix $\mathbf{J}$ as a function of a set of $K \times M$ binary vectors $\{\mathbfξ^{μ, A} \}_{μ=1,...,K}^{A=1,...,M}$ representing a sample of the reality that we want to retrieve. In particular, any item $\mathbfξ^{μ, A}$ is meant as a corrupted version of an unknown ground pattern $\mathbfζ^μ$, that is the target of our retrieval process. We consider and compare two definitions for $\mathbf{J}$, referred to as supervised and unsupervised, according to whether the class $μ$, each example belongs to, is unveiled or not, also, these definitions recover the paradigmatic Hebb's rule under suitable limits. The spectral properties of the resulting matrices are studied and used to inspect the retrieval capabilities of the related models as a function of their control parameters. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Report number: Roma01.Math.MP

arXiv:2308.01421 [pdf, other]

Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting

Authors: Elena Agliari, Francesco Alemanno, Miriam Aquaro, Alberto Fachechi

Abstract: In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the extent of suc… ▽ More In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the extent of such unlearning is proved to be related to the regularization hyperparameter of the loss function and to the training time. Thus, we can design strategies to avoid overfitting that are formulated in terms of regularization and early-stopping tuning. The generalization capabilities of these attractor networks are also investigated: analytical results are obtained for random synthetic datasets, next, the emerging picture is corroborated by numerical experiments that highlight the existence of several regimes (i.e., overfitting, failure and success) as the dataset parameters are varied. △ Less

Submitted 20 February, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: 29 pages, 10 figures, 4 appendices

Report number: Roma01.Math

arXiv:2207.00790 [pdf, other]

Pavlov Learning Machines

Authors: Elena Agliari, Miriam Aquaro, Adriano Barra, Alberto Fachechi, Chiara Marullo

Abstract: As well known, Hebb's learning traces its origin in Pavlov's Classical Conditioning, however, while the former has been extensively modelled in the past decades (e.g., by Hopfield model and countless variations on theme), as for the latter modelling has remained largely unaddressed so far; further, a bridge between these two pillars is totally lacking. The main difficulty towards this goal lays in… ▽ More As well known, Hebb's learning traces its origin in Pavlov's Classical Conditioning, however, while the former has been extensively modelled in the past decades (e.g., by Hopfield model and countless variations on theme), as for the latter modelling has remained largely unaddressed so far; further, a bridge between these two pillars is totally lacking. The main difficulty towards this goal lays in the intrinsically different scales of the information involved: Pavlov's theory is about correlations among \emph{concepts} that are (dynamically) stored in the synaptic matrix as exemplified by the celebrated experiment starring a dog and a ring bell; conversely, Hebb's theory is about correlations among pairs of adjacent neurons as summarized by the famous statement {\em neurons that fire together wire together}. In this paper we rely on stochastic-process theory and model neural and synaptic dynamics via Langevin equations, to prove that -- as long as we keep neurons' and synapses' timescales largely split -- Pavlov mechanism spontaneously takes place and ultimately gives rise to synaptic weights that recover the Hebbian kernel. △ Less

Submitted 2 July, 2022; originally announced July 2022.

arXiv:2203.14273 [pdf, other]

doi 10.1063/5.0095411

Non-linear PDEs approach to statistical mechanics of Dense Associative Memories

Authors: Elena Agliari, Alberto Fachechi, Chiara Marullo

Abstract: Dense associative memories (DAM), are widespread models in artificial intelligence used for pattern recognition tasks; computationally, they have been proven to be robust against adversarial input and theoretically, leveraging their analogy with spin-glass systems, they are usually treated by means of statistical-mechanics tools. Here we develop analytical methods, based on nonlinear PDEs, to inve… ▽ More Dense associative memories (DAM), are widespread models in artificial intelligence used for pattern recognition tasks; computationally, they have been proven to be robust against adversarial input and theoretically, leveraging their analogy with spin-glass systems, they are usually treated by means of statistical-mechanics tools. Here we develop analytical methods, based on nonlinear PDEs, to investigate their functioning. In particular, we prove differential identities involving DAM partition function and macroscopic observables useful for a qualitative and quantitative analysis of the system. These results allow for a deeper comprehension of the mechanisms underlying DAMs and provide interdisciplinary tools for their study. △ Less

Submitted 27 March, 2022; originally announced March 2022.

Report number: Roma01.Math

arXiv:2106.08978 [pdf, other]

Pattern recognition in Deep Boltzmann machines

Authors: Elena Agliari, Linda Albanese, Francesco Alemanno, Alberto Fachechi

Abstract: We consider a multi-layer Sherrington-Kirkpatrick spin-glass as a model for deep restricted Boltzmann machines and we solve for its quenched free energy, in the thermodynamic limit and allowing for a first step of replica symmetry breaking. This result is accomplished rigorously exploiting interpolating techniques and recovering the expression already known for the replica-symmetry case. Further,… ▽ More We consider a multi-layer Sherrington-Kirkpatrick spin-glass as a model for deep restricted Boltzmann machines and we solve for its quenched free energy, in the thermodynamic limit and allowing for a first step of replica symmetry breaking. This result is accomplished rigorously exploiting interpolating techniques and recovering the expression already known for the replica-symmetry case. Further, we drop the restriction constraint by introducing intra-layer connections among spins and we show that the resulting system can be mapped into a modular Hopfield network, which is also addressed rigorously via interpolating techniques up to the first step of replica symmetry breaking. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 24 pages, 2 figures

Report number: Roma01.Math

Journal ref: Journal of Physics A: Mathematical and Theoretical, Volume 54, Number 50 (2021)

arXiv:2103.13116 [pdf, other]

doi 10.1007/s10955-021-02747-9

PDE/statistical mechanics duality: relation between Guerra's interpolated $p$-spin ferromagnets and the Burgers hierarchy

Authors: Alberto Fachechi

Abstract: We examine the duality relating the equilibrium dynamics of the mean-field $p$-spin ferromagnets at finite size in the Guerra's interpolation scheme and the Burgers hierarchy. In particular, we prove that - for fixed $p$ - the expectation value of the order parameter on the first side w.r.t. the generalized partition function satisfies the $p-1$-th element in the aforementioned class of nonlinear… ▽ More We examine the duality relating the equilibrium dynamics of the mean-field $p$-spin ferromagnets at finite size in the Guerra's interpolation scheme and the Burgers hierarchy. In particular, we prove that - for fixed $p$ - the expectation value of the order parameter on the first side w.r.t. the generalized partition function satisfies the $p-1$-th element in the aforementioned class of nonlinear equations. In the light of this duality, we interpret the phase transitions in the thermodynamic limit of the statistical mechanics model with the development of shock waves in the PDE side. We also obtain the solutions for the $p$-spin ferromagnets at fixed $N$, allowing us to easily generate specific solutions of the corresponding equation in the Burgers hierarchy. Finally, we obtain an effective description of the finite $N$ equilibrium dynamics of the $p=2$ model with some standard tools in PDE side. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:1912.00666 [pdf, other]

doi 10.1088/1751-8121/ab6943

Interpolating between boolean and extremely high noisy patterns through Minimal Dense Associative Memories

Authors: Francesco Alemanno, Martino Centonze, Alberto Fachechi

Abstract: Recently, Hopfield and Krotov introduced the concept of {\em dense associative memories} [DAM] (close to spin-glasses with $P$-wise interactions in a disordered statistical mechanical jargon): they proved a number of remarkable features these networks share and suggested their use to (partially) explain the success of the new generation of Artificial Intelligence. Thanks to a remarkable ante-litte… ▽ More Recently, Hopfield and Krotov introduced the concept of {\em dense associative memories} [DAM] (close to spin-glasses with $P$-wise interactions in a disordered statistical mechanical jargon): they proved a number of remarkable features these networks share and suggested their use to (partially) explain the success of the new generation of Artificial Intelligence. Thanks to a remarkable ante-litteram analysis by Baldi \& Venkatesh, among these properties, it is known these networks can handle a maximal amount of stored patterns $K$ scaling as $K \sim N^{P-1}$.\\ In this paper, once introduced a {\em minimal dense associative network} as one of the most elementary cost-functions falling in this class of DAM, we sacrifice this high-load regime -namely we force the storage of {\em solely} a linear amount of patterns, i.e. $K = αN$ (with $α>0$)- to prove that, in this regime, these networks can correctly perform pattern recognition even if pattern signal is $O(1)$ and is embedded in a sea of noise $O(\sqrt{N})$, also in the large $N$ limit. To prove this statement, by extremizing the quenched free-energy of the model over its natural order-parameters (the various magnetizations and overlaps), we derived its phase diagram, at the replica symmetric level of description and in the thermodynamic limit: as a sideline, we stress that, to achieve this task, aiming at cross-fertilization among disciplines, we pave two hegemon routes in the statistical mechanics of spin glasses, namely the replica trick and the interpolation technique.\\ Both the approaches reach the same conclusion: there is a not-empty region, in the noise-$T$ vs load-$α$ phase diagram plane, where these networks can actually work in this challenging regime; in particular we obtained a quite high critical (linear) load in the (fast) noiseless case resulting in $\lim_{β\to \infty}α_c(β)=0.65$. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 18 pages, 2 figures

arXiv:1911.12707 [pdf, other]

Generalized Guerra's interpolation schemes for dense associative neural networks

Authors: Elena Agliari, Francesco Alemanno, Adriano Barra, Alberto Fachechi

Abstract: In this work we develop analytical techniques to investigate a broad class of associative neural networks set in the high-storage regime. These techniques translate the original statistical-mechanical problem into an analytical-mechanical one which implies solving a set of partial differential equations, rather than tackling the canonical probabilistic route. We test the method on the classical Ho… ▽ More In this work we develop analytical techniques to investigate a broad class of associative neural networks set in the high-storage regime. These techniques translate the original statistical-mechanical problem into an analytical-mechanical one which implies solving a set of partial differential equations, rather than tackling the canonical probabilistic route. We test the method on the classical Hopfield model - where the cost function includes only two-body interactions (i.e., quadratic terms) - and on the "relativistic" Hopfield model - where the (expansion of the) cost function includes p-body (i.e., of degree p) contributions. Under the replica symmetric assumption, we paint the phase diagrams of these models by obtaining the explicit expression of their free energy as a function of the model parameters (i.e., noise level and memory storage). Further, since for non-pairwise models ergodicity breaking is non necessarily a critical phenomenon, we develop a fluctuation analysis and find that criticality is preserved in the relativistic model. △ Less

Submitted 16 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Report number: Roma01.Math

arXiv:1911.12689 [pdf, other]

doi 10.1103/PhysRevLett.124.028301

Neural networks with redundant representation: detecting the undetectable

Authors: Elena Agliari, Francesco Alemanno, Adriano Barra, Martino Centonze, Alberto Fachechi

Abstract: We consider a three-layer Sejnowski machine and show that features learnt via contrastive divergence have a dual representation as patterns in a dense associative memory of order P=4. The latter is known to be able to Hebbian-store an amount of patterns scaling as N^{P-1}, where N denotes the number of constituting binary neurons interacting P-wisely. We also prove that, by keeping the dense assoc… ▽ More We consider a three-layer Sejnowski machine and show that features learnt via contrastive divergence have a dual representation as patterns in a dense associative memory of order P=4. The latter is known to be able to Hebbian-store an amount of patterns scaling as N^{P-1}, where N denotes the number of constituting binary neurons interacting P-wisely. We also prove that, by keeping the dense associative network far from the saturation regime (namely, allowing for a number of patterns scaling only linearly with N, while P>2) such a system is able to perform pattern recognition far below the standard signal-to-noise threshold. In particular, a network with P=4 is able to retrieve information whose intensity is O(1) even in the presence of a noise O(\sqrt{N}) in the large N limit. This striking skill stems from a redundancy representation of patterns -- which is afforded given the (relatively) low-load information storage -- and it contributes to explain the impressive abilities in pattern recognition exhibited by new-generation neural networks. The whole theory is developed rigorously, at the replica symmetric level of approximation, and corroborated by signal-to-noise analysis and Monte Carlo simulations. △ Less

Submitted 28 November, 2019; originally announced November 2019.

Report number: Roma01.Math

Journal ref: Phys. Rev. Lett. 124, 028301 (2020)

arXiv:1812.09077 [pdf, other]

doi 10.1088/1742-5468/ab371d

Dreaming neural networks: rigorous results

Authors: Elena Agliari, Francesco Alemanno, Adriano Barra, Alberto Fachechi

Abstract: Recently a daily routine for associative neural networks has been proposed: the network Hebbian-learns during the awake state (thus behaving as a standard Hopfield model), then, during its sleep state, optimizing information storage, it consolidates pure patterns and removes spurious ones: this forces the synaptic matrix to collapse to the projector one (ultimately approaching the Kanter-Sompolink… ▽ More Recently a daily routine for associative neural networks has been proposed: the network Hebbian-learns during the awake state (thus behaving as a standard Hopfield model), then, during its sleep state, optimizing information storage, it consolidates pure patterns and removes spurious ones: this forces the synaptic matrix to collapse to the projector one (ultimately approaching the Kanter-Sompolinksy model). This procedure keeps the learning Hebbian-based (a biological must) but, by taking advantage of a (properly stylized) sleep phase, still reaches the maximal critical capacity (for symmetric interactions). So far this emerging picture (as well as the bulk of papers on unlearning techniques) was supported solely by mathematically-challenging routes, e.g. mainly replica-trick analysis and numerical simulations: here we rely extensively on Guerra's interpolation techniques developed for neural networks and, in particular, we extend the generalized stochastic stability approach to the case. Confining our description within the replica symmetric approximation (where the previous ones lie), the picture painted regarding this generalization (and the previously existing variations on theme) is here entirely confirmed. Further, still relying on Guerra's schemes, we develop a systematic fluctuation analysis to check where ergodicity is broken (an analysis entirely absent in previous investigations). We find that, as long as the network is awake, ergodicity is bounded by the Amit-Gutfreund-Sompolinsky critical line (as it should), but, as the network sleeps, sleeping destroys spin glass states by extending both the retrieval as well as the ergodic region: after an entire sleeping session the solely surviving regions are retrieval and ergodic ones and this allows the network to achieve the perfect retrieval regime (the number of storable patterns equals the number of neurons in the network). △ Less

Submitted 21 December, 2018; originally announced December 2018.

Report number: Roma01.Math

arXiv:1811.08298 [pdf, other]

A novel derivation of the Marchenko-Pastur law through analog bipartite spin-glasses

Authors: Elena Agliari, Francesco Alemanno, Adriano Barra, Alberto Fachechi

Abstract: In this work we consider the {\em analog bipartite spin-glass} (or {\em real-valued restricted Boltzmann machine} in a neural network jargon), whose variables (those quenched as well as those dynamical) share standard Gaussian distributions. First, via Guerra's interpolation technique, we express its quenched free energy in terms of the natural order parameters of the theory (namely the self- and… ▽ More In this work we consider the {\em analog bipartite spin-glass} (or {\em real-valued restricted Boltzmann machine} in a neural network jargon), whose variables (those quenched as well as those dynamical) share standard Gaussian distributions. First, via Guerra's interpolation technique, we express its quenched free energy in terms of the natural order parameters of the theory (namely the self- and two-replica overlaps), then, we re-obtain the same result by using the replica-trick: a mandatory tribute, given the special occasion. Next, we show that the quenched free energy of this model is the functional generator of the moments of the correlation matrix among the weights connecting the two layers of the spin-glass (i.e., the Wishart matrix in random matrix theory or the Hebbian coupling in neural networks): as weights are quenched stochastic variables, this plays as a novel tool to inspect random matrices. In particular, we find that the Stieltjes transform of the spectral density of the correlation matrix is determined by the (replica-symmetric) quenched free energy of the bipartite spin-glass model. In this setup, we re-obtain the Marchenko-Pastur law in a very simple way. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Report number: Roma01.Math

arXiv:1810.12217 [pdf, other]

Dreaming neural networks: forgetting spurious memories and reinforcing pure ones

Authors: Alberto Fachechi, Elena Agliari, Adriano Barra

Abstract: The standard Hopfield model for associative neural networks accounts for biological Hebbian learning and acts as the harmonic oscillator for pattern recognition, however its maximal storage capacity is $α\sim 0.14$, far from the theoretical bound for symmetric networks, i.e. $α=1$. Inspired by sleeping and dreaming mechanisms in mammal brains, we propose an extension of this model displaying the s… ▽ More The standard Hopfield model for associative neural networks accounts for biological Hebbian learning and acts as the harmonic oscillator for pattern recognition, however its maximal storage capacity is $α\sim 0.14$, far from the theoretical bound for symmetric networks, i.e. $α=1$. Inspired by sleeping and dreaming mechanisms in mammal brains, we propose an extension of this model displaying the standard on-line (awake) learning mechanism (that allows the storage of external information in terms of patterns) and an off-line (sleep) unlearning$\&$consolidating mechanism (that allows spurious-pattern removal and pure-pattern reinforcement): this obtained daily prescription is able to saturate the theoretical bound $α=1$, remaining also extremely robust against thermal noise. Both neural and synaptic features are analyzed both analytically and numerically. In particular, beyond obtaining a phase diagram for neural dynamics, we focus on synaptic plasticity and we give explicit prescriptions on the temporal evolution of the synaptic matrix. We analytically prove that our algorithm makes the Hebbian kernel converge with high probability to the projection matrix built over the pure stored patterns. Furthermore, we obtain a sharp and explicit estimate for the "sleep rate" in order to ensure such a convergence. Finally, we run extensive numerical simulations (mainly Monte Carlo sampling) to check the approximations underlying the analytical investigations (e.g., we developed the whole theory at the so called replica-symmetric level, as standard in the Amit-Gutfreund-Sompolinsky reference framework) and possible finite-size effects, finding overall full agreement with the theory. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: 31 pages, 12 figures

Report number: Roma01.Math

arXiv:1801.01743 [pdf, other]

A relativistic extension of Hopfield neural networks via the mechanical analogy

Authors: Adriano Barra, Matteo Beccaria, Alberto Fachechi

Abstract: We propose a modification of the cost function of the Hopfield model whose salient features shine in its Taylor expansion and result in more than pairwise interactions with alternate signs, suggesting a unified framework for handling both with deep learning and network pruning. In our analysis, we heavily rely on the Hamilton-Jacobi correspondence relating the statistical model with a mechanical s… ▽ More We propose a modification of the cost function of the Hopfield model whose salient features shine in its Taylor expansion and result in more than pairwise interactions with alternate signs, suggesting a unified framework for handling both with deep learning and network pruning. In our analysis, we heavily rely on the Hamilton-Jacobi correspondence relating the statistical model with a mechanical system. In this picture, our model is nothing but the relativistic extension of the original Hopfield model (whose cost function is a quadratic form in the Mattis magnetization which mimics the non-relativistic Hamiltonian for a free particle). We focus on the low-storage regime and solve the model analytically by taking advantage of the mechanical analogy, thus obtaining a complete characterization of the free energy and the associated self-consistency equations in the thermodynamic limit. On the numerical side, we test the performances of our proposal with MC simulations, showing that the stability of spurious states (limiting the capabilities of the standard Hebbian construction) is sensibly reduced due to presence of unlearning contributions in this extended framework. △ Less

Submitted 5 January, 2018; originally announced January 2018.

Showing 1–17 of 17 results for author: Fachechi, A