-
Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training
Authors:
Tony Bonnaire,
Raphaël Urfin,
Giulio Biroli,
Marc Mézard
Abstract:
Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify…
▽ More
Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early time $τ_\mathrm{gen}$ at which models begin to generate high-quality samples, and a later time $τ_\mathrm{mem}$ beyond which memorization emerges. Crucially, we find that $τ_\mathrm{mem}$ increases linearly with the training set size $n$, while $τ_\mathrm{gen}$ remains constant. This creates a growing window of training times with $n$ where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allow to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
A Differentiable Rank-Based Objective For Better Feature Learning
Authors:
Krunoslav Lehman Pavasovic,
David Lopez-Paz,
Giulio Biroli,
Levent Sagun
Abstract:
In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in \cite{azadkia2021simple}. While FOCI is based on a non-parametric coefficient of conditional dependence, we introduce its parametric, differentiable…
▽ More
In this paper, we leverage existing statistical methods to better understand feature learning from data. We tackle this by modifying the model-free variable selection method, Feature Ordering by Conditional Independence (FOCI), which is introduced in \cite{azadkia2021simple}. While FOCI is based on a non-parametric coefficient of conditional dependence, we introduce its parametric, differentiable approximation. With this approximate coefficient of correlation, we present a new algorithm called difFOCI, which is applicable to a wider range of machine learning problems thanks to its differentiable nature and learnable parameters. We present difFOCI in three contexts: (1) as a variable selection method with baseline comparisons to FOCI, (2) as a trainable model parametrized with a neural network, and (3) as a generic, widely applicable neural network regularizer, one that improves feature learning with better management of spurious correlations. We evaluate difFOCI on increasingly complex problems ranging from basic variable selection in toy examples to saliency map comparisons in convolutional networks. We then show how difFOCI can be incorporated in the context of fairness to facilitate classifications without relying on sensitive data.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms
Authors:
Krunoslav Lehman Pavasovic,
Jakob Verbeek,
Giulio Biroli,
Marc Mezard
Abstract:
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distr…
▽ More
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distribution sharper than the target one, more shifted towards the boundary of the class. In this work, we provide a high-dimensional analysis of CFG, showing that these distortions vanish as the data dimension grows. We present a blessing-of-dimensionality result demonstrating that in sufficiently high and infinite dimensions, CFG accurately reproduces the target distribution. Using our high-dimensional theory, we show that there is a large family of guidances enjoying this property, in particular non-linear CFG generalizations. We study a simple non-linear power-law version, for which we demonstrate improved robustness, sample fidelity and diversity. Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.
△ Less
Submitted 22 May, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Optimizing Noise Schedules of Generative Models in High Dimensionss
Authors:
Santiago Aranguri,
Giulio Biroli,
Marc Mezard,
Eric Vanden-Eijnden
Abstract:
Recent works have shown that diffusion models can undergo phase transitions, the resolution of which is needed for accurately generating samples. This has motivated the use of different noise schedules, the two most common choices being referred to as variance preserving (VP) and variance exploding (VE). Here we revisit these schedules within the framework of stochastic interpolants. Using the Gau…
▽ More
Recent works have shown that diffusion models can undergo phase transitions, the resolution of which is needed for accurately generating samples. This has motivated the use of different noise schedules, the two most common choices being referred to as variance preserving (VP) and variance exploding (VE). Here we revisit these schedules within the framework of stochastic interpolants. Using the Gaussian Mixture (GM) and Curie-Weiss (CW) data distributions as test case models, we first investigate the effect of the variance of the initial noise distribution and show that VP recovers the low-level feature (the distribution of each mode) but misses the high-level feature (the asymmetry between modes), whereas VE performs oppositely. We also show that this dichotomy, which happens when denoising by a constant amount in each step, can be avoided by using noise schedules specific to VP and VE that allow for the recovery of both high- and low-level features. Finally we show that these schedules yield generative models for the GM and CW model whose probability flow ODE can be discretized using $Θ_d(1)$ steps in dimension $d$ instead of the $Θ_d(\sqrt{d})$ steps required by constant denoising.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Effective Energy, Interactions And Out Of Equilibrium Nature Of Scalar Active Matter
Authors:
Antonin Brossollet,
Etienne Lempereur,
Stéphane Mallat,
Giulio Biroli
Abstract:
Estimating the effective energy, $E_\text{eff}$ of a stationary probability distribution is a challenge for non-equilibrium steady states. Its solution could offer a novel framework for describing and analyzing non-equilibrium systems. In this work, we address this issue within the context of scalar active matter, focusing on the continuum field theory of Active Model B+. We show that the Wavelet…
▽ More
Estimating the effective energy, $E_\text{eff}$ of a stationary probability distribution is a challenge for non-equilibrium steady states. Its solution could offer a novel framework for describing and analyzing non-equilibrium systems. In this work, we address this issue within the context of scalar active matter, focusing on the continuum field theory of Active Model B+. We show that the Wavelet Conditional Renormalization Group method allows us to estimate the effective energy of active model B+ from samples obtained by numerical simulations. We investigate the qualitative changes of $E_\text{eff}$ as the activity level increases. Our key finding is that in the regimes corresponding to low activity and to standard phase separation the interactions in $E_\text{eff}$ are short-ranged, whereas for strong activity the interactions become long-ranged and lead to micro-phase separation. By analyzing the violation of Fluctuation-Dissipation theorem and entropy production patterns, which are directly accessible within the WCRG framework, we connect the emergence of these long-range interactions to the non-equilibrium nature of the steady state. This connection highlights the interplay between activity, range of the interactions and the fundamental properties of non-equilibrium systems.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Non-reciprocal spin-glass transition and aging
Authors:
Giulia Garcia Lorenzana,
Ada Altieri,
Giulio Biroli,
Michel Fruchart,
Vincenzo Vitelli
Abstract:
Disordered systems generically exhibit aging and a glass transition. Previous studies have long suggested that non-reciprocity tends to destroy glassiness. Here, we show that this is not always the case using a bipartite spherical Sherrington-Kirpatrick model that describes the antagonistic coupling between two identical complex agents modeled as macroscopic spin glasses. Our dynamical mean field…
▽ More
Disordered systems generically exhibit aging and a glass transition. Previous studies have long suggested that non-reciprocity tends to destroy glassiness. Here, we show that this is not always the case using a bipartite spherical Sherrington-Kirpatrick model that describes the antagonistic coupling between two identical complex agents modeled as macroscopic spin glasses. Our dynamical mean field theory calculations reveal an exceptional-point mediated transition from a static disorder phase to an oscillating amorphous phase as well as non-reciprocal aging with slow dynamics and oscillations.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Kernel Density Estimators in Large Dimensions
Authors:
Giulio Biroli,
Marc Mézard
Abstract:
This paper studies Kernel Density Estimation for a high-dimensional distribution $ρ(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $α=(\log n)/d$. Our study reveals three distinct statistical regimes for…
▽ More
This paper studies Kernel Density Estimation for a high-dimensional distribution $ρ(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $α=(\log n)/d$. Our study reveals three distinct statistical regimes for the kernel-based estimate of the density $\hat ρ_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$, depending on the bandwidth $h$: a classical regime for large bandwidth where the Central Limit Theorem (CLT) holds, which is akin to the one found in traditional approaches. Below a certain value of the bandwidth, $h_{CLT}(α)$, we find that the CLT breaks down. The statistics of $\hatρ_h^{\mathcal {D}}(x)$ for a fixed $x$ drawn from $ρ(x)$ is given by a heavy-tailed distribution (an alpha-stable distribution). In particular below a value $h_G(α)$, we find that $\hatρ_h^{\mathcal {D}}(x)$ is governed by extreme value statistics: only a few points in the database matter and give the dominant contribution to the density estimator. We provide a detailed analysis for high-dimensional multivariate Gaussian data. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper. As known by practitioners, when decreasing the bandwidth a Kernel-estimated estimated changes from a smooth curve to a collections of peaks centred on the data points. Our findings reveal that this general phenomenon is related to sharp transitions between phases characterized by different statistical properties, and offer new insights for Kernel density estimation in high-dimensional settings.
△ Less
Submitted 18 October, 2024; v1 submitted 11 August, 2024;
originally announced August 2024.
-
Quantum Thermalization via Travelling Waves
Authors:
Antonio Picano,
Giulio Biroli,
Marco Schirò
Abstract:
Isolated quantum many-body systems which thermalize under their own dynamics are expected to act as their own thermal baths, thereby bringing their local subsystems to thermal equilibrium. Here we show that the infinite-dimensional limit of a quantum lattice model, as described by Dynamical Mean-Field theory (DMFT), provides a natural framework to understand this self-consistent thermalization pro…
▽ More
Isolated quantum many-body systems which thermalize under their own dynamics are expected to act as their own thermal baths, thereby bringing their local subsystems to thermal equilibrium. Here we show that the infinite-dimensional limit of a quantum lattice model, as described by Dynamical Mean-Field theory (DMFT), provides a natural framework to understand this self-consistent thermalization process. Using the Fermi-Hubbard model as working example, we demonstrate that the emergence of a self-consistent bath thermalising the system is characterized by a sharp thermalization front, moving balistically and separating the initial condition from the long-time thermal fixed point. We characterize the full DMFT dynamics through an effective temperature for which we derive a travelling-wave equation of the Fisher-Kolmogorov-Petrovsky-Piskunov (FKPP) type. This equation allows to predict the asymptotic shape of the front and its velocity, which match perfectly the full DMFT numerics. Our results provide a new angle to understand the onset of quantum thermalisation in closed isolated systems.
△ Less
Submitted 16 December, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Cascade of phase transitions in the training of Energy-based models
Authors:
Dimitrios Bachtis,
Giulio Biroli,
Aurélien Decelle,
Beatriz Seoane
Abstract:
In this paper, we investigate the feature encoding process in a prototypical energy-based generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value dec…
▽ More
In this paper, we investigate the feature encoding process in a prototypical energy-based generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value decomposition, revealing a series of phase transitions associated to a progressive learning of the principal modes of the empirical probability distribution. The model first learns the center of mass of the modes and then progressively resolve all modes through a cascade of phase transitions. We first describe this process analytically in a controlled setup that allows us to study analytically the training dynamics. We then validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets. By using data sets of increasing dimension, we show that learning indeed leads to sharp phase transitions in the high-dimensional limit. Moreover, we propose and test a mean-field finite-size scaling hypothesis. This shows that the first phase transition is in the same universality class of the one we studied analytically, and which is reminiscent of the mean-field paramagnetic-to-ferromagnetic phase transition.
△ Less
Submitted 10 February, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Normalizing flows as an enhanced sampling method for atomistic supercooled liquids
Authors:
Gerhard Jung,
Giulio Biroli,
Ludovic Berthier
Abstract:
Normalizing flows can transform a simple prior probability distribution into a more complex target distribution. Here, we evaluate the ability and efficiency of generative machine learning methods to sample the Boltzmann distribution of an atomistic model for glass-forming liquids. This is a notoriously difficult task, as it amounts to ergodically exploring the complex free energy landscape of a d…
▽ More
Normalizing flows can transform a simple prior probability distribution into a more complex target distribution. Here, we evaluate the ability and efficiency of generative machine learning methods to sample the Boltzmann distribution of an atomistic model for glass-forming liquids. This is a notoriously difficult task, as it amounts to ergodically exploring the complex free energy landscape of a disordered and frustrated many-body system. We optimize a normalizing flow model to successfully transform high-temperature configurations of a dense liquid into low-temperature ones, near the glass transition. We perform a detailed comparative analysis with established enhanced sampling techniques developed in the physics literature to assess and rank the performance of normalizing flows against state-of-the-art algorithms. We demonstrate that machine learning methods are very promising, showing a large speedup over conventional molecular dynamics. Normalizing flows show performances comparable to parallel tempering and population annealing, while still falling far behind the swap Monte Carlo algorithm. Our study highlights the potential of generative machine learning models in scientific computing for complex systems, but also points to some of its current limitations and the need for further improvement.
△ Less
Submitted 13 September, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
From Zero to Hero: How local curvature at artless initial conditions leads away from bad minima
Authors:
Tony Bonnaire,
Giulio Biroli,
Chiara Cammarota
Abstract:
We provide an analytical study of the evolution of the Hessian during gradient descent dynamics, and relate a transition in its spectral properties to the ability of finding good minima. We focus on the phase retrieval problem as a case study for complex loss landscapes. We first characterize the high-dimensional limit where both the number $M$ and the dimension $N$ of the data are going to infini…
▽ More
We provide an analytical study of the evolution of the Hessian during gradient descent dynamics, and relate a transition in its spectral properties to the ability of finding good minima. We focus on the phase retrieval problem as a case study for complex loss landscapes. We first characterize the high-dimensional limit where both the number $M$ and the dimension $N$ of the data are going to infinity at fixed signal-to-noise ratio $α= M/N$. For small $α$, the Hessian is uninformative with respect to the signal. For $α$ larger than a critical value, the Hessian displays at short-times a downward direction pointing towards good minima. While descending, a transition in the spectrum takes place: the direction is lost and the system gets trapped in bad minima. Hence, the local landscape is benign and informative at first, before gradient descent brings the system into a uninformative maze. Through both theoretical analysis and numerical experiments, we show that this dynamical transition plays a crucial role for finite (even very large) $N$: it allows the system to recover the signal well before the algorithmic threshold corresponding to the $N\rightarrow\infty$ limit. Our analysis sheds light on this new mechanism that facilitates gradient descent dynamics in finite dimensions, and highlights the importance of a good initialization based on spectral properties for optimization in complex high-dimensional landscapes.
△ Less
Submitted 23 September, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Dynamical Regimes of Diffusion Models
Authors:
Giulio Biroli,
Tony Bonnaire,
Valentin de Bortoli,
Marc Mézard
Abstract:
Using statistical physics methods, we study generative diffusion models in the regime where the dimension of space and the number of data are large, and the score function has been trained optimally. Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process. The generative dynamics, starting from pure noise, encounters first a 'speciation' transition wh…
▽ More
Using statistical physics methods, we study generative diffusion models in the regime where the dimension of space and the number of data are large, and the score function has been trained optimally. Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process. The generative dynamics, starting from pure noise, encounters first a 'speciation' transition where the gross structure of data is unraveled, through a mechanism similar to symmetry breaking in phase transitions. It is followed at later time by a 'collapse' transition where the trajectories of the dynamics become attracted to one of the memorized data points, through a mechanism which is similar to the condensation in a glass phase. For any dataset, the speciation time can be found from a spectral analysis of the correlation matrix, and the collapse time can be found from the estimation of an 'excess entropy' in the data. The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models. Analytical solutions for simple models like high-dimensional Gaussian mixtures substantiate these findings and provide a theoretical framework, while extensions to more complex scenarios and numerical validations with real datasets confirm the theoretical predictions.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Yielding and plasticity in amorphous solids
Authors:
Ludovic Berthier,
Giulio Biroli,
M. Lisa Manning,
Francesco Zamponi
Abstract:
The physics of disordered media, from metallic glasses to colloidal suspensions, granular matter and biological tissues, offers difficult challenges because it often occurs far from equilibrium, in materials lacking symmetries and evolving through complex energy landscapes. Here, we review recent theoretical efforts to provide microscopic insights into the mechanical properties of amorphous media…
▽ More
The physics of disordered media, from metallic glasses to colloidal suspensions, granular matter and biological tissues, offers difficult challenges because it often occurs far from equilibrium, in materials lacking symmetries and evolving through complex energy landscapes. Here, we review recent theoretical efforts to provide microscopic insights into the mechanical properties of amorphous media using approaches from statistical mechanics as unifying frameworks. We cover both the initial regime corresponding to small deformations, and the yielding transition marking a change between elastic response and plastic flow. We discuss the specific features arising for systems evolving near a jamming transition, and extend our discussion to recent studies of the rheology of dense biological and active materials.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Dynamical Facilitation Governs the Equilibration Dynamics of Glasses
Authors:
Rahul N. Chacko,
François P. Landes,
Giulio Biroli,
Olivier Dauchot,
Andrea J. Liu,
David R. Reichman
Abstract:
Convincing evidence of domain growth in the heating of ultrastable glasses suggests that the equilibration dynamics of super-cooled liquids could be driven by a nucleation and growth mechanism. We investigate this possibility by simulating the equilibration dynamics of a model glass during both heating and cooling between poorly and well-annealed states. Though we do observe the growth of domains…
▽ More
Convincing evidence of domain growth in the heating of ultrastable glasses suggests that the equilibration dynamics of super-cooled liquids could be driven by a nucleation and growth mechanism. We investigate this possibility by simulating the equilibration dynamics of a model glass during both heating and cooling between poorly and well-annealed states. Though we do observe the growth of domains during heating, we find that domains are absent during cooling. This absence is inconsistent with classical nucleation theory. By comparing the equilibration dynamics of our glass with that of two models with kinetic constraints, we demonstrate that dynamical facilitation generically leads to heating driven by domain growth and cooling without domains. Our results provide strong evidence that dynamical facilitation, not nucleation and interfacial-tension-driven domain growth, is the driving mechanism for the equilibration dynamics of glass-formers.
△ Less
Submitted 10 May, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Large-deviation analysis of rare resonances for the Many-Body localization transition
Authors:
Giulio Biroli,
Alexander K. Hartmann,
Marco Tarzia
Abstract:
A central theoretical issue at the core of the current research on many-body localization (MBL) consists in characterizing the statistics of rare long-range resonances in many-body eigenstates. This is of paramount importance to understand: (i) the critical properties of the MBL transition and the mechanism for its destabilization through quantum avalanches; (ii) the unusual transport and anomalou…
▽ More
A central theoretical issue at the core of the current research on many-body localization (MBL) consists in characterizing the statistics of rare long-range resonances in many-body eigenstates. This is of paramount importance to understand: (i) the critical properties of the MBL transition and the mechanism for its destabilization through quantum avalanches; (ii) the unusual transport and anomalously slow out-of-equilibrium relaxation when the transition is approached from the metallic side. In order to study and characterize such long-range rare resonances, we develop a large-deviations approach based on an analogy with the physics of directed polymers in random media, and in particular with their freezing glass transition on infinite-dimensional graphs. The basic idea is to enlarge the parameter space by adding an auxiliary parameter (which plays the role of the inverse temperature in the directed polymer formulation) which allows us to fine-tune the effect of anomalously large outliers in the far-tails of the probability distributions of the transmission amplitudes between far-away many-body configurations in the Hilbert space. We first benchmark our approach onto two non-interacting paradigmatic toy models, namely the single-particle Anderson model on the (loop-less) Cayley tree and the Rosenzweig-Porter random matrix ensemble, and then apply it to the study of a class of disordered quantum spin chains in a transverse field. This analysis shows the existence of a broad disorder range in which rare, long-distance resonances, that may form only for a few specific realizations of the disorder and a few specific choice of the random initial state, destabilize the MBL phase, while the genuine MBL transition is shifted to much larger values of the disorder than originally thought.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Ductile-to-brittle transition and yielding in soft amorphous materials: perspectives and open questions
Authors:
Thibaut Divoux,
Elisabeth Agoritsas,
Stefano Aime,
Catherine Barentin,
Jean-Louis Barrat,
Roberto Benzi,
Ludovic Berthier,
Dapeng Bi,
Giulio Biroli,
Daniel Bonn,
Philippe Bourrianne,
Mehdi Bouzid,
Emanuela Del Gado,
Hélène Delanoë-Ayari,
Kasra Farain,
Suzanne Fielding,
Matthias Fuchs,
Jasper van der Gucht,
Silke Henkes,
Maziyar Jalaal,
Yogesh M. Joshi,
Anaël Lemaître,
Robert L. Leheny,
Sébastien Manneville,
Kirsten Martens
, et al. (15 additional authors not shown)
Abstract:
Soft amorphous materials are viscoelastic solids ubiquitously found around us, from clays and cementitious pastes to emulsions and physical gels encountered in food or biomedical engineering. Under an external deformation, these materials undergo a noteworthy transition from a solid to a liquid state that reshapes the material microstructure. This yielding transition was the main theme of a worksh…
▽ More
Soft amorphous materials are viscoelastic solids ubiquitously found around us, from clays and cementitious pastes to emulsions and physical gels encountered in food or biomedical engineering. Under an external deformation, these materials undergo a noteworthy transition from a solid to a liquid state that reshapes the material microstructure. This yielding transition was the main theme of a workshop held from January 9 to 13, 2023 at the Lorentz Center in Leiden. The manuscript presented here offers a critical perspective on the subject, synthesizing insights from the various brainstorming sessions and informal discussions that unfolded during this week of vibrant exchange of ideas. The result of these exchanges takes the form of a series of open questions that represent outstanding experimental, numerical, and theoretical challenges to be tackled in the near future.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Mean-Field Analysis of the Glassy Dynamics of an Elastoplastic Model of Super-Cooled Liquids
Authors:
Joseph W. Baron,
Giulio Biroli
Abstract:
We present a mean-field theory of a coarse-grained model of a super-cooled liquid in which relaxation occurs via local plastic rearrangements. Local relaxation can be induced by thermal fluctuations or by the long-range elastic consequences of other rearrangements. We extract the temperature dependence of both the relaxation time and the lengthscale of dynamical correlations. We find two dynamical…
▽ More
We present a mean-field theory of a coarse-grained model of a super-cooled liquid in which relaxation occurs via local plastic rearrangements. Local relaxation can be induced by thermal fluctuations or by the long-range elastic consequences of other rearrangements. We extract the temperature dependence of both the relaxation time and the lengthscale of dynamical correlations. We find two dynamical regimes. First, a regime in which the characteristic time and length scales diverge as a power law at a critical temperature $T_c$. This regime is found by an approximation that neglects activated relaxation channels, which can be interpreted as akin to the one found by the mode-coupling transition of glasses. In reality, only a cross-over takes place at $T_c$. The residual plastic activity leads to a second regime characterised by an Arrhenius law below $T_c$. In this case, we show that the lengthscale governing dynamical correlations diverges as a power law as $T\to 0$, and is logarithmically related to the relaxation time.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Curvature-driven pathways interpolating between stationary points: the case of the pure spherical 3-spin model
Authors:
Alessandro Pacco,
Giulio Biroli,
Valentina Ros
Abstract:
This paper focuses on characterizing the energy profile along pathways connecting different regions of configuration space in the context of a prototypical glass model, the pure spherical $p$-spin model with $p=3$. The study investigates pairs of stationary points (local minima or rank-1 saddles), analyzing the energy profile along geodesic paths and comparing them with "perturbed" pathways correl…
▽ More
This paper focuses on characterizing the energy profile along pathways connecting different regions of configuration space in the context of a prototypical glass model, the pure spherical $p$-spin model with $p=3$. The study investigates pairs of stationary points (local minima or rank-1 saddles), analyzing the energy profile along geodesic paths and comparing them with "perturbed" pathways correlated to the landscape curvature. The goal is to assess the extent to which information from the local Hessian matrices around stationary points can identify paths with lower energy barriers. Surprisingly, unlike findings in other systems, the direction of softest local curvature is not a reliable predictor of low-energy paths, except in the case in which the direction of softest curvature corresponds to an isolated mode of the Hessian. However, other information encoded in the local Hessian does allow the identification of pathways associated with lower energy barriers. We conclude commenting on implications for the system's activated dynamics.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Roadmap on machine learning glassy dynamics
Authors:
Gerhard Jung,
Rinske M. Alkemade,
Victor Bapst,
Daniele Coslovich,
Laura Filion,
François P. Landes,
Andrea Liu,
Francesco Saverio Pezzicoli,
Hayato Shiba,
Giovanni Volpe,
Francesco Zamponi,
Ludovic Berthier,
Giulio Biroli
Abstract:
Unraveling the connections between microscopic structure, emergent physical properties, and slow dynamics has long been a challenge when studying the glass transition. The absence of clear visible structural order in amorphous configurations complicates the identification of the key physical mechanisms underpinning slow dynamics. The difficulty in sampling equilibrated configurations at low temper…
▽ More
Unraveling the connections between microscopic structure, emergent physical properties, and slow dynamics has long been a challenge when studying the glass transition. The absence of clear visible structural order in amorphous configurations complicates the identification of the key physical mechanisms underpinning slow dynamics. The difficulty in sampling equilibrated configurations at low temperatures hampers thorough numerical and theoretical investigations. This perspective article explores the potential of machine learning (ML) techniques to face these challenges, building on the algorithms that have revolutionized computer vision and image recognition. We present recent successful ML applications, as well as many open problems for the future, such as transferability and interpretability of ML approaches. We highlight new ideas and directions in which ML could provide breakthroughs to better understand the fundamental mechanisms at play in glass-forming liquids. To foster a collaborative community effort, this article also introduces the ``GlassBench" dataset, providing simulation data and benchmarks for both two-dimensional and three-dimensional glass-formers. We propose critical metrics to compare the performance of emerging ML methodologies, in line with benchmarking practices in image and text recognition. The goal of this roadmap is to provide guidelines for the development of ML techniques in systems displaying slow dynamics, while inspiring new directions to improve our theoretical understanding of glassy liquids.
△ Less
Submitted 26 September, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions
Authors:
Simon Martin,
Francis Bach,
Giulio Biroli
Abstract:
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian.W…
▽ More
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian.We first derive convergence properties for the gradient flow and quantify the overparameterization that is necessary to achieve a strong signal recovery. Then, assuming that the teachers and the students at initialization form independent orthonormal families, we derive a high-dimensional limit for the flow and show that the minimal overparameterization is sufficient for strong recovery. We verify by numerical experiments that these results hold for more general initializations.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Dynamic heterogeneity at the experimental glass transition predicted by transferable machine learning
Authors:
Gerhard Jung,
Giulio Biroli,
Ludovic Berthier
Abstract:
We develop a transferable machine learning model which predicts structural relaxation from amorphous supercooled liquid structures. The trained networks are able to predict dynamic heterogeneity across a broad range of temperatures and time scales with excellent accuracy and transferability. We use the network transferability to predict dynamic heterogeneity down to the experimental glass transiti…
▽ More
We develop a transferable machine learning model which predicts structural relaxation from amorphous supercooled liquid structures. The trained networks are able to predict dynamic heterogeneity across a broad range of temperatures and time scales with excellent accuracy and transferability. We use the network transferability to predict dynamic heterogeneity down to the experimental glass transition temperature, $T_g$, where structural relaxation cannot be analyzed using molecular dynamics simulations. The results indicate that the strength, the geometry and the characteristic length scale of the dynamic heterogeneity evolve much more slowly near $T_g$ compared to their evolution at higher temperatures. Our results show that machine learning techniques can provide physical insights on the nature of the glass transition that cannot be gained using conventional simulation techniques.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Shear-Induced Phase Behavior and Topological Defects in Two-Dimensional Crystals
Authors:
Federico Ghimenti,
Misaki Ozawa,
Giulio Biroli,
Gilles Tarjus
Abstract:
We investigate through numerical simulations how a two-dimensional crystal yields and flows under an applied shear. We focus over a range that allows us to both address the response in the limit of an infinitesimal shear rate and describe the phase behavior of the system at a finite shear rate. In doing so, we carefully discuss the role of the topological defects and of the finite-size effects. We…
▽ More
We investigate through numerical simulations how a two-dimensional crystal yields and flows under an applied shear. We focus over a range that allows us to both address the response in the limit of an infinitesimal shear rate and describe the phase behavior of the system at a finite shear rate. In doing so, we carefully discuss the role of the topological defects and of the finite-size effects. We map out the whole phase diagram of the flowing steady state in the plane formed by temperature and shear rate. Shear-induced melting of the two-dimensional crystal is found to proceed in two steps: first, the solid loses long-range bond-orientational order and flows, even for an infinitesimal shear rate (in the thermodynamic limit). The resulting flowing hexatic phase then melts to a flowing, rather isotropic, liquid at a finite shear rate that depends on temperature. Finally, at a high shear rate, a third regime corresponding to a strongly anisotropic string-like flowing phase appears.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Far-from-equilibrium criticality in the Random Field Ising Model with Eshelby Interactions
Authors:
Saverio Rossi,
Giulio Biroli,
Misaki Ozawa,
Gilles Tarjus
Abstract:
We study a quasi-statically driven random field Ising model (RFIM) at zero temperature with interactions mediated by the long-range anisotropic Eshelby kernel. Analogously to amorphous solids at their yielding transition, and differently from ferromagnetic and dipolar RFIMs, the model shows a discontinuous magnetization jump associated with the appearance of a band-like structure for weak disorder…
▽ More
We study a quasi-statically driven random field Ising model (RFIM) at zero temperature with interactions mediated by the long-range anisotropic Eshelby kernel. Analogously to amorphous solids at their yielding transition, and differently from ferromagnetic and dipolar RFIMs, the model shows a discontinuous magnetization jump associated with the appearance of a band-like structure for weak disorder and a continuous magnetization growth, yet punctuated by avalanches, for strong disorder. Through a finite-size scaling analysis in 2 and 3 dimensions we find that the two regimes are separated by a finite-disorder critical point which we characterize. We discuss similarities and differences between the present model and models of sheared amorphous solids.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Interactions and migration rescuing ecological diversity
Authors:
Giulia Garcia Lorenzana,
Ada Altieri,
Giulio Biroli
Abstract:
How diversity is maintained in natural ecosystems is a long-standing question in Theoretical Ecology. By studying a system that combines ecological dynamics, heterogeneous interactions and spatial structure, we uncover a new mechanism for the survival of diversity-rich ecosystems in the presence of demographic fluctuations. For a single species, one finds a continuous phase transition between an e…
▽ More
How diversity is maintained in natural ecosystems is a long-standing question in Theoretical Ecology. By studying a system that combines ecological dynamics, heterogeneous interactions and spatial structure, we uncover a new mechanism for the survival of diversity-rich ecosystems in the presence of demographic fluctuations. For a single species, one finds a continuous phase transition between an extinction and a survival state, that falls into the universality class of Directed Percolation. Here we show that the case of many species with heterogeneous interactions is different and richer. By merging theory and simulations, we demonstrate that with sufficiently strong demographic noise, the system exhibits behavior akin to the single-species case, undergoing a continuous transition. Conversely, at low demographic noise, we observe unique features indicative of the ecosystem's complexity. The combined effects of the heterogeneity in the interaction network and migration enable the community to thrive, even in situations where demographic noise would lead to the extinction of isolated species. The emergence of mutualism induces the development of global bistability, accompanied by sudden tipping points. We present a way to predict the catastrophic shift from high diversity to extinction by probing responses to perturbations as an early warning signal.
△ Less
Submitted 5 February, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics
Authors:
Tony Bonnaire,
Davide Ghio,
Kamesh Krishnamurthy,
Francesca Mignacco,
Atsushi Yamamura,
Giulio Biroli
Abstract:
In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.
In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.
△ Less
Submitted 10 November, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Generative diffusion in very large dimensions
Authors:
Giulio Biroli,
Marc Mézard
Abstract:
Generative models based on diffusion have become the state of the art in the last few years, notably for image generation. Here, we analyse them in the high-dimensional limit, where data are formed by a very large number of variables. We use methods from statistical physics and focus on two well-controlled high-dimensional cases: a Gaussian model and the Curie-Weiss model of ferromagnetism. In the…
▽ More
Generative models based on diffusion have become the state of the art in the last few years, notably for image generation. Here, we analyse them in the high-dimensional limit, where data are formed by a very large number of variables. We use methods from statistical physics and focus on two well-controlled high-dimensional cases: a Gaussian model and the Curie-Weiss model of ferromagnetism. In the latter case, we highlight the mechanism of symmetry breaking in the inverse diffusion, and point out that, in order to reconstruct the relative asymmetry of the two low-temperature states, and thus to obtain the correct probability weights, one needs a database with a number of points much larger than the dimension of each data point. We characterize the scaling laws in the number of data and in the number of dimensions for an efficient generation.
△ Less
Submitted 24 August, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Scaling Description of Dynamical Heterogeneity and Avalanches of Relaxation in Glass-Forming Liquids
Authors:
Ali Tahaei,
Giulio Biroli,
Misaki Ozawa,
Marko Popović,
Matthieu Wyart
Abstract:
We provide a theoretical description of dynamical heterogeneities in glass-forming liquids, based on the premise that relaxation occurs via local rearrangements coupled by elasticity. In our framework, the growth of the dynamical correlation length $ξ$ and of the correlation volume $χ_4$ are controlled by a zero-temperature fixed point. We connect this critical behavior to the properties of the di…
▽ More
We provide a theoretical description of dynamical heterogeneities in glass-forming liquids, based on the premise that relaxation occurs via local rearrangements coupled by elasticity. In our framework, the growth of the dynamical correlation length $ξ$ and of the correlation volume $χ_4$ are controlled by a zero-temperature fixed point. We connect this critical behavior to the properties of the distribution of local energy barriers at zero temperature. Our description makes a direct connection between dynamical heterogeneities and avalanche-type relaxation associated to dynamic facilitation, allowing us to relate the size distribution of heterogeneities to their time evolution. Within an avalanche, a local region relaxes multiple times, the more the larger is the avalanche. This property, related to the nature of the zero-temperature fixed point, directly leads to decoupling of particle diffusion and relaxation time (the so-called Stokes-Einstein violation). Our most salient predictions are tested and confirmed by numerical simulations of scalar and tensorial thermal elasto-plastic models.
△ Less
Submitted 3 August, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Quenched complexity of equilibria for asymmetric Generalized Lotka-Volterra equations
Authors:
Valentina Ros,
Felix Roy,
Giulio Biroli,
Guy Bunin
Abstract:
We consider the Generalized Lotka-Volterra system of equations with all-to-all, random asymmetric interactions describing high-dimensional, very diverse and well-mixed ecosystems. We analyze the multiple equilibria phase of the model and compute its quenched complexity, i.e., the expected value of the logarithm of the number of equilibria of the dynamical equations. We discuss the resulting distri…
▽ More
We consider the Generalized Lotka-Volterra system of equations with all-to-all, random asymmetric interactions describing high-dimensional, very diverse and well-mixed ecosystems. We analyze the multiple equilibria phase of the model and compute its quenched complexity, i.e., the expected value of the logarithm of the number of equilibria of the dynamical equations. We discuss the resulting distribution of equilibria as a function of their diversity, stability and average abundance. We obtain the quenched complexity by means of the replicated Kac-Rice formalism, and compare the results with the same quantity obtained within the annealed approximation, as well as with the results of the cavity calculation and, in the limit of symmetric interactions, of standard methods to compute the complexity developed in the context of glasses.
△ Less
Submitted 25 June, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Generalized Lotka-Volterra equations with random, non-reciprocal interactions: the typical number of equilibria
Authors:
Valentina Ros,
Felix Roy,
Giulio Biroli,
Guy Bunin,
Ari M. Turner
Abstract:
We compute the typical number of equilibria of the Generalized Lotka-Volterra equations describing species-rich ecosystems with random, non-reciprocal interactions using the replicated Kac-Rice method. We characterize the multiple-equilibria phase by determining the average abundance and similaritybetween equilibria as a function of their diversity (i.e. of the number of coexisting species) and of…
▽ More
We compute the typical number of equilibria of the Generalized Lotka-Volterra equations describing species-rich ecosystems with random, non-reciprocal interactions using the replicated Kac-Rice method. We characterize the multiple-equilibria phase by determining the average abundance and similaritybetween equilibria as a function of their diversity (i.e. of the number of coexisting species) and of the variability of the interactions. We show that linearly unstable equilibria are dominant, and that the typical number of equilibria differs with respect to the average number.
△ Less
Submitted 25 June, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Predicting dynamic heterogeneity in glass-forming liquids by physics-inspired machine learning
Authors:
Gerhard Jung,
Giulio Biroli,
Ludovic Berthier
Abstract:
We introduce GlassMLP, a machine learning framework using physics-inspired structural input to predict the long-time dynamics in deeply supercooled liquids. We apply this deep neural network to atomistic models in 2D and 3D. Its performance is better than the state of the art while being more parsimonious in terms of training data and fitting parameters. GlassMLP quantitatively predicts four-point…
▽ More
We introduce GlassMLP, a machine learning framework using physics-inspired structural input to predict the long-time dynamics in deeply supercooled liquids. We apply this deep neural network to atomistic models in 2D and 3D. Its performance is better than the state of the art while being more parsimonious in terms of training data and fitting parameters. GlassMLP quantitatively predicts four-point dynamic correlations and the geometry of dynamic heterogeneity. Transferability across system sizes allows us to efficiently probe the temperature evolution of spatial dynamic correlations, revealing a profound change with temperature in the geometry of rearranging regions.
△ Less
Submitted 28 September, 2023; v1 submitted 29 October, 2022;
originally announced October 2022.
-
Elasticity, Facilitation and Dynamic Heterogeneity in Glass-Forming liquids
Authors:
Misaki Ozawa,
Giulio Biroli
Abstract:
We study the role of elasticity-induced facilitation on the dynamics of glass-forming liquids by a coarse-grained two-dimensional model in which local relaxation events, taking place by thermal activation, can trigger new relaxations by long-range elastically-mediated interactions. By simulations and an analytical theory, we show that the model reproduces the main salient facts associated with dyn…
▽ More
We study the role of elasticity-induced facilitation on the dynamics of glass-forming liquids by a coarse-grained two-dimensional model in which local relaxation events, taking place by thermal activation, can trigger new relaxations by long-range elastically-mediated interactions. By simulations and an analytical theory, we show that the model reproduces the main salient facts associated with dynamic heterogeneity and offers a mechanism to explain the emergence of dynamical correlations at the glass transition. We also discuss how it can be generalized and combined with current theories.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Dynamical Heterogeneity in Glass-Forming Liquids
Authors:
Giulio Biroli,
Kunimasa Miyazaki,
David R. Reichman
Abstract:
We review the phenomena of dynamical heterogeneity in glass-forming systems and its description within replica and mean-field theories of the glass transition.
We review the phenomena of dynamical heterogeneity in glass-forming systems and its description within replica and mean-field theories of the glass transition.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
The RFOT Theory of Glasses: Recent Progress and Open Issues
Authors:
Giulio Biroli,
Jean-Philippe Bouchaud
Abstract:
The Random First Order Transition (RFOT) theory started with the pioneering work of Kirkpatrick, Thirumalai and Wolynes. It leverages the methods and advances of the theory of disordered systems. It fares remarkably well at reproducing the salient experimental facts of super-cooled liquids. Yet, direct and indisputable experimental validations are missing. In this short survey, we will review rece…
▽ More
The Random First Order Transition (RFOT) theory started with the pioneering work of Kirkpatrick, Thirumalai and Wolynes. It leverages the methods and advances of the theory of disordered systems. It fares remarkably well at reproducing the salient experimental facts of super-cooled liquids. Yet, direct and indisputable experimental validations are missing. In this short survey, we will review recent investigations that broadly support all static aspects of RFOT, but also those for which the standard dynamical extension of the theory appears to be struggling, in particular in relation with facilitation effects. We discuss possible solutions and open issues.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Wavelet Conditional Renormalization Group
Authors:
Tanguy Marchand,
Misaki Ozawa,
Giulio Biroli,
Stéphane Mallat
Abstract:
We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Re…
▽ More
We develop a multiscale approach to estimate high-dimensional probability distributions from a dataset of physical fields or configurations observed in experiments or simulations. In this way we can estimate energy functions (or Hamiltonians) and efficiently generate new samples of many-body systems in various domains, from statistical physics to cosmology. Our method -- the Wavelet Conditional Renormalization Group (WC-RG) -- proceeds scale by scale, estimating models for the conditional probabilities of "fast degrees of freedom" conditioned by coarse-grained fields. These probability distributions are modeled by energy functions associated with scale interactions, and are represented in an orthogonal wavelet basis. WC-RG decomposes the microscopic energy function as a sum of interaction energies at all scales and can efficiently generate new samples by going from coarse to fine scales. Near phase transitions, it avoids the "critical slowing down" of direct estimation and sampling algorithms. This is explained theoretically by combining results from RG and wavelet theories, and verified numerically for the Gaussian and $\varphi^4$ field theories. We show that multiscale WC-RG energy-based models are more general than local potential models and can capture the physics of complex many-body interacting systems at all length scales. This is demonstrated for weak-gravitational-lensing fields reflecting dark matter distributions in cosmology, which include long-range interactions with long-tail probability distributions. WC-RG has a large number of potential applications in non-equilibrium systems, where the underlying distribution is not known {\it a priori}. Finally, we discuss the connection between WC-RG and deep network architectures.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Finite-disorder critical point in the yielding transition of elasto-plastic models
Authors:
Saverio Rossi,
Giulio Biroli,
Misaki Ozawa,
Gilles Tarjus,
Francesco Zamponi
Abstract:
Upon loading, amorphous solids can exhibit brittle yielding, with the abrupt formation of macroscopic shear bands leading to fracture, or ductile yielding, with a multitude of plastic events leading to homogeneous flow. It has been recently proposed, and subsequently questioned, that the two regimes are separated by a sharp critical point, as a function of some control parameter characterizing the…
▽ More
Upon loading, amorphous solids can exhibit brittle yielding, with the abrupt formation of macroscopic shear bands leading to fracture, or ductile yielding, with a multitude of plastic events leading to homogeneous flow. It has been recently proposed, and subsequently questioned, that the two regimes are separated by a sharp critical point, as a function of some control parameter characterizing the intrinsic disorder strength and the degree of stability of the solid. In order to resolve this issue, we have performed extensive numerical simulations of athermally driven elasto-plastic models with long-range and anisotropic realistic interaction kernels in two and three dimensions. Our results provide clear evidence for a finite-disorder critical point separating brittle and ductile yielding, and we provide an estimate of the critical exponents in 2D and 3D.
△ Less
Submitted 24 November, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Equilibrium Fluctuations in Mean-field Disordered Models
Authors:
Giampaolo Folena,
Giulio Biroli,
Patrick Charbonneau,
Yi Hu,
Francesco Zamponi
Abstract:
Mean-field models of glasses that present a random first order transition exhibit highly non-trivial fluctuations. Building on previous studies that focused on the critical scaling regime, we here obtain a fully quantitative framework for all equilibrium conditions. By means of the replica method we evaluate Gaussian fluctuations of the overlaps around the thermodynamic limit, decomposing them in…
▽ More
Mean-field models of glasses that present a random first order transition exhibit highly non-trivial fluctuations. Building on previous studies that focused on the critical scaling regime, we here obtain a fully quantitative framework for all equilibrium conditions. By means of the replica method we evaluate Gaussian fluctuations of the overlaps around the thermodynamic limit, decomposing them in thermal fluctuations inside each state and heterogeneous fluctuations between different states. We first test and compare our analytical results with numerical simulation results for the p-spin spherical model and the random orthogonal model, and then analyze the random Lorentz gas. In all cases, a strong quantitative agreement is obtained. Our analysis thus provides a robust scheme for identifying the key finite-size (or finite-dimensional) corrections to the mean-field treatment of these paradigmatic glass models.
△ Less
Submitted 23 June, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Authors:
Stéphane d'Ascoli,
Maria Refinetti,
Giulio Biroli
Abstract:
Learning rate schedules are ubiquitously used to speed up and improve optimisation. Many different policies have been introduced on an empirical basis, and theoretical analyses have been developed for convex settings. However, in many realistic problems the loss-landscape is high-dimensional and non convex -- a case for which results are scarce. In this paper we present a first analytical study of…
▽ More
Learning rate schedules are ubiquitously used to speed up and improve optimisation. Many different policies have been introduced on an empirical basis, and theoretical analyses have been developed for convex settings. However, in many realistic problems the loss-landscape is high-dimensional and non convex -- a case for which results are scarce. In this paper we present a first analytical study of the role of learning rate scheduling in this setting, focusing on Langevin optimization with a learning rate decaying as $η(t)=t^{-β}$. We begin by considering models where the loss is a Gaussian random function on the $N$-dimensional sphere ($N\rightarrow \infty$), featuring an extensive number of critical points. We find that to speed up optimization without getting stuck in saddles, one must choose a decay rate $β<1$, contrary to convex setups where $β=1$ is generally optimal. We then add to the problem a signal to be recovered. In this setting, the dynamics decompose into two phases: an \emph{exploration} phase where the dynamics navigates through rough parts of the landscape, followed by a \emph{convergence} phase where the signal is detected and the dynamics enter a convex basin. In this case, it is optimal to keep a large learning rate during the exploration phase to escape the non-convex region as quickly as possible, then use the convex criterion $β=1$ to converge rapidly to the solution. Finally, we demonstrate that our conclusions hold in a common regression task involving neural networks.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Artificial selection of communities drives the emergence of structured interactions
Authors:
Jules Fraboul,
Giulio Biroli,
Silvia De Monte
Abstract:
Species-rich communities, such as the microbiota or microbial ecosystems, provide key functions for human health and climatic resilience. Increasing effort is being dedicated to design experimental protocols for selecting community-level functions of interest. These experiments typically involve selection acting on populations of communities, each of which is composed of multiple species. If numer…
▽ More
Species-rich communities, such as the microbiota or microbial ecosystems, provide key functions for human health and climatic resilience. Increasing effort is being dedicated to design experimental protocols for selecting community-level functions of interest. These experiments typically involve selection acting on populations of communities, each of which is composed of multiple species. If numerical simulations started to explore the evolutionary dynamics of this complex, multi-scale system, a comprehensive theoretical understanding of the process of artificial selection of communities is still lacking. Here, we propose a general model for the evolutionary dynamics of communities composed of a large number of interacting species, described by disordered generalised Lotka-Volterra equations. Our analytical and numerical results reveal that selection for scalar community functions leads to the emergence, along an evolutionary trajectory, of a low-dimensional structure in an initially featureless interaction matrix. Such structure reflects the combination of the properties of the ancestral community and of the selective pressure. Our analysis determines how the speed of adaptation scales with the system parameters and the abundance distribution of the evolved communities. Artificial selection for larger total abundance is thus shown to drive increased levels of mutualism and interaction diversity. Inference of the interaction matrix is proposed as a method to assess the emergence of structured interactions from experimentally accessible measures.
△ Less
Submitted 21 June, 2023; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Critical behavior of the Anderson model on the Bethe lattice via a large-deviation approach
Authors:
Giulio Biroli,
Alexander K. Hartmann,
Marco Tarzia
Abstract:
We present a new large-deviation approach to investigate the critical properties of the Anderson model on the Bethe lattice close to the localization transition in the thermodynamic limit. Our method allows us to study accurately the distribution of the local density of states (LDoS) down to very small probability tails as small as $10^{-50}$ which are completely out of reach for standard numerica…
▽ More
We present a new large-deviation approach to investigate the critical properties of the Anderson model on the Bethe lattice close to the localization transition in the thermodynamic limit. Our method allows us to study accurately the distribution of the local density of states (LDoS) down to very small probability tails as small as $10^{-50}$ which are completely out of reach for standard numerical techniques. We perform a thorough analysis of the functional form and of the tails of the probability distribution of the LDoS which yields for the first time a direct, transparent, and precise estimation of the correlation volume close to the Anderson transition. Such correlation volume is found to diverge exponentially when the localization is approached from the delocalized regime, in a singular way that is in agreement with the analytic predictions of the supersymmetric treatment.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Local dynamical heterogeneity in glass formers
Authors:
Giulio Biroli,
Patrick Charbonneau,
Giampaolo Folena,
Yi Hu,
Francesco Zamponi
Abstract:
We study the local dynamical fluctuations in glass-forming models of particles embedded in $d$-dimensional space, in the mean-field limit of $d\to\infty$. Our analytical calculation reveals that single-particle observables, such as squared particle displacements, display divergent fluctuations around the dynamical (or mode-coupling) transition, due to the emergence of nontrivial correlations betwe…
▽ More
We study the local dynamical fluctuations in glass-forming models of particles embedded in $d$-dimensional space, in the mean-field limit of $d\to\infty$. Our analytical calculation reveals that single-particle observables, such as squared particle displacements, display divergent fluctuations around the dynamical (or mode-coupling) transition, due to the emergence of nontrivial correlations between displacements along different directions. This effect notably gives rise to a divergent non-Gaussian parameter, $α_2$. The $d\to\infty$ local dynamics therefore becomes quite rich upon approaching the glass transition. The finite-$d$ remnant of this phenomenon further provides a long sought-after, first-principle explanation for the growth of $α_2$ around the glass transition that is \emph{not based on multi-particle correlations}.
△ Less
Submitted 6 March, 2022; v1 submitted 24 September, 2021;
originally announced September 2021.
-
On the Dynamics of Liquids in the Large-Dimensional Limit
Authors:
Chen Liu,
Giulio Biroli,
David Reichman,
Grzegorz Szamel
Abstract:
In this work, we analytically derive the exact closed dynamical equations for a liquid with short-ranged interactions in large spatial dimensions using the same statistical mechanics tools employed to analyze Brownian motion. Our derivation greatly simplifies the original path-integral-based route to these equations and provides new insight into the physical features associated with high-dimension…
▽ More
In this work, we analytically derive the exact closed dynamical equations for a liquid with short-ranged interactions in large spatial dimensions using the same statistical mechanics tools employed to analyze Brownian motion. Our derivation greatly simplifies the original path-integral-based route to these equations and provides new insight into the physical features associated with high-dimensional liquids and glass formation. Most importantly, our construction provides a facile route to the exact dynamical analysis of important related dynamical problems, as well as a means to devise cluster generalizations of the exact solution in infinite dimensions. This latter fact opens the door to the construction of increasingly accurate theories of vitrification in three-dimensional liquids.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
Authors:
Stéphane d'Ascoli,
Levent Sagun,
Giulio Biroli,
Ari Morcos
Abstract:
Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a strong computational bottleneck, especially at large spatial resolutions. In this work, we explore the idea of reducing the time spent training these layers by in…
▽ More
Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a strong computational bottleneck, especially at large spatial resolutions. In this work, we explore the idea of reducing the time spent training these layers by initializing them as convolutional layers. This enables us to transition smoothly from any pre-trained CNN to its functionally identical hybrid model, called Transformed CNN (T-CNN). With only 50 epochs of fine-tuning, the resulting T-CNNs demonstrate significant performance gains over the CNN (+2.2% top-1 on ImageNet-1k for a ResNet50-RS) as well as substantially improved robustness (+11% top-1 on ImageNet-C). We analyze the representations learnt by the T-CNN, providing deeper insights into the fruitful interplay between convolutions and self-attention. Finally, we experiment initializing the T-CNN from a partially trained CNN, and find that it reaches better performance than the corresponding hybrid model trained from scratch, while reducing training time.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Effects of intraspecific cooperative interactions in large ecosystems
Authors:
Ada Altieri,
Giulio Biroli
Abstract:
We analyze the role of the Allee effect, a positive correlation between population density and mean individual fitness, for ecological communities formed by a large number of species. Our study is performed using the generalized Lotka-Volterra model with random interactions between species. We obtain the phase diagram and analyze the nature of the multiple equilibria phase. Remarkable differences…
▽ More
We analyze the role of the Allee effect, a positive correlation between population density and mean individual fitness, for ecological communities formed by a large number of species. Our study is performed using the generalized Lotka-Volterra model with random interactions between species. We obtain the phase diagram and analyze the nature of the multiple equilibria phase. Remarkable differences emerge with respect to the case of the logistic growth case, thus revealing the major role played by the functional response in determining aggregate behaviours of ecosystems.
△ Less
Submitted 20 May, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Sifting out the features by pruning: Are convolutional networks the winning lottery ticket of fully connected ones?
Authors:
Franco Pellegrini,
Giulio Biroli
Abstract:
Pruning methods can considerably reduce the size of artificial neural networks without harming their performance. In some cases, they can even uncover sub-networks that, when trained in isolation, match or surpass the test accuracy of their dense counterparts. Here we study the inductive bias that pruning imprints in such "winning lottery tickets". Focusing on visual tasks, we analyze the architec…
▽ More
Pruning methods can considerably reduce the size of artificial neural networks without harming their performance. In some cases, they can even uncover sub-networks that, when trained in isolation, match or surpass the test accuracy of their dense counterparts. Here we study the inductive bias that pruning imprints in such "winning lottery tickets". Focusing on visual tasks, we analyze the architecture resulting from iterative magnitude pruning of a simple fully connected network (FCN). We show that the surviving node connectivity is local in input space, and organized in patterns reminiscent of the ones found in convolutional networks (CNN). We investigate the role played by data and tasks in shaping the architecture of pruned sub-networks. Our results show that the winning lottery tickets of FCNs display the key features of CNNs. The ability of such automatic network-simplifying procedure to recover the key features "hand-crafted" in the design of CNNs suggests interesting applications to other datasets and tasks, in order to discover new and efficient architectural inductive biases.
△ Less
Submitted 14 May, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Authors:
Stéphane d'Ascoli,
Hugo Touvron,
Matthew Leavitt,
Ari Morcos,
Giulio Biroli,
Levent Sagun
Abstract:
Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external…
▽ More
Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external datasets or distillation from pre-trained convolutional networks. In this paper, we ask the following question: is it possible to combine the strengths of these two architectures while avoiding their respective limitations? To this end, we introduce gated positional self-attention (GPSA), a form of positional self-attention which can be equipped with a ``soft" convolutional inductive bias. We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information. The resulting convolutional-like ViT architecture, ConViT, outperforms the DeiT on ImageNet, while offering a much improved sample efficiency. We further investigate the role of locality in learning by first quantifying how it is encouraged in vanilla self-attention layers, then analysing how it is escaped in GPSA layers. We conclude by presenting various ablations to better understand the success of the ConViT. Our code and models are released publicly at https://github.com/facebookresearch/convit.
△ Less
Submitted 10 June, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
Revisiting the Concept of Activation in Supercooled Liquids
Authors:
Marco Baity-Jesi,
Giulio Biroli,
David R. Reichman
Abstract:
In this work we revisit the description of dynamics based on the concepts of metabasins and activation in mildly supercooled liquids via the analysis of the dynamics of a paradigmatic glass former between its onset temperature $T_{o}$ and mode-coupling temperature $T_{c}$. First, we provide measures that demonstrate that the onset of glassiness is indeed connected to the landscape, and that metaba…
▽ More
In this work we revisit the description of dynamics based on the concepts of metabasins and activation in mildly supercooled liquids via the analysis of the dynamics of a paradigmatic glass former between its onset temperature $T_{o}$ and mode-coupling temperature $T_{c}$. First, we provide measures that demonstrate that the onset of glassiness is indeed connected to the landscape, and that metabasin waiting time distributions are so broad that the system can remain stuck in a metabasin for times that exceed $τ_α$ by orders of magnitude. We then reanalyze the transitions between metabasins, providing several indications that the standard picture of activated dynamics in terms of traps does not hold in this regime. Instead, we propose that here activation is principally driven by entropic instead of energetic barriers. In particular, we illustrate that activation is not controlled by the hopping of high energetic barriers, and should more properly be interpreted as the entropic selection of nearly barrierless but rare pathways connecting metabasins on the landscape.
△ Less
Submitted 1 June, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
On the interplay between data structure and loss function in classification problems
Authors:
Stéphane d'Ascoli,
Marylou Gabrié,
Levent Sagun,
Giulio Biroli
Abstract:
One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well. Although the low-dimensional structure of typical datasets is key to this behavior, most theoretical studies of overparametrization focus on isotropic inputs. In this work, we instead consider an analytically tractable model of structured data, where the input covariance is b…
▽ More
One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well. Although the low-dimensional structure of typical datasets is key to this behavior, most theoretical studies of overparametrization focus on isotropic inputs. In this work, we instead consider an analytically tractable model of structured data, where the input covariance is built from independent blocks allowing us to tune the saliency of low-dimensional structures and their alignment with respect to the target function. Using methods from statistical physics, we derive a precise asymptotic expression for the train and test error achieved by random feature models trained to classify such data, which is valid for any convex loss function. We study in detail how the data structure affects the double descent curve, and show that in the over-parametrized regime, its impact is greater for logistic loss than for mean-squared loss: the easier the task, the wider the gap in performance at the advantage of the logistic loss. Our insights are confirmed by numerical experiments on MNIST and CIFAR10.
△ Less
Submitted 12 October, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Elastoplasticity Mediates Dynamical Heterogeneity Below the Mode-Coupling Temperature
Authors:
Rahul N. Chacko,
François P. Landes,
Giulio Biroli,
Olivier Dauchot,
Andrea J. Liu,
David R. Reichman
Abstract:
As liquids approach the glass transition temperature, dynamical heterogeneity emerges as a crucial universal feature of their behavior. Dynamic facilitation, where local motion triggers further motion nearby, plays a major role in this phenomenon. Here we show that long-range, elastically-mediated facilitation appears below the mode-coupling temperature, adding to the short-range component present…
▽ More
As liquids approach the glass transition temperature, dynamical heterogeneity emerges as a crucial universal feature of their behavior. Dynamic facilitation, where local motion triggers further motion nearby, plays a major role in this phenomenon. Here we show that long-range, elastically-mediated facilitation appears below the mode-coupling temperature, adding to the short-range component present at all temperatures. Our results suggest deep connections between the supercooled liquid and glass states, and pave the way for a deeper understanding of dynamical heterogeneity in glassy systems.
△ Less
Submitted 4 March, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Mean-field caging in a random Lorentz gas
Authors:
Giulio Biroli,
Patrick Charbonneau,
Yi Hu,
Harukuni Ikeda,
Grzegorz Szamel,
Francesco Zamponi
Abstract:
The random Lorentz gas (RLG) is a minimal model of both percolation and glassiness, which leads to a paradox in the infinite-dimensional, $d\rightarrow\infty$ limit: the localization transition is then expected to be continuous for the former and discontinuous for the latter. As a putative resolution, we have recently suggested that as $d$ increases the behavior of the RLG converges to the glassy…
▽ More
The random Lorentz gas (RLG) is a minimal model of both percolation and glassiness, which leads to a paradox in the infinite-dimensional, $d\rightarrow\infty$ limit: the localization transition is then expected to be continuous for the former and discontinuous for the latter. As a putative resolution, we have recently suggested that as $d$ increases the behavior of the RLG converges to the glassy description, and that percolation physics is recovered thanks to finite-$d$ perturbative and non-perturbative (instantonic) corrections [Biroli et al. arXiv:2003.11179]. Here, we expand on the $d\rightarrow\infty$ physics by considering a simpler static solution as well as the dynamical solution of the RLG. Comparing the $1/d$ correction of this solution with numerical results reveals that even perturbative corrections fall out of reach of existing theoretical descriptions. Comparing the dynamical solution with the mode-coupling theory (MCT) results further reveals that although key quantitative features of MCT are far off the mark, it does properly capture the discontinuous nature of the $d\rightarrow\infty$ RLG. These insights help chart a path toward a complete description of finite-dimensional glasses.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Rare events and disorder control the brittle yielding of well-annealed amorphous solids
Authors:
Misaki Ozawa,
Ludovic Berthier,
Giulio Biroli,
Gilles Tarjus
Abstract:
We use atomistic computer simulations to provide a microscopic description of the brittle failure of amorphous materials, and we assess the role of rare events and quenched disorder. We argue that brittle yielding originates at rare soft regions, similarly to Griffiths effects in disordered systems. We numerically demonstrate how localized plastic events in such soft regions trigger macroscopic fa…
▽ More
We use atomistic computer simulations to provide a microscopic description of the brittle failure of amorphous materials, and we assess the role of rare events and quenched disorder. We argue that brittle yielding originates at rare soft regions, similarly to Griffiths effects in disordered systems. We numerically demonstrate how localized plastic events in such soft regions trigger macroscopic failure via the propagation of a shear band. This physical picture, which no longer holds in poorly annealed ductile materials, allows us to discuss the role of finite size effects in brittle yielding and reinforces the similarities between yielding and other disorder-controlled nonequilibrium phase transitions.
△ Less
Submitted 11 July, 2022; v1 submitted 10 February, 2021;
originally announced February 2021.