-
Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training
Authors:
Tony Bonnaire,
Raphaël Urfin,
Giulio Biroli,
Marc Mézard
Abstract:
Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify…
▽ More
Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early time $τ_\mathrm{gen}$ at which models begin to generate high-quality samples, and a later time $τ_\mathrm{mem}$ beyond which memorization emerges. Crucially, we find that $τ_\mathrm{mem}$ increases linearly with the training set size $n$, while $τ_\mathrm{gen}$ remains constant. This creates a growing window of training times with $n$ where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allow to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms
Authors:
Krunoslav Lehman Pavasovic,
Jakob Verbeek,
Giulio Biroli,
Marc Mezard
Abstract:
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distr…
▽ More
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distribution sharper than the target one, more shifted towards the boundary of the class. In this work, we provide a high-dimensional analysis of CFG, showing that these distortions vanish as the data dimension grows. We present a blessing-of-dimensionality result demonstrating that in sufficiently high and infinite dimensions, CFG accurately reproduces the target distribution. Using our high-dimensional theory, we show that there is a large family of guidances enjoying this property, in particular non-linear CFG generalizations. We study a simple non-linear power-law version, for which we demonstrate improved robustness, sample fidelity and diversity. Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.
△ Less
Submitted 22 May, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Optimizing Noise Schedules of Generative Models in High Dimensionss
Authors:
Santiago Aranguri,
Giulio Biroli,
Marc Mezard,
Eric Vanden-Eijnden
Abstract:
Recent works have shown that diffusion models can undergo phase transitions, the resolution of which is needed for accurately generating samples. This has motivated the use of different noise schedules, the two most common choices being referred to as variance preserving (VP) and variance exploding (VE). Here we revisit these schedules within the framework of stochastic interpolants. Using the Gau…
▽ More
Recent works have shown that diffusion models can undergo phase transitions, the resolution of which is needed for accurately generating samples. This has motivated the use of different noise schedules, the two most common choices being referred to as variance preserving (VP) and variance exploding (VE). Here we revisit these schedules within the framework of stochastic interpolants. Using the Gaussian Mixture (GM) and Curie-Weiss (CW) data distributions as test case models, we first investigate the effect of the variance of the initial noise distribution and show that VP recovers the low-level feature (the distribution of each mode) but misses the high-level feature (the asymmetry between modes), whereas VE performs oppositely. We also show that this dichotomy, which happens when denoising by a constant amount in each step, can be avoided by using noise schedules specific to VP and VE that allow for the recovery of both high- and low-level features. Finally we show that these schedules yield generative models for the GM and CW model whose probability flow ODE can be discretized using $Θ_d(1)$ steps in dimension $d$ instead of the $Θ_d(\sqrt{d})$ steps required by constant denoising.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
How transformers learn structured data: insights from hierarchical filtering
Authors:
Jerome Garnier-Brun,
Marc Mézard,
Emanuele Moscato,
Luca Saglietti
Abstract:
Understanding the learning process and the embedded computation in transformers is becoming a central goal for the development of interpretable AI. In the present study, we introduce a hierarchical filtering procedure for data models of sequences on trees, allowing us to hand-tune the range of positional correlations in the data. Leveraging this controlled setting, we provide evidence that vanilla…
▽ More
Understanding the learning process and the embedded computation in transformers is becoming a central goal for the development of interpretable AI. In the present study, we introduce a hierarchical filtering procedure for data models of sequences on trees, allowing us to hand-tune the range of positional correlations in the data. Leveraging this controlled setting, we provide evidence that vanilla encoder-only transformers can approximate the exact inference algorithm when trained on root classification and masked language modeling tasks, and study how this computation is discovered and implemented. We find that correlations at larger distances, corresponding to increasing layers of the hierarchy, are sequentially included by the network during training. By comparing attention maps from models trained with varying degrees of filtering and by probing the different encoder levels, we find clear evidence of a reconstruction of correlations on successive length scales corresponding to the various levels of the hierarchy, which we relate to a plausible implementation of the exact inference algorithm within the same architecture.
△ Less
Submitted 10 June, 2025; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Kernel Density Estimators in Large Dimensions
Authors:
Giulio Biroli,
Marc Mézard
Abstract:
This paper studies Kernel Density Estimation for a high-dimensional distribution $ρ(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $α=(\log n)/d$. Our study reveals three distinct statistical regimes for…
▽ More
This paper studies Kernel Density Estimation for a high-dimensional distribution $ρ(x)$. Traditional approaches have focused on the limit of large number of data points $n$ and fixed dimension $d$. We analyze instead the regime where both the number $n$ of data points $y_i$ and their dimensionality $d$ grow with a fixed ratio $α=(\log n)/d$. Our study reveals three distinct statistical regimes for the kernel-based estimate of the density $\hat ρ_h^{\mathcal {D}}(x)=\frac{1}{n h^d}\sum_{i=1}^n K\left(\frac{x-y_i}{h}\right)$, depending on the bandwidth $h$: a classical regime for large bandwidth where the Central Limit Theorem (CLT) holds, which is akin to the one found in traditional approaches. Below a certain value of the bandwidth, $h_{CLT}(α)$, we find that the CLT breaks down. The statistics of $\hatρ_h^{\mathcal {D}}(x)$ for a fixed $x$ drawn from $ρ(x)$ is given by a heavy-tailed distribution (an alpha-stable distribution). In particular below a value $h_G(α)$, we find that $\hatρ_h^{\mathcal {D}}(x)$ is governed by extreme value statistics: only a few points in the database matter and give the dominant contribution to the density estimator. We provide a detailed analysis for high-dimensional multivariate Gaussian data. We show that the optimal bandwidth threshold based on Kullback-Leibler divergence lies in the new statistical regime identified in this paper. As known by practitioners, when decreasing the bandwidth a Kernel-estimated estimated changes from a smooth curve to a collections of peaks centred on the data points. Our findings reveal that this general phenomenon is related to sharp transitions between phases characterized by different statistical properties, and offer new insights for Kernel density estimation in high-dimensional settings.
△ Less
Submitted 18 October, 2024; v1 submitted 11 August, 2024;
originally announced August 2024.
-
Dynamical Regimes of Diffusion Models
Authors:
Giulio Biroli,
Tony Bonnaire,
Valentin de Bortoli,
Marc Mézard
Abstract:
Using statistical physics methods, we study generative diffusion models in the regime where the dimension of space and the number of data are large, and the score function has been trained optimally. Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process. The generative dynamics, starting from pure noise, encounters first a 'speciation' transition wh…
▽ More
Using statistical physics methods, we study generative diffusion models in the regime where the dimension of space and the number of data are large, and the score function has been trained optimally. Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process. The generative dynamics, starting from pure noise, encounters first a 'speciation' transition where the gross structure of data is unraveled, through a mechanism similar to symmetry breaking in phase transitions. It is followed at later time by a 'collapse' transition where the trajectories of the dynamics become attracted to one of the memorized data points, through a mechanism which is similar to the condensation in a glass phase. For any dataset, the speciation time can be found from a spectral analysis of the correlation matrix, and the collapse time can be found from the estimation of an 'excess entropy' in the data. The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models. Analytical solutions for simple models like high-dimensional Gaussian mixtures substantiate these findings and provide a theoretical framework, while extensions to more complex scenarios and numerical validations with real datasets confirm the theoretical predictions.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Eigenvector Dreaming
Authors:
Marco Benedetti,
Louis Carillo,
Enzo Marinari,
Marc Mèzard
Abstract:
Among the performance-enhancing procedures for Hopfield-type networks that implement associative memory, Hebbian Unlearning (or dreaming) strikes for its simplicity and its clear biological interpretation. Yet, it does not easily lend itself to a clear analytical understanding. Here we show how Hebbian Unlearning can be effectively described in terms of a simple evolution of the spectrum and the e…
▽ More
Among the performance-enhancing procedures for Hopfield-type networks that implement associative memory, Hebbian Unlearning (or dreaming) strikes for its simplicity and its clear biological interpretation. Yet, it does not easily lend itself to a clear analytical understanding. Here we show how Hebbian Unlearning can be effectively described in terms of a simple evolution of the spectrum and the eigenvectors of the coupling matrix. We use these ideas to design new dreaming algorithms that are effective from a computational point of view, and are analytically far more transparent than the original scheme.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
The Decimation Scheme for Symmetric Matrix Factorization
Authors:
Francesco Camilli,
Marc Mézard
Abstract:
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications that go from dictionary learning to recommendation systems and machine learning with deep networks. The study of its fundamental statistical limits represents a true challenge, and despite a decade-long history of efforts in the community, there is still no closed formula able to describ…
▽ More
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications that go from dictionary learning to recommendation systems and machine learning with deep networks. The study of its fundamental statistical limits represents a true challenge, and despite a decade-long history of efforts in the community, there is still no closed formula able to describe its optimal performances in the case where the rank of the matrix scales linearly with its size. In the present paper, we study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced, and carry out a thorough study of its performance. Decimation aims at recovering one column/line of the factors at a time, by mapping the problem into a sequence of neural network models of associative memory at a tunable temperature. Though being sub-optimal, decimation has the advantage of being theoretically analyzable. We extend its scope and analysis to two families of matrices. For a large class of compactly supported priors, we show that the replica symmetric free entropy of the neural network models takes a universal form in the low temperature limit. For sparse Ising prior, we show that the storage capacity of the neural network models diverges as sparsity in the patterns increases, and we introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization, with no need of an informative initialization.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Sparse Representations, Inference and Learning
Authors:
Clarissa Lauditi,
Emanuele Troiani,
Marc Mézard
Abstract:
In recent years statistical physics has proven to be a valuable tool to probe into large dimensional inference problems such as the ones occurring in machine learning. Statistical physics provides analytical tools to study fundamental limitations in their solutions and proposes algorithms to solve individual instances. In these notes, based on the lectures by Marc Mézard in 2022 at the summer scho…
▽ More
In recent years statistical physics has proven to be a valuable tool to probe into large dimensional inference problems such as the ones occurring in machine learning. Statistical physics provides analytical tools to study fundamental limitations in their solutions and proposes algorithms to solve individual instances. In these notes, based on the lectures by Marc Mézard in 2022 at the summer school in Les Houches, we will present a general framework that can be used in a large variety of problems with weak long-range interactions, including the compressed sensing problem, or the problem of learning in a perceptron. We shall see how these problems can be studied at the replica symmetric level, using developments of the cavity methods, both as a theoretical tool and as an algorithm.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
The Exponential Capacity of Dense Associative Memories
Authors:
Carlo Lucibello,
Marc Mézard
Abstract:
Recent generalizations of the Hopfield model of associative memories are able to store a number $P$ of random patterns that grows exponentially with the number $N$ of neurons, $P=\exp(αN)$. Besides the huge storage capacity, another interesting feature of these networks is their connection to the attention mechanism which is part of the Transformer architectures widely applied in deep learning. In…
▽ More
Recent generalizations of the Hopfield model of associative memories are able to store a number $P$ of random patterns that grows exponentially with the number $N$ of neurons, $P=\exp(αN)$. Besides the huge storage capacity, another interesting feature of these networks is their connection to the attention mechanism which is part of the Transformer architectures widely applied in deep learning. In this work, we study a generic family of pattern ensembles using a statistical mechanics analysis which gives exact asymptotic thresholds for the retrieval of a typical pattern, $α_1$, and lower bounds for the maximum of the load $α$ for which all patterns can be retrieved, $α_c$, as well as sizes of attraction basins. We discuss in detail the cases of Gaussian and spherical patterns, and show that they display rich and qualitatively different phase diagrams.
△ Less
Submitted 22 January, 2024; v1 submitted 28 April, 2023;
originally announced April 2023.
-
Matrix factorization with neural networks
Authors:
Francesco Camilli,
Marc Mézard
Abstract:
Matrix factorization is an important mathematical problem encountered in the context of dictionary learning, recommendation systems and machine learning. We introduce a new `decimation' scheme that maps it to neural network models of associative memory and provide a detailed theoretical analysis of its performance, showing that decimation is able to factorize extensive-rank matrices and to denoise…
▽ More
Matrix factorization is an important mathematical problem encountered in the context of dictionary learning, recommendation systems and machine learning. We introduce a new `decimation' scheme that maps it to neural network models of associative memory and provide a detailed theoretical analysis of its performance, showing that decimation is able to factorize extensive-rank matrices and to denoise them efficiently. We introduce a decimation algorithm based on ground-state search of the neural network, which shows performances that match the theoretical prediction.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising
Authors:
Antoine Maillard,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise f…
▽ More
Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.
△ Less
Submitted 8 June, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
Learning curves of generic features maps for realistic datasets with a teacher-student model
Authors:
Bruno Loureiro,
Cédric Gerbelot,
Hugo Cui,
Sebastian Goldt,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalis…
▽ More
Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.
△ Less
Submitted 14 December, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Epidemic mitigation by statistical inference from contact tracing data
Authors:
Antoine Baker,
Indaco Biazzo,
Alfredo Braunstein,
Giovanni Catania,
Luca Dall'Asta,
Alessandro Ingrosso,
Florent Krzakala,
Fabio Mazza,
Marc Mézard,
Anna Paola Muntoni,
Maria Refinetti,
Stefano Sarao Mannelli,
Lenka Zdeborová
Abstract:
Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing th…
▽ More
Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing their performance and assessing their impact on the mitigation of the epidemic. We develop Bayesian inference methods to estimate the risk that an individual is infected. This inference is based on the list of his recent contacts and their own risk levels, as well as personal information such as results of tests or presence of syndromes. We propose to use probabilistic risk estimation in order to optimize testing and quarantining strategies for the control of an epidemic. Our results show that in some range of epidemic spreading (typically when the manual tracing of all contacts of infected people becomes practically impossible, but before the fraction of infected people reaches the scale where a lock-down becomes unavoidable), this inference of individuals at risk could be an efficient way to mitigate the epidemic. Our approaches translate into fully distributed algorithms that only require communication between individuals who have recently been in contact. Such communication may be encrypted and anonymized and thus compatible with privacy preserving standards. We conclude that probabilistic risk estimation is capable to enhance performance of digital contact tracing and should be considered in the currently developed mobile applications.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
The Gaussian equivalence of generative models for learning with shallow neural networks
Authors:
Sebastian Goldt,
Bruno Loureiro,
Galen Reeves,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trai…
▽ More
Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.
△ Less
Submitted 21 May, 2021; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Generalisation error in learning with random features and the hidden manifold model
Authors:
Federica Gerace,
Bruno Loureiro,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymp…
▽ More
We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model. We consider the high-dimensional regime and using the replica method from statistical physics, we provide a closed-form expression for the asymptotic generalisation performance in these problems, valid in both the under- and over-parametrised regimes and for a broad choice of generalised linear model loss functions. In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model. Beyond the interest in these particular problems, the theoretical formalism introduced in this manuscript provides a path to further extensions to more complex tasks.
△ Less
Submitted 20 August, 2020; v1 submitted 21 February, 2020;
originally announced February 2020.
-
Modelling the influence of data structure on learning in neural networks: the hidden manifold model
Authors:
Sebastian Goldt,
Marc Mézard,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterised by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model trainin…
▽ More
Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterised by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model training data, or assumes that elements of each data sample are drawn independently from some factorised probability distribution. These approaches are thus by construction blind to the correlation structure of real-world data sets and their impact on learning in neural networks. Here, we introduce a generative model for structured data sets that we call the hidden manifold model (HMM). The idea is to construct high-dimensional inputs that lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single layer decoder or generator in a generative adversarial network. We demonstrate that learning of the hidden manifold model is amenable to an analytical treatment by proving a "Gaussian Equivalence Property" (GEP), and we use the GEP to show how the dynamics of two-layer neural networks trained using one-pass stochastic gradient descent is captured by a set of integro-differential equations that track the performance of the network at all times. This permits us to analyse in detail how a neural network learns functions of increasing complexity during training, how its performance depends on its size and how it is impacted by parameters such as the learning rate or the dimension of the hidden manifold.
△ Less
Submitted 3 December, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
High-temperature Expansions and Message Passing Algorithms
Authors:
Antoine Maillard,
Laura Foini,
Alejandro Lage Castellanos,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and…
▽ More
Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the `high-dimensional' limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many widely studied models, such as Restricted Boltzmann Machines or Generalized Linear Models with correlated data matrices. In this general setting, we show that all the approximation schemes described before are equivalent, and we conjecture that they are exact in the thermodynamic limit in the replica symmetric phases. We achieve this conclusion by resummation of the infinite perturbation series, which generalizes a seminal result of Parisi and Potters. A rigorous derivation of this conjecture is an interesting mathematical challenge. On the way to these conclusions, we uncover several diagrammatical results in connection with free probability and random matrix theory, that are interesting independently of the rest of our work.
△ Less
Submitted 10 June, 2020; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Multi-Layer Generalized Linear Estimation
Authors:
Andre Manoel,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
We consider the problem of reconstructing a signal from multi-layered (possibly) non-linear measurements. Using non-rigorous but standard methods from statistical physics we present the Multi-Layer Approximate Message Passing (ML-AMP) algorithm for computing marginal probabilities of the corresponding estimation problem and derive the associated state evolution equations to analyze its performance…
▽ More
We consider the problem of reconstructing a signal from multi-layered (possibly) non-linear measurements. Using non-rigorous but standard methods from statistical physics we present the Multi-Layer Approximate Message Passing (ML-AMP) algorithm for computing marginal probabilities of the corresponding estimation problem and derive the associated state evolution equations to analyze its performance. We also give the expression of the asymptotic free energy and the minimal information-theoretically achievable reconstruction error. Finally, we present some applications of this measurement model for compressed sensing and perceptron learning with structured matrices/patterns, and for a simple model of estimation of latent variables in an auto-encoder.
△ Less
Submitted 24 January, 2017;
originally announced January 2017.
-
Dynamic message-passing equations for models with unidirectional dynamics
Authors:
Andrey Y. Lokhov,
Marc Mézard,
Lenka Zdeborová
Abstract:
Understanding and quantifying the dynamics of disordered out-of-equilibrium models is an important problem in many branches of science. Using the dynamic cavity method on time trajectories, we construct a general procedure for deriving the dynamic message-passing equations for a large class of models with unidirectional dynamics, which includes the zero-temperature random field Ising model, the su…
▽ More
Understanding and quantifying the dynamics of disordered out-of-equilibrium models is an important problem in many branches of science. Using the dynamic cavity method on time trajectories, we construct a general procedure for deriving the dynamic message-passing equations for a large class of models with unidirectional dynamics, which includes the zero-temperature random field Ising model, the susceptible-infected-recovered model, and rumor spreading models. We show that unidirectionality of the dynamics is the key ingredient that makes the problem solvable. These equations are applicable to single instances of the corresponding problems with arbitrary initial conditions, and are asymptotically exact for problems defined on locally tree-like graphs. When applied to real-world networks, they generically provide a good analytic approximation of the real dynamics.
△ Less
Submitted 14 January, 2015; v1 submitted 4 July, 2014;
originally announced July 2014.
-
Phase transitions and sample complexity in Bayes-optimal matrix factorization
Authors:
Yoshiyuki Kabashima,
Florent Krzakala,
Marc Mézard,
Ayaka Sakata,
Lenka Zdeborová
Abstract:
We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. It is also i…
▽ More
We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix calibration, sparse principal component analysis, blind source separation, low rank matrix completion, robust principal component analysis or factor analysis. It is also important in machine learning: unsupervised representation learning can often be studied through matrix factorization. We use the tools of statistical mechanics - the cavity and replica methods - to analyze the achievability and computational tractability of the inference problems in the setting of Bayes-optimal inference, which amounts to assuming that the two matrices have random independent elements generated from some known distribution, and this information is available to the inference algorithm. In this setting, we compute the minimal mean-squared-error achievable in principle in any computational time, and the error that can be achieved by an efficient approximate message passing algorithm. The computation is based on the asymptotic state-evolution analysis of the algorithm. The performance that our analysis predicts, both in terms of the achieved mean-squared-error, and in terms of sample complexity, is extremely promising and motivating for a further development of the algorithm.
△ Less
Submitted 21 March, 2016; v1 submitted 6 February, 2014;
originally announced February 2014.
-
Inferring the origin of an epidemic with a dynamic message-passing algorithm
Authors:
Andrey Y. Lokhov,
Marc Mézard,
Hiroki Ohta,
Lenka Zdeborová
Abstract:
We study the problem of estimating the origin of an epidemic outbreak -- given a contact network and a snapshot of epidemic spread at a certain time, determine the infection source. Finding the source is important in different contexts of computer or social networks. We assume that the epidemic spread follows the most commonly used susceptible-infected-recovered model. We introduce an inference al…
▽ More
We study the problem of estimating the origin of an epidemic outbreak -- given a contact network and a snapshot of epidemic spread at a certain time, determine the infection source. Finding the source is important in different contexts of computer or social networks. We assume that the epidemic spread follows the most commonly used susceptible-infected-recovered model. We introduce an inference algorithm based on dynamic message-passing equations, and we show that it leads to significant improvement of performance compared to existing approaches. Importantly, this algorithm remains efficient in the case where one knows the state of only a fraction of nodes.
△ Less
Submitted 2 July, 2014; v1 submitted 21 March, 2013;
originally announced March 2013.
-
Non-adaptive pooling strategies for detection of rare faulty items
Authors:
Pan Zhang,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
We study non-adaptive pooling strategies for detection of rare faulty items. Given a binary sparse N-dimensional signal x, how to construct a sparse binary MxN pooling matrix F such that the signal can be reconstructed from the smallest possible number M of measurements y=Fx? We show that a very low number of measurements is possible for random spatially coupled design of pools F. Our design might…
▽ More
We study non-adaptive pooling strategies for detection of rare faulty items. Given a binary sparse N-dimensional signal x, how to construct a sparse binary MxN pooling matrix F such that the signal can be reconstructed from the smallest possible number M of measurements y=Fx? We show that a very low number of measurements is possible for random spatially coupled design of pools F. Our design might find application in genetic screening or compressed genotyping. We show that our results are robust with respect to the uncertainty in the matrix F when some elements are mistaken.
△ Less
Submitted 1 February, 2013;
originally announced February 2013.
-
Phase Diagram and Approximate Message Passing for Blind Calibration and Dictionary Learning
Authors:
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
We consider dictionary learning and blind calibration for signals and matrices created from a random ensemble. We study the mean-squared error in the limit of large signal dimension using the replica method and unveil the appearance of phase transitions delimiting impossible, possible-but-hard and possible inference regions. We also introduce an approximate message passing algorithm that asymptoti…
▽ More
We consider dictionary learning and blind calibration for signals and matrices created from a random ensemble. We study the mean-squared error in the limit of large signal dimension using the replica method and unveil the appearance of phase transitions delimiting impossible, possible-but-hard and possible inference regions. We also introduce an approximate message passing algorithm that asymptotically matches the theoretical performance, and show through numerical tests that it performs very well, for the calibration problem, for tractable system sizes.
△ Less
Submitted 24 January, 2013;
originally announced January 2013.
-
Compressed Sensing under Matrix Uncertainty: Optimum Thresholds and Robust Approximate Message Passing
Authors:
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
In compressed sensing one measures sparse signals directly in a compressed form via a linear transform and then reconstructs the original signal. However, it is often the case that the linear transform itself is known only approximately, a situation called matrix uncertainty, and that the measurement process is noisy. Here we present two contributions to this problem: first, we use the replica met…
▽ More
In compressed sensing one measures sparse signals directly in a compressed form via a linear transform and then reconstructs the original signal. However, it is often the case that the linear transform itself is known only approximately, a situation called matrix uncertainty, and that the measurement process is noisy. Here we present two contributions to this problem: first, we use the replica method to determine the mean-squared error of the Bayes-optimal reconstruction of sparse signals under matrix uncertainty. Second, we consider a robust variant of the approximate message passing algorithm and demonstrate numerically that in the limit of large systems, this algorithm matches the optimal performance in a large region of parameters.
△ Less
Submitted 5 January, 2013;
originally announced January 2013.
-
Belief Propagation Reconstruction for Discrete Tomography
Authors:
Emmanuelle Gouillart,
Florent Krzakala,
Marc Mezard,
Lenka Zdeborová
Abstract:
We consider the reconstruction of a two-dimensional discrete image from a set of tomographic measurements corresponding to the Radon projection. Assuming that the image has a structure where neighbouring pixels have a larger probability to take the same value, we follow a Bayesian approach and introduce a fast message-passing reconstruction algorithm based on belief propagation. For numerical resu…
▽ More
We consider the reconstruction of a two-dimensional discrete image from a set of tomographic measurements corresponding to the Radon projection. Assuming that the image has a structure where neighbouring pixels have a larger probability to take the same value, we follow a Bayesian approach and introduce a fast message-passing reconstruction algorithm based on belief propagation. For numerical results, we specialize to the case of binary tomography. We test the algorithm on binary synthetic images with different length scales and compare our results against a more usual convex optimization approach. We investigate the reconstruction error as a function of the number of tomographic measurements, corresponding to the number of projection angles. The belief propagation algorithm turns out to be more efficient than the convex-optimization algorithm, both in terms of recovery bounds for noise-free projections, and in terms of reconstruction quality when moderate Gaussian noise is added to the projections.
△ Less
Submitted 3 April, 2013; v1 submitted 11 November, 2012;
originally announced November 2012.
-
Compressed Sensing of Approximately-Sparse Signals: Phase Transitions and Optimal Reconstruction
Authors:
Jean Barbier,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Compressed sensing is designed to measure sparse signals directly in a compressed form. However, most signals of interest are only "approximately sparse", i.e. even though the signal contains only a small fraction of relevant (large) components the other components are not strictly equal to zero, but are only close to zero. In this paper we model the approximately sparse signal with a Gaussian dis…
▽ More
Compressed sensing is designed to measure sparse signals directly in a compressed form. However, most signals of interest are only "approximately sparse", i.e. even though the signal contains only a small fraction of relevant (large) components the other components are not strictly equal to zero, but are only close to zero. In this paper we model the approximately sparse signal with a Gaussian distribution of small components, and we study its compressed sensing with dense random matrices. We use replica calculations to determine the mean-squared error of the Bayes-optimal reconstruction for such signals, as a function of the variance of the small components, the density of large components and the measurement rate. We then use the G-AMP algorithm and we quantify the region of parameters for which this algorithm achieves optimality (for large systems). Finally, we show that in the region where the GAMP for the homogeneous measurement matrices is not optimal, a special "seeding" design of a spatially-coupled measurement matrix allows to restore optimality.
△ Less
Submitted 9 July, 2012;
originally announced July 2012.
-
Probabilistic Reconstruction in Compressed Sensing: Algorithms, Phase Diagrams, and Threshold Achieving Matrices
Authors:
Florent Krzakala,
Marc Mézard,
François Sausset,
Yifan Sun,
Lenka Zdeborová
Abstract:
Compressed sensing is a signal processing method that acquires data directly in a compressed form. This allows one to make less measurements than what was considered necessary to record a signal, enabling faster or more precise measurement protocols in a wide range of applications. Using an interdisciplinary approach, we have recently proposed in [arXiv:1109.4424] a strategy that allows compressed…
▽ More
Compressed sensing is a signal processing method that acquires data directly in a compressed form. This allows one to make less measurements than what was considered necessary to record a signal, enabling faster or more precise measurement protocols in a wide range of applications. Using an interdisciplinary approach, we have recently proposed in [arXiv:1109.4424] a strategy that allows compressed sensing to be performed at acquisition rates approaching to the theoretical optimal limits. In this paper, we give a more thorough presentation of our approach, and introduce many new results. We present the probabilistic approach to reconstruction and discuss its optimality and robustness. We detail the derivation of the message passing algorithm for reconstruction and expectation max- imization learning of signal-model parameters. We further develop the asymptotic analysis of the corresponding phase diagrams with and without measurement noise, for different distribution of signals, and discuss the best possible reconstruction performances regardless of the algorithm. We also present new efficient seeding matrices, test them on synthetic data and analyze their performance asymptotically.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.
-
Statistical physics-based reconstruction in compressed sensing
Authors:
Florent Krzakala,
Marc Mézard,
François Sausset,
Yifan Sun,
Lenka Zdeborová
Abstract:
Compressed sensing is triggering a major evolution in signal acquisition. It consists in sampling a sparse signal at low rate and later using computational power for its exact reconstruction, so that only the necessary information is measured. Currently used reconstruction techniques are, however, limited to acquisition rates larger than the true density of the signal. We design a new procedure wh…
▽ More
Compressed sensing is triggering a major evolution in signal acquisition. It consists in sampling a sparse signal at low rate and later using computational power for its exact reconstruction, so that only the necessary information is measured. Currently used reconstruction techniques are, however, limited to acquisition rates larger than the true density of the signal. We design a new procedure which is able to reconstruct exactly the signal with a number of measurements that approaches the theoretical limit in the limit of large systems. It is based on the joint use of three essential ingredients: a probabilistic approach to signal reconstruction, a message-passing algorithm adapted from belief propagation, and a careful design of the measurement matrix inspired from the theory of crystal nucleation. The performance of this new algorithm is analyzed by statistical physics methods. The obtained improvement is confirmed by numerical studies of several cases.
△ Less
Submitted 6 June, 2012; v1 submitted 20 September, 2011;
originally announced September 2011.
-
Decimation flows in constraint satisfaction problems
Authors:
Saburo Higuchi,
Marc Mézard
Abstract:
We study hard constraint satisfaction problems with a decimation approach based on message passing algorithms. Decimation induces a renormalization flow in the space of problems, and we exploit the fact that this flow transforms some of the constraints into linear constraints over GF(2). In particular, when the flow hits the subspace of linear problems, one can stop decimation and use Gaussian e…
▽ More
We study hard constraint satisfaction problems with a decimation approach based on message passing algorithms. Decimation induces a renormalization flow in the space of problems, and we exploit the fact that this flow transforms some of the constraints into linear constraints over GF(2). In particular, when the flow hits the subspace of linear problems, one can stop decimation and use Gaussian elimination. We introduce a new decimation algorithm which uses this linear structure and shows a strongly improved performance with respect to the usual decimation methods on some of the hardest locked occupation problems.
△ Less
Submitted 11 August, 2009;
originally announced August 2009.
-
Susceptibility Propagation for Constraint Satisfaction Problems
Authors:
Saburo Higuchi,
Marc Mézard
Abstract:
We study the susceptibility propagation, a message-passing algorithm to compute correlation functions. It is applied to constraint satisfaction problems and its accuracy is examined. As a heuristic method to find a satisfying assignment, we propose susceptibility-guided decimation where correlations among the variables play an important role. We apply this novel decimation to locked occupation p…
▽ More
We study the susceptibility propagation, a message-passing algorithm to compute correlation functions. It is applied to constraint satisfaction problems and its accuracy is examined. As a heuristic method to find a satisfying assignment, we propose susceptibility-guided decimation where correlations among the variables play an important role. We apply this novel decimation to locked occupation problems, a class of hard constraint satisfaction problems exhibited recently. It is shown that the present method performs better than the standard belief-guided decimation.
△ Less
Submitted 9 March, 2009;
originally announced March 2009.
-
Constraint satisfaction problems with isolated solutions are hard
Authors:
Lenka Zdeborová,
Marc Mézard
Abstract:
We study the phase diagram and the algorithmic hardness of the random `locked' constraint satisfaction problems, and compare them to the commonly studied 'non-locked' problems like satisfiability of boolean formulas or graph coloring. The special property of the locked problems is that clusters of solutions are isolated points. This simplifies significantly the determination of the phase diagram…
▽ More
We study the phase diagram and the algorithmic hardness of the random `locked' constraint satisfaction problems, and compare them to the commonly studied 'non-locked' problems like satisfiability of boolean formulas or graph coloring. The special property of the locked problems is that clusters of solutions are isolated points. This simplifies significantly the determination of the phase diagram, which makes the locked problems particularly appealing from the mathematical point of view. On the other hand we show empirically that the clustered phase of these problems is extremely hard from the algorithmic point of view: the best known algorithms all fail to find solutions. Our results suggest that the easy/hard transition (for currently known algorithms) in the locked problems coincides with the clustering transition. These should thus be regarded as new benchmarks of really hard constraint satisfaction problems.
△ Less
Submitted 4 December, 2008; v1 submitted 8 October, 2008;
originally announced October 2008.
-
Locked constraint satisfaction problems
Authors:
Lenka Zdeborová,
Marc Mézard
Abstract:
We introduce and study the random "locked" constraint satisfaction problems. When increasing the density of constraints, they display a broad "clustered" phase in which the space of solutions is divided into many isolated points. While the phase diagram can be found easily, these problems, in their clustered phase, are extremely hard from the algorithmic point of view: the best known algorithms…
▽ More
We introduce and study the random "locked" constraint satisfaction problems. When increasing the density of constraints, they display a broad "clustered" phase in which the space of solutions is divided into many isolated points. While the phase diagram can be found easily, these problems, in their clustered phase, are extremely hard from the algorithmic point of view: the best known algorithms all fail to find solutions. We thus propose new benchmarks of really hard optimization problems and provide insight into the origin of their typical hardness.
△ Less
Submitted 5 September, 2008; v1 submitted 20 March, 2008;
originally announced March 2008.
-
Group Testing with Random Pools: optimal two-stage algorithms
Authors:
Marc Mezard,
Cristina Toninelli
Abstract:
We study Probabilistic Group Testing of a set of N items each of which is defective with probability p. We focus on the double limit of small defect probability, p<<1, and large number of variables, N>>1, taking either p->0 after $N\to\infty$ or $p=1/N^β$ with $β\in(0,1/2)$. In both settings the optimal number of tests which are required to identify with certainty the defectives via a two-stage…
▽ More
We study Probabilistic Group Testing of a set of N items each of which is defective with probability p. We focus on the double limit of small defect probability, p<<1, and large number of variables, N>>1, taking either p->0 after $N\to\infty$ or $p=1/N^β$ with $β\in(0,1/2)$. In both settings the optimal number of tests which are required to identify with certainty the defectives via a two-stage procedure, $\bar T(N,p)$, is known to scale as $Np|\log p|$. Here we determine the sharp asymptotic value of $\bar T(N,p)/(Np|\log p|)$ and construct a class of two-stage algorithms over which this optimal value is attained. This is done by choosing a proper bipartite regular graph (of tests and variable nodes) for the first stage of the detection. Furthermore we prove that this optimal value is also attained on average over a random bipartite graph where all variables have the same degree, while the tests have Poisson-distributed degrees. Finally, we improve the existing upper and lower bound for the optimal number of tests in the case $p=1/N^β$ with $β\in[1/2,1)$.
△ Less
Submitted 21 June, 2007;
originally announced June 2007.
-
Geometrical organization of solutions to random linear Boolean equations
Authors:
Thierry Mora,
Marc Mézard
Abstract:
The random XORSAT problem deals with large random linear systems of Boolean variables. The difficulty of such problems is controlled by the ratio of number of equations to number of variables. It is known that in some range of values of this parameter, the space of solutions breaks into many disconnected clusters. Here we study precisely the corresponding geometrical organization. In particular,…
▽ More
The random XORSAT problem deals with large random linear systems of Boolean variables. The difficulty of such problems is controlled by the ratio of number of equations to number of variables. It is known that in some range of values of this parameter, the space of solutions breaks into many disconnected clusters. Here we study precisely the corresponding geometrical organization. In particular, the distribution of distances between these clusters is computed by the cavity method. This allows to study the `x-satisfiability' threshold, the critical density of equations where there exist two solutions at a given distance.
△ Less
Submitted 5 September, 2006;
originally announced September 2006.
-
The number of matchings in random graphs
Authors:
Lenka Zdeborová,
Marc Mézard
Abstract:
We study matchings on sparse random graphs by means of the cavity method. We first show how the method reproduces several known results about maximum and perfect matchings in regular and Erdos-Renyi random graphs. Our main new result is the computation of the entropy, i.e. the leading order of the logarithm of the number of solutions, of matchings with a given size. We derive both an algorithm t…
▽ More
We study matchings on sparse random graphs by means of the cavity method. We first show how the method reproduces several known results about maximum and perfect matchings in regular and Erdos-Renyi random graphs. Our main new result is the computation of the entropy, i.e. the leading order of the logarithm of the number of solutions, of matchings with a given size. We derive both an algorithm to compute this entropy for an arbitrary graph with a girth that diverges in the large size limit, and an analytic result for the entropy in regular and Erdos-Renyi random graph ensembles.
△ Less
Submitted 5 May, 2006; v1 submitted 13 March, 2006;
originally announced March 2006.
-
Landscape of solutions in constraint satisfaction problems
Authors:
Marc Mezard,
Matteo Palassini,
Olivier Rivoire
Abstract:
We present a theoretical framework for characterizing the geometrical properties of the space of solutions in constraint satisfaction problems, together with practical algorithms for studying this structure on particular instances. We apply our method to the coloring problem, for which we obtain the total number of solutions and analyze in detail the distribution of distances between solutions.
We present a theoretical framework for characterizing the geometrical properties of the space of solutions in constraint satisfaction problems, together with practical algorithms for studying this structure on particular instances. We apply our method to the coloring problem, for which we obtain the total number of solutions and analyze in detail the distribution of distances between solutions.
△ Less
Submitted 2 November, 2005; v1 submitted 19 July, 2005;
originally announced July 2005.
-
The theoretical capacity of the Parity Source Coder
Authors:
Stefano Ciliberti,
Marc Mezard
Abstract:
The Parity Source Coder is a protocol for data compression which is based on a set of parity checks organized in a sparse random network. We consider here the case of memoryless unbiased binary sources. We show that the theoretical capacity saturate the Shannon limit at large K. We also find that the first corrections to the leading behavior are exponentially small, so that the behavior at finit…
▽ More
The Parity Source Coder is a protocol for data compression which is based on a set of parity checks organized in a sparse random network. We consider here the case of memoryless unbiased binary sources. We show that the theoretical capacity saturate the Shannon limit at large K. We also find that the first corrections to the leading behavior are exponentially small, so that the behavior at finite K is very close to the optimal one.
△ Less
Submitted 14 September, 2005; v1 submitted 24 June, 2005;
originally announced June 2005.
-
Pairs of SAT Assignment in Random Boolean Formulae
Authors:
Hervé Daudé,
Marc Mezard,
Thierry Mora,
Riccardo Zecchina
Abstract:
We investigate geometrical properties of the random K-satisfiability problem using the notion of x-satisfiability: a formula is x-satisfiable if there exist two SAT assignments differing in Nx variables. We show the existence of a sharp threshold for this property as a function of the clause density. For large enough K, we prove that there exists a region of clause density, below the satisfiabil…
▽ More
We investigate geometrical properties of the random K-satisfiability problem using the notion of x-satisfiability: a formula is x-satisfiable if there exist two SAT assignments differing in Nx variables. We show the existence of a sharp threshold for this property as a function of the clause density. For large enough K, we prove that there exists a region of clause density, below the satisfiability threshold, where the landscape of Hamming distances between SAT assignments experiences a gap: pairs of SAT-assignments exist at small x, and around x=1/2, but they donot exist at intermediate values of x. This result is consistent with the clustering scenario which is at the heart of the recent heuristic analysis of satisfiability using statistical physics analysis (the cavity method), and its algorithmic counterpart (the survey propagation algorithm). The method uses elementary probabilistic arguments (first and second moment methods), and might be useful in other problems of computational and physical interest where similar phenomena appear.
△ Less
Submitted 19 September, 2007; v1 submitted 2 June, 2005;
originally announced June 2005.
-
Clustering of solutions in the random satisfiability problem
Authors:
M. Mezard,
T. Mora,
R. Zecchina
Abstract:
Using elementary rigorous methods we prove the existence of a clustered phase in the random $K$-SAT problem, for $K\geq 8$. In this phase the solutions are grouped into clusters which are far away from each other. The results are in agreement with previous predictions of the cavity method and give a rigorous confirmation to one of its main building blocks. It can be generalized to other systems…
▽ More
Using elementary rigorous methods we prove the existence of a clustered phase in the random $K$-SAT problem, for $K\geq 8$. In this phase the solutions are grouped into clusters which are far away from each other. The results are in agreement with previous predictions of the cavity method and give a rigorous confirmation to one of its main building blocks. It can be generalized to other systems of both physical and computational interest.
△ Less
Submitted 4 April, 2005;
originally announced April 2005.
-
Threshold values of Random K-SAT from the cavity method
Authors:
Stephan Mertens,
Marc Mezard,
Riccardo Zecchina
Abstract:
Using the cavity equations of \cite{mezard:parisi:zecchina:02,mezard:zecchina:02}, we derive the various threshold values for the number of clauses per variable of the random $K$-satisfiability problem, generalizing the previous results to $K \ge 4$. We also give an analytic solution of the equations, and some closed expressions for these thresholds, in an expansion around large $K$. The stabili…
▽ More
Using the cavity equations of \cite{mezard:parisi:zecchina:02,mezard:zecchina:02}, we derive the various threshold values for the number of clauses per variable of the random $K$-satisfiability problem, generalizing the previous results to $K \ge 4$. We also give an analytic solution of the equations, and some closed expressions for these thresholds, in an expansion around large $K$. The stability of the solution is also computed. For any $K$, the satisfiability threshold is found to be in the stable region of the solution, which adds further credit to the conjecture that this computation gives the exact satisfiability threshold.
△ Less
Submitted 24 February, 2005; v1 submitted 12 September, 2003;
originally announced September 2003.
-
Survey propagation: an algorithm for satisfiability
Authors:
A. Braunstein,
M. Mezard,
R. Zecchina
Abstract:
We study the satisfiability of randomly generated formulas formed by $M$ clauses of exactly $K$ literals over $N$ Boolean variables. For a given value of $N$ the problem is known to be most difficult with $α=M/N$ close to the experimental threshold $α_c$ separating the region where almost all formulas are SAT from the region where all formulas are UNSAT. Recent results from a statistical physics…
▽ More
We study the satisfiability of randomly generated formulas formed by $M$ clauses of exactly $K$ literals over $N$ Boolean variables. For a given value of $N$ the problem is known to be most difficult with $α=M/N$ close to the experimental threshold $α_c$ separating the region where almost all formulas are SAT from the region where all formulas are UNSAT. Recent results from a statistical physics analysis suggest that the difficulty is related to the existence of a clustering phenomenon of the solutions when $α$ is close to (but smaller than) $α_c$. We introduce a new type of message passing algorithm which allows to find efficiently a satisfiable assignment of the variables in the difficult region. This algorithm is iterative and composed of two main parts. The first is a message-passing procedure which generalizes the usual methods like Sum-Product or Belief Propagation: it passes messages that are surveys over clusters of the ordinary messages. The second part uses the detailed probabilistic information obtained from the surveys in order to fix variables and simplify the problem. Eventually, the simplified problem that remains is solved by a conventional heuristic.
△ Less
Submitted 4 April, 2006; v1 submitted 4 December, 2002;
originally announced December 2002.
-
Constraint Satisfaction by Survey Propagation
Authors:
A. Braunstein,
M. Mezard,
M. Weigt,
R. Zecchina
Abstract:
Survey Propagation is an algorithm designed for solving typical instances of random constraint satisfiability problems. It has been successfully tested on random 3-SAT and random $G(n,\frac{c}{n})$ graph 3-coloring, in the hard region of the parameter space. Here we provide a generic formalism which applies to a wide class of discrete Constraint Satisfaction Problems.
Survey Propagation is an algorithm designed for solving typical instances of random constraint satisfiability problems. It has been successfully tested on random 3-SAT and random $G(n,\frac{c}{n})$ graph 3-coloring, in the hard region of the parameter space. Here we provide a generic formalism which applies to a wide class of discrete Constraint Satisfaction Problems.
△ Less
Submitted 27 September, 2003; v1 submitted 18 December, 2002;
originally announced December 2002.
-
Alternative solutions to diluted p-spin models and XORSAT problems
Authors:
M. Mezard,
F. Ricci-Tersenghi,
R. Zecchina
Abstract:
We derive analytical solutions for p-spin models with finite connectivity at zero temperature. These models are the statistical mechanics equivalent of p-XORSAT problems in theoretical computer science. We give a full characterization of the phase diagram: location of the phase transitions (static and dynamic), together with a description of the clustering phenomenon taking place in configuratio…
▽ More
We derive analytical solutions for p-spin models with finite connectivity at zero temperature. These models are the statistical mechanics equivalent of p-XORSAT problems in theoretical computer science. We give a full characterization of the phase diagram: location of the phase transitions (static and dynamic), together with a description of the clustering phenomenon taking place in configurational space. We use two alternative methods: the cavity approach and a rigorous derivation.
△ Less
Submitted 19 September, 2002; v1 submitted 4 July, 2002;
originally announced July 2002.