-
Statistical mechanics of extensive-width Bayesian neural networks near interpolation
Authors:
Jean Barbier,
Francesco Camilli,
Minh-Toan Nguyen,
Mauro Pastore,
Rudy Skerk
Abstract:
For three decades statistical mechanics has been providing a framework to analyse neural networks. However, the theoretically tractable models, e.g., perceptrons, random features models and kernel machines, or multi-index models and committee machines with few neurons, remained simple compared to those used in applications. In this paper we help reducing the gap between practical networks and thei…
▽ More
For three decades statistical mechanics has been providing a framework to analyse neural networks. However, the theoretically tractable models, e.g., perceptrons, random features models and kernel machines, or multi-index models and committee machines with few neurons, remained simple compared to those used in applications. In this paper we help reducing the gap between practical networks and their theoretical understanding through a statistical physics analysis of the supervised learning of a two-layer fully connected network with generic weight distribution and activation function, whose hidden layer is large but remains proportional to the inputs dimension. This makes it more realistic than infinitely wide networks where no feature learning occurs, but also more expressive than narrow ones or with fixed inner weights. We focus on the Bayes-optimal learning in the teacher-student scenario, i.e., with a dataset generated by another network with the same architecture. We operate around interpolation, where the number of trainable parameters and of data are comparable and feature learning emerges. Our analysis uncovers a rich phenomenology with various learning transitions as the number of data increases. In particular, the more strongly the features (i.e., hidden neurons of the target) contribute to the observed responses, the less data is needed to learn them. Moreover, when the data is scarce, the model only learns non-linear combinations of the teacher weights, rather than "specialising" by aligning its weights with the teacher's. Specialisation occurs only when enough data becomes available, but it can be hard to find for practical training algorithms, possibly due to statistical-to-computational~gaps.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime
Authors:
Francesco Camilli,
Daria Tieplova,
Eleonora Bergamin,
Jean Barbier
Abstract:
We rigorously analyse fully-trained neural networks of arbitrary depth in the Bayesian optimal setting in the so-called proportional scaling regime where the number of training samples and width of the input and all inner layers diverge proportionally. We prove an information-theoretic equivalence between the Bayesian deep neural network model trained from data generated by a teacher with matching…
▽ More
We rigorously analyse fully-trained neural networks of arbitrary depth in the Bayesian optimal setting in the so-called proportional scaling regime where the number of training samples and width of the input and all inner layers diverge proportionally. We prove an information-theoretic equivalence between the Bayesian deep neural network model trained from data generated by a teacher with matching architecture, and a simpler model of optimal inference in a generalized linear model. This equivalence enables us to compute the optimal generalization error for deep neural networks in this regime. We thus prove the "deep Gaussian equivalence principle" conjectured in Cui et al. (2023) (arXiv:2302.00375). Our result highlights that in order to escape this "trivialisation" of deep neural networks (in the sense of reduction to a linear model) happening in the strongly overparametrized proportional regime, models trained from much more data have to be considered.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation
Authors:
Jean Barbier,
Francesco Camilli,
Minh-Toan Nguyen,
Mauro Pastore,
Rudy Skerk
Abstract:
We consider a teacher-student model of supervised learning with a fully-trained two-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We provide an effective theory for approximating the Bayes-optimal generalisation error of the network for any activation function in the regime of sample size $n$ scaling quadratically with the input dimension, i.e., around th…
▽ More
We consider a teacher-student model of supervised learning with a fully-trained two-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We provide an effective theory for approximating the Bayes-optimal generalisation error of the network for any activation function in the regime of sample size $n$ scaling quadratically with the input dimension, i.e., around the interpolation threshold where the number of trainable parameters $kd+k$ and of data $n$ are comparable. Our analysis tackles generic weight distributions. We uncover a discontinuous phase transition separating a "universal" phase from a "specialisation" phase. In the first, the generalisation error is independent of the weight distribution and decays slowly with the sampling rate $n/d^2$, with the student learning only some non-linear combinations of the teacher weights. In the latter, the error is weight distribution-dependent and decays faster due to the alignment of the student towards the teacher network. We thus unveil the existence of a highly predictive solution near interpolation, which is however potentially hard to find by practical algorithms.
△ Less
Submitted 1 April, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
Information-theoretic limits and approximate message-passing for high-dimensional time series
Authors:
Daria Tieplova,
Samriddha Lahiry,
Jean Barbier
Abstract:
High-dimensional time series appear in many scientific setups, demanding a nuanced approach to model and analyze the underlying dependence structure. Theoretical advancements so far often rely on stringent assumptions regarding the sparsity of the underlying signal. In non-sparse regimes, analyses have primarily focused on linear regression models with the design matrix having independent rows. In…
▽ More
High-dimensional time series appear in many scientific setups, demanding a nuanced approach to model and analyze the underlying dependence structure. Theoretical advancements so far often rely on stringent assumptions regarding the sparsity of the underlying signal. In non-sparse regimes, analyses have primarily focused on linear regression models with the design matrix having independent rows. In this paper, we expand the scope by investigating a high-dimensional time series model wherein the number of features grows proportionally to the number of sampling points, without assuming sparsity in the signal. Specifically, we consider the stochastic regression model and derive a single-letter formula for the normalized mutual information between observations and the signal, as well as for minimum mean-square errors. We also empirically study the vector approximate message passing VAMP algorithm and show that, despite the lack of theoretical guarantees, its performance for inference in our time series model is robust and often statistically optimal.
△ Less
Submitted 19 March, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
Machine learning for cerebral blood vessels' malformations
Authors:
Irem Topal,
Alexander Cherevko,
Yuri Bugay,
Maxim Shishlenin,
Jean Barbier,
Deniz Eroglu,
Édgar Roldán,
Roman Belousov
Abstract:
Cerebral aneurysms and arteriovenous malformations are life-threatening hemodynamic pathologies of the brain. While surgical intervention is often essential to prevent fatal outcomes, it carries significant risks both during the procedure and in the postoperative period, making the management of these conditions highly challenging. Parameters of cerebral blood flow, routinely monitored during medi…
▽ More
Cerebral aneurysms and arteriovenous malformations are life-threatening hemodynamic pathologies of the brain. While surgical intervention is often essential to prevent fatal outcomes, it carries significant risks both during the procedure and in the postoperative period, making the management of these conditions highly challenging. Parameters of cerebral blood flow, routinely monitored during medical interventions or with modern noninvasive high-resolution imaging methods, could potentially be utilized in machine learning-assisted protocols for risk assessment and therapeutic prognosis. To this end, we developed a linear oscillatory model of blood velocity and pressure for clinical data acquired from neurosurgical operations. Using the method of Sparse Identification of Nonlinear Dynamics (SINDy), the parameters of our model can be reconstructed online within milliseconds from a short time series of the hemodynamic variables. The identified parameter values enable automated classification of the blood-flow pathologies by means of logistic regression, achieving an accuracy of 73 \%}. Our results demonstrate the potential of this model for both diagnostic and prognostic applications, providing a robust and interpretable framework for assessing cerebral blood vessel conditions.
△ Less
Submitted 27 February, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance
Authors:
Jean Barbier,
Francesco Camilli,
Justin Ko,
Koki Okajima
Abstract:
Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator […
▽ More
Matrix denoising is central to signal processing and machine learning. Its statistical analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and an efficient Bayes-optimal denoising algorithm, called rotational invariant estimator [1,2], are known. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model (as appearing in high-energy physics) due to the lack of rotation symmetry, but rather a hybrid between the two. Here we make progress towards the understanding of Bayesian matrix denoising when the signal is a factored matrix $XX^\intercal$ that is not rotationally invariant. Monte Carlo simulations suggest the existence of a \emph{denoising-factorisation transition} separating a phase where denoising using the rotational invariant estimator remains Bayes-optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible, though algorithmically hard. We argue that it is only beyond the transition that factorisation, i.e., estimating $X$ itself, becomes possible up to irresolvable ambiguities. On the theory side, we combine mean-field techniques in an interpretable multiscale fashion in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations reproducible by the replica approach of [3]. Using numerical insights, we delimit the portion of phase diagram where we conjecture the mean-field theory to be exact, and correct it using universality when it is not. Our complete ansatz matches well the numerics in the whole phase diagram when considering finite size effects.
△ Less
Submitted 14 March, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise
Authors:
Jean Barbier,
Francesco Camilli,
Marco Mondelli,
Yizhou Xu
Abstract:
We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractabili…
▽ More
We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide sub-optimal algorithms or are limited to special cases of noise ensembles. In this paper, using tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals) we establish the first characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. Remarkably, our analysis unveils the asymptotic equivalence between the rotationally invariant model and a surrogate Gaussian one. Finally, we show how to saturate the predicted statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations.
△ Less
Submitted 8 July, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
A multiscale cavity method for sublinear-rank symmetric matrix factorization
Authors:
Jean Barbier,
Justin Ko,
Anas A. Rahman
Abstract:
We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime where the rank $M$ of the signal matrix to infer scales with its size $N$ as $M={\rm o}(\sqrt{\ln N})$. Allowing for an $N$-dependent rank offers new challenges and requires new methods. Working in the Bayes-optimal setting, we show that whenever the signal has i.i.d.~entr…
▽ More
We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime where the rank $M$ of the signal matrix to infer scales with its size $N$ as $M={\rm o}(\sqrt{\ln N})$. Allowing for an $N$-dependent rank offers new challenges and requires new methods. Working in the Bayes-optimal setting, we show that whenever the signal has i.i.d.~entries, the limiting mutual information between signal and data is given by a variational formula involving a rank-one replica symmetric potential. In other words, from the information-theoretic perspective, the case of a (slowly) growing rank is the same as when $M=1$ (namely, the standard spiked Wigner model). The proof is primarily based on a novel multiscale cavity method allowing for growing rank along with some information-theoretic identities on worst noise for the vector Gaussian channel. We believe that the cavity method developed here will play a role in the analysis of a broader class of inference and spin models where the degrees of freedom are large arrays instead of vectors.
△ Less
Submitted 20 March, 2025; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Fundamental limits of overparametrized shallow neural networks for supervised learning
Authors:
Francesco Camilli,
Daria Tieplova,
Jean Barbier
Abstract:
We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simp…
▽ More
We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Bayes-optimal limits in structured PCA, and how to reach them
Authors:
Jean Barbier,
Francesco Camilli,
Marco Mondelli,
Manuel Saenz
Abstract:
How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The r…
▽ More
How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide the first characterization of the Bayes-optimal limits of inference in this model. If the spike is rotation-invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical mechanics. We thus propose a novel AMP, inspired by the theory of Adaptive Thouless-Anderson-Palmer equations, which saturates the theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at remarkable universality properties.
△ Less
Submitted 2 June, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Sparse superposition codes with rotational invariant coding matrices for memoryless channels
Authors:
YuHao Liu,
Teng Fu,
Jean Barbier,
TianQi Hou
Abstract:
We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approx…
▽ More
We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approximate message-passing decoding [2].We focus on specific binary output channels for concreteness but our analysis based on the replica symmetric method from statistical physics applies to any memoryless channel. We confirm that the "spectral criterion" introduced in [1], a coding-matrix design principle which allows the code to be capacity-achieving in the "large section size" asymptotic limit, extends to generic memoryless channels. Moreover, we also show that the vanishing error floor property [3] of this coding scheme is universal for arbitrary spectrum of the coding matrix.
△ Less
Submitted 10 July, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
The mighty force: statistical inference and high-dimensional statistics
Authors:
Erik Aurell,
Jean Barbier,
Aurelien Decelle,
Roberto Mulet
Abstract:
This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k…
▽ More
This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k.a. direct coupling analysis), inference from graphs (the community detection problem), and the dynamic cavity method, which in particular allows for inference from graphs encoding causal relations.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Sparse superposition codes under VAMP decoding with generic rotational invariant coding matrices
Authors:
TianQi Hou,
YuHao Liu,
Teng Fu,
Jean Barbier
Abstract:
Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder…
▽ More
Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder based on vector approximate message-passing (VAMP).Our main findings, based on both a standard replica symmetric potential theory and state evolution analysis, are the superiority of certain structured ensembles of coding matrices (such as partial row-orthogonal) when compared to i.i.d. matrices, as well as a spectrum-independent upper bound on VAMP's threshold. Most importantly, we derive a simple "spectral criterion " for the scheme to be at the same time capacity-achieving while having the best possible algorithmic threshold, in the "large section size" asymptotic limit. Our results therefore provide practical design principles for the coding matrices in this promising communication scheme.
△ Less
Submitted 26 May, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Statistical limits of dictionary learning: random matrix theory and the spectral replica method
Authors:
Jean Barbier,
Nicolas Macris
Abstract:
We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising prob…
▽ More
We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising problems whose mutual information and minimum mean-square error are computable using techniques from random matrix theory. Next, we analyze the more challenging models of dictionary learning. To do so we introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method. This allows us to derive variational formulas for the mutual information between hidden representations and the noisy data of the dictionary learning problem, as well as for the overlaps quantifying the optimal reconstruction error. The proposed method reduces the number of degrees of freedom from $Θ(N^2)$ matrix entries to $Θ(N)$ eigenvalues (or singular values), and yields Coulomb gas representations of the mutual information which are reminiscent of matrix models in physics. The main ingredients are a combination of large deviation results for random matrices together with a new replica symmetric decoupling ansatz at the level of the probability distributions of eigenvalues (or singular values) of certain overlap matrices and the use of HarishChandra-Itzykson-Zuber spherical integrals.
△ Less
Submitted 26 February, 2022; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Performance of Bayesian linear regression in a model with mismatch
Authors:
Jean Barbier,
Wei-Kuo Chen,
Dmitry Panchenko,
Manuel Sáenz
Abstract:
In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previousl…
▽ More
In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is used for inference, much less is known when there is a mismatch. Here we consider a model in which the responses are corrupted by gaussian noise and are known to be generated as linear combinations of the covariates, but the distributions of the ground-truth regression coefficients and of the noise are unknown. This regression task can be rephrased as a statistical mechanics model known as the Gardner spin glass, an analogy which we exploit. Using a leave-one-out approach we characterize the mean-square error for the regression coefficients. We also derive the log-normalizing constant of the posterior. Similar models have been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are much more straightforward. An interesting consequence of our analysis is that in the quadratic loss case, the performance of the Bayesian estimator is independent of a global "temperature" hyperparameter and matches the ridge estimator: sampling and optimizing are equally good.
△ Less
Submitted 10 November, 2021; v1 submitted 14 July, 2021;
originally announced July 2021.
-
High-dimensional inference: a statistical mechanics perspective
Authors:
Jean Barbier
Abstract:
Statistical inference is the science of drawing conclusions about some system from data. In modern signal processing and machine learning, inference is done in very high dimension: very many unknown characteristics about the system have to be deduced from a lot of high-dimensional noisy data. This "high-dimensional regime" is reminiscent of statistical mechanics, which aims at describing the macro…
▽ More
Statistical inference is the science of drawing conclusions about some system from data. In modern signal processing and machine learning, inference is done in very high dimension: very many unknown characteristics about the system have to be deduced from a lot of high-dimensional noisy data. This "high-dimensional regime" is reminiscent of statistical mechanics, which aims at describing the macroscopic behavior of a complex system based on the knowledge of its microscopic interactions. It is by now clear that there are many connections between inference and statistical physics. This article aims at emphasizing some of the deep links connecting these apparently separated disciplines through the description of paradigmatic models of high-dimensional inference in the language of statistical mechanics. This article has been published in the issue on artificial intelligence of Ithaca, an Italian popularization-of-science journal. The selected topics and references are highly biased and not intended to be exhaustive in any ways. Its purpose is to serve as introduction to statistical mechanics of inference through a very specific angle that corresponds to my own tastes and limited knowledge.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Strong replica symmetry for high-dimensional disordered log-concave Gibbs measures
Authors:
Jean Barbier,
Dmitry Panchenko,
Manuel Sáenz
Abstract:
We consider a generic class of log-concave, possibly random, (Gibbs) measures. We prove the concentration of an infinite family of order parameters called multioverlaps. Because they completely parametrise the quenched Gibbs measure of the system, this implies a simple representation of the asymptotic Gibbs measures, as well as the decoupling of the variables in a strong sense. These results may p…
▽ More
We consider a generic class of log-concave, possibly random, (Gibbs) measures. We prove the concentration of an infinite family of order parameters called multioverlaps. Because they completely parametrise the quenched Gibbs measure of the system, this implies a simple representation of the asymptotic Gibbs measures, as well as the decoupling of the variables in a strong sense. These results may prove themselves useful in several contexts. In particular in machine learning and high-dimensional inference, log-concave measures appear in convex empirical risk minimisation, maximum a-posteriori inference or M-estimation. We believe that they may be applicable in establishing some type of "replica symmetric formulas" for the free energy, inference or generalisation error in such settings.
△ Less
Submitted 22 February, 2022; v1 submitted 27 September, 2020;
originally announced September 2020.
-
All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation
Authors:
Jean Barbier,
Nicolas Macris,
Cynthia Rush
Abstract:
We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropr…
▽ More
We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix and analyze the approximate message passing algorithm in the sparse regime. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, we find all-or-nothing phase transitions for the asymptotic minimum and algorithmic mean-square errors. These jump from their maximum possible value to zero, at well defined signal-to-noise thresholds whose asymptotic values we determine exactly. In the asymptotic regime the statistical-to-algorithmic gap diverges indicating that sparse recovery is hard for approximate message passing.
△ Less
Submitted 30 October, 2020; v1 submitted 14 June, 2020;
originally announced June 2020.
-
Information-theoretic limits of a multiview low-rank symmetric spiked matrix model
Authors:
Jean Barbier,
Galen Reeves
Abstract:
We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models, often used as probabilistic models for principal component analysis. Such paradigmatic models have recently attracted a lot of attention from a number of communities due to their phenomenological richness with statistical-to-computational gaps, while remaining tractable…
▽ More
We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models, often used as probabilistic models for principal component analysis. Such paradigmatic models have recently attracted a lot of attention from a number of communities due to their phenomenological richness with statistical-to-computational gaps, while remaining tractable. We rigorously establish the information-theoretic limits through the proof of single-letter formulas for the mutual information and minimum mean-square error. On a technical side we improve the recently introduced adaptive interpolation method, so that it can be used to study low-rank models (i.e., estimation problems of "tall matrices") in full generality, an important step towards the rigorous analysis of more complicated inference and learning models.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
Strong replica symmetry in high-dimensional optimal Bayesian inference
Authors:
Jean Barbier,
Dmitry Panchenko
Abstract:
We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities t…
▽ More
We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation coming from exponentially distributed "side-observations" of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the rigorous derivation of replica symmetric formulas for the free energy and mutual information in this context.
△ Less
Submitted 22 February, 2022; v1 submitted 6 May, 2020;
originally announced May 2020.
-
0-1 phase transitions in sparse spiked matrix estimation
Authors:
Jean Barbier,
Nicolas Macris
Abstract:
We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We pr…
▽ More
We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix in suitable sparse limits. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error. A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression (compressive sensing).
△ Less
Submitted 12 November, 2019;
originally announced November 2019.
-
Blind calibration for compressed sensing: State evolution and an online algorithm
Authors:
Marylou Gabrié,
Jean Barbier,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Compressed sensing, allows to acquire compressible signals with a small number of measurements. In applications, a hardware implementation often requires a calibration as the sensing process is not perfectly known. Blind calibration, that is performing at the same time calibration and compressed sensing is thus particularly appealing. A potential approach was suggested by Schülke and collaborators…
▽ More
Compressed sensing, allows to acquire compressible signals with a small number of measurements. In applications, a hardware implementation often requires a calibration as the sensing process is not perfectly known. Blind calibration, that is performing at the same time calibration and compressed sensing is thus particularly appealing. A potential approach was suggested by Schülke and collaborators in Schülke et al. 2013 and 2015, using approximate message passing (AMP) for blind calibration (cal-AMP). Here, the algorithm is extended from the already proposed offline case to the online case, where the calibration is refined step by step as new measured samples are received. Furthermore, we show that the performance of both the offline and the online algorithms can be theoretically studied via the State Evolution (SE) formalism. Through numerical simulations, the efficiency of cal-AMP and the consistency of the theoretical predictions are confirmed.
△ Less
Submitted 23 March, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Mutual information for low-rank even-order symmetric tensor estimation
Authors:
Clément Luneau,
Jean Barbier,
Nicolas Macris
Abstract:
We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This re…
▽ More
We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors.
△ Less
Submitted 23 September, 2020; v1 submitted 9 April, 2019;
originally announced April 2019.
-
Overlap matrix concentration in optimal Bayesian inference
Authors:
Jean Barbier
Abstract:
We consider models of Bayesian inference of signals with vectorial components of finite dimensionality. We show that, under a proper perturbation, these models are replica symmetric in the sense that the overlap matrix concentrates. The overlap matrix is the order parameter in these models and is directly related to error metrics such as minimum mean-square errors. Our proof is valid in the optima…
▽ More
We consider models of Bayesian inference of signals with vectorial components of finite dimensionality. We show that, under a proper perturbation, these models are replica symmetric in the sense that the overlap matrix concentrates. The overlap matrix is the order parameter in these models and is directly related to error metrics such as minimum mean-square errors. Our proof is valid in the optimal Bayesian inference setting. This means that it relies on the assumption that the model and all its hyper-parameters are known so that the posterior distribution can be written exactly. Examples of important problems in high-dimensional inference and learning to which our results apply are low-rank tensor factorization, the committee machine neural network with a finite number of hidden neurons in the teacher-student scenario, or multi-layer versions of the generalized linear model.
△ Less
Submitted 24 January, 2020; v1 submitted 4 April, 2019;
originally announced April 2019.
-
Mutual Information for the Stochastic Block Model by the Adaptive Interpolation Method
Authors:
Jean Barbier,
Chun Lam Chan,
Nicolas Macris
Abstract:
We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmi…
▽ More
We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method.
△ Less
Submitted 16 July, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Concentration of multi-overlaps for random ferromagnetic spin models
Authors:
Jean Barbier,
Chun Lam Chan,
Nicolas Macris
Abstract:
We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori l…
▽ More
We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori line. Here we treat all multi-overlaps by a non-trivial application of Griffiths-Kelly-Sherman correlation inequalities. Our results apply in particular to the pure and mixed p-spin ferromagnets on random dilute Erdoes-Rényi hypergraphs. On physical grounds one expects that multi-overlap concentration directly implies the correctness of the cavity (or replica symmetric) formula for the pressure. The proof of this formula for the general p-spin ferromagnet on a random dilute hypergraph remains an open problem.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models
Authors:
Jean Barbier,
Nicolas Macris
Abstract:
In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We…
▽ More
In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We then generalize this analysis to a paradigmatic inference problem, namely rank-one matrix estimation, also refered to as the Wigner spike model in statistics. We give many pointers to the recent literature where the method has been succesfully applied.
△ Less
Submitted 7 March, 2020; v1 submitted 19 January, 2019;
originally announced January 2019.
-
The committee machine: Computational to statistical gaps in learning a two-layers neural network
Authors:
Benjamin Aubin,
Antoine Maillard,
Jean Barbier,
Florent Krzakala,
Nicolas Macris,
Lenka Zdeborová
Abstract:
Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of…
▽ More
Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap.
△ Less
Submitted 29 February, 2024; v1 submitted 14 June, 2018;
originally announced June 2018.
-
Adaptive Path Interpolation for Sparse Systems: Application to a Simple Censored Block Model
Authors:
Jean Barbier,
Chun Lam Chan,
Nicolas Macris
Abstract:
Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation meth…
▽ More
Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation method directly proves that the replica symmetric prediction is exact, in a simple and unified manner. When the underlying factor graph of the inference problem is sparse the replica prediction is considerably more complicated, and rigorous results are often lacking or obtained by rather complicated methods. In this work we show how to extend the adaptive path interpolation method to sparse systems. We concentrate on a Censored Block Model, where hidden variables are measured through a binary erasure channel, for which we fully prove the replica prediction.
△ Less
Submitted 18 July, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Entropy and mutual information in models of deep neural networks
Authors:
Marylou Gabrié,
Andre Manoel,
Clément Luneau,
Jean Barbier,
Nicolas Macris,
Florent Krzakala,
Lenka Zdeborová
Abstract:
We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is kno…
▽ More
We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.
△ Less
Submitted 29 October, 2018; v1 submitted 24 May, 2018;
originally announced May 2018.
-
The Layered Structure of Tensor Estimation and its Mutual Information
Authors:
Jean Barbier,
Nicolas Macris,
Léo Miolane
Abstract:
We consider rank-one non-symmetric tensor estimation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula fo…
▽ More
We consider rank-one non-symmetric tensor estimation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula for the mutual information for the order 3 problem from the knowledge of the formula for the order 2 problem, still using the same kind of interpolation. Our proof technique straightforwardly generalizes and allows to rigorously obtain the mutual information at any order in a recursive way.
△ Less
Submitted 27 November, 2018; v1 submitted 29 September, 2017;
originally announced September 2017.
-
Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models
Authors:
Jean Barbier,
Florent Krzakala,
Nicolas Macris,
Léo Miolane,
Lenka Zdeborová
Abstract:
Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal es…
▽ More
Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Non-rigorous predictions for the optimal errors existed for special cases of GLMs, e.g. for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance, and locate the associated sharp phase transitions separating learnable and non-learnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multi-purpose algorithms. This paper is divided in two parts that can be read independently: The first part (main part) presents the model and main results, discusses some applications and sketches the main ideas of the proof. The second part (supplementary informations) is much more detailed and provides more examples as well as all the proofs.
△ Less
Submitted 1 November, 2018; v1 submitted 10 August, 2017;
originally announced August 2017.
-
The adaptive interpolation method: A simple scheme to prove replica formulas in Bayesian inference
Authors:
Jean Barbier,
Nicolas Macris
Abstract:
In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relativel…
▽ More
In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.
△ Less
Submitted 27 October, 2018; v1 submitted 8 May, 2017;
originally announced May 2017.
-
I-MMSE relations in random linear estimation and a sub-extensive interpolation method
Authors:
Jean Barbier,
Nicolas Macris
Abstract:
Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. T…
▽ More
Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. The main technical ingredient is a new interpolation method called "sub-extensive interpolation method". We use it to provide a new proof of an I-MMSE relation recently found by Reeves and Pfister [1] when the measurement rate is varied. Our proof makes it clear that this relation is intimately related to another I-MMSE relation also recently proved in [2]. One can directly verify that the identity relating the two types of variation of mutual information is indeed consistent with the one letter replica symmetric formula for the mutual information, first derived by Tanaka [3] for binary signals, and recently proved in more generality in [1,2,4,5] (by independent methods). However our proof is independent of any knowledge of Tanaka's formula.
△ Less
Submitted 13 April, 2017;
originally announced April 2017.
-
Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation
Authors:
Jean Barbier,
Nicolas Macris,
Mohamad Dia,
Florent Krzakala
Abstract:
We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these consid…
▽ More
We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in a proper limit the mutual information associated to such systems is the same as the one of uncoupled linear random Gaussian estimation.
△ Less
Submitted 28 August, 2020; v1 submitted 20 January, 2017;
originally announced January 2017.
-
Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula
Authors:
Jean Barbier,
Mohamad Dia,
Nicolas Macris,
Florent Krzakala,
Thibault Lesieur,
Lenka Zdeborova
Abstract:
Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows…
▽ More
Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available.
△ Less
Submitted 13 June, 2016;
originally announced June 2016.
-
Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels
Authors:
Jean Barbier,
Mohamad Dia,
Nicolas Macris
Abstract:
We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in…
▽ More
We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in the large input alphabet size limit: i) the GAMP algorithmic threshold of the underlying (or uncoupled) code ensemble is simply expressed as a Fisher information; ii) the potential threshold tends to Shannon's capacity. Although we focus on coding for sake of coherence with our previous results, the framework and methods are very general and hold for a wide class of generalized estimation problems with random linear mixing.
△ Less
Submitted 15 March, 2016;
originally announced March 2016.
-
Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes
Authors:
Jean Barbier,
Mohamad Dia,
Nicolas Macris
Abstract:
Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding a…
▽ More
Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding are used, without need of power allocation. In this note we prove that state evolution (which tracks message passing) indeed saturates the potential threshold of the underlying code ensemble, which approaches in a proper limit the optimal threshold. Our proof uses ideas developed in the theory of low-density parity-check codes and compressive sensing.
△ Less
Submitted 6 March, 2016;
originally announced March 2016.
-
Error correcting codes and spatial coupling
Authors:
Rafah El-Khatib,
Jean Barbier,
Ayaka Sakata,
Rüdiger Urbanke
Abstract:
These are notes from the lecture of Rüdiger Urbanke given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013. The school was organized by Florent Krzakala from UPMC and ENS Paris, Federico Ricci-Tersenghi from La Sapienza Roma, Lenka Zdeborovà f…
▽ More
These are notes from the lecture of Rüdiger Urbanke given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013. The school was organized by Florent Krzakala from UPMC and ENS Paris, Federico Ricci-Tersenghi from La Sapienza Roma, Lenka Zdeborovà from CEA Saclay and CNRS, and Riccardo Zecchina from Politecnico Torino. The first three sections cover the basics of polar codes and low density parity check codes. In the last three sections, we see how the spatial coupling helps belief propagation decoding.
△ Less
Submitted 25 September, 2014;
originally announced September 2014.
-
Replica Analysis and Approximate Message Passing Decoder for Superposition Codes
Authors:
Jean Barbier,
Florent Krzakala
Abstract:
Superposition codes are efficient for the Additive White Gaussian Noise channel. We provide here a replica analysis of the performances of these codes for large signals. We also consider a Bayesian Approximate Message Passing decoder based on a belief-propagation approach, and discuss its performance using the density evolution technic. Our main findings are 1) for the sizes we can access, the mes…
▽ More
Superposition codes are efficient for the Additive White Gaussian Noise channel. We provide here a replica analysis of the performances of these codes for large signals. We also consider a Bayesian Approximate Message Passing decoder based on a belief-propagation approach, and discuss its performance using the density evolution technic. Our main findings are 1) for the sizes we can access, the message-passing decoder outperforms other decoders studied in the literature 2) its performance is limited by a sharp phase transition and 3) while these codes reach capacity as $B$ (a crucial parameter in the code) increases, the performance of the message passing decoder worsen as the phase transition goes to lower rates.
△ Less
Submitted 17 April, 2014; v1 submitted 31 March, 2014;
originally announced March 2014.
-
Approximate message-passing with spatially coupled structured operators, with applications to compressed sensing and sparse superposition codes
Authors:
Jean Barbier,
Christophe Schülke,
Florent Krzakala
Abstract:
We study the behavior of Approximate Message-Passing, a solver for linear sparse estimation problems such as compressed sensing, when the i.i.d matrices -for which it has been specifically designed- are replaced by structured operators, such as Fourier and Hadamard ones. We show empirically that after proper randomization, the structure of the operators does not significantly affect the performanc…
▽ More
We study the behavior of Approximate Message-Passing, a solver for linear sparse estimation problems such as compressed sensing, when the i.i.d matrices -for which it has been specifically designed- are replaced by structured operators, such as Fourier and Hadamard ones. We show empirically that after proper randomization, the structure of the operators does not significantly affect the performances of the solver. Furthermore, for some specially designed spatially coupled operators, this allows a computationally fast and memory efficient reconstruction in compressed sensing up to the information-theoretical limit. We also show how this approach can be applied to sparse superposition codes, allowing the Approximate Message-Passing decoder to perform at large rates for moderate block length.
△ Less
Submitted 28 March, 2015; v1 submitted 5 December, 2013;
originally announced December 2013.
-
The hard-core model on random graphs revisited
Authors:
Jean Barbier,
Florent Krzakala,
Lenka Zdeborová,
Pan Zhang
Abstract:
We revisit the classical hard-core model, also known as independent set and dual to vertex cover problem, where one puts particles with a first-neighbor hard-core repulsion on the vertices of a random graph. Although the case of random graphs with small and very large average degrees respectively are quite well understood, they yield qualitatively different results and our aim here is to reconcili…
▽ More
We revisit the classical hard-core model, also known as independent set and dual to vertex cover problem, where one puts particles with a first-neighbor hard-core repulsion on the vertices of a random graph. Although the case of random graphs with small and very large average degrees respectively are quite well understood, they yield qualitatively different results and our aim here is to reconciliate these two cases. We revisit results that can be obtained using the (heuristic) cavity method and show that it provides a closed-form conjecture for the exact density of the densest packing on random regular graphs with degree K>=20, and that for K>16 the nature of the phase transition is the same as for large K. This also shows that the hard-code model is the simplest mean-field lattice model for structural glasses and jamming.
△ Less
Submitted 6 September, 2013; v1 submitted 18 June, 2013;
originally announced June 2013.
-
Compressed Sensing of Approximately-Sparse Signals: Phase Transitions and Optimal Reconstruction
Authors:
Jean Barbier,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Compressed sensing is designed to measure sparse signals directly in a compressed form. However, most signals of interest are only "approximately sparse", i.e. even though the signal contains only a small fraction of relevant (large) components the other components are not strictly equal to zero, but are only close to zero. In this paper we model the approximately sparse signal with a Gaussian dis…
▽ More
Compressed sensing is designed to measure sparse signals directly in a compressed form. However, most signals of interest are only "approximately sparse", i.e. even though the signal contains only a small fraction of relevant (large) components the other components are not strictly equal to zero, but are only close to zero. In this paper we model the approximately sparse signal with a Gaussian distribution of small components, and we study its compressed sensing with dense random matrices. We use replica calculations to determine the mean-squared error of the Bayes-optimal reconstruction for such signals, as a function of the variance of the small components, the density of large components and the measurement rate. We then use the G-AMP algorithm and we quantify the region of parameters for which this algorithm achieves optimality (for large systems). Finally, we show that in the region where the GAMP for the homogeneous measurement matrices is not optimal, a special "seeding" design of a spatially-coupled measurement matrix allows to restore optimality.
△ Less
Submitted 9 July, 2012;
originally announced July 2012.