Search | arXiv e-print repository

Adaptive collaboration for online personalized distributed learning with heterogeneous clients

Authors: Constantin Philippenko, Batiste Le Bars, Kevin Scaman, Laurent Massoulié

Abstract: We study the problem of online personalized decentralized learning with $N$ statistically heterogeneous clients collaborating to accelerate local training. An important challenge in this setting is to select relevant collaborators to reduce gradient variance while mitigating the introduced bias. To tackle this, we introduce a gradient-based collaboration criterion, allowing each client to dynamica… ▽ More We study the problem of online personalized decentralized learning with $N$ statistically heterogeneous clients collaborating to accelerate local training. An important challenge in this setting is to select relevant collaborators to reduce gradient variance while mitigating the introduced bias. To tackle this, we introduce a gradient-based collaboration criterion, allowing each client to dynamically select peers with similar gradients during the optimization process. Our criterion is motivated by a refined and more general theoretical analysis of the All-for-one algorithm, proved to be optimal in Even et al. (2022) for an oracle collaboration scheme. We derive excess loss upper-bounds for smooth objective functions, being either strongly convex, non-convex, or satisfying the Polyak-Lojasiewicz condition; our analysis reveals that the algorithm acts as a variance reduction method where the speed-up depends on a sufficient variance. We put forward two collaboration methods instantiating the proposed general schema; and we show that one variant preserves the optimality of All-for-one. We validate our results with experiments on synthetic and real datasets. △ Less

Submitted 9 July, 2025; originally announced July 2025.

Comments: 18 pages

arXiv:2506.21440 [pdf, ps, other]

Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform

Authors: Maxime Leiber, Yosra Marnissi, Axel Barrau, Sylvain Meignen, Laurent Massoulié

Abstract: The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the li… ▽ More The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: DSTFT, STFT, spectrogram, time-frequency, IEEE Transactions on Signal Processing, 10 pages

arXiv:2504.02299 [pdf, other]

Asymmetric graph alignment and the phase transition for asymmetric tree correlation testing

Authors: Jakob Maier, Laurent Massoulié

Abstract: Graph alignment - identifying node correspondences between two graphs - is a fundamental problem with applications in network analysis, biology, and privacy research. While substantial progress has been made in aligning correlated Erdős-Rényi graphs under symmetric settings, real-world networks often exhibit asymmetry in both node numbers and edge densities. In this work, we introduce a novel fram… ▽ More Graph alignment - identifying node correspondences between two graphs - is a fundamental problem with applications in network analysis, biology, and privacy research. While substantial progress has been made in aligning correlated Erdős-Rényi graphs under symmetric settings, real-world networks often exhibit asymmetry in both node numbers and edge densities. In this work, we introduce a novel framework for asymmetric correlated Erdős-Rényi graphs, generalizing existing models to account for these asymmetries. We conduct a rigorous theoretical analysis of graph alignment in the sparse regime, where local neighborhoods exhibit tree-like structures. Our approach leverages tree correlation testing as the central tool in our polynomial-time algorithm, MPAlign, which achieves one-sided partial alignment under certain conditions. A key contribution of our work is characterizing these conditions under which asymmetric tree correlation testing is feasible: If two correlated graphs $G$ and $G'$ have average degrees $λs$ and $λs'$ respectively, where $λ$ is their common density and $s,s'$ are marginal correlation parameters, their tree neighborhoods can be aligned if $ss' > α$, where $α$ denotes Otter's constant and $λ$ is supposed large enough. The feasibility of this tree comparison problem undergoes a sharp phase transition since $ss' \leq α$ implies its impossibility. These new results on tree correlation testing allow us to solve a class of random subgraph isomorphism problems, resolving an open problem in the field. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: Submitted

arXiv:2503.05323 [pdf, ps, other]

Graph Alignment via Birkhoff Relaxation

Authors: Sushil Mahavir Varma, Irène Waldspurger, Laurent Massoulié

Abstract: We consider the graph alignment problem, wherein the objective is to find a vertex correspondence between two graphs that maximizes the edge overlap. The graph alignment problem is an instance of the quadratic assignment problem (QAP), known to be NP-hard in the worst case even to approximately solve. In this paper, we analyze Birkhoff relaxation, a tight convex relaxation of QAP, and present theo… ▽ More We consider the graph alignment problem, wherein the objective is to find a vertex correspondence between two graphs that maximizes the edge overlap. The graph alignment problem is an instance of the quadratic assignment problem (QAP), known to be NP-hard in the worst case even to approximately solve. In this paper, we analyze Birkhoff relaxation, a tight convex relaxation of QAP, and present theoretical guarantees on its performance when the inputs follow the Gaussian Wigner Model. More specifically, the weighted adjacency matrices are correlated Gaussian Orthogonal Ensemble with correlation $1/\sqrt{1+σ^2}$. Denote the optimal solutions of the QAP and Birkhoff relaxation by $Π^\star$ and $X^\star$ respectively. We show that $\|X^\star-Π^\star\|_F^2 = o(n)$ when $σ= o(n^{-1.25})$ and $\|X^\star-Π^\star\|_F^2 = Ω(n)$ when $σ= Ω(n^{-0.5})$. Thus, the optimal solution $X^\star$ transitions from a small perturbation of $Π^\star$ for small $σ$ to being well separated from $Π^\star$ as $σ$ becomes larger than $n^{-0.5}$. This result allows us to guarantee that simple rounding procedures on $X^\star$ align $1-o(1)$ fraction of vertices correctly whenever $σ= o(n^{-1.25})$. This condition on $σ$ to ensure the success of the Birkhoff relaxation is state-of-the-art. △ Less

Submitted 7 March, 2025; originally announced March 2025.

arXiv:2501.18975 [pdf, ps, other]

Meta-learning of shared linear representations beyond well-specified linear regression

Authors: Mathieu Even, Laurent Massoulié

Abstract: Motivated by multi-task and meta-learning approaches, we consider the problem of learning structure shared by tasks or users, such as shared low-rank representations or clustered structures. While all previous works focus on well-specified linear regression, we consider more general convex objectives, where the structural low-rank and cluster assumptions are expressed on the optima of each functio… ▽ More Motivated by multi-task and meta-learning approaches, we consider the problem of learning structure shared by tasks or users, such as shared low-rank representations or clustered structures. While all previous works focus on well-specified linear regression, we consider more general convex objectives, where the structural low-rank and cluster assumptions are expressed on the optima of each function. We show that under mild assumptions such as \textit{Hessian concentration} and \textit{noise concentration at the optimum}, rank and clustered regularized estimators recover such structure, provided the number of samples per task and the number of tasks are large enough. We then study the problem of recovering the subspace in which all the solutions lie, in the setting where there is only a single sample per task: we show that in that case, the rank-constrained estimator can recover the subspace, but that the number of tasks needs to scale exponentially large with the dimension of the subspace. Finally, we provide a polynomial-time algorithm via nuclear norm constraints for learning a shared linear representation in the context of convex learning objectives. △ Less

Submitted 13 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

arXiv:2409.08771 [pdf, other]

In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting

Authors: Constantin Philippenko, Kevin Scaman, Laurent Massoulié

Abstract: We analyze a distributed algorithm to compute a low-rank matrix factorization on $N$ clients, each holding a local dataset $\mathbf{S}^i \in \mathbb{R}^{n_i \times d}$, mathematically, we seek to solve $min_{\mathbf{U}^i \in \mathbb{R}^{n_i\times r}, \mathbf{V}\in \mathbb{R}^{d \times r} } \frac{1}{2} \sum_{i=1}^N \|\mathbf{S}^i - \mathbf{U}^i \mathbf{V}^\top\|^2_{\text{F}}$. Considering a power i… ▽ More We analyze a distributed algorithm to compute a low-rank matrix factorization on $N$ clients, each holding a local dataset $\mathbf{S}^i \in \mathbb{R}^{n_i \times d}$, mathematically, we seek to solve $min_{\mathbf{U}^i \in \mathbb{R}^{n_i\times r}, \mathbf{V}\in \mathbb{R}^{d \times r} } \frac{1}{2} \sum_{i=1}^N \|\mathbf{S}^i - \mathbf{U}^i \mathbf{V}^\top\|^2_{\text{F}}$. Considering a power initialization of $\mathbf{V}$, we rewrite the previous smooth non-convex problem into a smooth strongly-convex problem that we solve using a parallel Nesterov gradient descent potentially requiring a single step of communication at the initialization step. For any client $i$ in $\{1, \dots, N\}$, we obtain a global $\mathbf{V}$ in $\mathbb{R}^{d \times r}$ common to all clients and a local variable $\mathbf{U}^i$ in $\mathbb{R}^{n_i \times r}$. We provide a linear rate of convergence of the excess loss which depends on $σ_{\max} / σ_{r}$, where $σ_{r}$ is the $r^{\mathrm{th}}$ singular value of the concatenation $\mathbf{S}$ of the matrices $(\mathbf{S}^i)_{i=1}^N$. This result improves the rates of convergence given in the literature, which depend on $σ_{\max}^2 / σ_{\min}^2$. We provide an upper bound on the Frobenius-norm error of reconstruction under the power initialization strategy. We complete our analysis with experiments on both synthetic and real data. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2405.14532 [pdf, other]

Aligning Embeddings and Geometric Random Graphs: Informational Results and Computational Approaches for the Procrustes-Wasserstein Problem

Authors: Mathieu Even, Luca Ganassali, Jakob Maier, Laurent Massoulié

Abstract: The Procrustes-Wasserstein problem consists in matching two high-dimensional point clouds in an unsupervised setting, and has many applications in natural language processing and computer vision. We consider a planted model with two datasets $X,Y$ that consist of $n$ datapoints in $\mathbb{R}^d$, where $Y$ is a noisy version of $X$, up to an orthogonal transformation and a relabeling of the data p… ▽ More The Procrustes-Wasserstein problem consists in matching two high-dimensional point clouds in an unsupervised setting, and has many applications in natural language processing and computer vision. We consider a planted model with two datasets $X,Y$ that consist of $n$ datapoints in $\mathbb{R}^d$, where $Y$ is a noisy version of $X$, up to an orthogonal transformation and a relabeling of the data points. This setting is related to the graph alignment problem in geometric models. In this work, we focus on the euclidean transport cost between the point clouds as a measure of performance for the alignment. We first establish information-theoretic results, in the high ($d \gg \log n$) and low ($d \ll \log n$) dimensional regimes. We then study computational aspects and propose the Ping-Pong algorithm, alternatively estimating the orthogonal transformation and the relabeling, initialized via a Franke-Wolfe convex relaxation. We give sufficient conditions for the method to retrieve the planted signal after one single step. We provide experimental results to compare the proposed approach with the state-of-the-art method of Grave et al. (2019). △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 28 pages, 1 figure. Comments are most welcome!

arXiv:2404.09536 [pdf, other]

doi 10.56553/popets-2025-0043

Noiseless Privacy-Preserving Decentralized Learning

Authors: Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos

Abstract: Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training data. Conventional defenses such as differential privacy and secure aggregation fall short in effectively safeguarding user privacy in DL, either sacrificing model utility or efficiency. We introduce S… ▽ More Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training data. Conventional defenses such as differential privacy and secure aggregation fall short in effectively safeguarding user privacy in DL, either sacrificing model utility or efficiency. We introduce Shatter, a novel DL approach in which nodes create virtual nodes (VNs) to disseminate chunks of their full model on their behalf. This enhances privacy by (i) preventing attackers from collecting full models from other nodes, and (ii) hiding the identity of the original node that produced a given model chunk. We theoretically prove the convergence of Shatter and provide a formal analysis demonstrating how Shatter reduces the efficacy of attacks compared to when exchanging full models between nodes. We evaluate the convergence and attack resilience of Shatter with existing DL algorithms, with heterogeneous datasets, and against three standard privacy attacks. Our evaluation shows that Shatter not only renders these privacy attacks infeasible when each node operates 16 VNs but also exhibits a positive impact on model utility compared to standard DL. In summary, Shatter enhances the privacy of DL while maintaining the utility and efficiency of the model. △ Less

Submitted 12 September, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at PETS 2025

arXiv:2403.11267 [pdf, ps, other]

Barely Random Algorithms and Collective Metrical Task Systems

Authors: Romain Cosson, Laurent Massoulié

Abstract: We consider metrical task systems on general metric spaces with $n$ points, and show that any fully randomized algorithm can be turned into a randomized algorithm that uses only $2\log n$ random bits, and achieves the same competitive ratio up to a factor $2$. This provides the first order-optimal barely random algorithms for metrical task systems, i.e., which use a number of random bits that does… ▽ More We consider metrical task systems on general metric spaces with $n$ points, and show that any fully randomized algorithm can be turned into a randomized algorithm that uses only $2\log n$ random bits, and achieves the same competitive ratio up to a factor $2$. This provides the first order-optimal barely random algorithms for metrical task systems, i.e., which use a number of random bits that does not depend on the number of requests addressed to the system. We discuss implications on various aspects of online decision-making such as: distributed systems, advice complexity, and transaction costs, suggesting broad applicability. We put forward an equivalent view that we call collective metrical task systems where $k$ agents in a metrical task system team up, and suffer the average cost paid by each agent. Our results imply that such a team can be $O(\log^2 n)$-competitive as soon as $k\geq n^2$. In comparison, a single agent is always $Ω(n)$-competitive. △ Less

Submitted 7 November, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2311.01354 [pdf, ps, other]

Collective Tree Exploration via Potential Function Method

Authors: Romain Cosson, Laurent Massoulié

Abstract: We study the problem of collective tree exploration (CTE) where a team of $k$ agents is tasked to traverse all the edges of an unknown tree as fast as possible, assuming complete communication between the agents. In this paper, we present an algorithm performing collective tree exploration in only $2n/k+O(kD)$ rounds, where $n$ is the number of nodes in the tree, and $D$ is the tree depth. This le… ▽ More We study the problem of collective tree exploration (CTE) where a team of $k$ agents is tasked to traverse all the edges of an unknown tree as fast as possible, assuming complete communication between the agents. In this paper, we present an algorithm performing collective tree exploration in only $2n/k+O(kD)$ rounds, where $n$ is the number of nodes in the tree, and $D$ is the tree depth. This leads to a competitive ratio of $O(\sqrt{k})$ for collective tree exploration, the first polynomial improvement over the initial $O(k/\log(k))$ ratio of [FGKP06]. Our analysis relies on a game with robots at the leaves of a continuously growing tree, which is presented in a similar manner as the `evolving tree game' of [BCR22], though its analysis and applications differ significantly. This game extends the `tree-mining game' (TM) of [Cos23] and leads to guarantees for an asynchronous extension of collective tree exploration (ACTE). Another surprising consequence of our results is the existence of algorithms $\{A_k\}_{k\in \mathbb{N}}$ for layered tree traversal (LTT) with cost at most $2L/k+O(kD)$, where $L$ is the sum of edge lengths and $D$ is the tree depth. For the case of layered trees of width $w$ and unit edge lengths, our guarantee is thus in $O(\sqrt{w}D)$. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2311.00465 [pdf, ps, other]

Asynchronous SGD on Graphs: a Unified Framework for Asynchronous Decentralized and Federated Optimization

Authors: Mathieu Even, Anastasia Koloskova, Laurent Massoulié

Abstract: Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization. Yet, combining these two techniques together still remains a challenge. In this paper, we take a step in this direction and introduce Asynchronous SGD on Gr… ▽ More Decentralized and asynchronous communications are two popular techniques to speedup communication complexity of distributed machine learning, by respectively removing the dependency over a central orchestrator and the need for synchronization. Yet, combining these two techniques together still remains a challenge. In this paper, we take a step in this direction and introduce Asynchronous SGD on Graphs (AGRAF SGD) -- a general algorithmic framework that covers asynchronous versions of many popular algorithms including SGD, Decentralized SGD, Local SGD, FedBuff, thanks to its relaxed communication and computation assumptions. We provide rates of convergence under much milder assumptions than previous decentralized asynchronous works, while still recovering or even improving over the best know results for all the algorithms covered. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2307.04679 [pdf, ps, other]

Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles

Authors: Kevin Scaman, Mathieu Even, Batiste Le Bars, Laurent Massoulié

Abstract: In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient… ▽ More In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation. △ Less

Submitted 1 July, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 22 pages, 0 figures

arXiv:2301.13307 [pdf, other]

Breadth-First Depth-Next: Optimal Collaborative Exploration of Trees with Low Diameter

Authors: Romain Cosson, Laurent Massoulié, Laurent Viennot

Abstract: We consider the problem of collaborative tree exploration posed by Fraigniaud, Gasieniec, Kowalski, and Pelc where a team of $k$ agents is tasked to collectively go through all the edges of an unknown tree as fast as possible. Denoting by $n$ the total number of nodes and by $D$ the tree depth, the $\mathcal{O}(n/\log(k)+D)$ algorithm of Fraigniaud et al. achieves the best-known competitive ratio… ▽ More We consider the problem of collaborative tree exploration posed by Fraigniaud, Gasieniec, Kowalski, and Pelc where a team of $k$ agents is tasked to collectively go through all the edges of an unknown tree as fast as possible. Denoting by $n$ the total number of nodes and by $D$ the tree depth, the $\mathcal{O}(n/\log(k)+D)$ algorithm of Fraigniaud et al. achieves the best-known competitive ratio with respect to the cost of offline exploration which is $Θ(\max{\{2n/k,2D\}})$. Brass, Cabrera-Mora, Gasparri, and Xiao consider an alternative performance criterion, namely the additive overhead with respect to $2n/k$, and obtain a $2n/k+\mathcal{O}((D+k)^k)$ runtime guarantee. In this paper, we introduce `Breadth-First Depth-Next' (BFDN), a novel and simple algorithm that performs collaborative tree exploration in time $2n/k+\mathcal{O}(D^2\log(k))$, thus outperforming Brass et al. for all values of $(n,D)$ and being order-optimal for all trees with depth $D=o_k(\sqrt{n})$. Moreover, a recent result from Disser et al. implies that no exploration algorithm can achieve a $2n/k+\mathcal{O}(D^{2-ε})$ runtime guarantee. The dependency in $D^2$ of our bound is in this sense optimal. The proof of our result crucially relies on the analysis of an associated two-player game. We extend the guarantees of BFDN to: scenarios with limited memory and communication, adversarial setups where robots can be blocked, and exploration of classes of non-tree graphs. Finally, we provide a recursive version of BFDN with a runtime of $\mathcal{O}_\ell(n/k^{1/\ell}+\log(k) D^{1+1/\ell})$ for parameter $\ell\ge 1$, thereby improving performance for trees with large depth. △ Less

Submitted 30 January, 2023; originally announced January 2023.

arXiv:2206.05091 [pdf, other]

Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging

Authors: Edwige Cyffers, Mathieu Even, Aurélien Bellet, Laurent Massoulié

Abstract: Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that… ▽ More Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node $u$ to a node $v$ may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets. △ Less

Submitted 11 June, 2024; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: Fixed a mistake in the privacy analysis of Muffliato-GD

arXiv:2107.07623 [pdf, other]

doi 10.1214/23-AAP2020

Correlation detection in trees for planted graph alignment

Authors: Luca Ganassali, Laurent Massoulié, Marc Lelarge

Abstract: Motivated by alignment of correlated sparse random graphs, we introduce a hypothesis testing problem of deciding whether or not two random trees are correlated. We obtain sufficient conditions under which this testing is impossible or feasible. We propose MPAlign, a message-passing algorithm for graph alignment inspired by the tree correlation detection problem. We prove MPAlign to succeed in poly… ▽ More Motivated by alignment of correlated sparse random graphs, we introduce a hypothesis testing problem of deciding whether or not two random trees are correlated. We obtain sufficient conditions under which this testing is impossible or feasible. We propose MPAlign, a message-passing algorithm for graph alignment inspired by the tree correlation detection problem. We prove MPAlign to succeed in polynomial time at partial alignment whenever tree detection is feasible. As a result our analysis of tree detection reveals new ranges of parameters for which partial alignment of sparse random graphs is feasible in polynomial time. We then conjecture that graph alignment is not feasible in polynomial time when the associated tree detection problem is impossible. If true, this conjecture together with our sufficient conditions on tree detection impossibility would imply the existence of a hard phase for graph alignment, i.e. a parameter range where alignment cannot be done in polynomial time even though it is known to be feasible in non-polynomial time. △ Less

Submitted 5 December, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

Comments: 38 pages, 9 figures

Journal ref: Ann. Appl. Probab. 34 (3) 2799 - 2843, June 2024

arXiv:2106.07644 [pdf, other]

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms. △ Less

Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

arXiv:2106.03585 [pdf, other]

Asynchronous speedup in decentralized optimization

Authors: Mathieu Even, Hadrien Hendrikx, Laurent Massoulie

Abstract: In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms are heavily impacted by a few slow nodes or edges in the graph (the \emph{straggler problem}), their asynchronous counterparts are notoriously harder to parametr… ▽ More In decentralized optimization, nodes of a communication network each possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node functions. While synchronous algorithms are heavily impacted by a few slow nodes or edges in the graph (the \emph{straggler problem}), their asynchronous counterparts are notoriously harder to parametrize. Indeed, their convergence properties for networks with heterogeneous communication and computation delays have defied analysis so far. In this paper, we use a \emph{ continuized} framework to analyze asynchronous algorithms in networks with delays. Our approach yields a precise characterization of convergence time and of its dependency on heterogeneous delays in the network. Our continuized framework benefits from the best of both continuous and discrete worlds: the algorithms it applies to are based on event-driven updates. They are thus essentially discrete and hence readily implementable. Yet their analysis is essentially in continuous time, relying in part on the theory of delayed ODEs. Our algorithms moreover achieve an \emph{asynchronous speedup}: their rate of convergence is controlled by the eigengap of the network graph weighted by local delays, instead of the network-wide worst-case delay as in previous analyses. Our methods thus enjoy improved robustness to stragglers. △ Less

Submitted 1 September, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2102.04259 [pdf, ps, other]

Concentration of Non-Isotropic Random Tensors with Applications to Learning and Empirical Risk Minimization

Authors: Mathieu Even, Laurent Massoulié

Abstract: Dimension is an inherent bottleneck to some modern learning tasks, where optimization methods suffer from the size of the data. In this paper, we study non-isotropic distributions of data and develop tools that aim at reducing these dimensional costs by a dependency on an effective dimension rather than the ambient one. Based on non-asymptotic estimates of the metric entropy of ellipsoids -- that… ▽ More Dimension is an inherent bottleneck to some modern learning tasks, where optimization methods suffer from the size of the data. In this paper, we study non-isotropic distributions of data and develop tools that aim at reducing these dimensional costs by a dependency on an effective dimension rather than the ambient one. Based on non-asymptotic estimates of the metric entropy of ellipsoids -- that prove to generalize to infinite dimensions -- and on a chaining argument, our uniform concentration bounds involve an effective dimension instead of the global dimension, improving over existing results. We show the importance of taking advantage of non-isotropic properties in learning problems with the following applications: i) we improve state-of-the-art results in statistical preconditioning for communication-efficient distributed optimization, ii) we introduce a non-isotropic randomized smoothing for non-smooth optimization. Both applications cover a class of functions that encompasses empirical risk minization (ERM) for linear models. △ Less

Submitted 11 February, 2025; v1 submitted 4 February, 2021; originally announced February 2021.

MSC Class: 60E15; 60B20; 60E15; 60F10

arXiv:2102.02685 [pdf, other]

Impossibility of Partial Recovery in the Graph Alignment Problem

Authors: Luca Ganassali, Laurent Massoulié, Marc Lelarge

Abstract: Random graph alignment refers to recovering the underlying vertex correspondence between two random graphs with correlated edges. This can be viewed as an average-case and noisy version of the well-known graph isomorphism problem. For the correlated Erdös-Rényi model, we prove an impossibility result for partial recovery in the sparse regime, with constant average degree and correlation, as well a… ▽ More Random graph alignment refers to recovering the underlying vertex correspondence between two random graphs with correlated edges. This can be viewed as an average-case and noisy version of the well-known graph isomorphism problem. For the correlated Erdös-Rényi model, we prove an impossibility result for partial recovery in the sparse regime, with constant average degree and correlation, as well as a general bound on the maximal reachable overlap. Our bound is tight in the noiseless case (the graph isomorphism problem) and we conjecture that it is still tight with noise. Our proof technique relies on a careful application of the probabilistic method to build automorphisms between tree components of a subcritical Erdös-Rényi graph. △ Less

Submitted 29 June, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: 23 pages, 8 figures. Accepted for publication at COLT21

Journal ref: Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2080-2102, 2021

arXiv:2011.02379 [pdf, other]

Asynchrony and Acceleration in Gossip Algorithms

Authors: Mathieu Even, Hadrien Hendrikx, Laurent Massoulié

Abstract: This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the in… ▽ More This paper considers the minimization of a sum of smooth and strongly convex functions dispatched over the nodes of a communication network. Previous works on the subject either focus on synchronous algorithms, which can be heavily slowed down by a few slow nodes (the straggler problem), or consider a model of asynchronous operation (Boyd et al., 2006) in which adjacent nodes communicate at the instants of Poisson point processes. We have two main contributions. 1) We propose CACDM (a Continuously Accelerated Coordinate Dual Method), and for the Poisson model of asynchronous operation, we prove CACDM to converge to optimality at an accelerated convergence rate in the sense of Nesterov et Stich, 2017. In contrast, previously proposed asynchronous algorithms have not been proven to achieve such accelerated rate. While CACDM is based on discrete updates, the proof of its convergence crucially depends on a continuous time analysis. 2) We introduce a new communication scheme based on Loss-Networks, that is programmable in a fully asynchronous and decentralized way, unlike the Poisson model of asynchronous operation that does not capture essential aspects of asynchrony such as non-instantaneous communications and computations. Under this Loss-Network model of asynchrony, we establish for CDM (a Coordinate Dual Method) a rate of convergence in terms of the eigengap of the Laplacian of the graph weighted by local effective delays. We believe this eigengap to be a fundamental bottleneck for convergence rates of asynchronous optimization. Finally, we verify empirically that CACDM enjoys an accelerated convergence rate in the Loss-Network model of asynchrony. △ Less

Submitted 7 February, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

MSC Class: 68Q87; 60G55; 90-10

arXiv:2007.00533 [pdf, ps, other]

Partial Recovery in the Graph Alignment Problem

Authors: Georgina Hall, Laurent Massoulié

Abstract: In this paper, we consider the graph alignment problem, which is the problem of recovering, given two graphs, a one-to-one mapping between nodes that maximizes edge overlap. This problem can be viewed as a noisy version of the well-known graph isomorphism problem and appears in many applications, including social network deanonymization and cellular biology. Our focus here is on partial recovery,… ▽ More In this paper, we consider the graph alignment problem, which is the problem of recovering, given two graphs, a one-to-one mapping between nodes that maximizes edge overlap. This problem can be viewed as a noisy version of the well-known graph isomorphism problem and appears in many applications, including social network deanonymization and cellular biology. Our focus here is on partial recovery, i.e., we look for a one-to-one mapping which is correct on a fraction of the nodes of the graph rather than on all of them, and we assume that the two input graphs to the problem are correlated Erdős-Rényi graphs of parameters $(n,q,s)$. Our main contribution is then to give necessary and sufficient conditions on $(n,q,s)$ under which partial recovery is possible with high probability as the number of nodes $n$ goes to infinity. In particular, we show that it is possible to achieve partial recovery in the $nqs=Θ(1)$ regime under certain additional assumptions. △ Less

Submitted 13 January, 2022; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:2006.14384 [pdf, other]

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Abstract: We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensiv… ▽ More We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a $1/n$ fraction of the dataset, where $n$ is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data. △ Less

Submitted 25 June, 2020; originally announced June 2020.

arXiv:2005.14186 [pdf, other]

doi 10.5802/crmath.99

Understanding and monitoring the evolution of the Covid-19 epidemic from medical emergency calls: the example of the Paris area

Authors: Stéphane Gaubert, Marianne Akian, Xavier Allamigeon, Marin Boyet, Baptiste Colin, Théotime Grohens, Laurent Massoulié, David P. Parsons, Frédéric Adnet, Érick Chanzy, Laurent Goix, Frédéric Lapostolle, Éric Lecarpentier, Christophe Leroy, Thomas Loeb, Jean-Sébastien Marx, Caroline Télion, Laurent Tréluyer, Pierre Carli

Abstract: We portray the evolution of the Covid-19 epidemic during the crisis of March-April 2020 in the Paris area, by analyzing the medical emergency calls received by the EMS of the four central departments of this area (Centre 15 of SAMU 75, 92, 93 and 94). Our study reveals strong dissimilarities between these departments. We show that the logarithm of each epidemic observable can be approximated by a… ▽ More We portray the evolution of the Covid-19 epidemic during the crisis of March-April 2020 in the Paris area, by analyzing the medical emergency calls received by the EMS of the four central departments of this area (Centre 15 of SAMU 75, 92, 93 and 94). Our study reveals strong dissimilarities between these departments. We show that the logarithm of each epidemic observable can be approximated by a piecewise linear function of time. This allows us to distinguish the different phases of the epidemic, and to identify the delay between sanitary measures and their influence on the load of EMS. This also leads to an algorithm, allowing one to detect epidemic resurgences. We rely on a transport PDE epidemiological model, and we use methods from Perron-Frobenius theory and tropical geometry. △ Less

Submitted 20 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: Changes v1->v2: Section 7 expanded. Changes v2->v3: bibliography expanded; minor improvements and corrections

Journal ref: Comptes Rendus -- Mathématique, Volume 358, issue 7 (2020), p. 843-875

arXiv:2005.10675 [pdf, other]

An Optimal Algorithm for Decentralized Finite Sum Optimization

Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulie

Abstract: Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregati… ▽ More Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregation steps that result in communication bottlenecks. In this work, we propose an efficient \textbf{A}ccelerated \textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums named ADFS, which uses local stochastic proximal updates and decentralized communications between nodes. On $n$ machines, ADFS minimizes the objective function with $nm$ samples in the same time it takes optimal algorithms to optimize from $m$ samples on one machine. This scaling holds until a critical network size is reached, which depends on communication delays, on the number of samples $m$, and on the network topology. We give a lower bound of complexity to show that ADFS is optimal among decentralized algorithms. To derive ADFS, we first develop an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling. Then, we apply this coordinate descent algorithm to a well-chosen dual problem based on an augmented graph approach, leading to the general ADFS algorithm. We illustrate the improvement of ADFS over state-of-the-art decentralized approaches with experiments. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1905.11394

arXiv:2002.10726 [pdf, other]

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

Authors: Hadrien Hendrikx, Lin Xiao, Sebastien Bubeck, Francis Bach, Laurent Massoulie

Abstract: We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem… ▽ More We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime. △ Less

Submitted 25 February, 2020; originally announced February 2020.

arXiv:2002.01258 [pdf, other]

From tree matching to sparse graph alignment

Authors: Luca Ganassali, Laurent Massoulié

Abstract: In this paper we consider alignment of sparse graphs, for which we introduce the Neighborhood Tree Matching Algorithm (NTMA). For correlated Erdős-Rényi random graphs, we prove that the algorithm returns -- in polynomial time -- a positive fraction of correctly matched vertices, and a vanishing fraction of mismatches. This result holds with average degree of the graphs in $O(1)$ and correlation pa… ▽ More In this paper we consider alignment of sparse graphs, for which we introduce the Neighborhood Tree Matching Algorithm (NTMA). For correlated Erdős-Rényi random graphs, we prove that the algorithm returns -- in polynomial time -- a positive fraction of correctly matched vertices, and a vanishing fraction of mismatches. This result holds with average degree of the graphs in $O(1)$ and correlation parameter $s$ that can be bounded away from 1, conditions under which random graph alignment is particularly challenging. As a byproduct of the analysis we introduce a matching metric between trees and characterize it for several models of correlated random trees. These results may be of independent interest, yielding for instance efficient tests for determining whether two random trees are correlated or independent. △ Less

Submitted 18 June, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

Comments: 33 pages, 10 figures, accepted at COLT 2020. Typos corrected, some new figures, some remarks and explanations detailed, minor changes in proof of Th. 1.2

MSC Class: 68Q87

Journal ref: Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1633-1665, 2020

arXiv:1912.00231 [pdf, other]

doi 10.1017/apr.2021.31

Spectral Alignment of Correlated Gaussian matrices

Authors: Luca Ganassali, Marc Lelarge, Laurent Massoulié

Abstract: In this paper we analyze a simple spectral method (EIG1) for the problem of matrix alignment, consisting in aligning their leading eigenvectors: given two matrices $A$ and $B$, we compute $v_1$ and $v'_1$ two corresponding leading eigenvectors. The algorithm returns the permutation $\hatπ$ such that the rank of coordinate $\hatπ(i)$ in $v_1$ and that of coordinate $i$ in $v'_1$ (up to the sign of… ▽ More In this paper we analyze a simple spectral method (EIG1) for the problem of matrix alignment, consisting in aligning their leading eigenvectors: given two matrices $A$ and $B$, we compute $v_1$ and $v'_1$ two corresponding leading eigenvectors. The algorithm returns the permutation $\hatπ$ such that the rank of coordinate $\hatπ(i)$ in $v_1$ and that of coordinate $i$ in $v'_1$ (up to the sign of $v'_1$) are the same. We consider a model of weighted graphs where the adjacency matrix $A$ belongs to the Gaussian Orthogonal Ensemble (GOE) of size $N \times N$, and $B$ is a noisy version of $A$ where all nodes have been relabeled according to some planted permutation $π$, namely $B= Π^T (A+σH) Π$, where $Π$ is the permutation matrix associated with $π$ and $H$ is an independent copy of $A$. We show the following zero-one law: with high probability, under the condition $σN^{7/6+ε} \to 0$ for some $ε>0$, EIG1 recovers all but a vanishing part of the underlying permutation $π$, whereas if $σN^{7/6-ε} \to \infty$, this method cannot recover more than $o(N)$ correct matches. This result gives an understanding of the simplest and fastest spectral method for matrix alignment (or complete weighted graph alignment), and involves proof methods and techniques which could be of independent interest. △ Less

Submitted 11 May, 2021; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: 26 pages, 4 figures. Figures and paper organization updated, typos corrected. Remark 4.2. added

Journal ref: Advances in Applied Probability, Volume 54, Issue 1, March 2022 , pp. 279 - 310

arXiv:1905.11394 [pdf, other]

An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums

Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulie

Abstract: Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregati… ▽ More Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregation steps that result in communication bottlenecks. In this work, we propose an efficient \textbf{A}ccelerated \textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums named ADFS, which uses local stochastic proximal updates and randomized pairwise communications between nodes. On $n$ machines, ADFS learns from $nm$ samples in the same time it takes optimal algorithms to learn from $m$ samples on one machine. This scaling holds until a critical network size is reached, which depends on communication delays, on the number of samples $m$, and on the network topology. We provide a theoretical analysis based on a novel augmented graph approach combined with a precise evaluation of synchronization times and an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art decentralized approaches with experiments. △ Less

Submitted 12 June, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

Comments: Code available in source files. arXiv admin note: substantial text overlap with arXiv:1901.09865

arXiv:1901.09865 [pdf, other]

Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite Sums

Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Abstract: In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves finite sums with $m$ components. ADFS converges linearly when local functions are smooth, and matches the rates of the best known finite sum algorithms when execut… ▽ More In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves finite sums with $m$ components. ADFS converges linearly when local functions are smooth, and matches the rates of the best known finite sum algorithms when executed on a single machine. On several machines, ADFS enjoys a $O (\sqrt{n})$ or $O(n)$ speed-up depending on the leading complexity term as long as the diameter of the network is not too big with respect to $m$. This also leads to a $\sqrt{m}$ speed-up over state-of-the-art distributed batch methods, which is the expected speed-up for finite sum algorithms. In terms of communication times and network parameters, ADFS scales as well as optimal distributed batch algorithms. As a side contribution, we give a generalized version of the accelerated proximal coordinate gradient algorithm using arbitrary sampling that we apply to a well-chosen dual problem to derive ADFS. Yet, ADFS uses primal proximal updates that only require solving one-dimensional problems for many standard machine learning applications. Finally, ADFS can be formulated for non-smooth objectives with equally good scaling properties. We illustrate the improvement of ADFS over state-of-the-art approaches with simulations. △ Less

Submitted 17 July, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

arXiv:1811.05808 [pdf, ps, other]

Robustness of spectral methods for community detection

Authors: Ludovic Stephan, Laurent Massoulié

Abstract: The present work is concerned with community detection. Specifically, we consider a random graph drawn according to the stochastic block model~: its vertex set is partitioned into blocks, or communities, and edges are placed randomly and independently of each other with probability depending only on the communities of their two endpoints. In this context, our aim is to recover the community labels… ▽ More The present work is concerned with community detection. Specifically, we consider a random graph drawn according to the stochastic block model~: its vertex set is partitioned into blocks, or communities, and edges are placed randomly and independently of each other with probability depending only on the communities of their two endpoints. In this context, our aim is to recover the community labels better than by random guess, based only on the observation of the graph. In the sparse case, where edge probabilities are in $O(1/n)$, we introduce a new spectral method based on the distance matrix $D^{(l)}$, where $D^{(l)}_{ij} = 1$ iff the graph distance between $i$ and $j$, noted $d(i, j)$ is equal to $\ell$. We show that when $\ell \sim c\log(n)$ for carefully chosen $c$, the eigenvectors associated to the largest eigenvalues of $D^{(l)}$ provide enough information to perform non-trivial community recovery with high probability, provided we are above the so-called Kesten-Stigum threshold. This yields an efficient algorithm for community detection, since computation of the matrix $D^{(l)}$ can be done in $O(n^{1+κ})$ operations for a small constant $κ$. We then study the sensitivity of the eigendecomposition of $D^{(l)}$ when we allow an adversarial perturbation of the edges of $G$. We show that when the considered perturbation does not affect more than $O(n^\varepsilon)$ vertices for some small $\varepsilon > 0$, the highest eigenvalues and their corresponding eigenvectors incur negligible perturbations, which allows us to still perform efficient recovery. △ Less

Submitted 25 June, 2019; v1 submitted 14 November, 2018; originally announced November 2018.

arXiv:1810.02660 [pdf, other]

Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives

Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

Abstract: In this paper, we study the problem of minimizing a sum of smooth and strongly convex functions split over the nodes of a network in a decentralized fashion. We propose the algorithm $ESDACD$, a decentralized accelerated algorithm that only requires local synchrony. Its rate depends on the condition number $κ$ of the local functions as well as the network topology and delays. Under mild assumption… ▽ More In this paper, we study the problem of minimizing a sum of smooth and strongly convex functions split over the nodes of a network in a decentralized fashion. We propose the algorithm $ESDACD$, a decentralized accelerated algorithm that only requires local synchrony. Its rate depends on the condition number $κ$ of the local functions as well as the network topology and delays. Under mild assumptions on the topology of the graph, $ESDACD$ takes a time $O((τ_{\max} + Δ_{\max})\sqrt{κ/γ}\ln(ε^{-1}))$ to reach a precision $ε$ where $γ$ is the spectral gap of the graph, $τ_{\max}$ the maximum communication delay and $Δ_{\max}$ the maximum computation time. Therefore, it matches the rate of $SSDA$, which is optimal when $τ_{\max} = Ω\left(Δ_{\max}\right)$. Applying $ESDACD$ to quadratic local functions leads to an accelerated randomized gossip algorithm of rate $O( \sqrt{θ_{\rm gossip}/n})$ where $θ_{\rm gossip}$ is the rate of the standard randomized gossip. To the best of our knowledge, it is the first asynchronous gossip algorithm with a provably improved rate of convergence of the second moment of the error. We illustrate these results with experiments in idealized settings. △ Less

Submitted 22 February, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

arXiv:1806.07562 [pdf, other]

doi 10.1109/TNSE.2019.2913949

Efficient inference in stochastic block models with vertex labels

Authors: Clara Stegehuis, Laurent Massoulié

Abstract: We study the stochastic block model with two communities where vertices contain side information in the form of a vertex label. These vertex labels may have arbitrary label distributions, depending on the community memberships. We analyze a linearized version of the popular belief propagation algorithm. We show that this algorithm achieves the highest accuracy possible whenever a certain function… ▽ More We study the stochastic block model with two communities where vertices contain side information in the form of a vertex label. These vertex labels may have arbitrary label distributions, depending on the community memberships. We analyze a linearized version of the popular belief propagation algorithm. We show that this algorithm achieves the highest accuracy possible whenever a certain function of the network parameters has a unique fixed point. Whenever this function has multiple fixed points, the belief propagation algorithm may not perform optimally. We show that increasing the information in the vertex labels may reduce the number of fixed points and hence lead to optimality of belief propagation. △ Less

Submitted 17 August, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

arXiv:1706.08561 [pdf, ps, other]

Group Synchronization on Grids

Authors: Emmanuel Abbe, Laurent Massoulie, Andrea Montanari, Allan Sly, Nikhil Srivastava

Abstract: Group synchronization requires to estimate unknown elements $(θ_v)_{v\in V}$ of a compact group ${\mathfrak G}$ associated to the vertices of a graph $G=(V,E)$, using noisy observations of the group differences associated to the edges. This model is relevant to a variety of applications ranging from structure from motion in computer vision to graph localization and positioning, to certain families… ▽ More Group synchronization requires to estimate unknown elements $(θ_v)_{v\in V}$ of a compact group ${\mathfrak G}$ associated to the vertices of a graph $G=(V,E)$, using noisy observations of the group differences associated to the edges. This model is relevant to a variety of applications ranging from structure from motion in computer vision to graph localization and positioning, to certain families of community detection problems. We focus on the case in which the graph $G$ is the $d$-dimensional grid. Since the unknowns ${\boldsymbol θ}_v$ are only determined up to a global action of the group, we consider the following weak recovery question. Can we determine the group difference $θ_u^{-1}θ_v$ between far apart vertices $u, v$ better than by random guessing? We prove that weak recovery is possible (provided the noise is small enough) for $d\ge 3$ and, for certain finite groups, for $d\ge 2$. Viceversa, for some continuous groups, we prove that weak recovery is impossible for $d=2$. Finally, for strong enough noise, weak recovery is always impossible. △ Less

Submitted 26 June, 2017; originally announced June 2017.

Comments: 21 pages

arXiv:1705.03427 [pdf, ps, other]

Rapid Mixing of Local Graph Dynamics

Authors: Laurent Massoulié, Rémi Varloot

Abstract: Graph dynamics arise naturally in many contexts. For instance in peer-to-peer networks, a participating peer may replace an existing connection with one neighbour by a new connection with a neighbour's neighbour. Several such local rewiring rules have been proposed to ensure that peer-to-peer networks achieve good connectivity properties (e.g. high expansion) in equilibrium. However it has remaine… ▽ More Graph dynamics arise naturally in many contexts. For instance in peer-to-peer networks, a participating peer may replace an existing connection with one neighbour by a new connection with a neighbour's neighbour. Several such local rewiring rules have been proposed to ensure that peer-to-peer networks achieve good connectivity properties (e.g. high expansion) in equilibrium. However it has remained an open question whether there existed such rules that also led to fast convergence to equilibrium. In this work we provide an affirmative answer: We exhibit a local rewiring rule that converges to equilibrium after each participating node has undergone only a number of rewirings that is poly-logarithmic in the system size. The proof involves consideration of the whole isoperimetric profile of the graph, and may be of independent interest. △ Less

Submitted 9 May, 2017; originally announced May 2017.

arXiv:1703.00674 [pdf, other]

Adaptive Matching for Expert Systems with Uncertain Task Types

Authors: Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic

Abstract: A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about th… ▽ More A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about the parties involved is usually limited. To address this challenge, we develop a model of a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches. In our model the tasks arrive online while the experts are fixed and constrained by a finite service capacity. For this model, we characterize the maximum task resolution throughput a platform can achieve. We show that the natural greedy approaches where each expert is assigned a task most suitable to her skill is suboptimal, as it does not internalize the above externality. We develop a throughput optimal backpressure algorithm which does so by accounting for the `congestion' among different task types. Finally, we validate our model and confirm our theoretical findings with data-driven simulations via logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics. △ Less

Submitted 26 October, 2018; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: A part of it presented at Allerton Conference 2017, 18 pages

arXiv:1609.02487 [pdf, ps, other]

Non-Backtracking Spectrum of Degree-Corrected Stochastic Block Models

Authors: Lennart Gulikers, Marc Lelarge, Laurent Massoulié

Abstract: Motivated by community detection, we characterise the spectrum of the non-backtracking matrix $B$ in the Degree-Corrected Stochastic Block Model. Specifically, we consider a random graph on $n$ vertices partitioned into two equal-sized clusters. The vertices have i.i.d. weights $\{ φ_u \}_{u=1}^n$ with second moment $Φ^{(2)}$. The intra-cluster connection probability for vertices $u$ and $v$ is… ▽ More Motivated by community detection, we characterise the spectrum of the non-backtracking matrix $B$ in the Degree-Corrected Stochastic Block Model. Specifically, we consider a random graph on $n$ vertices partitioned into two equal-sized clusters. The vertices have i.i.d. weights $\{ φ_u \}_{u=1}^n$ with second moment $Φ^{(2)}$. The intra-cluster connection probability for vertices $u$ and $v$ is $\frac{φ_u φ_v}{n}a$ and the inter-cluster connection probability is $\frac{φ_u φ_v}{n}b$. We show that with high probability, the following holds: The leading eigenvalue of the non-backtracking matrix $B$ is asymptotic to $ρ= \frac{a+b}{2} Φ^{(2)}$. The second eigenvalue is asymptotic to $μ_2 = \frac{a-b}{2} Φ^{(2)}$ when $μ_2^2 > ρ$, but asymptotically bounded by $\sqrtρ$ when $μ_2^2 \leq ρ$. All the remaining eigenvalues are asymptotically bounded by $\sqrtρ$. As a result, a clustering positively-correlated with the true communities can be obtained based on the second eigenvector of $B$ in the regime where $μ_2^2 > ρ.$ In a previous work we obtained that detection is impossible when $μ_2^2 < ρ,$ meaning that there occurs a phase-transition in the sparse regime of the Degree-Corrected Stochastic Block Model. As a corollary, we obtain that Degree-Corrected Erdős-Rényi graphs asymptotically satisfy the graph Riemann hypothesis, a quasi-Ramanujan property. A by-product of our proof is a weak law of large numbers for local-functionals on Degree-Corrected Stochastic Block Models, which could be of independent interest. △ Less

Submitted 18 May, 2017; v1 submitted 8 September, 2016; originally announced September 2016.

arXiv:1603.00544 [pdf, other]

On the capacity of information processing systems

Authors: Laurent Massoulie, Kuang Xu

Abstract: We propose and analyze a family of information processing systems, where a finite set of experts or servers are employed to extract information about a stream of incoming jobs. Each job is associated with a hidden label drawn from some prior distribution. An inspection by an expert produces a noisy outcome that depends both on the job's hidden label and the type of the expert, and occupies the exp… ▽ More We propose and analyze a family of information processing systems, where a finite set of experts or servers are employed to extract information about a stream of incoming jobs. Each job is associated with a hidden label drawn from some prior distribution. An inspection by an expert produces a noisy outcome that depends both on the job's hidden label and the type of the expert, and occupies the expert for a finite time duration. A decision maker's task is to dynamically assign inspections so that the resulting outcomes can be used to accurately recover the labels of all jobs, while keeping the system stable. Among our chief motivations are applications in crowd-sourcing, diagnostics, and experiment designs, where one wishes to efficiently learn the nature of a large number of items, using a finite pool of computational resources or human agents. We focus on the capacity of such an information processing system. Given a level of accuracy guarantee, we ask how many experts are needed in order to stabilize the system, and through what inspection architecture. Our main result provides an adaptive inspection policy that is asymptotically optimal in the following sense: the ratio between the required number of experts under our policy and the theoretical optimal converges to one, as the probability of error in label recovery tends to zero. △ Less

Submitted 30 May, 2016; v1 submitted 1 March, 2016; originally announced March 2016.

arXiv:1601.06838 [pdf, other]

doi 10.1109/INFOCOM.2016.7524445

A Utility Optimization Approach to Network Cache Design

Authors: Mostafa Dehghan, Laurent Massoulie, Don Towsley, Daniel Menasche, Y. C. Tay

Abstract: In any caching system, the admission and eviction policies determine which contents are added and removed from a cache when a miss occurs. Usually, these policies are devised so as to mitigate staleness and increase the hit probability. Nonetheless, the utility of having a high hit probability can vary across contents. This occurs, for instance, when service level agreements must be met, or if cer… ▽ More In any caching system, the admission and eviction policies determine which contents are added and removed from a cache when a miss occurs. Usually, these policies are devised so as to mitigate staleness and increase the hit probability. Nonetheless, the utility of having a high hit probability can vary across contents. This occurs, for instance, when service level agreements must be met, or if certain contents are more difficult to obtain than others. In this paper, we propose utility-driven caching, where we associate with each content a utility, which is a function of the corresponding content hit probability. We formulate optimization problems where the objectives are to maximize the sum of utilities over all contents. These problems differ according to the stringency of the cache capacity constraint. Our framework enables us to reverse engineer classical replacement policies such as LRU and FIFO, by computing the utility functions that they maximize. We also develop online algorithms that can be used by service providers to implement various caching policies based on arbitrary utility functions. △ Less

Submitted 25 January, 2016; originally announced January 2016.

Comments: IEEE INFOCOM 2016

arXiv:1511.00546 [pdf, ps, other]

An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model

Authors: Lennart Gulikers, Marc Lelarge, Laurent Massoulié

Abstract: We consider the Degree-Corrected Stochastic Block Model (DC-SBM): a random graph on $n$ nodes, having i.i.d. weights $(φ_u)_{u=1}^n$ (possibly heavy-tailed), partitioned into $q \geq 2$ asymptotically equal-sized clusters. The model parameters are two constants $a,b > 0$ and the finite second moment of the weights $Φ^{(2)}$. Vertices $u$ and $v$ are connected by an edge with probability… ▽ More We consider the Degree-Corrected Stochastic Block Model (DC-SBM): a random graph on $n$ nodes, having i.i.d. weights $(φ_u)_{u=1}^n$ (possibly heavy-tailed), partitioned into $q \geq 2$ asymptotically equal-sized clusters. The model parameters are two constants $a,b > 0$ and the finite second moment of the weights $Φ^{(2)}$. Vertices $u$ and $v$ are connected by an edge with probability $\frac{φ_u φ_v}{n}a$ when they are in the same class and with probability $\frac{φ_u φ_v}{n}b$ otherwise. We prove that it is information-theoretically impossible to estimate the clusters in a way positively correlated with the true community structure when $(a-b)^2 Φ^{(2)} \leq q(a+b)$. As by-products of our proof we obtain $(1)$ a precise coupling result for local neighbourhoods in DC-SBM's, that we use in a follow up paper [Gulikers et al., 2017] to establish a law of large numbers for local-functionals and $(2)$ that long-range interactions are weak in (power-law) DC-SBM's. △ Less

Submitted 24 November, 2018; v1 submitted 2 November, 2015; originally announced November 2015.

Comments: Appeared in Annals of Applied Probability

Journal ref: Annals of Applied Probability - Volume 28, Number 5 (2018), 3002-3027

arXiv:1506.08621 [pdf, other]

A spectral method for community detection in moderately-sparse degree-corrected stochastic block models

Authors: Lennart Gulikers, Marc Lelarge, Laurent Massoulié

Abstract: We consider community detection in Degree-Corrected Stochastic Block Models (DC-SBM). We propose a spectral clustering algorithm based on a suitably normalized adjacency matrix. We show that this algorithm consistently recovers the block-membership of all but a vanishing fraction of nodes, in the regime where the lowest degree is of order log$(n)$ or higher. Recovery succeeds even for very heterog… ▽ More We consider community detection in Degree-Corrected Stochastic Block Models (DC-SBM). We propose a spectral clustering algorithm based on a suitably normalized adjacency matrix. We show that this algorithm consistently recovers the block-membership of all but a vanishing fraction of nodes, in the regime where the lowest degree is of order log$(n)$ or higher. Recovery succeeds even for very heterogeneous degree-distributions. The used algorithm does not rely on parameters as input. In particular, it does not need to know the number of communities. △ Less

Submitted 7 February, 2017; v1 submitted 29 June, 2015; originally announced June 2015.

arXiv:1501.06087 [pdf, other]

Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs

Authors: Charles Bordenave, Marc Lelarge, Laurent Massoulié

Abstract: A non-backtracking walk on a graph is a directed path such that no edge is the inverse of its preceding edge. The non-backtracking matrix of a graph is indexed by its directed edges and can be used to count non-backtracking walks of a given length. It has been used recently in the context of community detection and has appeared previously in connection with the Ihara zeta function and in some gene… ▽ More A non-backtracking walk on a graph is a directed path such that no edge is the inverse of its preceding edge. The non-backtracking matrix of a graph is indexed by its directed edges and can be used to count non-backtracking walks of a given length. It has been used recently in the context of community detection and has appeared previously in connection with the Ihara zeta function and in some generalizations of Ramanujan graphs. In this work, we study the largest eigenvalues of the non-backtracking matrix of the Erdos-Renyi random graph and of the Stochastic Block Model in the regime where the number of edges is proportional to the number of vertices. Our results confirm the "spectral redemption" conjecture that community detection can be made on the basis of the leading eigenvectors above the feasibility threshold. △ Less

Submitted 22 April, 2015; v1 submitted 24 January, 2015; originally announced January 2015.

Comments: 59 pages

MSC Class: 05C80; 05C50; 91D30

arXiv:1401.1770 [pdf, ps, other]

Adaptive Replication in Distributed Content Delivery Networks

Authors: Mathieu Leconte, Marc Lelarge, Laurent Massoulié

Abstract: We address the problem of content replication in large distributed content delivery networks, composed of a data center assisted by many small servers with limited capabilities and located at the edge of the network. The objective is to optimize the placement of contents on the servers to offload as much as possible the data center. We model the system constituted by the small servers as a loss ne… ▽ More We address the problem of content replication in large distributed content delivery networks, composed of a data center assisted by many small servers with limited capabilities and located at the edge of the network. The objective is to optimize the placement of contents on the servers to offload as much as possible the data center. We model the system constituted by the small servers as a loss network, each loss corresponding to a request to the data center. Based on large system / storage behavior, we obtain an asymptotic formula for the optimal replication of contents and propose adaptive schemes related to those encountered in cache networks but reacting here to loss events, and faster algorithms generating virtual events at higher rate while keeping the same target replication. We show through simulations that our adaptive schemes outperform significantly standard replication strategies both in terms of loss rates and adaptation speed. △ Less

Submitted 8 January, 2014; originally announced January 2014.

Comments: 10 pages, 5 figures

arXiv:1311.3085 [pdf, ps, other]

Community detection thresholds and the weak Ramanujan property

Authors: Laurent Massoulie

Abstract: Decelle et al.\cite{Decelle11} conjectured the existence of a sharp threshold for community detection in sparse random graphs drawn from the stochastic block model. Mossel et al.\cite{Mossel12} established the negative part of the conjecture, proving impossibility of meaningful detection below the threshold. However the positive part of the conjecture remained elusive so far. Here we solve the pos… ▽ More Decelle et al.\cite{Decelle11} conjectured the existence of a sharp threshold for community detection in sparse random graphs drawn from the stochastic block model. Mossel et al.\cite{Mossel12} established the negative part of the conjecture, proving impossibility of meaningful detection below the threshold. However the positive part of the conjecture remained elusive so far. Here we solve the positive part of the conjecture. We introduce a modified adjacency matrix $B$ that counts self-avoiding paths of a given length $\ell$ between pairs of nodes and prove that for logarithmic $\ell$, the leading eigenvectors of this modified matrix provide non-trivial detection, thereby settling the conjecture. A key step in the proof consists in establishing a {\em weak Ramanujan property} of matrix $B$. Namely, the spectrum of $B$ consists in two leading eigenvalues $ρ(B)$, $λ_2$ and $n-2$ eigenvalues of a lower order $O(n^ε\sqrt{ρ(B)})$ for all $ε>0$, $ρ(B)$ denoting $B$'s spectral radius. $d$-regular graphs are Ramanujan when their second eigenvalue verifies $|λ|\le 2 \sqrt{d-1}$. Random $d$-regular graphs have a second largest eigenvalue $λ$ of $2\sqrt{d-1}+o(1)$ (see Friedman\cite{friedman08}), thus being {\em almost} Ramanujan. Erdős-Rényi graphs with average degree $d$ at least logarithmic ($d=Ω(\log n)$) have a second eigenvalue of $O(\sqrt{d})$ (see Feige and Ofek\cite{Feige05}), a slightly weaker version of the Ramanujan property. However this spectrum separation property fails for sparse ($d=O(1)$) Erdős-Rényi graphs. Our result thus shows that by constructing matrix $B$ through neighborhood expansion, we regularize the original adjacency matrix to eventually recover a weak form of the Ramanujan property. △ Less

Submitted 13 November, 2013; originally announced November 2013.

arXiv:1212.0952 [pdf, ps, other]

doi 10.1016/j.tcs.2015.02.018

Self-Organizing Flows in Social Networks

Authors: Nidhi Hegde, Laurent Massoulié, Laurent Viennot

Abstract: Social networks offer users new means of accessing information, essentially relying on "social filtering", i.e. propagation and filtering of information by social contacts. The sheer amount of data flowing in these networks, combined with the limited budget of attention of each user, makes it difficult to ensure that social filtering brings relevant content to the interested users. Our motivation… ▽ More Social networks offer users new means of accessing information, essentially relying on "social filtering", i.e. propagation and filtering of information by social contacts. The sheer amount of data flowing in these networks, combined with the limited budget of attention of each user, makes it difficult to ensure that social filtering brings relevant content to the interested users. Our motivation in this paper is to measure to what extent self-organization of the social network results in efficient social filtering. To this end we introduce flow games, a simple abstraction that models network formation under selfish user dynamics, featuring user-specific interests and budget of attention. In the context of homogeneous user interests, we show that selfish dynamics converge to a stable network structure (namely a pure Nash equilibrium) with close-to-optimal information dissemination. We show in contrast, for the more realistic case of heterogeneous interests, that convergence, if it occurs, may lead to information dissemination that can be arbitrarily inefficient, as captured by an unbounded "price of anarchy". Nevertheless the situation differs when users' interests exhibit a particular structure, captured by a metric space with low doubling dimension. In that case, natural autonomous dynamics converge to a stable configuration. Moreover, users obtain all the information of interest to them in the corresponding dissemination, provided their budget of attention is logarithmic in the size of their interest set. △ Less

Submitted 28 February, 2015; v1 submitted 5 December, 2012; originally announced December 2012.

Journal ref: Theoretical Computer Science, Elsevier, 2015, pp.16

arXiv:1209.2910 [pdf, other]

Community Detection in the Labelled Stochastic Block Model

Authors: Simon Heimlicher, Marc Lelarge, Laurent Massoulié

Abstract: We consider the problem of community detection from observed interactions between individuals, in the context where multiple types of interaction are possible. We use labelled stochastic block models to represent the observed data, where labels correspond to interaction types. Focusing on a two-community scenario, we conjecture a threshold for the problem of reconstructing the hidden communities i… ▽ More We consider the problem of community detection from observed interactions between individuals, in the context where multiple types of interaction are possible. We use labelled stochastic block models to represent the observed data, where labels correspond to interaction types. Focusing on a two-community scenario, we conjecture a threshold for the problem of reconstructing the hidden communities in a way that is correlated with the true partition. To substantiate the conjecture, we prove that the given threshold correctly identifies a transition on the behaviour of belief propagation from insensitive to sensitive. We further prove that the same threshold corresponds to the transition in a related inference problem on a tree model from infeasible to feasible. Finally, numerical results using belief propagation for community detection give further support to the conjecture. △ Less

Submitted 13 September, 2012; originally announced September 2012.

Comments: 9 pages

arXiv:1207.3269 [pdf, ps, other]

The Price of Privacy in Untrusted Recommendation Engines

Authors: Siddhartha Banerjee, Nidhi Hegde, Laurent Massoulié

Abstract: Recent increase in online privacy concerns prompts the following question: can a recommender system be accurate if users do not entrust it with their private data? To answer this, we study the problem of learning item-clusters under local differential privacy, a powerful, formal notion of data privacy. We develop bounds on the sample-complexity of learning item-clusters from privatized user inputs… ▽ More Recent increase in online privacy concerns prompts the following question: can a recommender system be accurate if users do not entrust it with their private data? To answer this, we study the problem of learning item-clusters under local differential privacy, a powerful, formal notion of data privacy. We develop bounds on the sample-complexity of learning item-clusters from privatized user inputs. Significantly, our results identify a sample-complexity separation between learning in an information-rich and an information-scarce regime, thereby highlighting the interaction between privacy and the amount of information (ratings) available to each user. In the information-rich regime, where each user rates at least a constant fraction of items, a spectral clustering approach is shown to achieve a sample-complexity lower bound derived from a simple information-theoretic argument based on Fano's inequality. However, the information-scarce regime, where each user rates only a vanishing fraction of items, is found to require a fundamentally different approach both for lower bounds and algorithms. To this end, we develop new techniques for bounding mutual information under a notion of channel-mismatch, and also propose a new algorithm, MaxSense, and show that it achieves optimal sample-complexity in this setting. The techniques we develop for bounding mutual information may be of broader interest. To illustrate this, we show their applicability to $(i)$ learning based on 1-bit sketches, and $(ii)$ adaptive learning, where queries can be adapted based on answers to past queries. △ Less

Submitted 27 October, 2014; v1 submitted 13 July, 2012; originally announced July 2012.

Comments: Preliminary version presented at the 50th Allerton Conference, 2012

arXiv:1207.1659 [pdf, ps, other]

Convergence of multivariate belief propagation, with applications to cuckoo hashing and load balancing

Authors: Mathieu Leconte, Marc Lelarge, Laurent Massoulié

Abstract: This paper is motivated by two applications, namely i) generalizations of cuckoo hashing, a computationally simple approach to assigning keys to objects, and ii) load balancing in content distribution networks, where one is interested in determining the impact of content replication on performance. These two problems admit a common abstraction: in both scenarios, performance is characterized by th… ▽ More This paper is motivated by two applications, namely i) generalizations of cuckoo hashing, a computationally simple approach to assigning keys to objects, and ii) load balancing in content distribution networks, where one is interested in determining the impact of content replication on performance. These two problems admit a common abstraction: in both scenarios, performance is characterized by the maximum weight of a generalization of a matching in a bipartite graph, featuring node and edge capacities. Our main result is a law of large numbers characterizing the asymptotic maximum weight matching in the limit of large bipartite random graphs, when the graphs admit a local weak limit that is a tree. This result specializes to the two application scenarios, yielding new results in both contexts. In contrast with previous results, the key novelty is the ability to handle edge capacities with arbitrary integer values. An analysis of belief propagation algorithms (BP) with multivariate belief vectors underlies the proof. In particular, we show convergence of the corresponding BP by exploiting monotonicity of the belief vectors with respect to the so-called upshifted likelihood ratio stochastic order. This auxiliary result can be of independent interest, providing a new set of structural conditions which ensure convergence of BP. △ Less

Submitted 6 July, 2012; originally announced July 2012.

Comments: 10 pages format + proofs in the appendix: total 24 pages

arXiv:1206.4674 [pdf]

Comparison-Based Learning with Rank Nets

Authors: Amin Karbasi, Stratis Ioannidis, laurent Massoulie

Abstract: We consider the problem of search through comparisons, where a user is presented with two candidate objects and reveals which is closer to her intended target. We study adaptive strategies for finding the target, that require knowledge of rank relationships but not actual distances between objects. We propose a new strategy based on rank nets, and show that for target distributions with a bounded… ▽ More We consider the problem of search through comparisons, where a user is presented with two candidate objects and reveals which is closer to her intended target. We study adaptive strategies for finding the target, that require knowledge of rank relationships but not actual distances between objects. We propose a new strategy based on rank nets, and show that for target distributions with a bounded doubling constant, it finds the target in a number of comparisons close to the entropy of the target distribution and, hence, of the optimum. We extend these results to the case of noisy oracles, and compare this strategy to prior art over multiple datasets. △ Less

Submitted 18 June, 2012; originally announced June 2012.

Comments: ICML2012

arXiv:1203.1891 [pdf, ps, other]

Optimal control of end-user energy storage

Authors: Peter M. van de Ven, Nidhi Hegde, Laurent Massoulie, Theodoros Salonidis

Abstract: An increasing number of retail energy markets show price fluctuations, providing users with the opportunity to buy energy at lower than average prices. We propose to temporarily store this inexpensive energy in a battery, and use it to satisfy demand when energy prices are high, thus allowing users to exploit the price variations without having to shift their demand to the low-price periods. We st… ▽ More An increasing number of retail energy markets show price fluctuations, providing users with the opportunity to buy energy at lower than average prices. We propose to temporarily store this inexpensive energy in a battery, and use it to satisfy demand when energy prices are high, thus allowing users to exploit the price variations without having to shift their demand to the low-price periods. We study the battery control policy that yields the best performance, i.e., minimizes the total discounted costs. The optimal policy is shown to have a threshold structure, and we derive these thresholds in a few special cases. The cost savings obtained from energy storage are demonstrated through extensive numerical experiments, and we offer various directions for future research. △ Less

Submitted 5 December, 2012; v1 submitted 8 March, 2012; originally announced March 2012.

arXiv:1109.3318 [pdf, other]

Distributed User Profiling via Spectral Methods

Authors: Dan-Cristian Tomozei, Laurent Massoulié

Abstract: User profiling is a useful primitive for constructing personalised services, such as content recommendation. In the present paper we investigate the feasibility of user profiling in a distributed setting, with no central authority and only local information exchanges between users. We compute a profile vector for each user (i.e., a low-dimensional vector that characterises her taste) via spectral… ▽ More User profiling is a useful primitive for constructing personalised services, such as content recommendation. In the present paper we investigate the feasibility of user profiling in a distributed setting, with no central authority and only local information exchanges between users. We compute a profile vector for each user (i.e., a low-dimensional vector that characterises her taste) via spectral transformation of observed user-produced ratings for items. Our two main contributions follow: i) We consider a low-rank probabilistic model of user taste. More specifically, we consider that users and items are partitioned in a constant number of classes, such that users and items within the same class are statistically identical. We prove that without prior knowledge of the compositions of the classes, based solely on few random observed ratings (namely $O(N\log N)$ such ratings for $N$ users), we can predict user preference with high probability for unrated items by running a local vote among users with similar profile vectors. In addition, we provide empirical evaluations characterising the way in which spectral profiling performance depends on the dimension of the profile space. Such evaluations are performed on a data set of real user ratings provided by Netflix. ii) We develop distributed algorithms which provably achieve an embedding of users into a low-dimensional space, based on spectral transformation. These involve simple message passing among users, and provably converge to the desired embedding. Our method essentially relies on a novel combination of gossiping and the algorithm proposed by Oja and Karhunen. △ Less

Submitted 22 April, 2013; v1 submitted 15 September, 2011; originally announced September 2011.

Comments: 31 pages

MSC Class: 60B20 ACM Class: G.3

Showing 1–50 of 53 results for author: Massoulie, l