-
Local Distance-Preserving Node Embeddings and Their Performance on Random Graphs
Authors:
My Le,
Luana Ruiz,
Souvik Dhara
Abstract:
Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as la…
▽ More
Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by Bourgain's seminal work on Hilbert space embeddings of metric spaces (1985), we study the performance of local distance-preserving node embeddings. Known as landmark-based algorithms, these embeddings approximate pairwise distances by computing shortest paths from a small subset of reference nodes (i.e., landmarks). Our main theoretical contribution shows that random graphs, such as Erdős-Rényi random graphs, require lower dimensions in landmark-based embeddings compared to worst-case graphs. Empirically, we demonstrate that the GNN-based approximations for the distances to landmarks generalize well to larger networks, offering a scalable alternative for graph representation learning.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Spectral Algorithms Optimally Recover Planted Sub-structures
Authors:
Souvik Dhara,
Julia Gaudio,
Elchanan Mossel,
Colin Sandon
Abstract:
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), a…
▽ More
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), as well as in a censored variant of the SBM. Here we show that this optimality is somewhat universal as it carries over to other planted substructures such as the planted dense subgraph problem and submatrix localization problem, as well as to a censored version of the planted dense subgraph problem.
△ Less
Submitted 11 October, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Community detection using low-dimensional network embedding algorithms
Authors:
Aman Barot,
Shankar Bhamidi,
Souvik Dhara
Abstract:
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learnin…
▽ More
With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires developing general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Machine Learning Regression for Operator Dynamics
Authors:
Justin Reyes,
Sayandip Dhara,
Eduardo R. Mucciolo
Abstract:
Determining the dynamics of the expectation values for operators acting on a quantum many-body (QMB) system is a challenging task. Matrix product states (MPS) have traditionally been the "go-to" models for these systems because calculating expectation values in this representation can be done with relative simplicity and high accuracy. However, such calculations can become computationally costly w…
▽ More
Determining the dynamics of the expectation values for operators acting on a quantum many-body (QMB) system is a challenging task. Matrix product states (MPS) have traditionally been the "go-to" models for these systems because calculating expectation values in this representation can be done with relative simplicity and high accuracy. However, such calculations can become computationally costly when extended to long times. Here, we present a solution for efficiently extending the computation of expectation values to long time intervals. We utilize a multi-layer perceptron (MLP) model as a tool for regression on MPS expectation values calculated within the regime of short time intervals. With this model, the computational cost of generating long-time dynamics is significantly reduced, while maintaining a high accuracy. These results are demonstrated with operators relevant to quantum spin models in one spatial dimension.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.