-
Community Detection with Known, Unknown, or Partially Known Auxiliary Latent Variables
Authors:
Mohammad Esmaeili,
Aria Nosratinia
Abstract:
Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this paper, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study com…
▽ More
Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this paper, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study community detection in graphs obeying the stochastic block model and censored block model with auxiliary latent variables. We analyze the conditions for exact recovery when these auxiliary latent variables are unknown, representing unknown nuisance parameters or model mismatch. We also analyze exact recovery when these secondary latent variables have been either fully or partially revealed. Finally, we propose a semidefinite programming algorithm for recovering the desired labels when the secondary labels are either known or unknown. We show that exact recovery is possible by semidefinite programming down to the respective maximum likelihood exact recovery threshold.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Semidefinite Programming for Community Detection with Side Information
Authors:
Mohammad Esmaeili,
Hussein Metwaly Saad,
Aria Nosratinia
Abstract:
This paper produces an efficient Semidefinite Programming (SDP) solution for community detection that incorporates non-graph data, which in this context is known as side information. SDP is an efficient solution for standard community detection on graphs. We formulate a semi-definite relaxation for the maximum likelihood estimation of node labels, subject to observing both graph and non-graph data…
▽ More
This paper produces an efficient Semidefinite Programming (SDP) solution for community detection that incorporates non-graph data, which in this context is known as side information. SDP is an efficient solution for standard community detection on graphs. We formulate a semi-definite relaxation for the maximum likelihood estimation of node labels, subject to observing both graph and non-graph data. This formulation is distinct from the SDP solution of standard community detection, but maintains its desirable properties. We calculate the exact recovery threshold for three types of non-graph information, which in this paper are called side information: partially revealed labels, noisy labels, as well as multiple observations (features) per node with arbitrary but finite cardinality. We find that SDP has the same exact recovery threshold in the presence of side information as maximum likelihood with side information. Thus, the methods developed herein are computationally efficient as well as asymptotically accurate for the solution of community detection in the presence of side information. Simulations show that the asymptotic results of this paper can also shed light on the performance of SDP for graphs of modest size.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Community Detection: Exact Recovery in Weighted Graphs
Authors:
Mohammad Esmaeili,
Aria Nosratinia
Abstract:
In community detection, the exact recovery of communities (clusters) has been mainly investigated under the general stochastic block model with edges drawn from Bernoulli distributions. This paper considers the exact recovery of communities in a complete graph in which the graph edges are drawn from either a set of Gaussian distributions with community-dependent means and variances, or a set of ex…
▽ More
In community detection, the exact recovery of communities (clusters) has been mainly investigated under the general stochastic block model with edges drawn from Bernoulli distributions. This paper considers the exact recovery of communities in a complete graph in which the graph edges are drawn from either a set of Gaussian distributions with community-dependent means and variances, or a set of exponential distributions with community-dependent means. For each case, we introduce a new semi-metric that describes sufficient and necessary conditions of exact recovery. The necessary and sufficient conditions are asymptotically tight. The analysis is also extended to incomplete, fully connected weighted graphs.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
Semi-Supervised Node Classification by Graph Convolutional Networks and Extracted Side Information
Authors:
Mohammad Esmaeili,
Aria Nosratinia
Abstract:
The nodes of a graph existing in a cluster are more likely to connect to each other than with other nodes in the graph. Then revealing some information about some nodes, the structure of the graph (graph edges) provides this opportunity to know more information about other nodes. From this perspective, this paper revisits the node classification task in a semi-supervised scenario by graph convolut…
▽ More
The nodes of a graph existing in a cluster are more likely to connect to each other than with other nodes in the graph. Then revealing some information about some nodes, the structure of the graph (graph edges) provides this opportunity to know more information about other nodes. From this perspective, this paper revisits the node classification task in a semi-supervised scenario by graph convolutional networks (GCNs). The goal is to benefit from the flow of information that circulates around the revealed node labels. The contribution of this paper is twofold. First, this paper provides a method for extracting side information from a graph realization. Then a new GCN architecture is presented that combines the output of traditional GCN and the extracted side information. Another contribution of this paper is relevant to non-graph observations (independent side information) that exists beside a graph realization in many applications. Indeed, the extracted side information can be replaced by a sequence of side information that is independent of the graph structure. For both cases, the experiments on synthetic and real-world datasets demonstrate that the proposed model achieves a higher prediction accuracy in comparison to the existing state-of-the-art methods for the node classification task.
△ Less
Submitted 13 November, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.