-
EC-SBM Synthetic Network Generator
Authors:
The-Anh Vu-Le,
Lahari Anne,
George Chacko,
Tandy Warnow
Abstract:
Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world…
▽ More
Generating high-quality synthetic networks with realistic community structure is vital to effectively evaluate community detection algorithms. In this study, we propose a new synthetic network generator called the Edge-Connected Stochastic Block Model (EC-SBM). The goal of EC-SBM is to take a given clustered real-world network and produce a synthetic network that resembles the clustered real-world network with respect to both network and community-specific criteria. In particular, we focus on simulating the internal edge connectivity of the clusters in the reference clustered network. Our extensive performance study on large real-world networks shows that EC-SBM has high accuracy in both network and community-specific criteria, and is generally more accurate than current alternative approaches for this problem. Furthermore, EC-SBM is fast enough to scale to real-world networks with millions of nodes.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
RECCS: Realistic Cluster Connectivity Simulator for Synthetic Network Generation
Authors:
Lahari Anne,
The-Anh Vu-Le,
Minhyuk Park,
Tandy Warnow,
George Chacko
Abstract:
The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Mode…
▽ More
The limited availability of useful ground-truth communities in real-world networks presents a challenge to evaluating and selecting a "best" community detection method for a given network or family of networks. The use of synthetic networks with planted ground-truths is one way to address this challenge. While several synthetic network generators can be used for this purpose, Stochastic Block Models (SBMs), when provided input parameters from real-world networks and clusterings, are well suited to producing networks that retain the properties of the network they are intended to model. We report, however, that SBMs can produce disconnected ground truth clusters; even under conditions where the input clusters are connected. In this study, we describe the REalistic Cluster Connectivity Simulator (RECCS), which, while retaining approximately the same quality for other network and cluster parameters, creates an SBM synthetic network and then modifies it to ensure an improved fit to cluster connectivity. We report results using parameters obtained from clustered real-world networks ranging up to 13.9 million nodes in size, and demonstrate an improvement over the unmodified use of SBMs for network generation.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Improved Community Detection using Stochastic Block Models
Authors:
Minhyuk Park,
Daniel Wang Feng,
Siya Digra,
The-Anh Vu-Le,
Lahari Anne,
George Chacko,
Tandy Warnow
Abstract:
Identifying edge-dense communities that are also well-connected is an important aspect of understanding community structure. Prior work has shown that community detection methods can produce poorly connected communities, and some can even produce internally disconnected communities. In this study we evaluate the connectivity of communities obtained using Stochastic Block Models. We find that SBMs…
▽ More
Identifying edge-dense communities that are also well-connected is an important aspect of understanding community structure. Prior work has shown that community detection methods can produce poorly connected communities, and some can even produce internally disconnected communities. In this study we evaluate the connectivity of communities obtained using Stochastic Block Models. We find that SBMs produce internally disconnected communities from real-world networks. We present a simple technique, Well-Connected Clusters (WCC), which repeatedly removes small edge cuts until the communities meet a user-specified threshold for well-connectivity. Our study using a large collection of synthetic networks based on clustered real-world networks shows that using WCC as a post-processing tool with SBM community detection typically improves clustering accuracy. WCC is fast enough to use on networks with millions of nodes and is freely available in open source form.
△ Less
Submitted 13 February, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
Synthetic Networks That Preserve Edge Connectivity
Authors:
Lahari Anne,
The-Anh Vu-Le,
Minhyuk Park,
Tandy Warnow,
George Chacko
Abstract:
Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools available, Stochastic Block Models (SBMs) produce networks with ground truth clusters that well approximate input parameters from real-world networks and clusterings. H…
▽ More
Since true communities within real-world networks are rarely known, synthetic networks with planted ground truths are valuable for evaluating the performance of community detection methods. Of the synthetic network generation tools available, Stochastic Block Models (SBMs) produce networks with ground truth clusters that well approximate input parameters from real-world networks and clusterings. However, we show that SBMs can produce disconnected ground truth clusters, even when given parameters from clusterings where all clusters are connected. Here we describe the REalistic Cluster Connectivity Simulator (RECCS), a technique that modifies an SBM synthetic network to improve the fit to a given clustered real-world network with respect to edge connectivity within clusters, while maintaining the good fit with respect to other network and cluster statistics. Using real-world networks up to 13.9 million nodes in size, we show that RECCS, applied to stochastic block models, results in synthetic networks that have a better fit to cluster edge connectivity than unmodified SBMs, while providing roughly the same quality fit for other network and clustering parameters as unmodified SBMs.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.