-
The Artificial Benchmark for Community Detection with Outliers and Overlapping Communities (ABCD+$o^2$)
Authors:
Jordan Barrett,
Ryan DeWolfe,
Bogumił Kamiński,
Paweł Prałat,
Aaron Smith,
François Théberge
Abstract:
The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster, more interpretable, and can be investigated analytically. In this paper, we use the underlying ingredients of the ABCD model, and its generaliz…
▽ More
The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster, more interpretable, and can be investigated analytically. In this paper, we use the underlying ingredients of the ABCD model, and its generalization to include outliers (ABCD+$o$), and introduce another variant that allows for overlapping communities, ABCD+$o^2$.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Self-similarity of Communities of the ABCD Model
Authors:
Jordan Barrett,
Bogumil Kaminski,
Pawel Pralat,
Francois Theberge
Abstract:
The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically.
In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the…
▽ More
The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically.
In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the degree distribution of ground-truth communities is asymptotically the same as the degree distribution of the whole graph (appropriately normalized based on their sizes). As a result, we can not only estimate the number of edges induced by each community but also the number of self-loops and multi-edges generated during the process. Understanding these quantities is important as (a) rewiring self-loops and multi-edges to keep the graph simple is an expensive part of the algorithm, and (b) every rewiring causes the underlying configuration models to deviate slightly from uniform simple graphs on their corresponding degree sequences.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Predicting Properties of Nodes via Community-Aware Features
Authors:
Bogumił Kamiński,
Paweł Prałat,
François Théberge,
Sebastian Zając
Abstract:
This paper shows how information about the network's community structure can be used to define node features with high predictive power for classification tasks. To do so, we define a family of community-aware node features and investigate their properties. Those features are designed to ensure that they can be efficiently computed even for large graphs. We show that community-aware node features…
▽ More
This paper shows how information about the network's community structure can be used to define node features with high predictive power for classification tasks. To do so, we define a family of community-aware node features and investigate their properties. Those features are designed to ensure that they can be efficiently computed even for large graphs. We show that community-aware node features contain information that cannot be completely recovered by classical node features or node embeddings (both classical and structural) and bring value in node classification tasks. This is verified for various classification tasks on synthetic and real-life networks.
△ Less
Submitted 26 April, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Lower Bounds for Possibly Divergent Probabilistic Programs
Authors:
Shenghua Feng,
Mingshuai Chen,
Han Su,
Benjamin Lucien Kaminski,
Joost-Pieter Katoen,
Naijun Zhan
Abstract:
We present a new proof rule for verifying lower bounds on quantities of probabilistic programs. Our proof rule is not confined to almost-surely terminating programs -- as is the case for existing rules -- and can be used to establish non-trivial lower bounds on, e.g., termination probabilities and expected values, for possibly divergent probabilistic loops, e.g., the well-known three-dimensional r…
▽ More
We present a new proof rule for verifying lower bounds on quantities of probabilistic programs. Our proof rule is not confined to almost-surely terminating programs -- as is the case for existing rules -- and can be used to establish non-trivial lower bounds on, e.g., termination probabilities and expected values, for possibly divergent probabilistic loops, e.g., the well-known three-dimensional random walk on a lattice.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
Artificial Benchmark for Community Detection with Outliers (ABCD+o)
Authors:
Bogumił Kamiński,
Paweł Prałat,
François Théberge
Abstract:
The Artificial Benchmark for Community Detection graph (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$. In this paper, we extend the ABCD mod…
▽ More
The Artificial Benchmark for Community Detection graph (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$. In this paper, we extend the ABCD model to include potential outliers. We perform some exploratory experiments on both the new ABCD+o model as well as a real-world network to show that outliers possess some desired, distinguishable properties.
△ Less
Submitted 12 June, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Hypergraph Artificial Benchmark for Community Detection (h-ABCD)
Authors:
Bogumił Kamiński,
Paweł Prałat,
François Théberge
Abstract:
The Artificial Benchmark for Community Detection (ABCD) graph is a recently introduced random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter can be tuned to mimic its counterpart in the LFR model, the mixing parameter. In this paper, we introdu…
▽ More
The Artificial Benchmark for Community Detection (ABCD) graph is a recently introduced random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter can be tuned to mimic its counterpart in the LFR model, the mixing parameter. In this paper, we introduce hypergraph counterpart of the ABCD model, h-ABCD, which produces random hypergraph with distributions of ground-truth community sizes and degrees following power-law. As in the original ABCD, the new model h-ABCD can produce hypergraphs with various levels of noise. More importantly, the model is flexible and can mimic any desired level of homogeneity of hyperedges that fall into one community. As a result, it can be used as a suitable, synthetic playground for analyzing and tuning hypergraph community detection algorithms.
△ Less
Submitted 12 June, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Properties and Performance of the ABCDe Random Graph Model with Community Structure
Authors:
Bogumił Kamiński,
Tomasz Olczak,
Bartosz Pankratz,
Paweł Prałat,
François Théberge
Abstract:
In this paper, we investigate properties and performance of synthetic random graph models with a built-in community structure. Such models are important for evaluating and tuning community detection algorithms that are unsupervised by nature. We propose ABCDe, a multi-threaded implementation of the ABCD (Artificial Benchmark for Community Detection) graph generator. We discuss the implementation d…
▽ More
In this paper, we investigate properties and performance of synthetic random graph models with a built-in community structure. Such models are important for evaluating and tuning community detection algorithms that are unsupervised by nature. We propose ABCDe, a multi-threaded implementation of the ABCD (Artificial Benchmark for Community Detection) graph generator. We discuss the implementation details of the algorithm and compare it with both the previously available sequential version of the ABCD model and with the parallel implementation of the standard and extensively used LFR (Lancichinetti--Fortunato--Radicchi) generator. We show that ABCDe is more than ten times faster and scales better than the parallel implementation of LFR provided in NetworKit. Moreover, the algorithm is not only faster but random graphs generated by ABCD have similar properties to the ones generated by the original LFR algorithm, while the parallelized NetworKit implementation of LFR produces graphs that have noticeably different characteristics.
△ Less
Submitted 16 September, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Modularity of the ABCD Random Graph Model with Community Structure
Authors:
Bogumil Kaminski,
Bartosz Pankratz,
Pawel Pralat,
Francois Theberge
Abstract:
The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$.
In this paper, we investigate vario…
▽ More
The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $ξ$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $μ$.
In this paper, we investigate various theoretical asymptotic properties of the ABCD model. In particular, we analyze the modularity function, arguably, the most important graph property of networks in the context of community detection. Indeed, the modularity function is often used to measure the presence of community structure in networks. It is also used as a quality function in many community detection algorithms, including the widely used Louvain algorithm.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Hamilton Cycles in the Semi-random Graph Process
Authors:
Pu Gao,
Bogumil Kaminski,
Calum MacRury,
Pawel Pralat
Abstract:
The semi-random graph process is a single player game in which the player is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the player independently and uniformly at random. The player then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a fixed monotone graph property, the objective of the player is to force the graph to s…
▽ More
The semi-random graph process is a single player game in which the player is initially presented an empty graph on $n$ vertices. In each round, a vertex $u$ is presented to the player independently and uniformly at random. The player then adaptively selects a vertex $v$, and adds the edge $uv$ to the graph. For a fixed monotone graph property, the objective of the player is to force the graph to satisfy this property with high probability in as few rounds as possible.
We focus on the problem of constructing a Hamilton cycle in as few rounds as possible. In particular, we present a novel strategy for the player which achieves a Hamiltonian cycle in $(2+4e^{-2}+0.07+o(1)) \, n < 2.61135 \, n$ rounds, assuming that a specific non-convex optimization problem has a negative solution (a premise we numerically support). Assuming that this technical condition holds, this improves upon the previously best known upper bound of $3 \, n$ rounds. We also show that the previously best lower bound of $(\ln 2 + \ln (1+\ln 2) + o(1)) \, n$ is not tight.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
On Broadcasting Time in the Model of Travelling Agents
Authors:
Reaz Huq,
Bogumil Kaminski,
Atefeh Mashatan,
Pawel Pralat,
Przemyslaw Szufel
Abstract:
Consider the following broadcasting process run on a connected graph $G=(V,E)$. Suppose that $k \ge 2$ agents start on vertices selected from $V$ uniformly and independently at random. One of the agents has a message that she wants to communicate to the other agents. All agents perform independent random walks on $G$, with the message being passed when an agent that knows the message meets an agen…
▽ More
Consider the following broadcasting process run on a connected graph $G=(V,E)$. Suppose that $k \ge 2$ agents start on vertices selected from $V$ uniformly and independently at random. One of the agents has a message that she wants to communicate to the other agents. All agents perform independent random walks on $G$, with the message being passed when an agent that knows the message meets an agent that does not know the message. The broadcasting time $ξ(G,k)$ is the time it takes to spread the message to all agents.
Our ultimate goal is to gain a better understanding of the broadcasting process run on real-world networks of roads of large cities that might shed some light on the behaviour of future autonomous and connected vehicles. Due to the complexity of road networks, such phenomena have to be studied using simulation in practical applications. In this paper, we study the process on the simplest scenario, i.e., the family of complete graphs, as in this case the problem is analytically tractable. We provide tight bounds for $ξ(K_n,k)$ that hold asymptotically almost surely for the whole range of the parameter $k$. These theoretical results reveal interesting relationships and, at the same time, are also helpful to understand and explain the behaviour we observe in more realistic networks.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
Clustering via Hypergraph Modularity
Authors:
Bogumil Kaminski,
Valerie Poulin,
Pawel Pralat,
Przemyslaw Szufel,
Francois Theberge
Abstract:
Despite the fact that many important problems (including clustering) can be described using hypergraphs, theoretical foundations as well as practical algorithms using hypergraphs are not well developed yet. In this paper, we propose a hypergraph modularity function that generalizes its well established and widely used graph counterpart measure of how clustered a network is. In order to define it p…
▽ More
Despite the fact that many important problems (including clustering) can be described using hypergraphs, theoretical foundations as well as practical algorithms using hypergraphs are not well developed yet. In this paper, we propose a hypergraph modularity function that generalizes its well established and widely used graph counterpart measure of how clustered a network is. In order to define it properly, we generalize the Chung-Lu model for graphs to hypergraphs. We then provide the theoretical foundations to search for an optimal solution with respect to our hypergraph modularity function. Two simple heuristic algorithms are described and applied to a few small illustrative examples. We show that using a strict version of our proposed modularity function often leads to a solution where a smaller number of hyperedges get cut as compared to optimizing modularity of 2-section graph of a hypergraph.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Subtrees of a random tree
Authors:
Bogumil Kaminski,
Pawel Pralat
Abstract:
Let $T$ be a random tree taken uniformly at random from the family of labelled trees on $n$ vertices. In this note, we provide bounds for $c(n)$, the number of sub-trees of $T$ that hold asymptotically almost surely. With computer support we show that $1.41805386^n \le c(n) \le 1.41959881^n$. Moreover, there is a strong indication that, in fact, $c(n) \le 1.41806183^n$.
Let $T$ be a random tree taken uniformly at random from the family of labelled trees on $n$ vertices. In this note, we provide bounds for $c(n)$, the number of sub-trees of $T$ that hold asymptotically almost surely. With computer support we show that $1.41805386^n \le c(n) \le 1.41959881^n$. Moreover, there is a strong indication that, in fact, $c(n) \le 1.41806183^n$.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Clustering Properties of Spatial Preferential Attachment Model
Authors:
Lenar Iskhakov,
Bogumil Kaminski,
Maksim Mironov,
Liudmila Ostroumova Prokhorenkova,
Pawel Pralat
Abstract:
In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model introduced by Aiello et al. in 2009. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. For example…
▽ More
In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model introduced by Aiello et al. in 2009. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. For example, the vertex degree distribution was shown to follow a power law. In the current paper, we study the behaviour of C(d), which is the average local clustering coefficient for the vertices of degree d. This characteristic was not previously analyzed in the SPA model. However, it was empirically shown that in real-world networks C(d) usually decreases as d^{-a} for some a>0 and it was often observed that a=1. We prove that in the SPA model C(d) decreases as 1/d. Furthermore, we are also able to prove that not only the average but the individual local clustering coefficient of a vertex v of degree d behaves as 1/d if d is large enough. The obtained results are illustrated by numerous experiments with simulated graphs.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
Local Clustering Coefficient of Spatial Preferential Attachment Model
Authors:
Lenar Iskhakov,
Bogumil Kaminski,
Maksim Mironov,
Pawel Pralat,
Liudmila Prokhorenkova
Abstract:
In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. Also, this model was successfully used for seve…
▽ More
In this paper, we study the clustering properties of the Spatial Preferential Attachment (SPA) model. This model naturally combines geometry and preferential attachment using the notion of spheres of influence. It was previously shown in several research papers that graphs generated by the SPA model are similar to real-world networks in many aspects. Also, this model was successfully used for several practical applications. However, the clustering properties of the SPA model were not fully analyzed. The clustering coefficient is an important characteristic of complex networks which is tightly connected with its community structure. In the current paper, we study the behaviour of C(d), which is the average local clustering coefficient for the vertices of degree d. It was empirically shown that in real-world networks C(d) usually decreases as 1/d^a for some a>0 and it was often observed that a=1. We prove that in the SPA model C(d) decreases as 1/d. Furthermore, we are also able to prove that not only the average but the individual local clustering coefficient of a vertex v of degree $d$ behaves as 1/d if d is large enough. The obtained results further confirm the suitability of the SPA model for fitting various real-world complex networks.
△ Less
Submitted 2 June, 2019; v1 submitted 18 November, 2017;
originally announced November 2017.
-
Space-time directional Lyapunov exponents for cellular automata
Authors:
Maurice Courbage,
Brunon Kaminski
Abstract:
Space-time directional Lyapunov exponents are introduced. They describe the maximal velocity of propagation to the right or to the left of fronts of perturbations in a frame moving with a given velocity. The continuity of these exponents as function of the velocity and an inequality relating them to the directional entropy is proved.
Space-time directional Lyapunov exponents are introduced. They describe the maximal velocity of propagation to the right or to the left of fronts of perturbations in a frame moving with a given velocity. The continuity of these exponents as function of the velocity and an inequality relating them to the directional entropy is proved.
△ Less
Submitted 24 March, 2006;
originally announced March 2006.