-
Distinguishing subsampled power laws from other heavy-tailed distributions
Authors:
Silja Sormunen,
Lasse Leskelä,
Jari Saramäki
Abstract:
Distinguishing power-law distributions from other heavy-tailed distributions is challenging, and this task is often further complicated by subsampling effects. In this work, we evaluate the performance of two commonly used methods for detecting power-law distributions - the maximum likelihood method of Clauset et al. and the extreme value method of Voitalov et al. - in distinguishing subsampled po…
▽ More
Distinguishing power-law distributions from other heavy-tailed distributions is challenging, and this task is often further complicated by subsampling effects. In this work, we evaluate the performance of two commonly used methods for detecting power-law distributions - the maximum likelihood method of Clauset et al. and the extreme value method of Voitalov et al. - in distinguishing subsampled power laws from two other heavy-tailed distributions, the lognormal and the stretched exponential distributions. We focus on a random subsampling method commonly applied in network science and biological sciences. In this subsampling scheme, we are ultimately interested in the frequency distribution of elements with a certain number of constituent parts, and each part is selected to the subsample with an equal probability. We investigate how well the results obtained from subsamples generalize to the original distribution. Our results show that the power-law exponent of the original distribution can be estimated fairly accurately from subsamples, but classifying the distribution correctly is more challenging. The maximum likelihood method falsely rejects the power-law hypothesis for a large fraction of subsamples from power-law distributions. While the extreme value method correctly recognizes subsampled power-law distributions with all tested subsampling depths, its capacity to distinguish power laws from the heavy-tailed alternatives is limited. However, these false positives tend to result not from the subsampling itself but from the estimators' inability to classify the original sample correctly. In fact, we show that the extreme value method can sometimes be expected to perform better on subsamples than on the original samples from the lognormal and the stretched exponential distributions, while the contrary is true for the main tests included in the maximum likelihood method.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
A network community detection method with integration of data from multiple layers and node attributes
Authors:
Hannu Reittu,
Lasse Leskelä,
Tomi Räty
Abstract:
Multilayer networks are in the focus of the current complex network study. In such networks multiple types of links may exist as well as many attributes for nodes. To fully use multilayer -- and other types of complex networks in applications, the merging of various data with topological information renders a powerful analysis. First, we suggest a simple way of representing network data in a data…
▽ More
Multilayer networks are in the focus of the current complex network study. In such networks multiple types of links may exist as well as many attributes for nodes. To fully use multilayer -- and other types of complex networks in applications, the merging of various data with topological information renders a powerful analysis. First, we suggest a simple way of representing network data in a data matrix where rows correspond to the nodes, and columns correspond to the data items. The number of columns is allowed to be arbitrary, so that the data matrix can be easily expanded by adding columns. The data matrix can be chosen according to targets of the analysis, and may vary a lot from case to case. Next, we partition the rows of the data matrix into communities using a method which allows maximal compression of the data matrix. For compressing a data matrix, we suggest to extend so called regular decomposition method for non-square matrices. We illustrate our method for several types of data matrices, in particular, distance matrices, and matrices obtained by augmenting a distance matrix by a column of node degrees, or by concatenating several distances matrices corresponding to layers of a multilayer network. We illustrate our method with synthetic power-law graphs and two real networks: an Internet autonomous systems graph and a world airline graph. We compare the outputs of different community recovery methods on these graphs, and discuss how incorporating node degrees as a separate column to the data matrix leads our method to identify community structures well-aligned with tiered hierarchical structures commonly encountered in complex scale-free networks.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Adaptive and optimized COVID-19 vaccination strategies across geographical regions and age groups
Authors:
Jeta Molla,
Alejandro Ponce de León Chávez,
Takayuki Hiraoka,
Tapio Ala-Nissila,
Mikko Kivelä,
Lasse Leskelä
Abstract:
We evaluate the efficiency of various heuristic strategies for allocating vaccines against COVID-19 and compare them to strategies found using optimal control theory. Our approach is based on a mathematical model which tracks the spread of disease among different age groups and across different geographical regions, and we introduce a method to combine age-specific contact data to geographical mov…
▽ More
We evaluate the efficiency of various heuristic strategies for allocating vaccines against COVID-19 and compare them to strategies found using optimal control theory. Our approach is based on a mathematical model which tracks the spread of disease among different age groups and across different geographical regions, and we introduce a method to combine age-specific contact data to geographical movement data. As a case study, we model the epidemic in the population of mainland Finland utilizing mobility data from a major telecom operator. Our approach allows to determine which geographical regions and age groups should be targeted first in order to minimize the number of deaths. In the scenarios that we test, we find that distributing vaccines demographically and in an age-descending order is not optimal for minimizing deaths and the burden of disease. Instead, more lives could potentially be saved by using strategies which emphasize high-incidence regions and distribute vaccines in parallel to multiple age groups. The level of emphasis that high-incidence regions should be given depends on the overall transmission rate in the population. This observation highlights the importance of updating the vaccination strategy when the effective reproduction number changes due to the general contact patterns changing and new virus variants entering.
△ Less
Submitted 3 December, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Clustering and percolation on superpositions of Bernoulli random graphs
Authors:
Mindaugas Bloznelis,
Lasse Leskelä
Abstract:
A simple but powerful network model with $n$ nodes and $m$ partly overlapping layers is generated as an overlay of independent random graphs $G_1,\dots,G_m$ with variable sizes and densities. The model is parameterised by a joint distribution $P_n$ of layer sizes and densities. When $m$ grows linearly and $P_n \to P$ as $n \to \infty$, the model generates sparse random graphs with a rich statistic…
▽ More
A simple but powerful network model with $n$ nodes and $m$ partly overlapping layers is generated as an overlay of independent random graphs $G_1,\dots,G_m$ with variable sizes and densities. The model is parameterised by a joint distribution $P_n$ of layer sizes and densities. When $m$ grows linearly and $P_n \to P$ as $n \to \infty$, the model generates sparse random graphs with a rich statistical structure, admitting a nonvanishing clustering coefficient together with a limiting degree distribution and clustering spectrum with tunable power-law exponents. Remarkably, the model admits parameter regimes in which bond percolation exhibits two phase transitions: the first related to the emergence of a giant connected component, and the second to the appearance of gigantic single-layer components.
△ Less
Submitted 2 November, 2020; v1 submitted 31 December, 2019;
originally announced December 2019.
-
The impact of degree variability on connectivity properties of large networks
Authors:
Lasse Leskelä,
Hoa Ngo
Abstract:
The goal of is to study how increased variability in the degree distribution impacts the global connectivity properties of a large network. We approach this question by modeling the network as a uniform random graph with a given degree sequence. We analyze the effect of the degree variability on the approximate size of the largest connected component using stochastic ordering techniques. A counter…
▽ More
The goal of is to study how increased variability in the degree distribution impacts the global connectivity properties of a large network. We approach this question by modeling the network as a uniform random graph with a given degree sequence. We analyze the effect of the degree variability on the approximate size of the largest connected component using stochastic ordering techniques. A counterexample shows that a higher degree variability may lead to a larger connected component, contrary to basic intuition about branching processes. When certain extremal cases are ruled out, the higher degree variability is shown to decrease the limiting approximate size of the largest connected component.
△ Less
Submitted 13 January, 2017; v1 submitted 13 August, 2015;
originally announced August 2015.