-
Empirical Networks are Sparse: Enhancing Multi-Edge Models with Zero-Inflation
Authors:
Giona Casiraghi,
Georges Andres
Abstract:
Real-world networks are sparse. As we show in this article, even when a large number of interactions is observed, most node pairs remain disconnected. We demonstrate that classical multi-edge network models, such as the $G(N,p)$, configuration models, and stochastic block models, fail to accurately capture this phenomenon. To mitigate this issue, zero-inflation must be integrated into these tradit…
▽ More
Real-world networks are sparse. As we show in this article, even when a large number of interactions is observed, most node pairs remain disconnected. We demonstrate that classical multi-edge network models, such as the $G(N,p)$, configuration models, and stochastic block models, fail to accurately capture this phenomenon. To mitigate this issue, zero-inflation must be integrated into these traditional models. Through zero-inflation, we incorporate a mechanism that accounts for the excess number of zeroes (disconnected pairs) observed in empirical data. By performing an analysis on all the datasets from the Sociopatterns repository, we illustrate how zero-inflated models more accurately reflect the sparsity and heavy-tailed edge count distributions observed in empirical data. Our findings underscore that failing to account for these ubiquitous properties in real-world networks inadvertently leads to biased models that do not accurately represent complex systems and their dynamics.
△ Less
Submitted 2 January, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Disentangling the Timescales of a Complex System: A Bayesian Approach to Temporal Network Analysis
Authors:
Giona Casiraghi,
Georges Andres
Abstract:
Changes in the timescales at which complex systems evolve are essential to predicting critical transitions and catastrophic failures. Disentangling the timescales of the dynamics governing complex systems remains a key challenge. With this study, we introduce an integrated Bayesian framework based on temporal network models to address this challenge. We focus on two methodologies: change point det…
▽ More
Changes in the timescales at which complex systems evolve are essential to predicting critical transitions and catastrophic failures. Disentangling the timescales of the dynamics governing complex systems remains a key challenge. With this study, we introduce an integrated Bayesian framework based on temporal network models to address this challenge. We focus on two methodologies: change point detection for identifying shifts in system dynamics, and a spectrum analysis for inferring the distribution of timescales. Applied to synthetic and empirical datasets, these methologies robustly identify critical transitions and comprehensively map the dominant and subsidiaries timescales in complex systems. This dual approach offers a powerful tool for analyzing temporal networks, significantly enhancing our understanding of dynamic behaviors in complex systems.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Locating Community Smells in Software Development Processes Using Higher-Order Network Centralities
Authors:
Christoph Gote,
Vincenzo Perri,
Christian Zingg,
Giona Casiraghi,
Carsten Arzig,
Alexander von Gernler,
Frank Schweitzer,
Ingo Scholtes
Abstract:
Community smells are negative patterns in software development teams' interactions that impede their ability to successfully create software. Examples are team members working in isolation, lack of communication and collaboration across departments or sub-teams, or areas of the codebase where only a few team members can work on. Current approaches aim to detect community smells by analysing static…
▽ More
Community smells are negative patterns in software development teams' interactions that impede their ability to successfully create software. Examples are team members working in isolation, lack of communication and collaboration across departments or sub-teams, or areas of the codebase where only a few team members can work on. Current approaches aim to detect community smells by analysing static network representations of software teams' interaction structures. In doing so, they are insufficient to locate community smells within development processes. Extending beyond the capabilities of traditional social network analysis, we show that higher-order network models provide a robust means of revealing such hidden patterns and complex relationships. To this end, we develop a set of centrality measures based on the MOGen higher-order network model and show their effectiveness in predicting influential nodes using five empirical datasets. We then employ these measures for a comprehensive analysis of a product team at the German IT security company genua GmbH, showcasing our method's success in identifying and locating community smells. Specifically, we uncover critical community smells in two areas of the team's development process. Semi-structured interviews with five team members validate our findings: while the team was aware of one community smell and employed measures to address it, it was not aware of the second. This highlights the potential of our approach as a robust tool for identifying and addressing community smells in software development teams. More generally, our work contributes to the social network analysis field with a powerful set of higher-order network centralities that effectively capture community dynamics and indirect relationships.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Adapting to Disruptions: Flexibility as a Pillar of Supply Chain Resilience
Authors:
Ambra Amico,
Luca Verginer,
Giona Casiraghi,
Giacomo Vaccario,
Frank Schweitzer
Abstract:
Supply chain disruptions cause shortages of raw material and products. To increase resilience, i.e., the ability to cope with shocks, substituting goods in established supply chains can become an effective alternative to creating new distribution links. We demonstrate its impact on supply deficits through a detailed analysis of the US opioid distribution system. Reconstructing 40 billion empirical…
▽ More
Supply chain disruptions cause shortages of raw material and products. To increase resilience, i.e., the ability to cope with shocks, substituting goods in established supply chains can become an effective alternative to creating new distribution links. We demonstrate its impact on supply deficits through a detailed analysis of the US opioid distribution system. Reconstructing 40 billion empirical distribution paths, our data-driven model allows a unique inspection of policies that increase the substitution flexibility. Our approach enables policymakers to quantify the trade-off between increasing flexibility, i.e., reduced supply deficits, and increasing complexity of the supply chain, which could make it more expensive to operate.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Modeling social resilience: Questions, answers, open problems
Authors:
Frank Schweitzer,
Georges Andres,
Giona Casiraghi,
Christoph Gote,
Ramona Roller,
Ingo Scholtes,
Giacomo Vaccario,
Christian Zingg
Abstract:
Resilience denotes the capacity of a system to withstand shocks and its ability to recover from them. We develop a framework to quantify the resilience of highly volatile, non-equilibrium social organizations, such as collectives or collaborating teams. It consists of four steps: (i) \emph{delimitation}, i.e., narrowing down the target systems, (ii) \emph{conceptualization}, .e., identifying how t…
▽ More
Resilience denotes the capacity of a system to withstand shocks and its ability to recover from them. We develop a framework to quantify the resilience of highly volatile, non-equilibrium social organizations, such as collectives or collaborating teams. It consists of four steps: (i) \emph{delimitation}, i.e., narrowing down the target systems, (ii) \emph{conceptualization}, .e., identifying how to approach social organizations, (iii) formal \emph{representation} using a combination of agent-based and network models, (iv) \emph{operationalization}, i.e. specifying measures and demonstrating how they enter the calculation of resilience. Our framework quantifies two dimensions of resilience, the \emph{robustness} of social organizations and their \emph{adaptivity}, and combines them in a novel resilience measure. It allows monitoring resilience instantaneously using longitudinal data instead of an ex-post evaluation.
△ Less
Submitted 31 December, 2022;
originally announced January 2023.
-
Understanding Online Migration Decisions Following the Banning of Radical Communities
Authors:
Giuseppe Russo,
Manoel Horta Ribeiro,
Giona Casiraghi,
Luca Verginer
Abstract:
The proliferation of radical online communities and their violent offshoots has sparked great societal concern. However, the current practice of banning such communities from mainstream platforms has unintended consequences: (I) the further radicalization of their members in fringe platforms where they migrate; and (ii) the spillover of harmful content from fringe back onto mainstream platforms. H…
▽ More
The proliferation of radical online communities and their violent offshoots has sparked great societal concern. However, the current practice of banning such communities from mainstream platforms has unintended consequences: (I) the further radicalization of their members in fringe platforms where they migrate; and (ii) the spillover of harmful content from fringe back onto mainstream platforms. Here, in a large observational study on two banned subreddits, r/The\_Donald and r/fatpeoplehate, we examine how factors associated with the RECRO radicalization framework relate to users' migration decisions. Specifically, we quantify how these factors affect users' decisions to post on fringe platforms and, for those who do, whether they continue posting on the mainstream platform. Our results show that individual-level factors, those relating to the behavior of users, are associated with the decision to post on the fringe platform. Whereas social-level factors, users' connection with the radical community, only affect the propensity to be coactive on both platforms. Overall, our findings pave the way for evidence-based moderation policies, as the decisions to migrate and remain coactive amplify unintended consequences of community bans.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
Struggling with change: The fragile resilience of collectives
Authors:
Frank Schweitzer,
Christian Zingg,
Giona Casiraghi
Abstract:
Collectives form non-equilibrium social structures characterised by a volatile dynamics. Individuals join or leave. Social relations change quickly. Therefore, differently from engineered or ecological systems, a resilient reference state cannot be defined. We propose a novel resilience measure combining two dimensions: robustness and adaptivity. We demonstrate how they can be quantified using dat…
▽ More
Collectives form non-equilibrium social structures characterised by a volatile dynamics. Individuals join or leave. Social relations change quickly. Therefore, differently from engineered or ecological systems, a resilient reference state cannot be defined. We propose a novel resilience measure combining two dimensions: robustness and adaptivity. We demonstrate how they can be quantified using data from a software developer collective. Our analysis reveals a resilience life cycle, i.e., stages of increasing resilience are followed by stages of decreasing resilience. We explain the reasons for these observed dynamics and provide a formal model to reproduce them. The resilience life cycle allows distinguishing between short-term resilience, given by a sequence of resilient states, and long-term resilience, which requires collectives to survive through different cycles.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
Spillover of Antisocial Behavior from Fringe Platforms: The Unintended Consequences of Community Banning
Authors:
Giuseppe Russo,
Luca Verginer,
Manoel Horta Ribeiro,
Giona Casiraghi
Abstract:
Online platforms face pressure to keep their communities civil and respectful. Thus, the bannings of problematic online communities from mainstream platforms like Reddit and Facebook are often met with enthusiastic public reactions. However, this policy can lead users to migrate to alternative fringe platforms with lower moderation standards and where antisocial behaviors like trolling and harassm…
▽ More
Online platforms face pressure to keep their communities civil and respectful. Thus, the bannings of problematic online communities from mainstream platforms like Reddit and Facebook are often met with enthusiastic public reactions. However, this policy can lead users to migrate to alternative fringe platforms with lower moderation standards and where antisocial behaviors like trolling and harassment are widely accepted. As users of these communities often remain co-active across mainstream and fringe platforms, antisocial behaviors may spill over onto the mainstream platform. We study this possible spillover by analyzing around 70,000 users from three banned communities that migrated to fringe platforms: r/The_Donald, r/GenderCritical, and r/Incels. Using a difference-in-differences design, we contrast co-active users with matched counterparts to estimate the causal effect of fringe platform participation on users' antisocial behavior on Reddit. Our results show that participating in the fringe communities increases users' toxicity on Reddit (as measured by Perspective API) and involvement with subreddits similar to the banned community -- which often also breach platform norms. The effect intensifies with time and exposure to the fringe platform. In short, we find evidence for a spillover of antisocial behavior from fringe platforms onto Reddit via co-participation.
△ Less
Submitted 12 April, 2023; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Reconstructing signed relations from interaction data
Authors:
Georges Andres,
Giona Casiraghi,
Giacomo Vaccario,
Frank Schweitzer
Abstract:
Positive and negative relations play an essential role in human behavior and shape the communities we live in. Despite their importance, data about signed relations is rare and commonly gathered through surveys. Interaction data is more abundant, for instance, in the form of proximity or communication data. So far, though, it could not be utilized to detect signed relations. In this paper, we show…
▽ More
Positive and negative relations play an essential role in human behavior and shape the communities we live in. Despite their importance, data about signed relations is rare and commonly gathered through surveys. Interaction data is more abundant, for instance, in the form of proximity or communication data. So far, though, it could not be utilized to detect signed relations. In this paper, we show how the underlying signed relations can be extracted with such data. Employing a statistical network approach, we construct networks of signed relations in four communities. We then show that these relations correspond to the ones reported in surveys. Additionally, the inferred relations allow us to study the homophily of individuals with respect to gender, religious beliefs, and financial backgrounds. We evaluate the importance of triads in the signed network to study group cohesion.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
The downside of heterogeneity: How established relations counteract systemic adaptivity in tasks assignments
Authors:
Giona Casiraghi,
Christian Zingg,
Frank Schweitzer
Abstract:
We study the lock-in effect in a network of task assignments. Agents have a heterogeneous fitness for solving tasks and can redistribute unfinished tasks to other agents. They learn over time to whom to reassign tasks and preferably choose agents with higher fitness. A lock-in occurs if reassignments can no longer adapt. Agents overwhelmed with tasks then fail, leading to failure cascades. We find…
▽ More
We study the lock-in effect in a network of task assignments. Agents have a heterogeneous fitness for solving tasks and can redistribute unfinished tasks to other agents. They learn over time to whom to reassign tasks and preferably choose agents with higher fitness. A lock-in occurs if reassignments can no longer adapt. Agents overwhelmed with tasks then fail, leading to failure cascades. We find that the probability for lock-ins and systemic failures increase with the heterogeneity in fitness values. To study this dependence, we use the Shannon entropy of the network of task assignments. A detailed discussion links our findings to the problem of resilience and observations in social systems.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders
Authors:
Christoph Gote,
Giona Casiraghi,
Frank Schweitzer,
Ingo Scholtes
Abstract:
We propose a novel sequence prediction method for sequential data capturing node traversals in graphs. Our method builds on a statistical modelling framework that combines multiple higher-order network models into a single multi-order model. We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. Our framework facilitates both nex…
▽ More
We propose a novel sequence prediction method for sequential data capturing node traversals in graphs. Our method builds on a statistical modelling framework that combines multiple higher-order network models into a single multi-order model. We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. Our framework facilitates both next-element and full sequence prediction given a sequence-prefix of any length. We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems. The results show that our method out-performs state-of-the-art algorithms for next-element prediction. We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.
△ Less
Submitted 25 August, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Intervention scenarios to enhance knowledge transfer in a network of firm
Authors:
Frank Schweitzer,
Yan Zhang,
Giona Casiraghi
Abstract:
We investigate a multi-agent model of firms in an R\&D network. Each firm is characterized by its knowledge stock $x_{i}(t)$, which follows a non-linear dynamics. It can grow with the input from other firms, i.e., by knowledge transfer, and decays otherwise. Maintaining interactions is costly. Firms can leave the network if their expected knowledge growth is not realized, which may cause other fir…
▽ More
We investigate a multi-agent model of firms in an R\&D network. Each firm is characterized by its knowledge stock $x_{i}(t)$, which follows a non-linear dynamics. It can grow with the input from other firms, i.e., by knowledge transfer, and decays otherwise. Maintaining interactions is costly. Firms can leave the network if their expected knowledge growth is not realized, which may cause other firms to also leave the network. The paper discusses two bottom-up intervention scenarios to prevent, reduce, or delay cascades of firms leaving. The first one is based on the formalism of network controllability, in which driver nodes are identified and subsequently incentivized, by reducing their costs. The second one combines node interventions and network interventions. It proposes the controlled removal of a single firm and the random replacement of firms leaving. This allows to generate small cascades, which prevents the occurrence of large cascades. We find that both approaches successfully mitigate cascades and thus improve the resilience of the R\&D network.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Probing the robustness of nested multi-layer networks
Authors:
Giona Casiraghi,
Antonios Garas,
Frank Schweitzer
Abstract:
We consider a multi-layer network with two layers, $\mathcal{L}_{1}$, $\mathcal{L}_{2}$. Their intra-layer topology shows a scale-free degree distribution and a core-periphery structure. A nested structure describes the inter-layer topology, i.e., some nodes from $\mathcal{L}_{1}$, the generalists, have many links to nodes in $\mathcal{L}_{2}$, specialists only have a few. This structure is verifi…
▽ More
We consider a multi-layer network with two layers, $\mathcal{L}_{1}$, $\mathcal{L}_{2}$. Their intra-layer topology shows a scale-free degree distribution and a core-periphery structure. A nested structure describes the inter-layer topology, i.e., some nodes from $\mathcal{L}_{1}$, the generalists, have many links to nodes in $\mathcal{L}_{2}$, specialists only have a few. This structure is verified by analyzing two empirical networks from ecology and economics. To probe the robustness of the multi-layer network, we remove nodes from $\mathcal{L}_{1}$ with their inter- and intra-layer links and measure the impact on the size of the largest connected component, $F_{2}$, in $\mathcal{L}_{2}$, which we take as a robustness measure. We test different attack scenarios by preferably removing peripheral or core nodes. We also vary the intra-layer coupling between generalists and specialists, to study their impact on the robustness of the multi-layer network. We find that some combinations of attack scenario and intra-layer coupling lead to very low robustness values, whereas others demonstrate high robustness of the multi-layer network because of the intra-layer links. Our results shed new light on the robustness of bipartite networks, which consider only inter-layer, but no intra-layer links.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
Improving the robustness of online social networks: A simulation approach of network interventions
Authors:
Giona Casiraghi,
Frank Schweitzer
Abstract:
Online social networks (OSN) are prime examples of socio-technical systems in which individuals interact via a technical platform. OSN are very volatile because users enter and exit and frequently change their interactions. This makes the robustness of such systems difficult to measure and to control. To quantify robustness, we propose a coreness value obtained from the directed interaction networ…
▽ More
Online social networks (OSN) are prime examples of socio-technical systems in which individuals interact via a technical platform. OSN are very volatile because users enter and exit and frequently change their interactions. This makes the robustness of such systems difficult to measure and to control. To quantify robustness, we propose a coreness value obtained from the directed interaction network. We study the emergence of large drop-out cascades of users leaving the OSN by means of an agent-based model. For agents, we define a utility function that depends on their relative reputation and their costs for interactions. The decision of agents to leave the OSN depends on this utility. Our aim is to prevent drop-out cascades by influencing specific agents with low utility. We identify strategies to control agents in the core and the periphery of the OSN such that drop-out cascades are significantly reduced, and the robustness of the OSN is increased.
△ Less
Submitted 31 October, 2019;
originally announced October 2019.
-
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
Authors:
Timothy LaRock,
Vahan Nanumyan,
Ingo Scholtes,
Giona Casiraghi,
Tina Eliassi-Rad,
Frank Schweitzer
Abstract:
The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams…
▽ More
The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.
△ Less
Submitted 29 January, 2020; v1 submitted 25 May, 2019;
originally announced May 2019.
-
What is the Entropy of a Social Organization?
Authors:
Christian Zingg,
Giona Casiraghi,
Giacomo Vaccario,
Frank Schweitzer
Abstract:
We quantify a social organization's potentiality, that is its ability to attain different configurations. The organization is represented as a network in which nodes correspond to individuals and (multi-)edges to their multiple interactions. Attainable configurations are treated as realizations from a network ensemble. To encode interaction preferences between individuals, we choose the generalize…
▽ More
We quantify a social organization's potentiality, that is its ability to attain different configurations. The organization is represented as a network in which nodes correspond to individuals and (multi-)edges to their multiple interactions. Attainable configurations are treated as realizations from a network ensemble. To encode interaction preferences between individuals, we choose the generalized hypergeometric ensemble of random graphs, which is described by a closed-form probability distribution. From this distribution we calculate Shannon entropy as a measure of potentiality. This allows us to compare different organizations as well different stages in the development of a given organization. The feasibility of the approach is demonstrated using data from 3 empirical and 2 synthetic systems.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Quantifying Triadic Closure in Multi-Edge Social Networks
Authors:
Laurence Brandenberger,
Giona Casiraghi,
Vahan Nanumyan,
Frank Schweitzer
Abstract:
Multi-edge networks capture repeated interactions between individuals. In social networks, such edges often form closed triangles, or triads. Standard approaches to measure this triadic closure, however, fail for multi-edge networks, because they do not consider that triads can be formed by edges of different multiplicity. We propose a novel measure of triadic closure for multi-edge networks of so…
▽ More
Multi-edge networks capture repeated interactions between individuals. In social networks, such edges often form closed triangles, or triads. Standard approaches to measure this triadic closure, however, fail for multi-edge networks, because they do not consider that triads can be formed by edges of different multiplicity. We propose a novel measure of triadic closure for multi-edge networks of social interactions based on a shared partner statistic. We demonstrate that our operalization is able to detect meaningful closure in synthetic and empirical multi-edge networks, where common approaches fail. This is a cornerstone in driving inferential network analyses from the analysis of binary networks towards the analyses of multi-edge and weighted networks, which offer a more realistic representation of social interactions and relations.
△ Less
Submitted 8 May, 2019;
originally announced May 2019.
-
Analytical Formulation of the Block-Constrained Configuration Model
Authors:
Giona Casiraghi
Abstract:
We provide a novel family of generative block-models for random graphs that naturally incorporates degree distributions: the block-constrained configuration model. Block-constrained configuration models build on the generalised hypergeometric ensemble of random graphs and extend the well-known configuration model by enforcing block-constraints on the edge generation process. The resulting models a…
▽ More
We provide a novel family of generative block-models for random graphs that naturally incorporates degree distributions: the block-constrained configuration model. Block-constrained configuration models build on the generalised hypergeometric ensemble of random graphs and extend the well-known configuration model by enforcing block-constraints on the edge generation process. The resulting models are analytically tractable and practical to fit even to large networks. These models provide a new, flexible tool for the study of community structure and for network science in general, where modelling networks with heterogeneous degree distributions is of central importance.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
Generalised hypergeometric ensembles of random graphs: the configuration model as an urn problem
Authors:
Giona Casiraghi,
Vahan Nanumyan
Abstract:
We introduce a broad class of random graph models: the generalised hypergeometric ensemble (GHypEG). This class enables to solve some long standing problems in random graph theory. First, GHypEG provides an elegant and compact formulation of the well-known configuration model in terms of an urn problem. Second, GHypEG allows to incorporate arbitrary tendencies to connect different vertex pairs. Th…
▽ More
We introduce a broad class of random graph models: the generalised hypergeometric ensemble (GHypEG). This class enables to solve some long standing problems in random graph theory. First, GHypEG provides an elegant and compact formulation of the well-known configuration model in terms of an urn problem. Second, GHypEG allows to incorporate arbitrary tendencies to connect different vertex pairs. Third, we present the closed-form expressions of the associated probability distribution ensures the analytical tractability of our formulation. This is in stark contrast with the previous state-of-the-art, which is to implement the configuration model by means of computationally expensive procedures.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles
Authors:
Giona Casiraghi,
Vahan Nanumyan,
Ingo Scholtes,
Frank Schweitzer
Abstract:
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques…
▽ More
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.
△ Less
Submitted 7 July, 2017; v1 submitted 14 June, 2017;
originally announced June 2017.
-
Multiplex Network Regression: How do relations drive interactions?
Authors:
Giona Casiraghi
Abstract:
We introduce a statistical regression model to investigate the impact of dyadic relations on complex networks generated from observed repeated interactions. It is based on generalised hypergeometric ensembles (gHypEG), a class of statistical network ensembles developed recently to deal with multi-edge graph and count data. We represent different types of known relations between system elements by…
▽ More
We introduce a statistical regression model to investigate the impact of dyadic relations on complex networks generated from observed repeated interactions. It is based on generalised hypergeometric ensembles (gHypEG), a class of statistical network ensembles developed recently to deal with multi-edge graph and count data. We represent different types of known relations between system elements by weighted graphs, separated in the different layers of a multiplex network. With our method, we can regress the influence of each relational layer, the explanatory variables, on the interaction counts, the dependent variables. Moreover, we can quantify the statistical significance of the relations as explanatory variables for the observed interactions. To demonstrate the power of our approach, we investigate an example based on empirical data.
△ Less
Submitted 20 July, 2020; v1 submitted 7 February, 2017;
originally announced February 2017.
-
Generalized Hypergeometric Ensembles: Statistical Hypothesis Testing in Complex Networks
Authors:
Giona Casiraghi,
Vahan Nanumyan,
Ingo Scholtes,
Frank Schweitzer
Abstract:
Statistical ensembles of networks, i.e., probability spaces of all networks that are consistent with given aggregate statistics, have become instrumental in the analysis of complex networks. Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis te…
▽ More
Statistical ensembles of networks, i.e., probability spaces of all networks that are consistent with given aggregate statistics, have become instrumental in the analysis of complex networks. Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis testing. Contributing to the foundation of these data analysis techniques, in this Letter we introduce generalized hypergeometric ensembles, a broad class of analytically tractable statistical ensembles of finite, directed and weighted networks. This framework can be interpreted as a generalization of the classical configuration model, which is commonly used to randomly generate networks with a given degree sequence or distribution. Our generalization rests on the introduction of dyadic link propensities, which capture the degree-corrected tendencies of pairs of nodes to form edges between each other. Studying empirical and synthetic data, we show that our approach provides broad perspectives for model selection and statistical hypothesis testing in data on complex networks.
△ Less
Submitted 8 August, 2016; v1 submitted 8 July, 2016;
originally announced July 2016.