-
The microscale organization of directed hypergraphs
Authors:
Quintino Francesco Lotito,
Alberto Vendramini,
Alberto Montresor,
Federico Battiston
Abstract:
Many real-world complex systems are characterized by non-pairwise -- higher-order -- interactions among system's units, and can be effectively modeled as hypergraphs. Directed hypergraphs distinguish between source and target sets within each hyperedge, and allow to account for the directional flow of information between nodes. Here, we provide a framework to characterize the structural organizati…
▽ More
Many real-world complex systems are characterized by non-pairwise -- higher-order -- interactions among system's units, and can be effectively modeled as hypergraphs. Directed hypergraphs distinguish between source and target sets within each hyperedge, and allow to account for the directional flow of information between nodes. Here, we provide a framework to characterize the structural organization of directed higher-order networks at their microscale. First, we extract the fingerprint of a directed hypergraph, capturing the frequency of hyperedges with a certain source and target sizes, and use this information to compute differences in higher-order connectivity patterns among real-world systems. Then, we formulate reciprocity in hypergraphs, including exact, strong, and weak definitions, to measure to which extent hyperedges are reciprocated. Finally, we extend motif analysis to identify recurring interaction patterns and extract the building blocks of directed hypergraphs. We validate our framework on empirical datasets, including Bitcoin transactions, metabolic networks, and citation data, revealing structural principles behind the organization of real-world systems.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Comparing Personalized Relevance Algorithms for Directed Graphs
Authors:
Luca Cavalcanti,
Cristian Consonni,
Martin Brugnara,
David Laniado,
Alberto Montresor
Abstract:
We present an interactive Web platform that, given a directed graph, allows identifying the most relevant nodes related to a given query node. Besides well-established algorithms such as PageRank and Personalized PageRank, the demo includes Cyclerank, a novel algorithm that addresses some of their limitations by leveraging cyclic paths to compute personalized relevance scores. Our demo design enab…
▽ More
We present an interactive Web platform that, given a directed graph, allows identifying the most relevant nodes related to a given query node. Besides well-established algorithms such as PageRank and Personalized PageRank, the demo includes Cyclerank, a novel algorithm that addresses some of their limitations by leveraging cyclic paths to compute personalized relevance scores. Our demo design enables two use cases: (a) algorithm comparison, comparing the results obtained with different algorithms, and (b) dataset comparison, for exploring and gaining insights into a dataset and comparing it with others. We provide 50 pre-loaded datasets from Wikipedia, Twitter, and Amazon and seven algorithms. Users can upload new datasets, and new algorithms can be easily added. By showcasing efficient algorithms to compute relevance scores in directed graphs, our tool helps to uncover hidden relationships within the data, which makes of it a valuable addition to the repertoire of graph analysis algorithms.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Multiplex measures for higher-order networks
Authors:
Quintino Francesco Lotito,
Alberto Montresor,
Federico Battiston
Abstract:
A wide variety of complex systems are characterized by interactions of different types involving varying numbers of units. Multiplex hypergraphs serve as a tool to describe such structures, capturing distinct types of higher-order interactions among a collection of units. In this work, we introduce a comprehensive set of measures to describe structural connectivity patterns in multiplex hypergraph…
▽ More
A wide variety of complex systems are characterized by interactions of different types involving varying numbers of units. Multiplex hypergraphs serve as a tool to describe such structures, capturing distinct types of higher-order interactions among a collection of units. In this work, we introduce a comprehensive set of measures to describe structural connectivity patterns in multiplex hypergraphs, considering scales from node and hyperedge levels to the system's mesoscale. We validate our measures with three real-world datasets: scientific co-authorship in physics, movie collaborations, and high school interactions. This validation reveals new collaboration patterns, identifies trends within and across movie subfields, and provides insights into daily interaction dynamics. Our framework aims to offer a more nuanced characterization of real-world systems marked by both multiplex and higher-order interactions.
△ Less
Submitted 9 September, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Hypergraphx: a library for higher-order network analysis
Authors:
Quintino Francesco Lotito,
Martina Contisciani,
Caterina De Bacco,
Leonardo Di Gaetano,
Luca Gallo,
Alberto Montresor,
Federico Musciotto,
Nicolò Ruggeri,
Federico Battiston
Abstract:
From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of…
▽ More
From social to biological systems, many real-world systems are characterized by higher-order, non-dyadic interactions. Such systems are conveniently described by hypergraphs, where hyperedges encode interactions among an arbitrary number of units. Here, we present an open-source python library, hypergraphx (HGX), providing a comprehensive collection of algorithms and functions for the analysis of higher-order networks. These include different ways to convert data across distinct higher-order representations, a large variety of measures of higher-order organization at the local and the mesoscale, statistical filters to sparsify higher-order data, a wide array of static and dynamic generative models, and an implementation of different dynamical processes with higher-order interactions. Our computational framework is general, and allows to analyse hypergraphs with weighted, directed, signed, temporal and multiplex group interactions. We provide visual insights on higher-order data through a variety of different visualization tools. We accompany our code with an extended higher-order data repository, and demonstrate the ability of HGX to analyse real-world systems through a systematic analysis of a social network with higher-order interactions. The library is conceived as an evolving, community-based effort, which will further extend its functionalities over the years. Our software is available at https://github.com/HGX-Team/hypergraphx
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Hyperlink communities in higher-order networks
Authors:
Quintino Francesco Lotito,
Federico Musciotto,
Alberto Montresor,
Federico Battiston
Abstract:
Many networks can be characterised by the presence of communities, which are groups of units that are closely linked. Identifying these communities can be crucial for understanding the system's overall function. Recently, hypergraphs have emerged as a fundamental tool for modelling systems where interactions are not limited to pairs but may involve an arbitrary number of nodes. In this study, we a…
▽ More
Many networks can be characterised by the presence of communities, which are groups of units that are closely linked. Identifying these communities can be crucial for understanding the system's overall function. Recently, hypergraphs have emerged as a fundamental tool for modelling systems where interactions are not limited to pairs but may involve an arbitrary number of nodes. In this study, we adopt a dual approach to community detection and extend the concept of link communities to hypergraphs. This extension allows us to extract informative clusters of highly related hyperedges. We analyze the dendrograms obtained by applying hierarchical clustering to distance matrices among hyperedges across a variety of real-world data, showing that hyperlink communities naturally highlight the hierarchical and multiscale structure of higher-order networks. Moreover, hyperlink communities enable us to extract overlapping memberships from nodes, overcoming limitations of traditional hard clustering methods. Finally, we introduce higher-order network cartography as a practical tool for categorizing nodes into different structural roles based on their interaction patterns and community participation. This approach aids in identifying different types of individuals in a variety of real-world social systems. Our work contributes to a better understanding of the structural organization of real-world higher-order systems.
△ Less
Submitted 19 February, 2024; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Exact and sampling methods for mining higher-order motifs in large hypergraphs
Authors:
Quintino Francesco Lotito,
Federico Musciotto,
Federico Battiston,
Alberto Montresor
Abstract:
Network motifs are recurrent, small-scale patterns of interactions observed frequently in a system. They shed light on the interplay between the topology and the dynamics of complex networks across various domains. In this work, we focus on the problem of counting occurrences of small sub-hypergraph patterns in very large hypergraphs, where higher-order interactions connect arbitrary numbers of sy…
▽ More
Network motifs are recurrent, small-scale patterns of interactions observed frequently in a system. They shed light on the interplay between the topology and the dynamics of complex networks across various domains. In this work, we focus on the problem of counting occurrences of small sub-hypergraph patterns in very large hypergraphs, where higher-order interactions connect arbitrary numbers of system units. We show how directly exploiting higher-order structures speeds up the counting process compared to traditional data mining techniques for exact motif discovery. Moreover, with hyperedge sampling, performance is further improved at the cost of small errors in the estimation of motif frequency. We evaluate our method on several real-world datasets describing face-to-face interactions, co-authorship and human communication. We show that our approximated algorithm allows us to extract higher-order motifs faster and on a larger scale, beyond the computational limits of an exact approach.
△ Less
Submitted 2 October, 2023; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Higher-order motif analysis in hypergraphs
Authors:
Quintino Francesco Lotito,
Federico Musciotto,
Alberto Montresor,
Federico Battiston
Abstract:
A deluge of new data on social, technological and biological networked systems suggests that a large number of interactions among system units are not limited to pairs, but rather involve a higher number of nodes. To properly encode such higher-order interactions, richer mathematical frameworks such as hypergraphs are needed, where hyperlinks describe connections among an arbitrary number of nodes…
▽ More
A deluge of new data on social, technological and biological networked systems suggests that a large number of interactions among system units are not limited to pairs, but rather involve a higher number of nodes. To properly encode such higher-order interactions, richer mathematical frameworks such as hypergraphs are needed, where hyperlinks describe connections among an arbitrary number of nodes. Here we introduce the concept of higher-order motifs, small connected subgraphs where vertices may be linked by interactions of any order. We provide lower and upper bounds on the number of higher-order motifs as a function of the motif size, and propose an efficient algorithm to extract complete higher-order motif profiles from empirical data. We identify different families of hypergraphs, characterized by distinct higher-order connectivity patterns at the local scale. We also capture evidences of structural reinforcement, a mechanism that associates higher strengths of higher-order interactions for the nodes that interact more at the pairwise level. Our work highlights the informative power of higher-order motifs, providing a first way to extract higher-order fingerprints in hypergraphs at the network microscale.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Dynamic Embeddings for Interaction Prediction
Authors:
Zekarias T. Kefato,
Sarunas Girdzijauskas,
Nasrullah Sheikh,
Alberto Montresor
Abstract:
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. While the last decade has seen an explosion of RSs aimed at identifying relevant items that match user preferences, there is still a range of aspects that could be considered to further improve their performance. For example, often RSs are centered around the user, who is modeled using…
▽ More
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. While the last decade has seen an explosion of RSs aimed at identifying relevant items that match user preferences, there is still a range of aspects that could be considered to further improve their performance. For example, often RSs are centered around the user, who is modeled using her recent sequence of activities. Recent studies, however, have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings. Building on the success of these studies, we propose a novel method called DeePRed that addresses some of their limitations. In particular, we avoid recursive and costly interactions between consecutive short-term embeddings by using long-term (stationary) embeddings as a proxy. This enable us to train DeePRed using simple mini-batches without the overhead of specialized mini-batches proposed in previous studies. Moreover, DeePRed's effectiveness comes from the aforementioned design and a multi-way attention mechanism that inspects user-item compatibility. Experiments show that DeePRed outperforms the best state-of-the-art approach by at least 14% on next item prediction task, while gaining more than an order of magnitude speedup over the best performing baselines. Although this study is mainly concerned with temporal interaction networks, we also show the power and flexibility of DeePRed by adapting it to the case of static interaction networks, substituting the short- and long-term aspects with local and global ones.
△ Less
Submitted 26 February, 2021; v1 submitted 10 November, 2020;
originally announced November 2020.
-
A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data
Authors:
Giovanni De Toni,
Cristian Consonni,
Alberto Montresor
Abstract:
Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Being able to estimate in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this st…
▽ More
Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Being able to estimate in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this study, we show the feasibility of exploiting information about Wikipedia's page views of a selected group of articles and machine learning models to obtain accurate estimates of influenza-like illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. We propose a novel language-agnostic method, based on two algorithms, Personalized PageRank and CycleRank, to automatically select the most relevant Wikipedia pages to be monitored without the need for expert supervision. We then show how our model is able to reach state-of-the-art results by comparing it with previous solutions.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Efficient Algorithms to Mine Maximal Span-Trusses From Temporal Graphs
Authors:
Quintino Francesco Lotito,
Alberto Montresor
Abstract:
Over the last decade, there has been an increasing interest in temporal graphs, pushed by a growing availability of temporally-annotated network data coming from social, biological and financial networks. Despite the importance of analyzing complex temporal networks, there is a huge gap between the set of definitions, algorithms and tools available to study large static graphs and the ones availab…
▽ More
Over the last decade, there has been an increasing interest in temporal graphs, pushed by a growing availability of temporally-annotated network data coming from social, biological and financial networks. Despite the importance of analyzing complex temporal networks, there is a huge gap between the set of definitions, algorithms and tools available to study large static graphs and the ones available for temporal graphs. An important task in temporal graph analysis is mining dense structures, i.e., identifying high-density subgraphs together with the span in which this high density is observed. In this paper, we introduce the concept of $(k, Δ)$-truss (span-truss) in temporal graphs, a temporal generalization of the $k$-truss, in which $k$ captures the information about the density and $Δ$ captures the time span in which this density holds. We then propose novel and efficient algorithms to identify maximal span-trusses, namely the ones not dominated by any other span-truss neither in the order $k$ nor in the interval $Δ$, and evaluate them on a number of public available datasets.
△ Less
Submitted 4 October, 2020; v1 submitted 3 September, 2020;
originally announced September 2020.
-
Which way? Direction-Aware Attributed Graph Embedding
Authors:
Zekarias T. Kefato,
Nasrullah Sheikh,
Alberto Montresor
Abstract:
Graph embedding algorithms are used to efficiently represent (encode) a graph in a low-dimensional continuous vector space that preserves the most important properties of the graph. One aspect that is often overlooked is whether the graph is directed or not. Most studies ignore the directionality, so as to learn high-quality representations optimized for node classification. On the other hand, stu…
▽ More
Graph embedding algorithms are used to efficiently represent (encode) a graph in a low-dimensional continuous vector space that preserves the most important properties of the graph. One aspect that is often overlooked is whether the graph is directed or not. Most studies ignore the directionality, so as to learn high-quality representations optimized for node classification. On the other hand, studies that capture directionality are usually effective on link prediction but do not perform well on other tasks. This preliminary study presents a novel text-enriched, direction-aware algorithm called DIAGRAM , based on a carefully designed multi-objective model to learn embeddings that preserve the direction of edges, textual features and graph context of nodes. As a result, our algorithm does not have to trade one property for another and jointly learns high-quality representations for multiple network analysis tasks. We empirically show that DIAGRAM significantly outperforms six state-of-the-art baselines, both direction-aware and oblivious ones,on link prediction and network reconstruction experiments using two popular datasets. It also achieves a comparable performance on node classification experiments against these baselines using the same datasets.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
Please, do not decentralize the Internet with (permissionless) blockchains!
Authors:
Pedro Garcia Lopez,
Alberto Montresor,
Anwitaman Datta
Abstract:
The old mantra of decentralizing the Internet is coming again with fanfare, this time around the blockchain technology hype. We have already seen a technology supposed to change the nature of the Internet: peer-to-peer. The reality is that peer-to-peer naming systems failed, peer-to-peer social networks failed, and yes, peer-to-peer storage failed as well. In this paper, we will review the researc…
▽ More
The old mantra of decentralizing the Internet is coming again with fanfare, this time around the blockchain technology hype. We have already seen a technology supposed to change the nature of the Internet: peer-to-peer. The reality is that peer-to-peer naming systems failed, peer-to-peer social networks failed, and yes, peer-to-peer storage failed as well. In this paper, we will review the research on distributed systems in the last few years to identify the limits of open peer-to-peer networks. We will address issues like system complexity, security and frailty, instability and performance. We will show how many of the aforementioned problems also apply to the recent breed of permissionless blockchain networks. The applicability of such systems to mature industrial applications is undermined by the same properties that make them so interesting for a libertarian audience: namely, their openness, their pseudo-anonymity and their unregulated cryptocurrencies. As such, we argue that permissionless blockchain networks are unsuitable to be the substrate for a decentralized Internet. Yet, there is still hope for more decentralization, albeit in a form somewhat limited with respect to the libertarian view of decentralized Internet: in cooperation rather than in competition with the superpowerful datacenters that dominate the world today. This is derived from the recent surge in interest in byzantine fault tolerance and permissioned blockchains, which opens the door to a world where use of trusted third parties is not the only way to arbitrate an ensemble of entities. The ability of establish trust through permissioned blockchains enables to move the control from the datacenters to the edge, truly realizing the promises of edge-centric computing.
△ Less
Submitted 30 April, 2019;
originally announced April 2019.
-
WikiLinkGraphs: A Complete, Longitudinal and Multi-Language Dataset of the Wikipedia Link Networks
Authors:
Cristian Consonni,
David Laniado,
Alberto Montresor
Abstract:
Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the $9$ largest language editions. The dataset contains yearly snapshots of the network and spans $17$ years, from the creation of Wikipedia in 2001…
▽ More
Wikipedia articles contain multiple links connecting a subject to other pages of the encyclopedia. In Wikipedia parlance, these links are called internal links or wikilinks. We present a complete dataset of the network of internal Wikipedia links for the $9$ largest language editions. The dataset contains yearly snapshots of the network and spans $17$ years, from the creation of Wikipedia in 2001 to March 1st, 2018. While previous work has mostly focused on the complete hyperlink graph which includes also links automatically generated by templates, we parsed each revision of each article to track links appearing in the main text. In this way we obtained a cleaner network, discarding more than half of the links and representing all and only the links intentionally added by editors. We describe in detail how the Wikipedia dumps have been processed and the challenges we have encountered, including the need to handle special pages such as redirects, i.e., alternative article titles. We present descriptive statistics of several snapshots of this network. Finally, we propose several research opportunities that can be explored using this new dataset.
△ Less
Submitted 4 April, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
BLADYG: A Graph Processing Framework for Large Dynamic Graphs
Authors:
Sabeur Aridhi,
Alberto Montresor,
Yannis Velegrakis
Abstract:
Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertex-centric and (2) bl…
▽ More
Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertex-centric and (2) block-centric approaches. In vertex-centric approaches, each vertex corresponds to a process, and message are exchanged among vertices. In block-centric approaches, the unit of computation is a block, a connected subgraph of the graph, and message exchanges occur among blocks. In this paper, we are considering the issues of scale and dynamism in the case of block-centric approaches. We present bladyg, a block-centric framework that addresses the issue of dynamism in large-scale graphs. We present an implementation of BLADYG on top of akka framework. We experimentally evaluate the performance of the proposed framework.
△ Less
Submitted 2 January, 2017;
originally announced January 2017.
-
Distributed Edge Partitioning for Graph Processing
Authors:
Alessio Guerrieri,
Alberto Montresor
Abstract:
The availability of larger and larger graph datasets, growing exponentially over the years, has created several new algorithmic challenges to be addressed. Sequential approaches have become unfeasible, while interest on parallel and distributed algorithms has greatly increased.
Appropriately partitioning the graph as a preprocessing step can improve the degree of parallelism of its analysis. A n…
▽ More
The availability of larger and larger graph datasets, growing exponentially over the years, has created several new algorithmic challenges to be addressed. Sequential approaches have become unfeasible, while interest on parallel and distributed algorithms has greatly increased.
Appropriately partitioning the graph as a preprocessing step can improve the degree of parallelism of its analysis. A number of heuristic algorithms have been developed to solve this problem, but many of them subdivide the graph on its vertex set, thus obtaining a vertex-partitioned graph.
Aim of this paper is to explore a completely different approach based on edge partitioning, in which edges, rather than vertices, are partitioned into disjoint subsets. Contribution of this paper is twofold: first, we introduce a graph processing framework based on edge partitioning, that is flexible enough to be applied to several different graph problems. Second, we show the feasibility of these ideas by presenting a distributed edge partitioning algorithm called d-fep.
Our framework is thoroughly evaluated, using both simulations and an Hadoop implementation running on the Amazon EC2 cloud. The experiments show that d-fep is efficient, scalable and obtains consistently good partitions. The resulting edge-partitioned graph can be exploited to obtain more efficient implementations of graph analysis algorithms.
△ Less
Submitted 25 March, 2014;
originally announced March 2014.
-
Distributed k-Core Decomposition
Authors:
Alberto Montresor,
Francesco De Pellegrini,
Daniele Miorandi
Abstract:
Among the novel metrics used to study the relative importance of nodes in complex networks, k-core decomposition has found a number of applications in areas as diverse as sociology, proteinomics, graph visualization, and distributed system analysis and design. This paper proposes new distributed algorithms for the computation of the k-core decomposition of a network, with the purpose of (i) enabli…
▽ More
Among the novel metrics used to study the relative importance of nodes in complex networks, k-core decomposition has found a number of applications in areas as diverse as sociology, proteinomics, graph visualization, and distributed system analysis and design. This paper proposes new distributed algorithms for the computation of the k-core decomposition of a network, with the purpose of (i) enabling the run-time computation of k-cores in "live" distributed systems and (ii) allowing the decomposition, over a set of connected machines, of very large graphs, that cannot be hosted in a single machine. Lower bounds on the algorithms complexity are given, and an exhaustive experimental analysis on real-world graphs is provided.
△ Less
Submitted 29 March, 2011; v1 submitted 28 March, 2011;
originally announced March 2011.