-
Cryptocurrency Network Analysis
Authors:
Natkamon Tovanich,
Célestin Coquidé,
Rémy Cazabet
Abstract:
Cryptocurrency network analysis consists of applying the tools and methods of social network analysis to transactional data issued from cryptocurrencies. The main difference with most online social networks is that users do not exchange textual content but instead value -- in systems designed mainly as cryptocurrency, such as Bitcoin -- or digital items and services in more permissive systems base…
▽ More
Cryptocurrency network analysis consists of applying the tools and methods of social network analysis to transactional data issued from cryptocurrencies. The main difference with most online social networks is that users do not exchange textual content but instead value -- in systems designed mainly as cryptocurrency, such as Bitcoin -- or digital items and services in more permissive systems based on smart contracts such as Ethereum.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
SEANN: A Domain-Informed Neural Network for Epidemiological Insights
Authors:
Jean-Baptiste Guimbaud,
Marc Plantevit,
Léa Maître,
Rémy Cazabet
Abstract:
In epidemiology, traditional statistical methods such as logistic regression, linear regression, and other parametric models are commonly employed to investigate associations between predictors and health outcomes. However, non-parametric machine learning techniques, such as deep neural networks (DNNs), coupled with explainable AI (XAI) tools, offer new opportunities for this task. Despite their p…
▽ More
In epidemiology, traditional statistical methods such as logistic regression, linear regression, and other parametric models are commonly employed to investigate associations between predictors and health outcomes. However, non-parametric machine learning techniques, such as deep neural networks (DNNs), coupled with explainable AI (XAI) tools, offer new opportunities for this task. Despite their potential, these methods face challenges due to the limited availability of high-quality, high-quantity data in this field. To address these challenges, we introduce SEANN, a novel approach for informed DNNs that leverages a prevalent form of domain-specific knowledge: Pooled Effect Sizes (PES). PESs are commonly found in published Meta-Analysis studies, in different forms, and represent a quantitative form of a scientific consensus. By direct integration within the learning procedure using a custom loss, we experimentally demonstrate significant improvements in the generalizability of predictive performances and the scientific plausibility of extracted relationships compared to a domain-knowledge agnostic neural network in a scarce and noisy data setting.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Inside Alameda Research: A Multi-Token Network Analysis
Authors:
Célestin Coquidé,
Rémy Cazabet,
Natkamon Tovanich
Abstract:
We analyze the token transfer network on Ethereum, focusing on accounts associated with Alameda Research, a cryptocurrency trading firm implicated in the misuse of FTX customer funds. Using a multi-token network representation, we examine node centralities and the network backbone to identify critical accounts, tokens, and activity groups. The temporal evolution of Alameda accounts reveals shifts…
▽ More
We analyze the token transfer network on Ethereum, focusing on accounts associated with Alameda Research, a cryptocurrency trading firm implicated in the misuse of FTX customer funds. Using a multi-token network representation, we examine node centralities and the network backbone to identify critical accounts, tokens, and activity groups. The temporal evolution of Alameda accounts reveals shifts in token accumulation and distribution patterns leading up to its bankruptcy in November 2022. Through network analysis, our work offers insights into the activities and dynamics that shape the DeFi ecosystem.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models
Authors:
Yassir Lairgi,
Ludovic Moncla,
Rémy Cazabet,
Khalid Benabdeslem,
Pierre Cléau
Abstract:
Most available data is unstructured, making it challenging to access valuable information. Automatically building Knowledge Graphs (KGs) is crucial for structuring data and making it accessible, allowing users to search for information effectively. KGs also facilitate insights, inference, and reasoning. Traditional NLP methods, such as named entity recognition and relation extraction, are key in i…
▽ More
Most available data is unstructured, making it challenging to access valuable information. Automatically building Knowledge Graphs (KGs) is crucial for structuring data and making it accessible, allowing users to search for information effectively. KGs also facilitate insights, inference, and reasoning. Traditional NLP methods, such as named entity recognition and relation extraction, are key in information retrieval but face limitations, including the use of predefined entity types and the need for supervised learning. Current research leverages large language models' capabilities, such as zero- or few-shot learning. However, unresolved and semantically duplicated entities and relations still pose challenges, leading to inconsistent graphs and requiring extensive post-processing. Additionally, most approaches are topic-dependent. In this paper, we propose iText2KG, a method for incremental, topic-independent KG construction without post-processing. This plug-and-play, zero-shot method is applicable across a wide range of KG construction scenarios and comprises four modules: Document Distiller, Incremental Entity Extractor, Incremental Relation Extractor, and Graph Integrator and Visualization. Our method demonstrates superior performance compared to baseline methods across three scenarios: converting scientific papers to graphs, websites to graphs, and CVs to graphs.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Longitudinal Modularity, a Modularity for Link Streams
Authors:
Victor Brabant,
Yasaman Asgari,
Pierre Borgnat,
Angela Bonifati,
Remy Cazabet
Abstract:
Temporal networks are commonly used to model real-life phenomena. When these phenomena represent interactions and are captured at a fine-grained temporal resolution, they are modeled as link streams. Community detection is an essential network analysis task. Although many methods exist for static networks, and some methods have been developed for temporal networks represented as sequences of snaps…
▽ More
Temporal networks are commonly used to model real-life phenomena. When these phenomena represent interactions and are captured at a fine-grained temporal resolution, they are modeled as link streams. Community detection is an essential network analysis task. Although many methods exist for static networks, and some methods have been developed for temporal networks represented as sequences of snapshots, few works can handle link streams. This article introduces the first adaptation of the well-known Modularity quality function to link streams. Unlike existing methods, it is independent of the time scale of analysis. After introducing the quality function, and its relation to existing static and dynamic definitions of Modularity, we show experimentally its relevance for dynamic community evaluation.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
ORBITAAL: A Temporal Graph Dataset of Bitcoin Entity-Entity Transactions
Authors:
Célestin Coquidé,
Rémy Cazabet
Abstract:
Research on Bitcoin (BTC) transactions is a matter of interest for both economic and network science fields. Although this cryptocurrency is based on a decentralized system, making transaction details freely accessible, making raw blockchain data analyzable is not straightforward due to the Bitcoin protocol specificity and data richness. To address the need for an accessible dataset, we present OR…
▽ More
Research on Bitcoin (BTC) transactions is a matter of interest for both economic and network science fields. Although this cryptocurrency is based on a decentralized system, making transaction details freely accessible, making raw blockchain data analyzable is not straightforward due to the Bitcoin protocol specificity and data richness. To address the need for an accessible dataset, we present ORBITAAL, the first comprehensive dataset based on temporal graph formalism. The dataset covers all Bitcoin transactions from January 2009 to January 2021. ORBITAAL provides temporal graph representations of entity-entity transaction networks, snapshots and stream graph. Each transaction value is given in Bitcoin and US dollar regarding daily-based conversion rate. This dataset also provides details on entities such as their global BTC balance and associated public addresses.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Decoding Decentralized Finance Transactions through Ego Network Motif Mining
Authors:
Natkamon Tovanich,
Célestin Coquidé,
Rémy Cazabet
Abstract:
Decentralized Finance (DeFi) is increasingly studied and adopted for its potential to provide accessible and transparent financial services. Analyzing how investors use DeFi is important for reaching a better understanding of their usage and for regulation purposes. However, analyzing DeFi transactions is challenging due to often incomplete or inaccurate labeled data. This paper presents a method…
▽ More
Decentralized Finance (DeFi) is increasingly studied and adopted for its potential to provide accessible and transparent financial services. Analyzing how investors use DeFi is important for reaching a better understanding of their usage and for regulation purposes. However, analyzing DeFi transactions is challenging due to often incomplete or inaccurate labeled data. This paper presents a method to extract ego network motifs from the token transfer network, capturing the transfer of tokens between users and smart contracts. Our results demonstrate that smart contract methods performing specific DeFi operations can be efficiently identified by analyzing these motifs while providing insights into account activities.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Y Social: an LLM-powered Social Media Digital Twin
Authors:
Giulio Rossetti,
Massimo Stella,
Rémy Cazabet,
Katherine Abramski,
Erica Cau,
Salvatore Citraro,
Andrea Failla,
Riccardo Improta,
Virginia Morini,
Valentina Pansanella
Abstract:
In this paper we introduce Y, a new-generation digital twin designed to replicate an online social media platform. Digital twins are virtual replicas of physical systems that allow for advanced analyses and experimentation. In the case of social media, a digital twin such as Y provides a powerful tool for researchers to simulate and understand complex online interactions. {\tt Y} leverages state-o…
▽ More
In this paper we introduce Y, a new-generation digital twin designed to replicate an online social media platform. Digital twins are virtual replicas of physical systems that allow for advanced analyses and experimentation. In the case of social media, a digital twin such as Y provides a powerful tool for researchers to simulate and understand complex online interactions. {\tt Y} leverages state-of-the-art Large Language Models (LLMs) to replicate sophisticated agent behaviors, enabling accurate simulations of user interactions, content dissemination, and network dynamics. By integrating these aspects, Y offers valuable insights into user engagement, information spread, and the impact of platform policies. Moreover, the integration of LLMs allows Y to generate nuanced textual content and predict user responses, facilitating the study of emergent phenomena in online environments.
To better characterize the proposed digital twin, in this paper we describe the rationale behind its implementation, provide examples of the analyses that can be performed on the data it enables to be generated, and discuss its relevance for multidisciplinary research.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Redefining Event Types and Group Evolution in Temporal Data
Authors:
Andrea Failla,
Rémy Cazabet,
Giulio Rossetti,
Salvatore Citraro
Abstract:
Groups -- such as clusters of points or communities of nodes -- are fundamental when addressing various data mining tasks. In temporal data, the predominant approach for characterizing group evolution has been through the identification of ``events". However, the events usually described in the literature, e.g., shrinks/growths, splits/merges, are often arbitrarily defined, creating a gap between…
▽ More
Groups -- such as clusters of points or communities of nodes -- are fundamental when addressing various data mining tasks. In temporal data, the predominant approach for characterizing group evolution has been through the identification of ``events". However, the events usually described in the literature, e.g., shrinks/growths, splits/merges, are often arbitrarily defined, creating a gap between such theoretical/predefined types and real-data group observations. Moving beyond existing taxonomies, we think of events as ``archetypes" characterized by a unique combination of quantitative dimensions that we call ``facets". Group dynamics are defined by their position within the facet space, where archetypal events occupy extremities. Thus, rather than enforcing strict event types, our approach can allow for hybrid descriptions of dynamics involving group proximity to multiple archetypes. We apply our framework to evolving groups from several face-to-face interaction datasets, showing it enables richer, more reliable characterization of group dynamics with respect to state-of-the-art methods, especially when the groups are subject to complex relationships. Our approach also offers intuitive solutions to common tasks related to dynamic group analysis, such as choosing an appropriate aggregation scale, quantifying partition stability, and evaluating event quality.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
A toy model for approaching volcanic plumbing systems as complex systems
Authors:
Remy Cazabet,
Catherine Annen,
Jean-Francois Moyen,
Roberto Weinberg
Abstract:
Magmas form at depth, move upwards and evolve chemically through a combination of processes. Magmatic processes are investigated by means of fieldwork combined with geophysics, geochemistry, analog and numerical models, and many other approaches. However, scientists in the field still struggle to understand how the variety of magmatic products arises, and there is no consensus yet on models of vol…
▽ More
Magmas form at depth, move upwards and evolve chemically through a combination of processes. Magmatic processes are investigated by means of fieldwork combined with geophysics, geochemistry, analog and numerical models, and many other approaches. However, scientists in the field still struggle to understand how the variety of magmatic products arises, and there is no consensus yet on models of volcanic plumbing systems. This is because eruptions result from the integration of multiple processes, rooted in the magma source either in the mantle or lower crust that feeds a complex network of magma bodies linking magma source and volcano. In this work, we investigate the potential of the network approach through a prototype of magma pool interaction and magma transfer across the crust. In network terms, it describes a diffusion process on a dynamic spatial network, in which diffusion and network evolution are intertwined: the diffusion affects the network structure, and reciprocally. The diffusion process and network evolution mechanisms come from rules of behaviour derived from rock mechanics and melting processes. Nodes represent magma pools and edges physical connections between them, e.g., dykes or veinlets.
△ Less
Submitted 19 July, 2023;
originally announced January 2024.
-
Mosaic benchmark networks: Modular link streams for testing dynamic community detection algorithms
Authors:
Yasaman Asgari,
Remy Cazabet,
Pierre Borgnat
Abstract:
Community structure is a critical feature of real networks, providing insights into nodes' internal organization. Nowadays, with the availability of highly detailed temporal networks such as link streams, studying community structures becomes more complex due to increased data precision and time sensitivity. Despite numerous algorithms developed in the past decade for dynamic community discovery,…
▽ More
Community structure is a critical feature of real networks, providing insights into nodes' internal organization. Nowadays, with the availability of highly detailed temporal networks such as link streams, studying community structures becomes more complex due to increased data precision and time sensitivity. Despite numerous algorithms developed in the past decade for dynamic community discovery, assessing their performance on link streams remains a challenge. Synthetic benchmark graphs are a well-accepted approach for evaluating static community detection algorithms. Additionally, there have been some proposals for slowly evolving communities in low-resolution temporal networks like snapshots. Nevertheless, this approach is not yet suitable for link streams. To bridge this gap, we introduce a novel framework that generates synthetic modular link streams with predefined communities. Subsequently, we evaluate established dynamic community detection methods to uncover limitations that may not be evident in snapshots with slowly evolving communities. While no method emerges as a clear winner, we observe notable differences among them.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Temporal and Geographical Analysis of Real Economic Activities in the Bitcoin Blockchain
Authors:
Rafael Ramos Tubino,
Remy Cazabet,
Natkamon Tovanich,
Celine Robardet
Abstract:
We study the real economic activity in the Bitcoin blockchain that involves transactions from/to retail users rather than between organizations such as marketplaces, exchanges, or other services. We first introduce a heuristic method to classify Bitcoin players into three main categories: Frequent Receivers (FR), Neighbors of FR, and Others. We show that most real transactions involve Frequent Rec…
▽ More
We study the real economic activity in the Bitcoin blockchain that involves transactions from/to retail users rather than between organizations such as marketplaces, exchanges, or other services. We first introduce a heuristic method to classify Bitcoin players into three main categories: Frequent Receivers (FR), Neighbors of FR, and Others. We show that most real transactions involve Frequent Receivers, representing a small fraction of the total value exchanged according to the blockchain, but a significant fraction of all payments, raising concerns about the centralization of the Bitcoin ecosystem. We also conduct a weekly pattern analysis of activity, providing insights into the geographical location of Bitcoin users and allowing us to quantify the bias of a well-known dataset for actor identification.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Structify-Net: Random Graph generation with controlled size and customized structure
Authors:
Remy Cazabet,
Salvatore Citraro,
Giulio Rossetti
Abstract:
Network structure is often considered one of the most important features of a network, and various models exist to generate graphs having one of the most studied types of structures, such as blocks/communities or spatial structures. In this article, we introduce a framework for the generation of random graphs with a controlled size -- number of nodes, edges -- and a customizable structure, beyond…
▽ More
Network structure is often considered one of the most important features of a network, and various models exist to generate graphs having one of the most studied types of structures, such as blocks/communities or spatial structures. In this article, we introduce a framework for the generation of random graphs with a controlled size -- number of nodes, edges -- and a customizable structure, beyond blocks and spatial ones, based on node-pair rank and a tunable probability function allowing to control the amount of randomness. We introduce a structure zoo -- a collection of original network structures -- and conduct experiments on the small-world properties of networks generated by those structures. Finally, we introduce an implementation as a Python library named Structify-net.
△ Less
Submitted 29 September, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Pattern Analysis of Money Flow in the Bitcoin Blockchain
Authors:
Natkamon Tovanich,
Rémy Cazabet
Abstract:
Bitcoin is the first and highest valued cryptocurrency that stores transactions in a publicly distributed ledger called the blockchain. Understanding the activity and behavior of Bitcoin actors is a crucial research topic as they are pseudonymous in the transaction network. In this article, we propose a method based on taint analysis to extract taint flows --dynamic networks representing the seque…
▽ More
Bitcoin is the first and highest valued cryptocurrency that stores transactions in a publicly distributed ledger called the blockchain. Understanding the activity and behavior of Bitcoin actors is a crucial research topic as they are pseudonymous in the transaction network. In this article, we propose a method based on taint analysis to extract taint flows --dynamic networks representing the sequence of Bitcoins transferred from an initial source to other actors until dissolution. Then, we apply graph embedding methods to characterize taint flows. We evaluate our embedding method with taint flows from top mining pools and show that it can classify mining pools with high accuracy. We also found that taint flows from the same period show high similarity. Our work proves that tracing the money flows can be a promising approach to classifying source actors and characterizing different money flow patterns
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Δ-Conformity: Multi-scale Node Assortativity in Feature-rich Stream Graphs
Authors:
Salvatore Citraro,
Letizia Milli,
Rémy Cazabet,
Giulio Rossetti
Abstract:
Heterogeneity is a key aspect of complex networks, often emerging by looking at the distribution of node properties, from the milestone observations on the degree to the recent developments in mixing pattern estimation. Mixing patterns, in particular, refer to nodes' connectivity preferences with respect to an attribute label. Social networks are mostly characterized by assortative/homophilic beha…
▽ More
Heterogeneity is a key aspect of complex networks, often emerging by looking at the distribution of node properties, from the milestone observations on the degree to the recent developments in mixing pattern estimation. Mixing patterns, in particular, refer to nodes' connectivity preferences with respect to an attribute label. Social networks are mostly characterized by assortative/homophilic behaviour, where nodes are more likely to be connected with similar ones. Recently, assortative mixing is increasingly measured in a multi-scale fashion to overcome well known limitations of classic scores. Such multi-scale strategies can capture heterogeneous behaviors among node homophily, but they ignore an important, often available, addendum in real-world systems: the time when edges are present and the time-varying paths they form accordingly. Hence, temporal homophily is still little understood in complex networks. In this work we aim to cover this gap by introducing the Δ-Conformity measure, a multiscale, path-aware, node homophily estimator within the new framework of feature-rich stream graphs. A rich experimental section analyzes Δ-Conformity trends over time, spanning the analysis from real-life social interaction networks to a specific case-study about the Bitcoin Transaction Network.
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Quantitative Evaluation of Snapshot Graphs for the Analysis of Temporal Networks
Authors:
Alessandro Chiappori,
Rémy Cazabet
Abstract:
One of the most common approaches to the analysis of dynamic networks is through time-window aggregation. The resulting representation is a sequence of static networks, i.e. the snapshot graph. Despite this representation being widely used in the literature, a general framework to evaluate the soundness of snapshot graphs is still missing. In this article, we propose two scores to quantify conflic…
▽ More
One of the most common approaches to the analysis of dynamic networks is through time-window aggregation. The resulting representation is a sequence of static networks, i.e. the snapshot graph. Despite this representation being widely used in the literature, a general framework to evaluate the soundness of snapshot graphs is still missing. In this article, we propose two scores to quantify conflicting objectives: Stability measures how much stable the sequence of snapshots is, while Fidelity measures the loss of information compared to the original data. We also develop a technique of targeted filtering of the links, to simplify the original temporal network. Our framework is tested on datasets of proximity and face-to-face interactions.
△ Less
Submitted 5 December, 2021; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Graph space: using both geometric and probabilistic structure to evaluate statistical graph models
Authors:
Louis Duvivier,
Rémy Cazabet,
Céline Robardet
Abstract:
Statistical graph models aim at modeling graphs as random realization among a set of possible graphs. One issue is to evaluate whether or not a graph is likely to have been generated by one particular model. In this paper we introduce the edit distance expected value (EDEV) and compare it with other methods such as entropy and distance to the barycenter. We show that contrary to them, EDEV is able…
▽ More
Statistical graph models aim at modeling graphs as random realization among a set of possible graphs. One issue is to evaluate whether or not a graph is likely to have been generated by one particular model. In this paper we introduce the edit distance expected value (EDEV) and compare it with other methods such as entropy and distance to the barycenter. We show that contrary to them, EDEV is able to distinguish between graphs that have a typical structure with respect to a model, and those that do not. Finally we introduce a statistical hypothesis testing methodology based on this distance to evaluate the relevance of a candidate model with respect to an observed graph.
△ Less
Submitted 28 March, 2022; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Graph model selection by edge probability sequential inference
Authors:
Louis Duvivier,
Rémy Cazabet,
Céline Robardet
Abstract:
Graphs are widely used for describing systems made up of many interacting components and for understanding the structure of their interactions. Various statistical models exist, which describe this structure as the result of a combination of constraints and randomness. %Model selection techniques need to automatically identify the best model, and the best set of parameters for a given graph. To do…
▽ More
Graphs are widely used for describing systems made up of many interacting components and for understanding the structure of their interactions. Various statistical models exist, which describe this structure as the result of a combination of constraints and randomness. %Model selection techniques need to automatically identify the best model, and the best set of parameters for a given graph. To do so, most authors rely on the minimum description length paradigm, and apply it to graphs by considering the entropy of probability distributions defined on graph ensembles. In this paper, we introduce edge probability sequential inference, a new approach to perform model selection, which relies on probability distributions on edge ensembles. From a theoretical point of view, we show that this methodology provides a more consistent ground for statistical inference with respect to existing techniques, due to the fact that it relies on multiple realizations of the random variable. It also provides better guarantees against overfitting, by making it possible to lower the number of parameters of the model below the number of observations. Experimentally, we illustrate the benefits of this methodology in two situations: to infer the partition of a stochastic blockmodel, and to identify the most relevant model for a given graph between the stochastic blockmodel and the configuration model.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Edge based stochastic block model statistical inference
Authors:
Louis Duvivier,
Rémy Cazabet,
Céline Robardet
Abstract:
Community detection in graphs often relies on ad hoc algorithms with no clear specification about the node partition they define as the best, which leads to uninterpretable communities. Stochastic block models (SBM) offer a framework to rigorously define communities, and to detect them using statistical inference method to distinguish structure from random fluctuations. In this paper, we introduce…
▽ More
Community detection in graphs often relies on ad hoc algorithms with no clear specification about the node partition they define as the best, which leads to uninterpretable communities. Stochastic block models (SBM) offer a framework to rigorously define communities, and to detect them using statistical inference method to distinguish structure from random fluctuations. In this paper, we introduce an alternative definition of SBM based on edge sampling. We derive from this definition a quality function to statistically infer the node partition used to generate a given graph. We then test it on synthetic graphs, and on the zachary karate club network.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Data compression to choose a proper dynamic network representation
Authors:
Remy Cazabet
Abstract:
Dynamic network data are now available in a wide range of contexts and domains. Several representation formalisms exist to represent dynamic networks, but there is no well-known method to choose one representation over another for a given dataset. In this article, we propose a method based on data compression to choose between three of the most important representations: snapshots, link streams an…
▽ More
Dynamic network data are now available in a wide range of contexts and domains. Several representation formalisms exist to represent dynamic networks, but there is no well-known method to choose one representation over another for a given dataset. In this article, we propose a method based on data compression to choose between three of the most important representations: snapshots, link streams and interval graphs. We apply the method on synthetic and real datasets to show the relevance of the method and its possible applications, such as choosing an appropriate representation when confronted to a new dataset, and storing dynamic networks in an efficient manner.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Evaluating Community Detection Algorithms for Progressively Evolving Graphs
Authors:
Remy Cazabet,
Souaad Boudebza,
Giulio Rossetti
Abstract:
Many algorithms have been proposed in the last ten years for the discovery of dynamic communities. However, these methods are seldom compared between themselves. In this article, we propose a generator of dynamic graphs with planted evolving community structure, as a benchmark to compare and evaluate such algorithms. Unlike previously proposed benchmarks, it is able to specify any desired evolving…
▽ More
Many algorithms have been proposed in the last ten years for the discovery of dynamic communities. However, these methods are seldom compared between themselves. In this article, we propose a generator of dynamic graphs with planted evolving community structure, as a benchmark to compare and evaluate such algorithms. Unlike previously proposed benchmarks, it is able to specify any desired evolving community structure through a descriptive language, and then to generate the corresponding progressively evolving network. We empirically evaluate six existing algorithms for dynamic community detection in terms of instantaneous and longitudinal similarity with the planted ground truth, smoothness of dynamic partitions, and scalability. We notably observe different types of weaknesses depending on their approach to ensure smoothness, namely Glitches, Oversimplification and Identity loss. Although no method arises as a clear winner, we observe clear differences between methods, and we identified the fastest, those yielding the most smoothed or the most accurate solutions at each step.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Minimum entropy stochastic block models neglect edge distribution heterogeneity
Authors:
Louis Duvivier,
Rémy Cazabet,
Céline Robardet
Abstract:
The statistical inference of stochastic block models as emerged as a mathematicaly principled method for identifying communities inside networks. Its objective is to find the node partition and the block-to-block adjacency matrix of maximum likelihood i.e. the one which has most probably generated the observed network. In practice, in the so-called microcanonical ensemble, it is frequently assumed…
▽ More
The statistical inference of stochastic block models as emerged as a mathematicaly principled method for identifying communities inside networks. Its objective is to find the node partition and the block-to-block adjacency matrix of maximum likelihood i.e. the one which has most probably generated the observed network. In practice, in the so-called microcanonical ensemble, it is frequently assumed that when comparing two models which have the same number and sizes of communities, the best one is the one of minimum entropy i.e. the one which can generate the less different networks. In this paper, we show that there are situations in which the minimum entropy model does not identify the most significant communities in terms of edge distribution, even though it generates the observed graph with a higher probability.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Challenges in Community Discovery on Temporal Networks
Authors:
Remy Cazabet,
Giulio Rossetti
Abstract:
Community discovery is one of the most studied problems in network science. In recent years, many works have focused on discovering communities in temporal networks, thus identifying dynamic communities. Interestingly, dynamic communities are not mere sequences of static ones; new challenges arise from their dynamic nature. In this chapter, we will discuss some of these challenges and recent propo…
▽ More
Community discovery is one of the most studied problems in network science. In recent years, many works have focused on discovering communities in temporal networks, thus identifying dynamic communities. Interestingly, dynamic communities are not mere sequences of static ones; new challenges arise from their dynamic nature. In this chapter, we will discuss some of these challenges and recent propositions to tackle them. We will, among other topics, discuss on the question of community events in gradually evolving networks, on the notion of identity through change, on dynamic communities in link streams, on the smoothness of dynamic communities, and on the different types of complexity of algorithms for their discovery.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
Detecting Stable Communities in Link Streams at Multiple Temporal Scales
Authors:
Souaad Boudebza,
Remy Cazabet,
Omar Nouali,
Faical Azouaou
Abstract:
Link streams model interactions over time in a wide range of fields. Under this model, the challenge is to mine efficiently both temporal and topological structures. Community detection and change point detection are one of the most powerful tools to analyze such evolving interactions. In this paper, we build on both to detect stable community structures by identifying change points within meaning…
▽ More
Link streams model interactions over time in a wide range of fields. Under this model, the challenge is to mine efficiently both temporal and topological structures. Community detection and change point detection are one of the most powerful tools to analyze such evolving interactions. In this paper, we build on both to detect stable community structures by identifying change points within meaningful communities. Unlike existing dynamic community detection algorithms, the proposed method is able to discover stable communities efficiently at multiple temporal scales. We test the effectiveness of our method on synthetic networks, and on high-resolution time-varying networks of contacts drawn from real social networks.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Systematic Biases in Link Prediction: comparing heuristic and graph embedding based methods
Authors:
Aakash Sinha,
Rémy Cazabet,
Rémi Vaudaine
Abstract:
Link prediction is a popular research topic in network analysis. In the last few years, new techniques based on graph embedding have emerged as a powerful alternative to heuristics. In this article, we study the problem of systematic biases in the prediction, and show that some methods based on graph embedding offer less biased results than those based on heuristics, despite reaching lower scores…
▽ More
Link prediction is a popular research topic in network analysis. In the last few years, new techniques based on graph embedding have emerged as a powerful alternative to heuristics. In this article, we study the problem of systematic biases in the prediction, and show that some methods based on graph embedding offer less biased results than those based on heuristics, despite reaching lower scores according to usual quality scores. We discuss the relevance of this finding in the context of the filter bubble problem and the algorithmic fairness of recommender systems.
△ Less
Submitted 11 October, 2018;
originally announced November 2018.
-
OLCPM: An Online Framework for Detecting Overlapping Communities in Dynamic Social Networks
Authors:
Souâad Boudebza,
Rémy Cazabet,
Faiçal Azouaou,
Omar Nouali
Abstract:
Community structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present a…
▽ More
Community structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present an online algorithm, called OLCPM, based on clique percolation and label propagation methods. OLCPM can detect overlapping communities and works on temporal networks with a fine granularity. By locally updating the community structure, OLCPM delivers significant improvement in running time compared with previous clique percolation techniques. The experimental results on both synthetic and real-world networks illustrate the effectiveness of the method.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Tracking bitcoin users activity using community detection on a network of weak signals
Authors:
Remy Cazabet,
Baccour Rym,
Latapy Matthieu,
Cazabet Remy
Abstract:
Bitcoin is a cryptocurrency attracting a lot of interest both from the general public and researchers. There is an ongoing debate on the question of users' anonymity: while the Bitcoin protocol has been designed to ensure that the activity of individual users could not be tracked, some methods have been proposed to partially bypass this limitation. In this article, we show how the Bitcoin transact…
▽ More
Bitcoin is a cryptocurrency attracting a lot of interest both from the general public and researchers. There is an ongoing debate on the question of users' anonymity: while the Bitcoin protocol has been designed to ensure that the activity of individual users could not be tracked, some methods have been proposed to partially bypass this limitation. In this article, we show how the Bitcoin transaction network can be studied using complex networks analysis techniques, and in particular how community detection can be efficiently used to re-identify multiple addresses belonging to a same user.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
Community Discovery in Dynamic Networks: a Survey
Authors:
Giulio Rossetti,
Rémy Cazabet
Abstract:
Networks built to model real world phenomena are characeterised by some properties that have attracted the attention of the scientific community: (i) they are organised according to community structure and (ii) their structure evolves with time. Many researchers have worked on methods that can efficiently unveil substructures in complex networks, giving birth to the field of community discovery. A…
▽ More
Networks built to model real world phenomena are characeterised by some properties that have attracted the attention of the scientific community: (i) they are organised according to community structure and (ii) their structure evolves with time. Many researchers have worked on methods that can efficiently unveil substructures in complex networks, giving birth to the field of community discovery. A novel and challenging problem started capturing researcher interest recently: the identification of evolving communities. To model the evolution of a system, dynamic networks can be used: nodes and edges are mutable and their presence, or absence, deeply impacts the community structure that composes them. The aim of this survey is to present the distinctive features and challenges of dynamic community discovery, and propose a classification of published approaches. As a "user manual", this work organizes state of art methodologies into a taxonomy, based on their rationale, and their specific instanciation. Given a desired definition of network dynamics, community characteristics and analytical needs, this survey will support researchers to identify the set of approaches that best fit their needs. The proposed classification could also help researchers to choose in which direction should future research be oriented.
△ Less
Submitted 3 September, 2019; v1 submitted 11 July, 2017;
originally announced July 2017.
-
Using multiple-criteria methods to evaluate community partitions
Authors:
Remy Cazabet,
Rathachai Chawuthai,
Hideaki Takeda
Abstract:
Community detection is one of the most studied problems on complex networks. Although hundreds of methods have been proposed so far, there is still no universally accepted formal definition of what is a good community. As a consequence, the problem of the evaluation and the comparison of the quality of the solutions produced by these algorithms is still an open question, despite constant progress…
▽ More
Community detection is one of the most studied problems on complex networks. Although hundreds of methods have been proposed so far, there is still no universally accepted formal definition of what is a good community. As a consequence, the problem of the evaluation and the comparison of the quality of the solutions produced by these algorithms is still an open question, despite constant progress on the topic. In this article, we investigate how using a multi-criteria evaluation can solve some of the existing problems of community evaluation, in particular the question of multiple equally-relevant solutions of different granularity. After exploring several approaches, we introduce a new quality function, called MDensity, and propose a method that can be related both to a widely used community detection metric, the Modularity, and to the Precision/Recall approach, ubiquitous in information retrieval.
△ Less
Submitted 18 February, 2015;
originally announced February 2015.