-
Density-aware Walks for Coordinated Campaign Detection
Authors:
Atul Anand Gopalakrishnan,
Jakir Hossain,
Tuğrulcan Elmas,
Ahmet Erdem Sarıyüce
Abstract:
Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement. Distinguishing these coordinated efforts from genuine public discourse remains a significant challenge due to the sophisticated nature of such attacks. Our work focuses on detecting coordinated campaigns by modeling the p…
▽ More
Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement. Distinguishing these coordinated efforts from genuine public discourse remains a significant challenge due to the sophisticated nature of such attacks. Our work focuses on detecting coordinated campaigns by modeling the problem as a graph classification task. We leverage the recently introduced Large Engagement Networks (LEN) dataset, which contains over 300 networks capturing engagement patterns from both fake and authentic trends on Twitter prior to the 2023 Turkish elections. The graphs in LEN were constructed by collecting interactions related to campaigns that stemmed from ephemeral astroturfing. Established graph neural networks (GNNs) struggle to accurately classify campaign graphs, highlighting the challenges posed by LEN due to the large size of its networks. To address this, we introduce a new graph classification method that leverages the density of local network structures. We propose a random weighted walk (RWW) approach in which node transitions are biased by local density measures such as degree, core number, or truss number. These RWWs are encoded using the Skip-gram model, producing density-aware structural embeddings for the nodes. Training message-passing neural networks (MPNNs) on these density-aware embeddings yields superior results compared to the simpler node features available in the dataset, with nearly a 12\% and 5\% improvement in accuracy for binary and multiclass classification, respectively. Our findings demonstrate that incorporating density-aware structural encoding with MPNNs provides a robust framework for identifying coordinated inauthentic behavior on social media networks such as Twitter.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Large Engagement Networks for Classifying Coordinated Campaigns and Organic Twitter Trends
Authors:
Atul Anand Gopalakrishnan,
Jakir Hossain,
Tugrulcan Elmas,
Ahmet Erdem Sariyuce
Abstract:
Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth. In this paper, we cr…
▽ More
Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth. In this paper, we create such ground truth by detecting the campaigns promoted by ephemeral astroturfing attacks. These attacks push any topic to Twitter's (X) trends list by employing bots that tweet in a coordinated manner in a short period and then immediately delete their tweets. We manually curate a dataset of organic Twitter trends. We then create engagement networks out of these datasets which can serve as a challenging testbed for graph classification task to distinguish between campaigns and organic trends. Engagement networks consist of users as nodes and engagements as edges (retweets, replies, and quotes) between users. We release the engagement networks for 179 campaigns and 135 non-campaigns, and also provide finer-grain labels to characterize the type of the campaigns and non-campaigns. Our dataset, LEN (Large Engagement Networks), is available in the URL below. In comparison to traditional graph classification datasets, which are small with tens of nodes and hundreds of edges at most, graphs in LEN are larger. The average graph in LEN has ~11K nodes and ~23K edges. We show that state-of-the-art GNN methods give only mediocre results for campaign vs. non-campaign and campaign type classification on LEN. LEN offers a unique and challenging playfield for the graph classification problem. We believe that LEN will help advance the frontiers of graph classification techniques on large networks and also provide an interesting use case in terms of distinguishing coordinated campaigns and organic trends.
△ Less
Submitted 28 March, 2025; v1 submitted 1 March, 2025;
originally announced March 2025.
-
Retrieving Top-k Hyperedge Triplets: Models and Applications
Authors:
Jason Niu,
Ilya D. Amburg,
Sinan G. Aksoy,
Ahmet Erdem Sarıyüce
Abstract:
Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motifs hold promise in this regard, as motif patterns serve…
▽ More
Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motifs hold promise in this regard, as motif patterns serve as building blocks for larger group interactions which are inexpressible by graphs. Recent work has focused on categorizing and counting hypergraph motifs based on the existence of nodes in hyperedge intersection regions. Here, we argue that the relative sizes of hyperedge intersections within motifs contain varied and valuable information. We propose a suite of efficient algorithms for finding top-k triplets of hyperedges based on optimizing the sizes of these intersection patterns. This formulation uncovers interesting local patterns of interaction, finding hyperedge triplets that either (1) are the least similar with each other, (2) have the highest pairwise but not groupwise correlation, or (3) are the most similar with each other. We formalize this as a combinatorial optimization problem and design efficient algorithms based on filtering hyperedges. Our comprehensive experimental evaluation shows that the resulting hyperedge triplets yield insightful information on real-world hypergraphs. Our approach is also orders of magnitude faster than a naive baseline implementation.
△ Less
Submitted 21 November, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Quantifying Node-based Core Resilience
Authors:
Jakir Hossain,
Sucheta Soundarajan,
Ahmet Erdem Sarıyüce
Abstract:
Core decomposition is an efficient building block for various graph analysis tasks such as dense subgraph discovery and identifying influential nodes. One crucial weakness of the core decomposition is its sensitivity to changes in the graph: inserting or removing a few edges can drastically change the core structure of a graph. Hence, it is essential to characterize, quantify, and, if possible, im…
▽ More
Core decomposition is an efficient building block for various graph analysis tasks such as dense subgraph discovery and identifying influential nodes. One crucial weakness of the core decomposition is its sensitivity to changes in the graph: inserting or removing a few edges can drastically change the core structure of a graph. Hence, it is essential to characterize, quantify, and, if possible, improve the resilience of the core structure of a given graph in global and local levels. Previous works mostly considered the core resilience of the entire graph or important subgraphs in it. In this work, we study node-based core resilience measures upon edge removals and insertions. We first show that a previously proposed measure, Core Strength, does not correctly capture the core resilience of a node upon edge removals. Next, we introduce the concept of dependency graph to capture the impact of neighbor nodes (for edge removal) and probable future neighbor nodes (for edge insertion) on the core number of a given node. Accordingly, we define Removal Strength and Insertion Strength measures to capture the resilience of an individual node upon removing and inserting an edge, respectively. As naive computation of those measures is costly, we provide efficient heuristics built on key observations about the core structure. We consider two key applications, finding critical edges and identifying influential spreaders, to demonstrate the usefulness of our new measures on various real-world networks and against several baselines. We also show that our heuristic algorithms are more efficient than the naive approaches.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Using Motif Transitions for Temporal Graph Generation
Authors:
Penghang Liu,
A. Erdem Sarıyüce
Abstract:
Graph generative models are highly important for sharing surrogate data and benchmarking purposes. Real-world complex systems often exhibit dynamic nature, where the interactions among nodes change over time in the form of a temporal network. Most temporal network generation models extend the static graph generation models by incorporating temporality in the generation process. More recently, temp…
▽ More
Graph generative models are highly important for sharing surrogate data and benchmarking purposes. Real-world complex systems often exhibit dynamic nature, where the interactions among nodes change over time in the form of a temporal network. Most temporal network generation models extend the static graph generation models by incorporating temporality in the generation process. More recently, temporal motifs are used to generate temporal networks with better success. However, existing models are often restricted to a small set of predefined motif patterns due to the high computational cost of counting temporal motifs. In this work, we develop a practical temporal graph generator, Motif Transition Model (MTM), to generate synthetic temporal networks with realistic global and local features. Our key idea is modeling the arrival of new events as temporal motif transition processes. We first calculate the transition properties from the input graph and then simulate the motif transition processes based on the transition probabilities and transition rates. We demonstrate that our model consistently outperforms the baselines with respect to preserving various global and local temporal graph statistics and runtime performance.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Temporal Motifs for Financial Networks: A Study on Mercari, JPMC, and Venmo Platforms
Authors:
Penghang Liu,
Bahadir Altun,
Rupam Acharyya,
Robert E. Tillman,
Shunya Kimura,
Naoki Masuda,
Ahmet Erdem Sarıyüce
Abstract:
Understanding the dynamics of financial transactions among people is critical for various applications such as fraud detection. One important aspect of financial transaction networks is temporality. The order and repetition of transactions can offer new insights when considered within the graph structure. Temporal motifs, defined as a set of nodes that interact with each other in a short time peri…
▽ More
Understanding the dynamics of financial transactions among people is critical for various applications such as fraud detection. One important aspect of financial transaction networks is temporality. The order and repetition of transactions can offer new insights when considered within the graph structure. Temporal motifs, defined as a set of nodes that interact with each other in a short time period, are a promising tool in this context. In this work, we study three unique temporal financial networks: transactions in Mercari, an online marketplace, payments in a synthetic network generated by J.P. Morgan Chase, and payments and friendships among Venmo users. We consider the fraud detection problem on the Mercari and J.P. Morgan Chase networks, for which the ground truth is available. We show that temporal motifs offer superior performance to several baselines, including a previous method that considers simple graph features and two node embedding techniques (LINE and node2vec), while being practical in terms of runtime performance. For the Venmo network, we investigate the interplay between financial and social relations on three tasks: friendship prediction, vendor identification, and analysis of temporal cycles. For friendship prediction, temporal motifs yield better results than general heuristics, such as Jaccard and Adamic-Adar measures. We are also able to identify vendors with high accuracy and observe interesting patterns in rare motifs, such as temporal cycles. We believe that the analysis, datasets, and lessons from this work will be beneficial for future research on financial transaction networks.
△ Less
Submitted 10 July, 2025; v1 submitted 18 January, 2023;
originally announced January 2023.
-
Temporal Motifs in Patent Opposition and Collaboration Networks
Authors:
Penghang Liu,
Naoki Masuda,
Tomomi Kito,
A. Erdem Sarıyüce
Abstract:
Patents are intellectual properties that reflect innovative activities of companies and organizations. The literature is rich with the studies that analyze the citations among the patents and the collaboration relations among companies that own the patents. However, the adversarial relations between the patent owners are not as well investigated. One proxy to model such relations is the patent opp…
▽ More
Patents are intellectual properties that reflect innovative activities of companies and organizations. The literature is rich with the studies that analyze the citations among the patents and the collaboration relations among companies that own the patents. However, the adversarial relations between the patent owners are not as well investigated. One proxy to model such relations is the patent opposition, which is a legal activity in which a company challenges the validity of a patent. Characterizing the patent oppositions, collaborations, and the interplay between them can help better understand the companies' business strategies. Temporality matters in this context as the order and frequency of oppositions and collaborations characterize their interplay. In this study, we construct a two-layer temporal network to model the patent oppositions and collaborations among the companies. We utilize temporal motifs to analyze the oppositions and collaborations from structural and temporal perspectives. We first characterize the frequent motifs in patent oppositions and investigate how often the companies of different sizes attack other companies. We show that large companies tend to engage in opposition with multiple companies. Then we analyze the temporal interplay between collaborations and oppositions. We find that two adversarial companies are more likely to collaborate in the future than two collaborating companies oppose each other in the future.
△ Less
Submitted 4 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Motif-driven Dense Subgraph Discovery in Directed and Labeled Networks
Authors:
Ahmet Erdem Sariyuce
Abstract:
Dense regions in networks are an indicator of interesting and unusual information. However, most existing methods only consider simple, undirected, unweighted networks. Complex networks in the real-world often have rich information though: edges are asymmetrical and nodes/edges have categorical and numerical attributes. Finding dense subgraphs in such networks in accordance with this rich informat…
▽ More
Dense regions in networks are an indicator of interesting and unusual information. However, most existing methods only consider simple, undirected, unweighted networks. Complex networks in the real-world often have rich information though: edges are asymmetrical and nodes/edges have categorical and numerical attributes. Finding dense subgraphs in such networks in accordance with this rich information is an important problem with many applications. Furthermore, most existing algorithms ignore the higher-order relationships (i.e., motifs) among the nodes. Motifs are shown to be helpful for dense subgraph discovery but their wide spectrum in heterogeneous networks makes it challenging to utilize them effectively. In this work, we propose quark decomposition framework to locate dense subgraphs that are rich with a given motif. We focus on networks with directed edges and categorical attributes on nodes/edges. For a given motif, our framework builds subgraphs, called quarks, in varying quality and with hierarchical relations. Our framework is versatile, efficient, and extendible. We discuss the limitations and practical instantiations of our framework as well as the role confusion problem that needs to be considered in directed networks. We give an extensive evaluation of our framework in directed, signed-directed, and node-labeled networks. We consider various motifs and evaluate the quark decomposition using several real-world networks. Results show that quark decomposition performs better than the state-of-the-art techniques. Our framework is also practical and scalable to networks with up to 101M edges.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Characterizing and Utilizing the Interplay Between Core and Truss Decompositions
Authors:
Penghang Liu,
A. Erdem Sarıyüce
Abstract:
Finding the dense regions in a graph is an important problem in network analysis. Core decomposition and truss decomposition address this problem from two different perspectives. The former is a vertex-driven approach that assigns density indicators for vertices whereas the latter is an edge-driven technique that put density quantifiers on edges. Despite the algorithmic similarity between these tw…
▽ More
Finding the dense regions in a graph is an important problem in network analysis. Core decomposition and truss decomposition address this problem from two different perspectives. The former is a vertex-driven approach that assigns density indicators for vertices whereas the latter is an edge-driven technique that put density quantifiers on edges. Despite the algorithmic similarity between these two approaches, it is not clear how core and truss decompositions in a network are related. In this work, we introduce the vertex interplay (VI) and edge interplay (EI) plots to characterize the interplay between core and truss decompositions. Based on our observations, we devise CORE-TRUSSDD, an anomaly detection algorithm to identify the discrepancies between core and truss decompositions. We analyze a large and diverse set of real-world networks, and demonstrate how our approaches can be effective tools to characterize the patterns and anomalies in the networks. Through VI and EI plots, we observe distinct behaviors for graphs from different domains, and identify two anomalous behaviors driven by specific real-world structures. Our algorithm provides an efficient solution to retrieve the outliers in the networks, which correspond to the two anomalous behaviors. We believe that investigating the interplay between core and truss decompositions is important and can yield surprising insights regarding the dense subgraph structure of real-world networks.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Temporal Network Motifs: Models, Limitations, Evaluation
Authors:
Penghang Liu,
Valerio Guarrasi,
A. Erdem Sarıyüce
Abstract:
Investigating the frequency and distribution of small subgraphs with a few nodes/edges, i.e., motifs, is an effective analysis method for static networks. Motif-driven analysis is also useful for temporal networks where the spectrum of motifs is significantly larger due to the additional temporal information on edges. This variety makes it challenging to design a temporal motif model that can cons…
▽ More
Investigating the frequency and distribution of small subgraphs with a few nodes/edges, i.e., motifs, is an effective analysis method for static networks. Motif-driven analysis is also useful for temporal networks where the spectrum of motifs is significantly larger due to the additional temporal information on edges. This variety makes it challenging to design a temporal motif model that can consider all aspects of temporality. In the literature, previous works have introduced various models that handle different characteristics. In this work, we compare the existing temporal motif models and evaluate the facets of temporal networks that are overlooked in the literature. We first survey four temporal motif models and highlight their differences. Then, we evaluate the advantages and limitations of these models with respect to the temporal inducedness and timing constraints. In addition, we suggest a new lens, event pairs, to investigate temporal correlations. We believe that our comparative survey and extensive evaluation will catalyze the research on temporal network motif models.
△ Less
Submitted 2 May, 2021; v1 submitted 24 May, 2020;
originally announced May 2020.
-
FLEET: Butterfly Estimation from a Bipartite Graph Stream
Authors:
Seyed-Vahid Sanei-Mehri,
Yu Zhang,
Ahmet Erdem Sariyuce,
Srikanta Tirthapura
Abstract:
We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for…
▽ More
We consider space-efficient single-pass estimation of the number of butterflies, a fundamental bipartite graph motif, from a massive bipartite graph stream where each edge represents a connection between entities in two different partitions. We present a space lower bound for any streaming algorithm that can estimate the number of butterflies accurately, as well as FLEET, a suite of algorithms for accurately estimating the number of butterflies in the graph stream. Estimates returned by the algorithms come with provable guarantees on the approximation error, and experiments show good tradeoffs between the space used and the accuracy of approximation. We also present space-efficient algorithms for estimating the number of butterflies within a sliding window of the most recent elements in the stream. While there is a significant body of work on counting subgraphs such as triangles in a unipartite graph stream, our work seems to be one of the few to tackle the case of bipartite graph streams.
△ Less
Submitted 28 August, 2019; v1 submitted 8 December, 2018;
originally announced December 2018.
-
Butterfly Counting in Bipartite Networks
Authors:
Seyed-Vahid Sanei-Mehri,
Ahmet Erdem Sariyuce,
Srikanta Tirthapura
Abstract:
We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete $2 \times 2$ biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the nu…
▽ More
We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete $2 \times 2$ biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.
△ Less
Submitted 15 March, 2018; v1 submitted 31 December, 2017;
originally announced January 2018.
-
Active Betweenness Cardinality: Algorithms and Applications
Authors:
Yusuf Ozkaya,
A. Erdem Sariyuce,
Umit V. Catalyurek,
Ali Pinar
Abstract:
Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the network and only rely on its topology. However, i…
▽ More
Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the network and only rely on its topology. However, in many applications, the network is autonomous, vast, and distributed, and it is hard to collect the exact topology. At the same time, the underlying pairwise activity between node pairs is not uniform and node criticality strongly depends on the activity on the underlying network.
In this paper, we propose active betweenness cardinality, as a new measure, where the node criticalities are based on not the static structure, but the activity of the network. We show how this metric can be computed efficiently by using only local information for a given node and how we can find the most critical nodes starting from only a few nodes. We also show how this metric can be used to monitor a network and identify failed nodes.We present experimental results to show effectiveness by demonstrating how the failed nodes can be identified by measuring active betweenness cardinality of a few nodes in the system.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
Local Algorithms for Hierarchical Dense Subgraph Discovery
Authors:
Ahmet Erdem Sariyuce,
C. Seshadhri,
Ali Pinar
Abstract:
Finding the dense regions of a graph and relations among them is a fundamental problem in network analysis. Core and truss decompositions reveal dense subgraphs with hierarchical relations. The incremental nature of algorithms for computing these decompositions and the need for global information at each step of the algorithm hinders scalable parallelization and approximations since the densest re…
▽ More
Finding the dense regions of a graph and relations among them is a fundamental problem in network analysis. Core and truss decompositions reveal dense subgraphs with hierarchical relations. The incremental nature of algorithms for computing these decompositions and the need for global information at each step of the algorithm hinders scalable parallelization and approximations since the densest regions are not revealed until the end. In a previous work, Lu et al. proposed to iteratively compute the $h$-indices of neighbor vertex degrees to obtain the core numbers and prove that the convergence is obtained after a finite number of iterations. This work generalizes the iterative $h$-index computation for truss decomposition as well as nucleus decomposition which leverages higher-order structures to generalize core and truss decompositions. In addition, we prove convergence bounds on the number of iterations. We present a framework of local algorithms to obtain the core, truss, and nucleus decompositions. Our algorithms are local, parallel, offer high scalability, and enable approximations to explore time and quality trade-offs. Our shared-memory implementation verifies the efficiency, scalability, and effectiveness of our local algorithms on real-world networks.
△ Less
Submitted 14 September, 2018; v1 submitted 2 April, 2017;
originally announced April 2017.
-
Peeling Bipartite Networks for Dense Subgraph Discovery
Authors:
A. Erdem Sariyuce,
Ali Pinar
Abstract:
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet advertising, and bioinformatics. However, most dense subgraph discovery algorithms are designed for classic, unipartite graphs. Subsequently, studies on…
▽ More
Finding dense bipartite subgraphs and detecting the relations among them is an important problem for affiliation networks that arise in a range of domains, such as social network analysis, word-document clustering, the science of science, internet advertising, and bioinformatics. However, most dense subgraph discovery algorithms are designed for classic, unipartite graphs. Subsequently, studies on affiliation networks are conducted on the co-occurrence graphs (e.g., co-author and co-purchase) that project the bipartite structure to a unipartite structure by connecting two entities if they share an affiliation. Despite their convenience, co-occurrence networks come at a cost of loss of information and an explosion in graph sizes, which limit the quality and the efficiency of solutions. We study the dense subgraph discovery problem on bipartite graphs. We define a framework of bipartite subgraphs based on the butterfly motif (2,2-biclique) to model the dense regions in a hierarchical structure. We introduce efficient peeling algorithms to find the dense subgraphs and build relations among them. We can identify denser structures compared to the state-of-the-art algorithms on co-occurrence graphs in real-world data. Our analyses on an author-paper network and a user-product network yield interesting subgraphs and hierarchical relations such as the groups of collaborators in the same institution and spammers that give fake ratings.
△ Less
Submitted 27 November, 2017; v1 submitted 8 November, 2016;
originally announced November 2016.
-
Fast Hierarchy Construction for Dense Subgraphs
Authors:
A. Erdem Sariyuce,
Ali Pinar
Abstract:
Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (k-core, k-truss, and nucleus decomposition) have been effective to locate many dense subgraphs. However, co…
▽ More
Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (k-core, k-truss, and nucleus decomposition) have been effective to locate many dense subgraphs. However, constructing a hierarchical representation of density structure, even correctly computing the connected k-cores and k-trusses, have been mostly overlooked. Keeping track of connected components during peeling requires an additional traversal operation, which is as expensive as the peeling process. In this paper, we start with a thorough survey and point to nuances in problem formulations that lead to significant differences in runtimes. We then propose efficient and generic algorithms to construct the hierarchy of dense subgraphs for k-core, k-truss, or any nucleus decomposition. Our algorithms leverage the disjoint-set forest data structure to efficiently construct the hierarchy during traversal. Furthermore, we introduce a new idea to avoid traversal. We construct the subgraphs while visiting neighborhoods in the peeling process, and build the relations to previously constructed subgraphs. We also consider an existing idea to find the k-core hierarchy and adapt for our objectives efficiently. Experiments on different types of large scale real-world networks show significant speedups over naive algorithms and existing alternatives. Our algorithms also outperform the hypothetical limits of any possible traversal-based solution.
△ Less
Submitted 17 October, 2016; v1 submitted 6 October, 2016;
originally announced October 2016.
-
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
Authors:
Ahmet Erdem Sariyuce,
C. Seshadhri,
Ali Pinar,
Umit V. Catalyurek
Abstract:
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense substructures, understand…
▽ More
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-cores and k-truss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-the-art solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour.
△ Less
Submitted 9 March, 2015; v1 submitted 12 November, 2014;
originally announced November 2014.
-
On Distributed Graph Coloring with Iterative Recoloring
Authors:
Ahmet Erdem Sarıyüce,
Erik Saule,
Ümit V. Çatalyürek
Abstract:
Identifying the sets of operations that can be executed simultaneously is an important problem appearing in many parallel applications. By modeling the operations and their interactions as a graph, one can identify the independent operations by solving a graph coloring problem. Many efficient sequential algorithms are known for this NP-Complete problem, but they are typically unsuitable when the o…
▽ More
Identifying the sets of operations that can be executed simultaneously is an important problem appearing in many parallel applications. By modeling the operations and their interactions as a graph, one can identify the independent operations by solving a graph coloring problem. Many efficient sequential algorithms are known for this NP-Complete problem, but they are typically unsuitable when the operations and their interactions are distributed in the memory of large parallel computers. On top of an existing distributed-memory graph coloring algorithm, we investigate two compatible techniques in this paper for fast and scalable distributed-memory graph coloring. First, we introduce an improvement for the distributed post-processing operation, called recoloring, which drastically improves the number of colors. We propose a novel and efficient communication scheme for recoloring which enables it to scale gracefully. Recoloring must be seeded with an existing coloring of the graph. Our second contribution is to introduce a randomized color selection strategy for initial coloring which quickly produces solutions of modest quality. We extensively evaluate the impact of our new techniques on existing distributed algorithms and show the time-quality tradeoffs. We show that combining an initial randomized coloring with multiple recoloring iterations yields better quality solutions with the smaller runtime at large scale.
△ Less
Submitted 24 July, 2014;
originally announced July 2014.
-
Incremental Algorithms for Network Management and Analysis based on Closeness Centrality
Authors:
Ahmet Erdem Sariyuce,
Kamer Kaya,
Erik Saule,
Umit V. Catalyurek
Abstract:
Analyzing networks requires complex algorithms to extract meaningful information. Centrality metrics have shown to be correlated with the importance and loads of the nodes in network traffic. Here, we are interested in the problem of centrality-based network management. The problem has many applications such as verifying the robustness of the networks and controlling or improving the entity dissem…
▽ More
Analyzing networks requires complex algorithms to extract meaningful information. Centrality metrics have shown to be correlated with the importance and loads of the nodes in network traffic. Here, we are interested in the problem of centrality-based network management. The problem has many applications such as verifying the robustness of the networks and controlling or improving the entity dissemination. It can be defined as finding a small set of topological network modifications which yield a desired closeness centrality configuration. As a fundamental building block to tackle that problem, we propose incremental algorithms which efficiently update the closeness centrality values upon changes in network topology, i.e., edge insertions and deletions. Our algorithms are proven to be efficient on many real-life networks, especially on small-world networks, which have a small diameter and a spike-shaped shortest distance distribution. In addition to closeness centrality, they can also be a great arsenal for the shortest-path-based management and analysis of the networks. We experimentally validate the efficiency of our algorithms on large networks and show that they update the closeness centrality values of the temporal DBLP-coauthorship network of 1.2 million users 460 times faster than it would take to compute them from scratch. To the best of our knowledge, this is the first work which can yield practical large-scale network management based on closeness centrality values.
△ Less
Submitted 2 March, 2013;
originally announced March 2013.
-
Shattering and Compressing Networks for Centrality Analysis
Authors:
Ahmet Erdem Sarıyüce,
Erik Saule,
Kamer Kaya,
Ümit V. Çatalyürek
Abstract:
Who is more important in a network? Who controls the flow between the nodes or whose contribution is significant for connections? Centrality metrics play an important role while answering these questions. The betweenness metric is useful for network analysis and implemented in various tools. Since it is one of the most computationally expensive kernels in graph mining, several techniques have been…
▽ More
Who is more important in a network? Who controls the flow between the nodes or whose contribution is significant for connections? Centrality metrics play an important role while answering these questions. The betweenness metric is useful for network analysis and implemented in various tools. Since it is one of the most computationally expensive kernels in graph mining, several techniques have been proposed for fast computation of betweenness centrality. In this work, we propose and investigate techniques which compress a network and shatter it into pieces so that the rest of the computation can be handled independently for each piece. Although we designed and tuned the shattering process for betweenness, it can be adapted for other centrality metrics in a straightforward manner. Experimental results show that the proposed techniques can be a great arsenal to reduce the centrality computation time for various types of networks.
△ Less
Submitted 26 September, 2012;
originally announced September 2012.