Search | arXiv e-print repository

Algorithms for Large-scale Network Analysis and the NetworKit Toolkit

Authors: Eugenio Angriman, Alexander van der Grinten, Michael Hamann, Henning Meyerhenke, Manuel Penschuck

Abstract: The abundance of massive network data in a plethora of applications makes scalable analysis algorithms and software tools necessary to generate knowledge from such data in reasonable time. Addressing scalability as well as other requirements such as good usability and a rich feature set, the open-source software NetworKit has established itself as a popular tool for large-scale network analysis. T… ▽ More The abundance of massive network data in a plethora of applications makes scalable analysis algorithms and software tools necessary to generate knowledge from such data in reasonable time. Addressing scalability as well as other requirements such as good usability and a rich feature set, the open-source software NetworKit has established itself as a popular tool for large-scale network analysis. This chapter provides a brief overview of the contributions to NetworKit made by the DFG Priority Programme SPP 1736 Algorithms for Big Data. Algorithmic contributions in the areas of centrality computations, community detection, and sparsification are in the focus, but we also mention several other aspects -- such as current software engineering principles of the project and ways to visualize network data within a NetworKit-based workflow. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2112.12704 [pdf, other]

Deterministic Parallel Hypergraph Partitioning

Authors: Lars Gottesbüren, Michael Hamann

Abstract: Balanced hypergraph partitioning is a classical NP-hard optimization problem with applications in various domains such as VLSI design, simulating quantum circuits, optimizing data placement in distributed databases or minimizing communication volume in high-performance computing. Engineering parallel heuristics for this problem is a topic of recent research. Most of them are non-deterministic thou… ▽ More Balanced hypergraph partitioning is a classical NP-hard optimization problem with applications in various domains such as VLSI design, simulating quantum circuits, optimizing data placement in distributed databases or minimizing communication volume in high-performance computing. Engineering parallel heuristics for this problem is a topic of recent research. Most of them are non-deterministic though. In this work, we design and implement a highly scalable deterministic algorithm in the state-of-the-art parallel partitioning framework Mt-KaHyPar. On our extensive set of benchmark instances, it achieves similar partition quality and performance as a comparable but non-deterministic configuration of Mt-KaHyPar and outperforms the only other existing parallel deterministic algorithm BiPart regarding partition quality, running time and parallel speedups. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2008.10538 [pdf, other]

Towards Flexible Security Testing of OT Devices

Authors: Florian Wilkens, Samuel Botzler, Julia Curts, Skadi Dinter, Malte Hamann, Vincent Hubbe, Aleksandra Kornivetc, Nurefsan Sertbas, Mathias Fischer

Abstract: In the factory of the future traditional and formerly isolated Operational Technology (OT) hardware will become connected with all kinds of networks. This leads to more complex security challenges during design, deployment and use of industrial control systems. As it is infeasible to perform security tests on production hardware and it is expensive to build hardware setups dedicated to security te… ▽ More In the factory of the future traditional and formerly isolated Operational Technology (OT) hardware will become connected with all kinds of networks. This leads to more complex security challenges during design, deployment and use of industrial control systems. As it is infeasible to perform security tests on production hardware and it is expensive to build hardware setups dedicated to security testing, virtualised testbeds are gaining interest. We create a testbed based on a virtualised factory which can be controlled by real and virtualised hardware. This allows for a flexible evaluation of security strategies. △ Less

Submitted 24 August, 2020; originally announced August 2020.

arXiv:2003.14317 [pdf, other]

Engineering Exact Quasi-Threshold Editing

Authors: Lars Gottesbüren, Michael Hamann, Philipp Schoch, Ben Strasser, Dorothea Wagner, Sven Zühlsdorf

Abstract: Quasi-threshold graphs are $\{C_4, P_4\}$-free graphs, i.e., they do not contain any cycle or path of four nodes as an induced subgraph. We study the $\{C_4, P_4\}$-free editing problem, which is the problem of finding a minimum number of edge insertions or deletions to transform an input graph into a quasi-threshold graph. This problem is NP-hard but fixed-parameter tractable (FPT) in the number… ▽ More Quasi-threshold graphs are $\{C_4, P_4\}$-free graphs, i.e., they do not contain any cycle or path of four nodes as an induced subgraph. We study the $\{C_4, P_4\}$-free editing problem, which is the problem of finding a minimum number of edge insertions or deletions to transform an input graph into a quasi-threshold graph. This problem is NP-hard but fixed-parameter tractable (FPT) in the number of edits by using a branch-and-bound algorithm and admits a simple integer linear programming formulation (ILP). Both methods are also applicable to the general $F$-free editing problem for any finite set of graphs $F$. For the FPT algorithm, we introduce a fast heuristic for computing high-quality lower bounds and an improved branching strategy. For the ILP, we engineer several variants of row generation. We evaluate both methods for quasi-threshold editing on a large set of protein similarity graphs. For most instances, our optimizations speed up the FPT algorithm by one to three orders of magnitude. The running time of the ILP, that we solve using Gurobi, becomes only slightly faster. With all optimizations, the FPT algorithm is slightly faster than the ILP, even when listing all solutions. Additionally, we show that for almost all graphs, solutions of the previously proposed quasi-threshold editing heuristic QTM are close to optimal. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: 22 pages, 8 figures, to appear at SEA 2020

arXiv:2003.12110 [pdf, other]

Advanced Flow-Based Multilevel Hypergraph Partitioning

Authors: Lars Gottesbüren, Michael Hamann, Sebastian Schlag, Dorothea Wagner

Abstract: The balanced hypergraph partitioning problem is to partition a hypergraph into $k$ disjoint blocks of bounded size such that the sum of the number of blocks connected by each hyperedge is minimized. We present an improvement to the flow-based refinement framework of KaHyPar-MF, the current state-of-the-art multilevel $k$-way hypergraph partitioning algorithm for high-quality solutions. Our improve… ▽ More The balanced hypergraph partitioning problem is to partition a hypergraph into $k$ disjoint blocks of bounded size such that the sum of the number of blocks connected by each hyperedge is minimized. We present an improvement to the flow-based refinement framework of KaHyPar-MF, the current state-of-the-art multilevel $k$-way hypergraph partitioning algorithm for high-quality solutions. Our improvement is based on the recently proposed HyperFlowCutter algorithm for computing bipartitions of unweighted hypergraphs by solving a sequence of incremental maximum flow problems. Since vertices and hyperedges are aggregated during the coarsening phase, refinement algorithms employed in the multilevel setting must be able to handle both weighted hyperedges and weighted vertices -- even if the initial input hypergraph is unweighted. We therefore enhance HyperFlowCutter to handle weighted instances and propose a technique for computing maximum flows directly on weighted hypergraphs. We compare the performance of two configurations of our new algorithm with KaHyPar-MF and seven other partitioning algorithms on a comprehensive benchmark set with instances from application areas such as VLSI design, scientific computing, and SAT solving. Our first configuration, KaHyPar-HFC, computes slightly better solutions than KaHyPar-MF using significantly less running time. The second configuration, KaHyPar-HFC*, computes solutions of significantly better quality and is still slightly faster than KaHyPar-MF. Furthermore, in terms of solution quality, both configurations also outperform all other competing partitioners. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Comments: To appear at SEA'20

arXiv:2003.00736 [pdf, other]

Recent Advances in Scalable Network Generation

Authors: Manuel Penschuck, Ulrik Brandes, Michael Hamann, Sebastian Lamm, Ulrich Meyer, Ilya Safro, Peter Sanders, Christian Schulz

Abstract: Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to… ▽ More Random graph models are frequently used as a controllable and versatile data source for experimental campaigns in various research fields. Generating such data-sets at scale is a non-trivial task as it requires design decisions typically spanning multiple areas of expertise. Challenges begin with the identification of relevant domain-specific network features, continue with the question of how to compile such features into a tractable model, and culminate in algorithmic details arising while implementing the pertaining model. In the present survey, we explore crucial aspects of random graph models with known scalable generators. We begin by briefly introducing network features considered by such models, and then discuss random graphs alongside with generation algorithms. Our focus lies on modelling techniques and algorithmic primitives that have proven successful in obtaining massive graphs. We consider concepts and graph models for various domains (such as social network, infrastructure, ecology, and numerical simulations), and discuss generators for different models of computation (including shared-memory parallelism, massive-parallel GPUs, and distributed systems). △ Less

Submitted 2 March, 2020; originally announced March 2020.

arXiv:1907.02053 [pdf, other]

Evaluation of a Flow-Based Hypergraph Bipartitioning Algorithm

Authors: Lars Gottesbüren, Michael Hamann, Dorothea Wagner

Abstract: In this paper, we propose HyperFlowCutter, an algorithm for balanced hypergraph bipartitioning. It is based on minimum S-T hyperedge cuts and maximum flows. It computes a sequence of bipartitions that optimize cut size and balance in the Pareto sense, being able to trade one for the other. HyperFlowCutter builds on the FlowCutter algorithm for partitioning graphs. We propose additional features, s… ▽ More In this paper, we propose HyperFlowCutter, an algorithm for balanced hypergraph bipartitioning. It is based on minimum S-T hyperedge cuts and maximum flows. It computes a sequence of bipartitions that optimize cut size and balance in the Pareto sense, being able to trade one for the other. HyperFlowCutter builds on the FlowCutter algorithm for partitioning graphs. We propose additional features, such as handling disconnected hypergraphs, novel methods for obtaining starting S,T pairs as well as an approach to refine a given partition with HyperFlowCutter. Our main contribution is ReBaHFC, a new algorithm which obtains an initial partition with the fast multilevel hypergraph partitioner PaToH and then improves it using HyperFlowCutter as a refinement algorithm. ReBaHFC is able to significantly improve the solution quality of PaToH at little additional running time. The solution quality is only marginally worse than that of the best-performing hypergraph partitioners KaHyPar and hMETIS, while being one order of magnitude faster. Thus ReBaHFC offers a new time-quality trade-off in the current spectrum of hypergraph partitioners. For the special case of perfectly balanced bipartitioning, only the much slower plain HyperFlowCutter yields slightly better solutions than ReBaHFC, while only PaToH is faster than ReBaHFC. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Comments: 22 pages, 7 figures, 2 tables. Full version of the paper appearing in the Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019)

arXiv:1906.11811 [pdf, other]

doi 10.3390/a12090196

Faster and Better Nested Dissection Orders for Customizable Contraction Hierarchies

Authors: Lars Gottesbüren, Michael Hamann, Tim Niklas Uhl, Dorothea Wagner

Abstract: Graph partitioning has many applications. We consider the acceleration of shortest path queries in road networks using Customizable Contraction Hierarchies (CCH). It is based on computing a nested dissection order by recursively dividing the road network into parts. Recently, with FlowCutter and Inertial Flow, two flow-based graph bipartitioning algorithms have been proposed for road networks. Whi… ▽ More Graph partitioning has many applications. We consider the acceleration of shortest path queries in road networks using Customizable Contraction Hierarchies (CCH). It is based on computing a nested dissection order by recursively dividing the road network into parts. Recently, with FlowCutter and Inertial Flow, two flow-based graph bipartitioning algorithms have been proposed for road networks. While FlowCutter achieves high-quality results and thus fast query times, it is rather slow. Inertial Flow is particularly fast due to the use of geographical information while still achieving decent query times. We combine the techniques of both algorithms to achieve more than six times faster preprocessing times than FlowCutter and even faster queries on the Europe road network. We show that using 16 cores of a shared-memory machine, this preprocessing needs four minutes. △ Less

Submitted 2 July, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

Comments: 18 pages, 8 tables; v2: re-run experiments of competing algorithms

arXiv:1905.09282 [pdf, other]

Spatio-Temporal Deep Learning Models for Tip Force Estimation During Needle Insertion

Authors: Nils Gessert, Torben Priegnitz, Thore Saathoff, Sven-Thomas Antoni, David Meyer, Moritz Franz Hamann, Klaus-Peter Jünemann, Christoph Otte, Alexander Schlaefer

Abstract: Purpose. Precise placement of needles is a challenge in a number of clinical applications such as brachytherapy or biopsy. Forces acting at the needle cause tissue deformation and needle deflection which in turn may lead to misplacement or injury. Hence, a number of approaches to estimate the forces at the needle have been proposed. Yet, integrating sensors into the needle tip is challenging and a… ▽ More Purpose. Precise placement of needles is a challenge in a number of clinical applications such as brachytherapy or biopsy. Forces acting at the needle cause tissue deformation and needle deflection which in turn may lead to misplacement or injury. Hence, a number of approaches to estimate the forces at the needle have been proposed. Yet, integrating sensors into the needle tip is challenging and a careful calibration is required to obtain good force estimates. Methods. We describe a fiber-optical needle tip force sensor design using a single OCT fiber for measurement. The fiber images the deformation of an epoxy layer placed below the needle tip which results in a stream of 1D depth profiles. We study different deep learning approaches to facilitate calibration between this spatio-temporal image data and the related forces. In particular, we propose a novel convGRU-CNN architecture for simultaneous spatial and temporal data processing. Results. The needle can be adapted to different operating ranges by changing the stiffness of the epoxy layer. Likewise, calibration can be adapted by training the deep learning models. Our novel convGRU-CNN architecture results in the lowest mean absolute error of 1.59 +- 1.3 mN and a cross-correlation coefficient of 0.9997, and clearly outperforms the other methods. Ex vivo experiments in human prostate tissue demonstrate the needle's application. Conclusions. Our OCT-based fiber-optical sensor presents a viable alternative for needle tip force estimation. The results indicate that the rich spatio-temporal information included in the stream of images showing the deformation throughout the epoxy layer can be effectively used by deep learning models. Particularly, we demonstrate that the convGRU-CNN architecture performs favorably, making it a promising approach for other spatio-temporal learning problems. △ Less

Submitted 22 May, 2019; originally announced May 2019.

Comments: Accepted for publication in the International Journal of Computer Assisted Radiology and Surgery

arXiv:1805.11911 [pdf, other]

doi 10.1007/978-3-030-00937-3_26

Needle Tip Force Estimation using an OCT Fiber and a Fused convGRU-CNN Architecture

Authors: Nils Gessert, Torben Priegnitz, Thore Saathoff, Sven-Thomas Antoni, David Meyer, Moritz Franz Hamann, Klaus-Peter Jünemann, Christoph Otte, Alexander Schlaefer

Abstract: Needle insertion is common during minimally invasive interventions such as biopsy or brachytherapy. During soft tissue needle insertion, forces acting at the needle tip cause tissue deformation and needle deflection. Accurate needle tip force measurement provides information on needle-tissue interaction and helps detecting and compensating potential misplacement. For this purpose we introduce an i… ▽ More Needle insertion is common during minimally invasive interventions such as biopsy or brachytherapy. During soft tissue needle insertion, forces acting at the needle tip cause tissue deformation and needle deflection. Accurate needle tip force measurement provides information on needle-tissue interaction and helps detecting and compensating potential misplacement. For this purpose we introduce an image-based needle tip force estimation method using an optical fiber imaging the deformation of an epoxy layer below the needle tip over time. For calibration and force estimation, we introduce a novel deep learning-based fused convolutional GRU-CNN model which effectively exploits the spatio-temporal data structure. The needle is easy to manufacture and our model achieves a mean absolute error of 1.76 +- 1.5 mN with a cross-correlation coefficient of 0.9996, clearly outperforming other methods. We test needles with different materials to demonstrate that the approach can be adapted for different sensitivities and force ranges. Furthermore, we validate our approach in an ex-vivo prostate needle insertion scenario. △ Less

Submitted 30 May, 2018; originally announced May 2018.

Comments: Accepted for Publication at MICCAI 2018

arXiv:1804.08487 [pdf, other]

Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades

Authors: Corrie Jacobien Carstens, Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, Dorothea Wagner

Abstract: Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, an… ▽ More Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC. Since trades however are more expensive, we study Curveball's practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we investigate global trades processing every node in a graph during a single super step, and show that undirected global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EM-PGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES nearly one order of magnitude faster. △ Less

Submitted 17 August, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

arXiv:1710.09605 [pdf, other]

doi 10.1007/978-3-319-96983-1_49

Distributed Graph Clustering using Modularity and Map Equation

Authors: Michael Hamann, Ben Strasser, Dorothea Wagner, Tim Zeitz

Abstract: We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internal… ▽ More We study large-scale, distributed graph clustering. Given an undirected graph, our objective is to partition the nodes into disjoint sets called clusters. A cluster should contain many internal edges while being sparsely connected to other clusters. In the context of a social network, a cluster could be a group of friends. Modularity and map equation are established formalizations of this internally-dense-externally-sparse principle. We present two versions of a simple distributed algorithm to optimize both measures. They are based on Thrill, a distributed big data processing framework that implements an extended MapReduce model. The algorithms for the two measures, DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality measures is straight-forward. We conduct an extensive experimental study on real-world graphs and on synthetic benchmark graphs with up to 68 billion edges. Our algorithms are fast while detecting clusterings similar to those detected by other sequential, parallel and distributed clustering algorithms. Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is up to an order of magnitude faster and achieves better quality. △ Less

Submitted 7 June, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more details, more results; v2: extended experiments to include comparison with competing algorithms, shortened for submission to Euro-Par 2018

arXiv:1609.02121 [pdf, other]

Generating realistic scaled complex networks

Authors: Christian L. Staudt, Michael Hamann, Alexander Gutfraind, Ilya Safro, Henning Meyerhenke

Abstract: Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of mod… ▽ More Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size. △ Less

Submitted 23 March, 2017; v1 submitted 7 September, 2016; originally announced September 2016.

Comments: 26 pages, 13 figures, extended version, a preliminary version of the paper was presented at the 5th International Workshop on Complex Networks and their Applications

arXiv:1604.08738 [pdf, other]

I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark

Authors: Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, Dorothea Wagner

Abstract: LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministica… ▽ More LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massive parallel distributed algorithm, and is faster than a state-of-the-art internal memory implementation even on instances fitting in main memory. EM-LFR's implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyse the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process. △ Less

Submitted 14 June, 2017; v1 submitted 29 April, 2016; originally announced April 2016.

Comments: 25 pages, 11 figures. We add the sampling of simple graphs using the Configuration Model followed by rewiring steps and experimental results regarding the mixing time of the sampling schemes

ACM Class: I.5.3

arXiv:1601.00286 [pdf, other]

Structure-Preserving Sparsification Methods for Social Networks

Authors: Michael Hamann, Gerd Lindner, Henning Meyerhenke, Christian L. Staudt, Dorothea Wagner

Abstract: Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating… ▽ More Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. In order to assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure. △ Less

Submitted 3 January, 2016; originally announced January 2016.

arXiv:1505.00564 [pdf, other]

Structure-Preserving Sparsification of Social Networks

Authors: Gerd Lindner, Christian L. Staudt, Michael Hamann, Henning Meyerhenke, Dorothea Wagner

Abstract: Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating… ▽ More Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally by these scores. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of 100 Facebook social networks with respect to network properties including diameter, connected components, community structure, and multiple node centrality measures. Experiments with our implementations of the sparsification methods (using the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges. Furthermore, the experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. Our Local Degree method is fast enough for large-scale networks and performs well across a wider range of properties than previously proposed methods. △ Less

Submitted 4 May, 2015; originally announced May 2015.

Comments: 8 pages

arXiv:1504.07379 [pdf, other]

doi 10.1007/978-3-662-48350-3_22

Fast Quasi-Threshold Editing

Authors: Ulrik Brandes, Michael Hamann, Ben Strasser, Dorothea Wagner

Abstract: We introduce Quasi-Threshold Mover (QTM), an algorithm to solve the quasi-threshold (also called trivially perfect) graph editing problem with edge insertion and deletion. Given a graph it computes a quasi-threshold graph which is close in terms of edit count. This edit problem is NP-hard. We present an extensive experimental study, in which we show that QTM is the first algorithm that is able to… ▽ More We introduce Quasi-Threshold Mover (QTM), an algorithm to solve the quasi-threshold (also called trivially perfect) graph editing problem with edge insertion and deletion. Given a graph it computes a quasi-threshold graph which is close in terms of edit count. This edit problem is NP-hard. We present an extensive experimental study, in which we show that QTM is the first algorithm that is able to scale to large real-world graphs in practice. As a side result we further present a simple linear-time algorithm for the quasi-threshold recognition problem. △ Less

Submitted 28 April, 2015; originally announced April 2015.

Comments: 26 pages, 4 figures, submitted to ESA 2015

arXiv:1504.03812 [pdf, other]

Graph Bisection with Pareto-Optimization

Authors: Michael Hamann, Ben Strasser

Abstract: We introduce FlowCutter, a novel algorithm to compute a set of edge cuts or node separators that optimize cut size and balance in the Pareto-sense. Our core algorithm solves the balanced connected st-edge-cut problem, where two given nodes s and t must be separated by removing edges to obtain two connected parts. Using the core algorithm we build variants that compute node separators and are indep… ▽ More We introduce FlowCutter, a novel algorithm to compute a set of edge cuts or node separators that optimize cut size and balance in the Pareto-sense. Our core algorithm solves the balanced connected st-edge-cut problem, where two given nodes s and t must be separated by removing edges to obtain two connected parts. Using the core algorithm we build variants that compute node separators and are independent of s and t. Using the Pareto-set we can identify cuts with a particularly good trade-off between cut size and balance that can be used to compute contraction and minimum fill-in orders, which can be used in Customizable Contraction Hierarchies (CCH), a speed-up technique for shortest path computations. Our core algorithm runs in O(cm) time where m is the number of edges and c the cut size. This makes it well-suited for large graphs with small cuts, such as road graphs, which are our primary application. For road graphs we present an extensive experimental study demonstrating that FlowCutter outperforms the current state of the art both in terms of cut sizes as well as CCH performance. △ Less

Submitted 22 November, 2017; v1 submitted 15 April, 2015; originally announced April 2015.

arXiv:1503.01592 [pdf, ps, other]

Bounding connected tree-width

Authors: Matthias Hamann, Daniel Weißauer

Abstract: Diestel and Müller showed that the connected tree-width of a graph $G$, i.e., the minimum width of any tree-decomposition with connected parts, can be bounded in terms of the tree-width of $G$ and the largest length of a geodesic cycle in $G$. We improve their bound to one that is of correct order of magnitude. Finally, we construct a graph whose connected tree-width exceeds the connected order of… ▽ More Diestel and Müller showed that the connected tree-width of a graph $G$, i.e., the minimum width of any tree-decomposition with connected parts, can be bounded in terms of the tree-width of $G$ and the largest length of a geodesic cycle in $G$. We improve their bound to one that is of correct order of magnitude. Finally, we construct a graph whose connected tree-width exceeds the connected order of any of its brambles. This disproves a conjecture by Diestel and Müller asserting an analogue of tree-width duality. △ Less

Submitted 8 June, 2015; v1 submitted 5 March, 2015; originally announced March 2015.

Comments: 12 pages

Journal ref: SIAM J. Discrete Math., 30(3):1391-1400, 2016

arXiv:1305.0757 [pdf, other]

Hierarchies of Predominantly Connected Communities

Authors: Michael Hamann, Tanja Hartmann, Dorothea Wagner

Abstract: We consider communities whose vertices are predominantly connected, i.e., the vertices in each community are stronger connected to other community members of the same community than to vertices outside the community. Flake et al. introduced a hierarchical clustering algorithm that finds such predominantly connected communities of different coarseness depending on an input parameter. We present a s… ▽ More We consider communities whose vertices are predominantly connected, i.e., the vertices in each community are stronger connected to other community members of the same community than to vertices outside the community. Flake et al. introduced a hierarchical clustering algorithm that finds such predominantly connected communities of different coarseness depending on an input parameter. We present a simple and efficient method for constructing a clustering hierarchy according to Flake et al. that supersedes the necessity of choosing feasible parameter values and guarantees the completeness of the resulting hierarchy, i.e., the hierarchy contains all clusterings that can be constructed by the original algorithm for any parameter value. However, predominantly connected communities are not organized in a single hierarchy. Thus, we develop a framework that, after precomputing at most $2(n-1)$ maximum flows, admits a linear time construction of a clustering $\C(S)$ of predominantly connected communities that contains a given community $S$ and is maximum in the sense that any further clustering of predominantly connected communities that also contains $S$ is hierarchically nested in $\C(S)$. We further generalize this construction yielding a clustering with similar properties for $k$ given communities in $O(kn)$ time. This admits the analysis of a network's structure with respect to various communities in different hierarchies. △ Less

Submitted 3 May, 2013; originally announced May 2013.

Comments: to appear (WADS 2013)

Showing 1–20 of 20 results for author: Hamann, M