-
Percolation Thresholds for Robust Network Connectivity
Authors:
Arman Mohseni-Kabir,
Mihir Pant,
Don Towsley,
Saikat Guha,
Ananthram Swami
Abstract:
Communication networks, power grids, and transportation networks are all examples of networks whose performance depends on reliable connectivity of their underlying network components even in the presence of usual network dynamics due to mobility, node or edge failures, and varying traffic loads. Percolation theory quantifies the threshold value of a local control parameter such as a node occupati…
▽ More
Communication networks, power grids, and transportation networks are all examples of networks whose performance depends on reliable connectivity of their underlying network components even in the presence of usual network dynamics due to mobility, node or edge failures, and varying traffic loads. Percolation theory quantifies the threshold value of a local control parameter such as a node occupation (resp., deletion) probability or an edge activation (resp., removal) probability above (resp., below) which there exists a giant connected component (GCC), a connected component comprising of a number of occupied nodes and active edges whose size is proportional to the size of the network itself. Any pair of occupied nodes in the GCC is connected via at least one path comprised of active edges and occupied nodes. The mere existence of the GCC itself does not guarantee that the long-range connectivity would be robust, e.g., to random link or node failures due to network dynamics. In this paper, we explore new percolation thresholds that guarantee not only spanning network connectivity, but also robustness. We define and analyze four measures of robust network connectivity, explore their interrelationships, and numerically evaluate the respective robust percolation thresholds for the 2D square lattice.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Entanglement generation in a quantum network at distance-independent rate
Authors:
Ashlesha Patil,
Mihir Pant,
Dirk Englund,
Don Towsley,
Saikat Guha
Abstract:
We develop a protocol for entanglement generation in the quantum internet that allows a repeater node to use $n$-qubit Greenberger-Horne-Zeilinger (GHZ) projective measurements that can fuse $n$ successfully-entangled {\em links}, i.e., two-qubit entangled Bell pairs shared across $n$ network edges, incident at that node. Implementing $n$-fusion, for $n \ge 3$, is in principle not much harder than…
▽ More
We develop a protocol for entanglement generation in the quantum internet that allows a repeater node to use $n$-qubit Greenberger-Horne-Zeilinger (GHZ) projective measurements that can fuse $n$ successfully-entangled {\em links}, i.e., two-qubit entangled Bell pairs shared across $n$ network edges, incident at that node. Implementing $n$-fusion, for $n \ge 3$, is in principle not much harder than $2$-fusions (Bell-basis measurements) in solid-state qubit memories. If we allow even $3$-fusions at the nodes, we find---by developing a connection to a modified version of the site-bond percolation problem---that despite lossy (hence probabilistic) link-level entanglement generation, and probabilistic success of the fusion measurements at nodes, one can generate entanglement between end parties Alice and Bob at a rate that stays constant as the distance between them increases. We prove that this powerful network property is not possible to attain with any quantum networking protocol built with Bell measurements and multiplexing alone. We also design a two-party quantum key distribution protocol that converts the entangled states shared between two nodes into a shared secret, at a key generation rate that is independent of the distance between the two parties.
△ Less
Submitted 10 August, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Quantum Networks For Open Science
Authors:
Thomas Ndousse-Fetter,
Nicholas Peters,
Warren Grice,
Prem Kumar,
Tom Chapuran,
Saikat Guha,
Scott Hamilton,
Inder Monga,
Ray Newell,
Andrei Nomerotski,
Don Towsley,
Ben Yoo
Abstract:
The United States Department of Energy convened the Quantum Networks for Open Science (QNOS) Workshop in September 2018. The workshop was primarily focused on quantum networks optimized for scientific applications with the expectation that the resulting quantum networks could be extended to lay the groundwork for a generalized network that will evolve into a quantum internet.
The United States Department of Energy convened the Quantum Networks for Open Science (QNOS) Workshop in September 2018. The workshop was primarily focused on quantum networks optimized for scientific applications with the expectation that the resulting quantum networks could be extended to lay the groundwork for a generalized network that will evolve into a quantum internet.
△ Less
Submitted 27 March, 2019;
originally announced October 2019.
-
On a Class of Stochastic Multilayer Networks
Authors:
Bo Jiang,
Philippe Nain,
Don Towsley,
Saikat Guha
Abstract:
In this paper, we introduce a new class of stochastic multilayer networks. A stochastic multilayer network is the aggregation of $M$ networks (one per layer) where each is a subgraph of a foundational network $G$. Each layer network is the result of probabilistically removing links and nodes from $G$. The resulting network includes any link that appears in at least $K$ layers. This model is an ins…
▽ More
In this paper, we introduce a new class of stochastic multilayer networks. A stochastic multilayer network is the aggregation of $M$ networks (one per layer) where each is a subgraph of a foundational network $G$. Each layer network is the result of probabilistically removing links and nodes from $G$. The resulting network includes any link that appears in at least $K$ layers. This model is an instance of a non-standard site-bond percolation model. Two sets of results are obtained: first, we derive the probability distribution that the $M$-layer network is in a given configuration for some particular graph structures (explicit results are provided for a line and an algorithm is provided for a tree), where a configuration is the collective state of all links (each either active or inactive). Next, we show that for appropriate scalings of the node and link selection processes in a layer, links are asymptotically independent as the number of layers goes to infinity, and follow Poisson distributions. Numerical results are provided to highlight the impact of having several layers on some metrics of interest (including expected size of the cluster a node belongs to in the case of the line). This model finds applications in wireless communication networks with multichannel radios, multiple social networks with overlapping memberships, transportation networks, and, more generally, in any scenario where a common set of nodes can be linked via co-existing means of connectivity.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Sampling Online Social Networks by Random Walk with Indirect Jumps
Authors:
Junzhou Zhao,
Pinghui Wang,
John C. S. Lui,
Don Towsley,
Xiaohong Guan
Abstract:
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In thi…
▽ More
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the necessity of UNI but still enjoy the similar benefits as RWwJ. We observe that many graphs under study, called target graphs, do not exist in isolation. In many situations, a target graph is related to an auxiliary graph and a bipartite graph, and they together form a better connected {\em two-layered network structure}. This new viewpoint brings extra benefits to graph sampling: if directly sampling a target graph is difficult, we can sample it indirectly with the assistance of the other two graphs. We propose a series of new graph sampling techniques by exploiting such a two-layered network structure to estimate target graph characteristics. Experiments conducted on both synthetic and real-world networks demonstrate the effectiveness and usefulness of these new techniques.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Characterizing Directed and Undirected Networks via Multidimensional Walks with Jumps
Authors:
Fabricio Murai,
Bruno Ribeiro,
Don Towsley,
Pinghui Wang
Abstract:
Estimating distributions of node characteristics (labels) such as number of connections or citizenship of users in a social network via edge and node sampling is a vital part of the study of complex networks. Due to its low cost, sampling via a random walk (RW) has been proposed as an attractive solution to this task. Most RW methods assume either that the network is undirected or that walkers can…
▽ More
Estimating distributions of node characteristics (labels) such as number of connections or citizenship of users in a social network via edge and node sampling is a vital part of the study of complex networks. Due to its low cost, sampling via a random walk (RW) has been proposed as an attractive solution to this task. Most RW methods assume either that the network is undirected or that walkers can traverse edges regardless of their direction. Some RW methods have been designed for directed networks where edges coming into a node are not directly observable. In this work, we propose Directed Unbiased Frontier Sampling (DUFS), a sampling method based on a large number of coordinated walkers, each starting from a node chosen uniformly at random. It is applicable to directed networks with invisible incoming edges because it constructs, in real-time, an undirected graph consistent with the walkers trajectories, and due to the use of random jumps which prevent walkers from being trapped. DUFS generalizes previous RW methods and is suited for undirected networks and to directed networks regardless of in-edges visibility. We also propose an improved estimator of node label distributions that combines information from the initial walker locations with subsequent RW observations. We evaluate DUFS, compare it to other RW methods, investigate the impact of its parameters on estimation accuracy and provide practical guidelines for choosing them. In estimating out-degree distributions, DUFS yields significantly better estimates of the head of the distribution than other methods, while matching or exceeding estimation accuracy of the tail. Last, we show that DUFS outperforms uniform node sampling when estimating distributions of node labels of the top 10% largest degree nodes, even when sampling a node uniformly has the same cost as RW steps.
△ Less
Submitted 13 July, 2018; v1 submitted 23 March, 2017;
originally announced March 2017.
-
Minfer: Inferring Motif Statistics From Sampled Edges
Authors:
Pinghui Wang,
John C. S. Lui,
Don Towsley
Abstract:
Characterizing motif (i.e., locally connected subgraph patterns) statistics is important for understanding complex networks such as online social networks and communication networks. Previous work made the strong assumption that the graph topology of interest is known, and that the dataset either fits into main memory or stored on disks such that it is not expensive to obtain all neighbors of any…
▽ More
Characterizing motif (i.e., locally connected subgraph patterns) statistics is important for understanding complex networks such as online social networks and communication networks. Previous work made the strong assumption that the graph topology of interest is known, and that the dataset either fits into main memory or stored on disks such that it is not expensive to obtain all neighbors of any given node. In practice, researchers have to deal with the situation where the graph topology is unknown, either because the graph is dynamic, or because it is expensive to collect and store all topological and meta information on disk. Hence, what is available to researchers is only a snapshot of the graph generated by sampling edges from the graph at random, which we called a "RESampled graph". Clearly, a RESampled graph's motif statistics may be quite different from the underlying original graph. To solve this challenge, we propose a framework and implement a system called Minfer, which can take the given RESampled graph and accurately infer the underlying graph's motif statistics. We also use Fisher information to bound the error of our estimates. Experiments using large scale datasets show that our method to be accurate.
△ Less
Submitted 23 February, 2015;
originally announced February 2015.
-
Reciprocity in Social Networks with Capacity Constraints
Authors:
Bo Jiang,
Zhi-Li Zhang,
Don Towsley
Abstract:
Directed links -- representing asymmetric social ties or interactions (e.g., "follower-followee") -- arise naturally in many social networks and other complex networks, giving rise to directed graphs (or digraphs) as basic topological models for these networks. Reciprocity, defined for a digraph as the percentage of edges with a reciprocal edge, is a key metric that has been used in the literature…
▽ More
Directed links -- representing asymmetric social ties or interactions (e.g., "follower-followee") -- arise naturally in many social networks and other complex networks, giving rise to directed graphs (or digraphs) as basic topological models for these networks. Reciprocity, defined for a digraph as the percentage of edges with a reciprocal edge, is a key metric that has been used in the literature to compare different directed networks and provide "hints" about their structural properties: for example, are reciprocal edges generated randomly by chance or are there other processes driving their generation? In this paper we study the problem of maximizing achievable reciprocity for an ensemble of digraphs with the same prescribed in- and out-degree sequences. We show that the maximum reciprocity hinges crucially on the in- and out-degree sequences, which may be intuitively interpreted as constraints on some "social capacities" of nodes and impose fundamental limits on achievable reciprocity. We show that it is NP-complete to decide the achievability of a simple upper bound on maximum reciprocity, and provide conditions for achieving it. We demonstrate that many real networks exhibit reciprocities surprisingly close to the upper bound, which implies that users in these social networks are in a sense more "social" than suggested by the empirical reciprocity alone in that they are more willing to reciprocate, subject to their "social capacity" constraints. We find some surprising linear relationships between empirical reciprocity and the bound. We also show that a particular type of small network motifs that we call 3-paths are the major source of loss in reciprocity for real networks.
△ Less
Submitted 18 June, 2015; v1 submitted 13 December, 2014;
originally announced December 2014.
-
Tracking Triadic Cardinality Distributions for Burst Detection in Social Activity Streams
Authors:
Junzhou Zhao,
John C. S. Lui,
Don Towsley,
Pinghui Wang,
Xiaohong Guan
Abstract:
In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., we receive/send more greetings from/to our friends on Christmas Day, than usual. We also observe that some videos suddenly go viral through people's sharing in online social networks (OSNs). Do these seemingly different phenomena share a common structure?
All these phenomena a…
▽ More
In everyday life, we often observe unusually frequent interactions among people before or during important events, e.g., we receive/send more greetings from/to our friends on Christmas Day, than usual. We also observe that some videos suddenly go viral through people's sharing in online social networks (OSNs). Do these seemingly different phenomena share a common structure?
All these phenomena are associated with sudden surges of user activities in networks, which we call "bursts" in this work. We find that the emergence of a burst is accompanied with the formation of triangles in networks. This finding motivates us to propose a new method to detect bursts in OSNs. We first introduce a new measure, "triadic cardinality distribution", corresponding to the fractions of nodes with different numbers of triangles, i.e., triadic cardinalities, within a network. We demonstrate that this distribution changes when a burst occurs, and is naturally immunized against spamming social-bot attacks. Hence, by tracking triadic cardinality distributions, we can reliably detect bursts in OSNs. To avoid handling massive activity data generated by OSN users, we design an efficient sample-estimate solution to estimate the triadic cardinality distribution from sampled data. Extensive experiments conducted on real data demonstrate the usefulness of this triadic cardinality distribution and the effectiveness of our sample-estimate solution.
△ Less
Submitted 4 June, 2015; v1 submitted 14 November, 2014;
originally announced November 2014.
-
Learning to Generate Networks
Authors:
James Atwood,
Don Towsley,
Krista Gile,
David Jensen
Abstract:
We investigate the problem of learning to generate complex networks from data. Specifically, we consider whether deep belief networks, dependency networks, and members of the exponential random graph family can learn to generate networks whose complex behavior is consistent with a set of input examples. We find that the deep model is able to capture the complex behavior of small networks, but that…
▽ More
We investigate the problem of learning to generate complex networks from data. Specifically, we consider whether deep belief networks, dependency networks, and members of the exponential random graph family can learn to generate networks whose complex behavior is consistent with a set of input examples. We find that the deep model is able to capture the complex behavior of small networks, but that no model is able capture this behavior for networks with more than a handful of nodes.
△ Less
Submitted 10 November, 2014; v1 submitted 22 May, 2014;
originally announced May 2014.
-
Design of Efficient Sampling Methods on Hybrid Social-Affiliation Networks
Authors:
Junzhou Zhao,
John C. S. Lui,
Don Towsley,
Pinghui Wang,
Xiaohong Guan
Abstract:
Graph sampling via crawling has become increasingly popular and important in the study of measuring various characteristics of large scale complex networks. While powerful, it is known to be challenging when the graph is loosely connected or disconnected which slows down the convergence of random walks and can cause poor estimation accuracy.
In this work, we observe that the graph under study, o…
▽ More
Graph sampling via crawling has become increasingly popular and important in the study of measuring various characteristics of large scale complex networks. While powerful, it is known to be challenging when the graph is loosely connected or disconnected which slows down the convergence of random walks and can cause poor estimation accuracy.
In this work, we observe that the graph under study, or called target graph, usually does not exist in isolation. In many situations, the target graph is related to an auxiliary graph and an affiliation graph, and the target graph becomes well connected when we view it from the perspective of these three graphs together, or called a hybrid social-affiliation graph in this paper. When directly sampling the target graph is difficult or inefficient, we can indirectly sample it efficiently with the assistances of the other two graphs. We design three sampling methods on such a hybrid social-affiliation network. Experiments conducted on both synthetic and real datasets demonstrate the effectiveness of our proposed methods.
△ Less
Submitted 20 May, 2014;
originally announced May 2014.
-
Efficient Network Generation Under General Preferential Attachment
Authors:
James Atwood,
Bruno Ribeiro,
Don Towsley
Abstract:
Preferential attachment (PA) models of network structure are widely used due to their explanatory power and conceptual simplicity. PA models are able to account for the scale-free degree distributions observed in many real-world large networks through the remarkably simple mechanism of sequentially introducing nodes that attach preferentially to high-degree nodes. The ability to efficiently genera…
▽ More
Preferential attachment (PA) models of network structure are widely used due to their explanatory power and conceptual simplicity. PA models are able to account for the scale-free degree distributions observed in many real-world large networks through the remarkably simple mechanism of sequentially introducing nodes that attach preferentially to high-degree nodes. The ability to efficiently generate instances from PA models is a key asset in understanding both the models themselves and the real networks that they represent. Surprisingly, little attention has been paid to the problem of efficient instance generation. In this paper, we show that the complexity of generating network instances from a PA model depends on the preference function of the model, provide efficient data structures that work under any preference function, and present empirical results from an implementation based on these data structures. We demonstrate that, by indexing growing networks with a simple augmented heap, we can implement a network generator which scales many orders of magnitude beyond existing capabilities ($10^6$ -- $10^8$ nodes). We show the utility of an efficient and general PA network generator by investigating the consequences of varying the preference functions of an existing model. We also provide "quicknet", a freely-available open-source implementation of the methods described in this work.
△ Less
Submitted 20 May, 2014; v1 submitted 18 March, 2014;
originally announced March 2014.
-
On the duration and intensity of cumulative advantage competitions
Authors:
Bo Jiang,
Liyuan Sun,
Daniel R. Figueiredo,
Bruno Ribeiro,
Don Towsley
Abstract:
The role of skill (fitness) and luck (randomness) as driving forces on the dynamics of resource accumulation in a myriad of systems have long puzzled scientists. Fueled by undisputed inequalities that emerge from actual competitions, there is a pressing need for better understanding the effects of skill and luck in resource accumulation. When such competitions are driven by externalities such as c…
▽ More
The role of skill (fitness) and luck (randomness) as driving forces on the dynamics of resource accumulation in a myriad of systems have long puzzled scientists. Fueled by undisputed inequalities that emerge from actual competitions, there is a pressing need for better understanding the effects of skill and luck in resource accumulation. When such competitions are driven by externalities such as cumulative advantage (CA), the rich-get-richer effect, little is known with respect to fundamental properties such as their duration and intensity. In this work we provide a mathematical understanding of how CA exacerbates the role of luck in detriment of skill in simple and well-studied competition models. We show, for instance, that if two agents are competing for resources that arrive sequentially at each time unit, an early stroke of luck can place the less skilled in the lead for an extremely long period of time, a phenomenon we call "struggle of the fittest". In the absence of CA, the more skilled quickly prevails despite any early stroke of luck that the less skilled may have. We prove that duration of a simple skill and luck competition model exhibit power law tails when CA is present, regardless of skill difference, which is in sharp contrast to exponential tails when CA is absent. Our findings have important implications to competitions not only in complex social systems but also in contexts that leverage such models.
△ Less
Submitted 13 December, 2014; v1 submitted 18 February, 2014;
originally announced February 2014.
-
Classifying Latent Infection States in Complex Networks
Authors:
Yeon-sup Lim,
Bruno Ribeiro,
Don Towsley
Abstract:
Algorithms for identifying the infection states of nodes in a network are crucial for understanding and containing infections. Often, however, only a relatively small set of nodes have a known infection state. Moreover, the length of time that each node has been infected is also unknown. This missing data -- infection state of most nodes and infection time of the unobserved infected nodes -- poses…
▽ More
Algorithms for identifying the infection states of nodes in a network are crucial for understanding and containing infections. Often, however, only a relatively small set of nodes have a known infection state. Moreover, the length of time that each node has been infected is also unknown. This missing data -- infection state of most nodes and infection time of the unobserved infected nodes -- poses a challenge to the study of real-world cascades.
In this work, we develop techniques to identify the latent infected nodes in the presence of missing infection time-and-state data. Based on the likely epidemic paths predicted by the simple susceptible-infected epidemic model, we propose a measure (Infection Betweenness) for uncovering these unknown infection states. Our experimental results using machine learning algorithms show that Infection Betweenness is the most effective feature for identifying latent infected nodes.
△ Less
Submitted 31 January, 2014;
originally announced February 2014.
-
Online Dating Recommendations: Matching Markets and Learning Preferences
Authors:
Kun Tu,
Bruno Ribeiro,
Hua Jiang,
Xiaodong Wang,
David Jensen,
Benyuan Liu,
Don Towsley
Abstract:
Recommendation systems for online dating have recently attracted much attention from the research community. In this paper we proposed a two-side matching framework for online dating recommendations and design an LDA model to learn the user preferences from the observed user messaging behavior and user profile features. Experimental results using data from a large online dating website shows that…
▽ More
Recommendation systems for online dating have recently attracted much attention from the research community. In this paper we proposed a two-side matching framework for online dating recommendations and design an LDA model to learn the user preferences from the observed user messaging behavior and user profile features. Experimental results using data from a large online dating website shows that two-sided matching improves significantly the rate of successful matches by as much as 45%. Finally, using simulated matchings we show that the the LDA model can correctly capture user preferences.
△ Less
Submitted 30 January, 2014;
originally announced January 2014.
-
Who is Dating Whom: Characterizing User Behaviors of a Large Online Dating Site
Authors:
Peng Xia,
Kun Tu,
Bruno Ribeiro,
Hua Jiang,
Xiaodong Wang,
Cindy Chen,
Benyuan Liu,
Don Towsley
Abstract:
Online dating sites have become popular platforms for people to look for potential romantic partners. It is important to understand users' dating preferences in order to make better recommendations on potential dates. The message sending and replying actions of a user are strong indicators for what he/she is looking for in a potential date and reflect the user's actual dating preferences. We study…
▽ More
Online dating sites have become popular platforms for people to look for potential romantic partners. It is important to understand users' dating preferences in order to make better recommendations on potential dates. The message sending and replying actions of a user are strong indicators for what he/she is looking for in a potential date and reflect the user's actual dating preferences. We study how users' online dating behaviors correlate with various user attributes using a large real-world dateset from a major online dating site in China. Many of our results on user messaging behavior align with notions in social and evolutionary psychology: males tend to look for younger females while females put more emphasis on the socioeconomic status (e.g., income, education level) of a potential date. In addition, we observe that the geographic distance between two users and the photo count of users play an important role in their dating behaviors. Our results show that it is important to differentiate between users' true preferences and random selection. Some user behaviors in choosing attributes in a potential date may largely be a result of random selection. We also find that both males and females are more likely to reply to users whose attributes come closest to the stated preferences of the receivers, and there is significant discrepancy between a user's stated dating preference and his/her actual online dating behavior. These results can provide valuable guidelines to the design of a recommendation engine for potential dates.
△ Less
Submitted 22 January, 2014;
originally announced January 2014.
-
Sampling Content Distributed Over Graphs
Authors:
Pinghui Wang,
Junzhou Zhao,
John C. S. Lui,
Don Towsley,
Xiaohong Guan
Abstract:
Despite recent effort to estimate topology characteristics of large graphs (i.e., online social networks and peer-to-peer networks), little attention has been given to develop a formal methodology to characterize the vast amount of content distributed over these networks. Due to the large scale nature of these networks, exhaustive enumeration of this content is computationally prohibitive. In this…
▽ More
Despite recent effort to estimate topology characteristics of large graphs (i.e., online social networks and peer-to-peer networks), little attention has been given to develop a formal methodology to characterize the vast amount of content distributed over these networks. Due to the large scale nature of these networks, exhaustive enumeration of this content is computationally prohibitive. In this paper, we show how one can obtain content properties by sampling only a small fraction of vertices. We first show that when sampling is naively applied, this can produce a huge bias in content statistics (i.e., average number of content duplications). To remove this bias, one may use maximum likelihood estimation to estimate content characteristics. However our experimental results show that one needs to sample most vertices in the graph to obtain accurate statistics using such a method. To address this challenge, we propose two efficient estimators: special copy estimator (SCE) and weighted copy estimator (WCE) to measure content characteristics using available information in sampled contents. SCE uses the special content copy indicator to compute the estimate, while WCE derives the estimate based on meta-information in sampled vertices. We perform experiments to show WCE and SCE are cost effective and also ``{\em asymptotically unbiased}''. Our methodology provides a new tool for researchers to efficiently query content distributed in large scale networks.
△ Less
Submitted 13 November, 2013;
originally announced November 2013.
-
Practical Characterization of Large Networks Using Neighborhood Information
Authors:
Pinghui Wang,
Bruno Ribeiro,
Junzhou Zhao,
John C. S. Lui,
Don Towsley,
Xiaohong Guan
Abstract:
Characterizing large online social networks (OSNs) through node querying is a challenging task. OSNs often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network. Various ad-hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on develop…
▽ More
Characterizing large online social networks (OSNs) through node querying is a challenging task. OSNs often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network. Various ad-hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for OSNs where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web API available on most major OSNs for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than current state-of-the-art methods. We characterize metrics such as node label density estimation and edge label density estimation, two of the most fundamental network characteristics from which other network characteristics can be derived. We evaluate our methods on-the-fly over several live networks using their native APIs. Our simulation studies over a variety of offline datasets show that by including neighborhood information, our method drastically (4-fold) reduces the number of samples required to achieve the same estimation accuracy of state-of-the-art methods.
△ Less
Submitted 13 November, 2013;
originally announced November 2013.
-
Efficiently Estimating Motif Statistics of Large Networks
Authors:
Pinghui Wang,
John C. S. Lui,
Bruno Ribeiro,
Don Towsley,
Junzhou Zhao,
Xiaohong Guan
Abstract:
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and online social networks (OSNs). Nowadays the massive size of some critical networks -- often stored in already overloaded relational databases -- effectively limits the rate at which nodes and edges can be explored, making i…
▽ More
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and online social networks (OSNs). Nowadays the massive size of some critical networks -- often stored in already overloaded relational databases -- effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no pre-computation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known data sets, and show that for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.
△ Less
Submitted 27 March, 2014; v1 submitted 22 June, 2013;
originally announced June 2013.
-
Social Sensor Placement in Large Scale Networks: A Graph Sampling Perspective
Authors:
Junzhou Zhao,
John C. S. Lui,
Don Towsley,
Xiaohong Guan,
Pinghui Wang
Abstract:
Sensor placement for the purpose of detecting/tracking news outbreak and preventing rumor spreading is a challenging problem in a large scale online social network (OSN). This problem is a kind of subset selection problem: choosing a small set of items from a large population so to maximize some prespecified set function. However, it is known to be NP-complete. Existing heuristics are very costly…
▽ More
Sensor placement for the purpose of detecting/tracking news outbreak and preventing rumor spreading is a challenging problem in a large scale online social network (OSN). This problem is a kind of subset selection problem: choosing a small set of items from a large population so to maximize some prespecified set function. However, it is known to be NP-complete. Existing heuristics are very costly especially for modern OSNs which usually contain hundreds of millions of users. This paper aims to design methods to find \emph{good solutions} that can well trade off efficiency and accuracy. We first show that it is possible to obtain a high quality solution with a probabilistic guarantee from a "{\em candidate set}" of the underlying social network. By exploring this candidate set, one can increase the efficiency of placing social sensors. We also present how this candidate set can be obtained using "{\em graph sampling}", which has an advantage over previous methods of not requiring the prior knowledge of the complete network topology. Experiments carried out on two real datasets demonstrate not only the accuracy and efficiency of our approach, but also effectiveness in detecting and predicting news outbreak.
△ Less
Submitted 6 December, 2013; v1 submitted 28 May, 2013;
originally announced May 2013.
-
Online Myopic Network Covering
Authors:
Konstantin Avrachenkov,
Prithwish Basu,
Giovanni Neglia,
Bruno Ribeiro,
Don Towsley
Abstract:
Efficient marketing or awareness-raising campaigns seek to recruit $n$ influential individuals -- where $n$ is the campaign budget -- that are able to cover a large target audience through their social connections. So far most of the related literature on maximizing this network cover assumes that the social network topology is known. Even in such a case the optimal solution is NP-hard. In practic…
▽ More
Efficient marketing or awareness-raising campaigns seek to recruit $n$ influential individuals -- where $n$ is the campaign budget -- that are able to cover a large target audience through their social connections. So far most of the related literature on maximizing this network cover assumes that the social network topology is known. Even in such a case the optimal solution is NP-hard. In practice, however, the network topology is generally unknown and needs to be discovered on-the-fly. In this work we consider an unknown topology where recruited individuals disclose their social connections (a feature known as {\em one-hop lookahead}). The goal of this work is to provide an efficient greedy online algorithm that recruits individuals as to maximize the size of target audience covered by the campaign.
We propose a new greedy online algorithm, Maximum Expected $d$-Excess Degree (MEED), and provide, to the best of our knowledge, the first detailed theoretical analysis of the cover size of a variety of well known network sampling algorithms on finite networks. Our proposed algorithm greedily maximizes the expected size of the cover. For a class of random power law networks we show that MEED simplifies into a straightforward procedure, which we denote MOD (Maximum Observed Degree). We substantiate our analytical results with extensive simulations and show that MOD significantly outperforms all analyzed myopic algorithms. We note that performance may be further improved if the node degree distribution is known or can be estimated online during the campaign.
△ Less
Submitted 20 December, 2012;
originally announced December 2012.
-
Multiple Random Walks to Uncover Short Paths in Power Law Networks
Authors:
Bruno Ribeiro,
Prithwish Basu,
Don Towsley
Abstract:
Consider the following routing problem in the context of a large scale network $G$, with particular interest paid to power law networks, although our results do not assume a particular degree distribution. A small number of nodes want to exchange messages and are looking for short paths on $G$. These nodes do not have access to the topology of $G$ but are allowed to crawl the network within a limi…
▽ More
Consider the following routing problem in the context of a large scale network $G$, with particular interest paid to power law networks, although our results do not assume a particular degree distribution. A small number of nodes want to exchange messages and are looking for short paths on $G$. These nodes do not have access to the topology of $G$ but are allowed to crawl the network within a limited budget. Only crawlers whose sample paths cross are allowed to exchange topological information. In this work we study the use of random walks (RWs) to crawl $G$. We show that the ability of RWs to find short paths bears no relation to the paths that they take. Instead, it relies on two properties of RWs on power law networks: 1) RW's ability observe a sizable fraction of the network edges; and 2) an almost certainty that two distinct RW sample paths cross after a small percentage of the nodes have been visited. We show promising simulation results on several real world networks.
△ Less
Submitted 26 May, 2012;
originally announced May 2012.
-
Quick Detection of Nodes with Large Degrees
Authors:
Konstantin Avrachenkov,
Nelly Litvak,
Marina Sokol,
Don Towsley
Abstract:
Our goal is to quickly find top $k$ lists of nodes with the largest degrees in large complex networks. If the adjacency list of the network is known (not often the case in complex networks), a deterministic algorithm to find a node with the largest degree requires an average complexity of O(n), where $n$ is the number of nodes in the network. Even this modest complexity can be very high for large…
▽ More
Our goal is to quickly find top $k$ lists of nodes with the largest degrees in large complex networks. If the adjacency list of the network is known (not often the case in complex networks), a deterministic algorithm to find a node with the largest degree requires an average complexity of O(n), where $n$ is the number of nodes in the network. Even this modest complexity can be very high for large complex networks. We propose to use the random walk based method. We show theoretically and by numerical experiments that for large networks the random walk method finds good quality top lists of nodes with high probability and with computational savings of orders of magnitude. We also propose stopping criteria for the random walk method which requires very little knowledge about the structure of the network.
△ Less
Submitted 15 February, 2012;
originally announced February 2012.
-
Characterizing Continuous Time Random Walks on Time Varying Graphs
Authors:
Daniel Figueiredo,
Philippe Nain,
Bruno Ribeiro,
Edmundo de Souza e Silva,
Don Towsley
Abstract:
In this paper we study the behavior of a continuous time random walk (CTRW) on a stationary and ergodic time varying dynamic graph. We establish conditions under which the CTRW is a stationary and ergodic process. In general, the stationary distribution of the walker depends on the walker rate and is difficult to characterize. However, we characterize the stationary distribution in the following c…
▽ More
In this paper we study the behavior of a continuous time random walk (CTRW) on a stationary and ergodic time varying dynamic graph. We establish conditions under which the CTRW is a stationary and ergodic process. In general, the stationary distribution of the walker depends on the walker rate and is difficult to characterize. However, we characterize the stationary distribution in the following cases: i) the walker rate is significantly larger or smaller than the rate in which the graph changes (time-scale separation), ii) the walker rate is proportional to the degree of the node that it resides on (coupled dynamics), and iii) the degrees of node belonging to the same connected component are identical (structural constraints). We provide examples that illustrate our theoretical findings.
△ Less
Submitted 2 December, 2012; v1 submitted 24 December, 2011;
originally announced December 2011.