Search | arXiv e-print repository

Towards Optimal Strategy for Adaptive Probing in Incomplete Networks

Authors: Tri P. Nguyen, Hung T. Nguyen, Thang N. Dinh

Abstract: We investigate a graph probing problem in which an agent has only an incomplete view $G' \subsetneq G$ of the network and wishes to explore the network with least effort. In each step, the agent selects a node $u$ in $G'$ to probe. After probing $u$, the agent gains the information about $u$ and its neighbors. All the neighbors of $u$ become \emph{observed} and are \emph{probable} in the subsequen… ▽ More We investigate a graph probing problem in which an agent has only an incomplete view $G' \subsetneq G$ of the network and wishes to explore the network with least effort. In each step, the agent selects a node $u$ in $G'$ to probe. After probing $u$, the agent gains the information about $u$ and its neighbors. All the neighbors of $u$ become \emph{observed} and are \emph{probable} in the subsequent steps (if they have not been probed). What is the best probing strategy to maximize the number of nodes explored in $k$ probes? This problem serves as a fundamental component for other decision-making problems in incomplete networks such as information harvesting in social networks, network crawling, network security, and viral marketing with incomplete information. While there are a few methods proposed for the problem, none can perform consistently well across different network types. In this paper, we establish a strong (in)approximability for the problem, proving that no algorithm can guarantees finite approximation ratio unless P=NP. On the bright side, we design learning frameworks to capture the best probing strategies for individual network. Our extensive experiments suggest that our framework can learn efficient probing strategies that \emph{consistently} outperform previous heuristics and metric-based approaches. △ Less

Submitted 5 February, 2017; originally announced February 2017.

arXiv:1702.01451 [pdf, other]

Transitivity Demolition and the Falls of Social Networks

Authors: Hung T. Nguyen, Nam P. Nguyen, Tam Vu, Huan X. Hoang, Thang N. Dinh

Abstract: In this paper, we study crucial elements of a complex network, namely its nodes and connections, which play a key role in maintaining the network's structure and function under unexpected structural perturbations of nodes and edges removal. Specifically, we want to identify vital nodes and edges whose failure (either random or intentional) will break the most number of connected triples (or triang… ▽ More In this paper, we study crucial elements of a complex network, namely its nodes and connections, which play a key role in maintaining the network's structure and function under unexpected structural perturbations of nodes and edges removal. Specifically, we want to identify vital nodes and edges whose failure (either random or intentional) will break the most number of connected triples (or triangles) in the network. This problem is extremely important because connected triples form the foundation of strong connections in many real-world systems, such as mutual relationships in social networks, reliable data transmission in communication networks, and stable routing strategies in mobile networks. Disconnected triples, analog to broken mutual connections, can greatly affect the network's structure and disrupt its normal function, which can further lead to the corruption of the entire system. The analysis of such crucial elements will shed light on key factors behind the resilience and robustness of many complex systems in practice. We formulate the analysis under multiple optimization problems and show their intractability. We next propose efficient approximation algorithms, namely DAK-n and DAK-e, which guarantee an $(1-1/e)$-approximate ratio (compared to the overall optimal solutions) while having the same time complexity as the best triangle counting and listing algorithm on power-law networks. This advantage makes our algorithms scale extremely well even for very large networks. In an application perspective, we perform comprehensive experiments on real social traces with millions of nodes and billions of edges. These empirical experiments indicate that our approaches achieve comparably better results while are up to 100x faster than current state-of-the-art methods. △ Less

Submitted 5 February, 2017; originally announced February 2017.

arXiv:1605.07990 [pdf, other]

Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks

Authors: Hung T. Nguyen, My T. Thai, Thang N. Dinh

Abstract: Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisf… ▽ More Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within complex systems. Despite the huge amount of effort, IM in billion-scale networks such as Facebook, Twitter, and World Wide Web has not been satisfactorily solved. Even the state-of-the-art methods such as TIM+ and IMM may take days on those networks. In this paper, we propose SSA and D-SSA, two novel sampling frameworks for IM-based viral marketing problems. SSA and D-SSA are up to 1200 times faster than the SIGMOD'15 best method, IMM, while providing the same $(1-1/e-ε)$ approximation guarantee. Underlying our frameworks is an innovative Stop-and-Stare strategy in which they stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality. Theoretically, we prove that SSA and D-SSA are the first approximation algorithms that use (asymptotically) minimum numbers of samples, meeting strict theoretical thresholds characterized for IM. The absolute superiority of SSA and D-SSA are confirmed through extensive experiments on real network data for IM and another topic-aware viral marketing problem, named TVM. The source code is available at https://github.com/hungnt55/Stop-and-Stare △ Less

Submitted 22 February, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

Comments: Correct the errors in the proofs for SSA/D-SSA. Update D-SSA to estimate ε(s) instead of δ(s)

arXiv:1602.01016 [pdf, other]

doi 10.1109/ICDM.2015.139

Network Clustering via Maximizing Modularity: Approximation Algorithms and Theoretical Limits

Authors: Thang N. Dinh, Xiang Li, My T. Thai

Abstract: Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the max… ▽ More Many social networks and complex systems are found to be naturally divided into clusters of densely connected nodes, known as community structure (CS). Finding CS is one of fundamental yet challenging topics in network science. One of the most popular classes of methods for this problem is to maximize Newman's modularity. However, there is a little understood on how well we can approximate the maximum modularity as well as the implications of finding community structure with provable guarantees. In this paper, we settle definitely the approximability of modularity clustering, proving that approximating the problem within any (multiplicative) positive factor is intractable, unless P = NP. Yet we propose the first additive approximation algorithm for modularity clustering with a constant factor. Moreover, we provide a rigorous proof that a CS with modularity arbitrary close to maximum modularity QOPT might bear no similarity to the optimal CS of maximum modularity. Thus even when CS with near-optimal modularity are found, other verification methods are needed to confirm the significance of the structure. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: Appeared in IEEE ICDM 2015

arXiv:1108.4034 [pdf, other]

Finding Community Structure with Performance Guarantees in Complex Networks

Authors: Thang N. Dinh, My T. Thai

Abstract: Many networks including social networks, computer networks, and biological networks are found to divide naturally into communities of densely connected individuals. Finding community structure is one of fundamental problems in network science. Since Newman's suggestion of using \emph{modularity} as a measure to qualify the goodness of community structures, many efficient methods to maximize modula… ▽ More Many networks including social networks, computer networks, and biological networks are found to divide naturally into communities of densely connected individuals. Finding community structure is one of fundamental problems in network science. Since Newman's suggestion of using \emph{modularity} as a measure to qualify the goodness of community structures, many efficient methods to maximize modularity have been proposed but without a guarantee of optimality. In this paper, we propose two polynomial-time algorithms to the modularity maximization problem with theoretical performance guarantees. The first algorithm comes with a \emph{priori guarantee} that the modularity of found community structure is within a constant factor of the optimal modularity when the network has the power-law degree distribution. Despite being mainly of theoretical interest, to our best knowledge, this is the first approximation algorithm for finding community structure in networks. In our second algorithm, we propose a \emph{sparse metric}, a substantially faster linear programming method for maximizing modularity and apply a rounding technique based on this sparse metric with a \emph{posteriori approximation guarantee}. Our experiments show that the rounding algorithm returns the optimal solutions in most cases and are very scalable, that is, it can run on a network of a few thousand nodes whereas the LP solution in the literature only ran on a network of at most 235 nodes. △ Less

Submitted 19 August, 2011; originally announced August 2011.

Showing 1–5 of 5 results for author: Dinh, T N