-
Experience versus Talent Shapes the Structure of the Web
Authors:
Joseph S. Kong,
Nima Sarshar,
Vwani P. Roychowdhury
Abstract:
We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as mea…
▽ More
We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as measured by the likelihood that someone visiting the page would give a hyperlink to it), and the continual high rates of birth and death of pages on the web. We find that the web is conservative in judging talent and the overall fitness distribution is exponential, showing low variability. The small variance in talent, however, is enough to lead to experience distributions with high variance: The preferential attachment mechanism amplifies these small biases and leads to heavy-tailed power-law (PL) inbound degree distributions over all pages, as well as over pages that are of the same age. The balancing act between experience and talent on the web allows newly introduced pages with novel and interesting content to grow quickly and surpass older pages. In this regard, it is much like what we observe in high-mobility and meritocratic societies: People with entitlement continue to have access to the best resources, but there is just enough screening for fitness that allows for talented winners to emerge and join the ranks of the leaders. Finally, we show that the fitness estimates have potential practical applications in ranking query results.
△ Less
Submitted 2 January, 2009;
originally announced January 2009.
-
A Symphony Conducted by Brunet
Authors:
P. Oscar Boykin,
Jesse S. A. Bridgewater,
Joseph S. Kong,
Kamen M. Lozev,
Behnam A. Rezaei,
Vwani P. Roychowdhury
Abstract:
We introduce BruNet, a general P2P software framework which we use to produce the first implementation of Symphony, a 1-D Kleinberg small-world architecture. Our framework is designed to easily implement and measure different P2P protocols over different transport layers such as TCP or UDP. This paper discusses our implementation of the Symphony network, which allows each node to keep…
▽ More
We introduce BruNet, a general P2P software framework which we use to produce the first implementation of Symphony, a 1-D Kleinberg small-world architecture. Our framework is designed to easily implement and measure different P2P protocols over different transport layers such as TCP or UDP. This paper discusses our implementation of the Symphony network, which allows each node to keep $k \le \log N$ shortcut connections and to route to any other node with a short average delay of $O(\frac{1}{k}\log^2 N)$. %This provides a continuous trade-off between node degree and routing latency. We present experimental results taken from several PlanetLab deployments of size up to 1060 nodes. These succes sful deployments represent some of the largest PlanetLab deployments of P2P overlays found in the literature, and show our implementation's robustness to massive node dynamics in a WAN environment.
△ Less
Submitted 25 September, 2007;
originally announced September 2007.
-
A General Framework for Scalability and Performance Analysis of DHT Routing Systems
Authors:
Joseph S. Kong,
Jesse S. A. Bridgewater,
Vwani P. Roychowdhury
Abstract:
In recent years, many DHT-based P2P systems have been proposed, analyzed, and certain deployments have reached a global scale with nearly one million nodes. One is thus faced with the question of which particular DHT system to choose, and whether some are inherently more robust and scalable.
Toward developing such a comparative framework, we present the reachable component method (RCM) for anal…
▽ More
In recent years, many DHT-based P2P systems have been proposed, analyzed, and certain deployments have reached a global scale with nearly one million nodes. One is thus faced with the question of which particular DHT system to choose, and whether some are inherently more robust and scalable.
Toward developing such a comparative framework, we present the reachable component method (RCM) for analyzing the performance of different DHT routing systems subject to random failures. We apply RCM to five DHT systems and obtain analytical expressions that characterize their routability as a continuous function of system size and node failure probability. An important consequence is that in the large-network limit, the routability of certain DHT systems go to zero for any non-zero probability of node failure. These DHT routing algorithms are therefore unscalable, while some others, including Kademlia, which powers the popular eDonkey P2P system, are found to be scalable.
△ Less
Submitted 28 March, 2006;
originally announced March 2006.
-
Let Your CyberAlter Ego Share Information and Manage Spam
Authors:
Joseph S. Kong,
P. Oscar Boykin,
Behnam A. Rezaei,
Nima Sarshar,
Vwani P. Roychowdhury
Abstract:
Almost all of us have multiple cyberspace identities, and these {\em cyber}alter egos are networked together to form a vast cyberspace social network. This network is distinct from the world-wide-web (WWW), which is being queried and mined to the tune of billions of dollars everyday, and until recently, has gone largely unexplored. Empirically, the cyberspace social networks have been found to p…
▽ More
Almost all of us have multiple cyberspace identities, and these {\em cyber}alter egos are networked together to form a vast cyberspace social network. This network is distinct from the world-wide-web (WWW), which is being queried and mined to the tune of billions of dollars everyday, and until recently, has gone largely unexplored. Empirically, the cyberspace social networks have been found to possess many of the same complex features that characterize its real counterparts, including scale-free degree distributions, low diameter, and extensive connectivity. We show that these topological features make the latent networks particularly suitable for explorations and management via local-only messaging protocols. {\em Cyber}alter egos can communicate via their direct links (i.e., using only their own address books) and set up a highly decentralized and scalable message passing network that can allow large-scale sharing of information and data. As one particular example of such collaborative systems, we provide a design of a spam filtering system, and our large-scale simulations show that the system achieves a spam detection rate close to 100%, while the false positive rate is kept around zero. This system has several advantages over other recent proposals (i) It uses an already existing network, created by the same social dynamics that govern our daily lives, and no dedicated peer-to-peer (P2P) systems or centralized server-based systems need be constructed; (ii) It utilizes a percolation search algorithm that makes the query-generated traffic scalable; (iii) The network has a built in trust system (just as in social networks) that can be used to thwart malicious attacks; iv) It can be implemented right now as a plugin to popular email programs, such as MS Outlook, Eudora, and Sendmail.
△ Less
Submitted 7 May, 2005; v1 submitted 4 April, 2005;
originally announced April 2005.