-
A Reconfigurable Relay for Polarization Encoded QKD Networks
Authors:
Jing Wang,
Bernardo A. Huberman
Abstract:
We propose a method for reconfiguring a relay node for polarization encoded quantum key distribution (QKD) networks. The relay can be switched between trusted and untrusted modes to adapt to different network conditions, relay distances, and security requirements. This not only extends the distance over which a QKD network operates but also enables point-to-multipoint (P2MP) network topologies. Th…
▽ More
We propose a method for reconfiguring a relay node for polarization encoded quantum key distribution (QKD) networks. The relay can be switched between trusted and untrusted modes to adapt to different network conditions, relay distances, and security requirements. This not only extends the distance over which a QKD network operates but also enables point-to-multipoint (P2MP) network topologies. The proposed architecture centralizes the expensive and delicate single-photon detectors (SPDs) at the relay node with eased maintenance and cooling while simplifying each user node so that it only needs commercially available devices for low-cost qubit preparation.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
A Bayesian Approach to the Partitioning of Workflows
Authors:
Freddy C. Chua,
Bernardo A. Huberman
Abstract:
When partitioning workflows in realistic scenarios, the knowledge of the processing units is often vague or unknown. A naive approach to addressing this issue is to perform many controlled experiments for different workloads, each consisting of multiple number of trials in order to estimate the mean and variance of the specific workload. Since this controlled experimental approach can be quite cos…
▽ More
When partitioning workflows in realistic scenarios, the knowledge of the processing units is often vague or unknown. A naive approach to addressing this issue is to perform many controlled experiments for different workloads, each consisting of multiple number of trials in order to estimate the mean and variance of the specific workload. Since this controlled experimental approach can be quite costly in terms of time and resources, we propose a variant of the Gibbs Sampling algorithm that uses a sequence of Bayesian inference updates to estimate the processing characteristics of the processing units. Using the inferred characteristics of the processing units, we are able to determine the best way to split a workflow for processing it in parallel with the lowest expected completion time and least variance.
△ Less
Submitted 2 November, 2015;
originally announced November 2015.
-
Partitioning Uncertain Workflows
Authors:
Bernardo A. Huberman,
Freddy C. Chua
Abstract:
It is common practice to partition complex workflows into separate channels in order to speed up their completion times. When this is done within a distributed environment, unavoidable fluctuations make individual realizations depart from the expected average gains. We present a method for breaking any complex workflow into several workloads in such a way that once their outputs are joined, their…
▽ More
It is common practice to partition complex workflows into separate channels in order to speed up their completion times. When this is done within a distributed environment, unavoidable fluctuations make individual realizations depart from the expected average gains. We present a method for breaking any complex workflow into several workloads in such a way that once their outputs are joined, their full completion takes less time and exhibit smaller variance than when running in only one channel. We demonstrate the effectiveness of this method in two different scenarios; the optimization of a convex function and the transmission of a large computer file over the Internet.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.
-
Attention decay in science
Authors:
Pietro Della Briotta Parolo,
Raj Kumar Pan,
Rumi Ghosh,
Bernardo A. Huberman,
Kimmo Kaski,
Santo Fortunato
Abstract:
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typicall…
▽ More
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typically, the citation rate of a paper increases up to a few years after its publication, reaches a peak and then decreases rapidly. This decay can be described by an exponential or a power law behavior, as in ultradiffusive processes, with exponential fitting better than power law for the majority of cases. The decay is also becoming faster over the years, signaling that nowadays papers are forgotten more quickly. However, when time is counted in terms of the number of published papers, the rate of decay of citations is fairly independent of the period considered. This indicates that the attention of scholars depends on the number of published items, and not on real time.
△ Less
Submitted 23 November, 2015; v1 submitted 6 March, 2015;
originally announced March 2015.
-
Deciding what to display: maximizing the information value of social media
Authors:
Sandra Servia-RodrÃguez,
Bernardo A. Huberman,
Sitaram Asur
Abstract:
In information-rich environments, the competition for users' attention leads to a flood of content from which people often find hard to sort out the most relevant and useful pieces. Using Twitter as a case study, we applied an attention economy solution to generate the most informative tweets for its users. By considering the novelty and popularity of tweets as objective measures of their relevanc…
▽ More
In information-rich environments, the competition for users' attention leads to a flood of content from which people often find hard to sort out the most relevant and useful pieces. Using Twitter as a case study, we applied an attention economy solution to generate the most informative tweets for its users. By considering the novelty and popularity of tweets as objective measures of their relevance and utility, we used the Huberman-Wu algorithm to automatically select the ones that will receive the most attention in the next time interval. Their predicted popularity was confirmed by using Twitter data collected for a period of 2 months.
△ Less
Submitted 12 November, 2014;
originally announced November 2014.
-
Detecting Flow Anomalies in Distributed Systems
Authors:
Freddy Chong Tat Chua,
Ee-Peng Lim,
Bernardo A. Huberman
Abstract:
Deep within the networks of distributed systems, one often finds anomalies that affect their efficiency and performance. These anomalies are difficult to detect because the distributed systems may not have sufficient sensors to monitor the flow of traffic within the interconnected nodes of the networks. Without early detection and making corrections, these anomalies may aggravate over time and cou…
▽ More
Deep within the networks of distributed systems, one often finds anomalies that affect their efficiency and performance. These anomalies are difficult to detect because the distributed systems may not have sufficient sensors to monitor the flow of traffic within the interconnected nodes of the networks. Without early detection and making corrections, these anomalies may aggravate over time and could possibly cause disastrous outcomes in the system in the unforeseeable future. Using only coarse-grained information from the two end points of network flows, we propose a network transmission model and a localization algorithm, to detect the location of anomalies and rank them using a proposed metric within distributed systems. We evaluate our approach on passengers' records of an urbanized city's public transportation system and correlate our findings with passengers' postings on social media microblogs. Our experiments show that the metric derived using our localization algorithm gives a better ranking of anomalies as compared to standard deviation measures from statistical models. Our case studies also demonstrate that transportation events reported in social media microblogs matches the locations of our detect anomalies, suggesting that our algorithm performs well in locating the anomalies within distributed systems.
△ Less
Submitted 8 December, 2014; v1 submitted 22 July, 2014;
originally announced July 2014.
-
Dynamics of Trends and Attention in Chinese Social Media
Authors:
Louis Lei Yu,
Sitaram Asur,
Bernardo A. Huberman
Abstract:
There has been a tremendous rise in the growth of online social networks all over the world in recent years. It has facilitated users to generate a large amount of real-time content at an incessant rate, all competing with each other to attract enough attention and become popular trends. While Western online social networks such as Twitter have been well studied, the popular Chinese microblogging…
▽ More
There has been a tremendous rise in the growth of online social networks all over the world in recent years. It has facilitated users to generate a large amount of real-time content at an incessant rate, all competing with each other to attract enough attention and become popular trends. While Western online social networks such as Twitter have been well studied, the popular Chinese microblogging network Sina Weibo has had relatively lower exposure. In this paper, we analyze in detail the temporal aspect of trends and trend-setters in Sina Weibo, contrasting it with earlier observations in Twitter. We find that there is a vast difference in the content shared in China when compared to a global social network such as Twitter. In China, the trends are created almost entirely due to the retweets of media content such as jokes, images and videos, unlike Twitter where it has been shown that the trends tend to have more to do with current global events and news stories. We take a detailed look at the formation, persistence and decay of trends and examine the key topics that trend in Sina Weibo. One of our key findings is that retweets are much more common in Sina Weibo and contribute a lot to creating trends. When we look closer, we observe that most trends in Sina Weibo are due to the continuous retweets of a small percentage of fraudulent accounts. These fake accounts are set up to artificially inflate certain posts, causing them to shoot up into Sina Weibo's trending list, which are in turn displayed as the most popular topics to users.
△ Less
Submitted 2 December, 2013;
originally announced December 2013.
-
Semantic Stability in Social Tagging Streams
Authors:
Claudia Wagner,
Philipp Singer,
Markus Strohmaier,
Bernardo A. Huberman
Abstract:
One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag…
▽ More
One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag them. At the same time, previous work has raised an array of new questions such as: (i) How can we assess the semantic stability of social tagging systems in a robust and methodical way? (ii) Does semantic stabilization of tags vary across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language streams alone.
△ Less
Submitted 5 November, 2013;
originally announced November 2013.
-
Information Relaxation is Ultradiffusive
Authors:
Rumi Ghosh,
Bernardo A. Huberman
Abstract:
We investigate how the overall response to a piece of information (a story or an article) evolves and relaxes as a function of time in social networks like Reddit, Digg and Youtube. This response or popularity is measured in terms of the number of votes/comments that the story (or article) accrued over time. We find that the temporal evolution of popularity can be described by a universal function…
▽ More
We investigate how the overall response to a piece of information (a story or an article) evolves and relaxes as a function of time in social networks like Reddit, Digg and Youtube. This response or popularity is measured in terms of the number of votes/comments that the story (or article) accrued over time. We find that the temporal evolution of popularity can be described by a universal function whose parameters depend upon the system under consideration. Unlike most previous studies, which empirically investigated the dynamics of voting behavior, we also give a theoretical interpretation of the observed behavior using ultradiffusion.
Whether it is the inter-arrival time between two consecutive votes on a story on Reddit or the comments on a video shared on Youtube, there is always a hierarchy of time scales in information propagation. One vote/comment might occur almost simultaneously with the previous, whereas another vote/comment might occur hours after the preceding one. This hierarchy of time scales leads us to believe that the dynamical response of users to information is ultradiffusive in nature. We show that a ultradiffusion based stochastic process can be used to rationalize the observed temporal evolution.
△ Less
Submitted 19 March, 2014; v1 submitted 9 October, 2013;
originally announced October 2013.
-
How Random are Online Social Interactions?
Authors:
Chunyan Wang,
Bernardo A. Huberman
Abstract:
The massive amounts of data that social media generates has facilitated the study of online human behavior on a scale unimaginable a few years ago. At the same time, the much discussed apparent randomness with which people interact online makes it appear as if these studies cannot reveal predictive social behaviors that could be used for developing better platforms and services. We use two large s…
▽ More
The massive amounts of data that social media generates has facilitated the study of online human behavior on a scale unimaginable a few years ago. At the same time, the much discussed apparent randomness with which people interact online makes it appear as if these studies cannot reveal predictive social behaviors that could be used for developing better platforms and services. We use two large social databases to measure the mutual information entropy that both individual and group actions generate as they evolve over time. We show that user's interaction sequences have strong deterministic components, in contrast with existing assumptions and models. In addition, we show that individual interactions are more predictable when users act on their own rather than when attending group activities.
△ Less
Submitted 19 July, 2012; v1 submitted 16 July, 2012;
originally announced July 2012.
-
A Market for Unbiased Private Data: Paying Individuals According to their Privacy Attitudes
Authors:
Christina Aperjis,
Bernardo A. Huberman
Abstract:
Since there is, in principle, no reason why third parties should not pay individuals for the use of their data, we introduce a realistic market that would allow these payments to be made while taking into account the privacy attitude of the participants. And since it is usually important to use unbiased samples to obtain credible statistical results, we examine the properties that such a market sh…
▽ More
Since there is, in principle, no reason why third parties should not pay individuals for the use of their data, we introduce a realistic market that would allow these payments to be made while taking into account the privacy attitude of the participants. And since it is usually important to use unbiased samples to obtain credible statistical results, we examine the properties that such a market should have and suggest a mechanism that compensates those individuals that participate according to their risk attitudes. Equally important, we show that this mechanism also benefits buyers, as they pay less for the data than they would if they compensated all individuals with the same maximum fee that the most concerned ones expect.
△ Less
Submitted 30 April, 2012;
originally announced May 2012.
-
From User Comments to On-line Conversations
Authors:
Chunyan Wang,
Mao Ye,
Bernardo A. Huberman
Abstract:
We present an analysis of user conversations in on-line social media and their evolution over time. We propose a dynamic model that accurately predicts the growth dynamics and structural properties of conversation threads. The model successfully reconciles the differing observations that have been reported in existing studies. By separating artificial factors from user behaviors, we show that ther…
▽ More
We present an analysis of user conversations in on-line social media and their evolution over time. We propose a dynamic model that accurately predicts the growth dynamics and structural properties of conversation threads. The model successfully reconciles the differing observations that have been reported in existing studies. By separating artificial factors from user behaviors, we show that there are actually underlying rules in common for on-line conversations in different social media websites. Results of our model are supported by empirical measurements throughout a number of different social media websites.
△ Less
Submitted 31 March, 2012;
originally announced April 2012.
-
The Pulse of News in Social Media: Forecasting Popularity
Authors:
Roja Bandari,
Sitaram Asur,
Bernardo A. Huberman
Abstract:
News articles are extremely time sensitive by nature. There is also intense competition among news items to propagate as widely as possible. Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. Prior research has dealt with predicting eventual online popularity based on early popularity. It is most desirable, however, to predict the popu…
▽ More
News articles are extremely time sensitive by nature. There is also intense competition among news items to propagate as widely as possible. Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. Prior research has dealt with predicting eventual online popularity based on early popularity. It is most desirable, however, to predict the popularity of items prior to their release, fostering the possibility of appropriate decision making to modify an article and the manner of its publication. In this paper, we construct a multi-dimensional feature space derived from properties of an article and evaluate the efficacy of these features to serve as predictors of online popularity. We examine both regression and classification algorithms and demonstrate that despite randomness in human behavior, it is possible to predict ranges of popularity on twitter with an overall 84% accuracy. Our study also serves to illustrate the differences between traditionally prominent sources and those immensely popular on the social web.
△ Less
Submitted 1 February, 2012;
originally announced February 2012.
-
Artificial Inflation: The True Story of Trends in Sina Weibo
Authors:
Louis Yu,
Sitaram Asur,
Bernardo A. Huberman
Abstract:
There has been a tremendous rise in the growth of online social networks all over the world in recent years. This has facilitated users to generate a large amount of real-time content at an incessant rate, all competing with each other to attract enough attention and become trends. While Western online social networks such as Twitter have been well studied, characteristics of the popular Chinese m…
▽ More
There has been a tremendous rise in the growth of online social networks all over the world in recent years. This has facilitated users to generate a large amount of real-time content at an incessant rate, all competing with each other to attract enough attention and become trends. While Western online social networks such as Twitter have been well studied, characteristics of the popular Chinese microblogging network Sina Weibo have not been. In this paper, we analyze in detail the temporal aspect of trends and trend-setters in Sina Weibo, constrasting it with earlier observations on Twitter. First, we look at the formation, persistence and decay of trends and examine the key topics that trend in Sina Weibo. One of our key findings is that retweets are much more common in Sina Weibo and contribute a lot to creating trends. When we look closer, we observe that a large percentage of trends in Sina Weibo are due to the continuous retweets of a small amount of fraudulent accounts. These fake accounts are set up to artificially inflate certain posts causing them to shoot up into Sina Weibo's trending list, which are in turn displayed as the most popular topics to users.
△ Less
Submitted 1 February, 2012;
originally announced February 2012.
-
Swayed by Friends or by the Crowd?
Authors:
Zeinab Abbassi,
Christina Aperjis,
Bernardo A. Huberman
Abstract:
We have conducted three empirical studies of the effects of friend recommendations and general ratings on how online users make choices. These two components of social influence were investigated through user studies on Mechanical Turk. We find that for a user deciding between two choices an additional rating star has a much larger effect than an additional friend's recommendation on the probabili…
▽ More
We have conducted three empirical studies of the effects of friend recommendations and general ratings on how online users make choices. These two components of social influence were investigated through user studies on Mechanical Turk. We find that for a user deciding between two choices an additional rating star has a much larger effect than an additional friend's recommendation on the probability of selecting an item. Equally important, negative opinions from friends are more influential than positive opinions, and people exhibit more random behavior in their choices when the decision involves less cost and risk. Our results can be generalized across different demographics, implying that individuals trade off recommendations from friends and ratings in a similar fashion.
△ Less
Submitted 4 November, 2011; v1 submitted 1 November, 2011;
originally announced November 2011.
-
Long Trend Dynamics in Social Media
Authors:
Chunyan Wang,
Bernardo A. Huberman
Abstract:
A main characteristic of social media is that its diverse content, copiously generated by both standard outlets and general users, constantly competes for the scarce attention of large audiences. Out of this flood of information some topics manage to get enough attention to become the most popular ones and thus to be prominently displayed as trends. Equally important, some of these trends persist…
▽ More
A main characteristic of social media is that its diverse content, copiously generated by both standard outlets and general users, constantly competes for the scarce attention of large audiences. Out of this flood of information some topics manage to get enough attention to become the most popular ones and thus to be prominently displayed as trends. Equally important, some of these trends persist long enough so as to shape part of the social agenda. How this happens is the focus of this paper. By introducing a stochastic dynamical model that takes into account the user's repeated involvement with given topics, we can predict the distribution of trend durations as well as the thresholds in popularity that lead to their emergence within social media. Detailed measurements of datasets from Twitter confirm the validity of the model and its predictions.
△ Less
Submitted 20 December, 2011; v1 submitted 8 September, 2011;
originally announced September 2011.
-
To Switch or Not To Switch: Understanding Social Influence in Recommender Systems
Authors:
Haiyi Zhu,
Bernardo A. Huberman,
Yarun Luon
Abstract:
We designed and ran an experiment to test how often people's choices are reversed by others' recommendations when facing different levels of confirmation and conformity pressures. In our experiment participants were first asked to provide their preferences between pairs of items. They were then asked to make second choices about the same pairs with knowledge of others' preferences. Our results sho…
▽ More
We designed and ran an experiment to test how often people's choices are reversed by others' recommendations when facing different levels of confirmation and conformity pressures. In our experiment participants were first asked to provide their preferences between pairs of items. They were then asked to make second choices about the same pairs with knowledge of others' preferences. Our results show that others people's opinions significantly sway people's own choices. The influence is stronger when people are required to make their second decision sometime later (22.4%) than immediately (14.1%). Moreover, people are most likely to reverse their choices when facing a moderate number of opposing opinions. Finally, the time people spend making the first decision significantly predicts whether they will reverse their decisions later on, while demographics such as age and gender do not. These results have implications for consumer behavior research as well as online marketing strategies.
△ Less
Submitted 29 August, 2011; v1 submitted 25 August, 2011;
originally announced August 2011.
-
Collective Attention and the Dynamics of Group Deals
Authors:
Mao Ye,
Chunyan Wang,
Christina Aperjis,
Bernardo A. Huberman,
Thomas Sandholm
Abstract:
We present a study of the group purchasing behavior of daily deals in Groupon and LivingSocial and introduce a predictive dynamic model of collective attention for group buying behavior. In our model, the aggregate number of purchases at a given time comprises two types of processes: random discovery and social propagation. We find that these processes are very clearly separated by an inflection p…
▽ More
We present a study of the group purchasing behavior of daily deals in Groupon and LivingSocial and introduce a predictive dynamic model of collective attention for group buying behavior. In our model, the aggregate number of purchases at a given time comprises two types of processes: random discovery and social propagation. We find that these processes are very clearly separated by an inflection point. Using large data sets from both Groupon and LivingSocial we show how the model is able to predict the success of group deals as a function of time. We find that Groupon deals are easier to predict accurately earlier in the deal lifecycle than LivingSocial deals due to the final number of deal purchases saturating quicker. One possible explanation for this is that the incentive to socially propagate a deal is based on an individual threshold in LivingSocial, whereas in Groupon it is based on a collective threshold, which is reached very early. Furthermore, the personal benefit of propagating a deal is also greater in LivingSocial.
△ Less
Submitted 28 November, 2011; v1 submitted 22 July, 2011;
originally announced July 2011.
-
What Trends in Chinese Social Media
Authors:
Louis Yu,
Sitaram Asur,
Bernardo A. Huberman
Abstract:
There has been a tremendous rise in the growth of online social networks all over the world in recent times. While some networks like Twitter and Facebook have been well documented, the popular Chinese microblogging social network Sina Weibo has not been studied. In this work, we examine the key topics that trend on Sina Weibo and contrast them with our observations on Twitter. We find that there…
▽ More
There has been a tremendous rise in the growth of online social networks all over the world in recent times. While some networks like Twitter and Facebook have been well documented, the popular Chinese microblogging social network Sina Weibo has not been studied. In this work, we examine the key topics that trend on Sina Weibo and contrast them with our observations on Twitter. We find that there is a vast difference in the content shared in China, when compared to a global social network such as Twitter. In China, the trends are created almost entirely due to retweets of media content such as jokes, images and videos, whereas on Twitter, the trends tend to have more to do with current global events and news stories.
△ Less
Submitted 18 July, 2011;
originally announced July 2011.
-
Trends in Social Media : Persistence and Decay
Authors:
Sitaram Asur,
Bernardo A. Huberman,
Gabor Szabo,
Chunyan Wang
Abstract:
Social media generates a prodigious wealth of real-time content at an incessant rate. From all the content that people create and share, only a few topics manage to attract enough attention to rise to the top and become temporal trends which are displayed to users. The question of what factors cause the formation and persistence of trends is an important one that has not been answered yet. In this…
▽ More
Social media generates a prodigious wealth of real-time content at an incessant rate. From all the content that people create and share, only a few topics manage to attract enough attention to rise to the top and become temporal trends which are displayed to users. The question of what factors cause the formation and persistence of trends is an important one that has not been answered yet. In this paper, we conduct an intensive study of trending topics on Twitter and provide a theoretical basis for the formation, persistence and decay of trends. We also demonstrate empirically how factors such as user activity and number of followers do not contribute strongly to trend creation and its propagation. In fact, we find that the resonance of the content with the users of the social network plays a major role in causing trends.
△ Less
Submitted 7 February, 2011;
originally announced February 2011.
-
Social Attention and the Provider's Dilemma
Authors:
Christina Aperjis,
Bernardo A. Huberman
Abstract:
While attracting attention is one of the prime goals of content providers, the conversion of that attention into revenue is by no means obvious. Given that most users expect to consume web content for free, a provider with an established audience faces a dilemma. Since the introduction of advertisements or subscription fees will be construed by users as an inconvenience which may lead them to stop…
▽ More
While attracting attention is one of the prime goals of content providers, the conversion of that attention into revenue is by no means obvious. Given that most users expect to consume web content for free, a provider with an established audience faces a dilemma. Since the introduction of advertisements or subscription fees will be construed by users as an inconvenience which may lead them to stop using the site, what should the provider do in order to maximize revenues? We address this question through the lens of adaptation theory, which states that even though a change affects a person's utility initially, as time goes on people tend to adapt and become less aware of past changes. We establish that if the likelihood of continuing to attend to the provider after an increase in inconvenience is log-concave in the magnitude of the increase, then the provider faces a tradeoff between achieving a higher revenue per user sooner and maximizing the number of users in the long term. On the other hand, if the likelihood of continuing to attend to the provider after an increase in inconvenience is log-convex, then it is always optimal for the provider to perform the increase in one step.
△ Less
Submitted 27 September, 2010;
originally announced September 2010.
-
Human Speed-Accuracy Tradeoffs in Search
Authors:
Christina Aperjis,
Bernardo A. Huberman,
Fang Wu
Abstract:
When foraging for information, users face a tradeoff between the accuracy and value of the acquired information and the time spent collecting it, a problem which also surfaces when seeking answers to a question posed to a large community. We empirically study how people behave when facing these conflicting objectives using data from Yahoo Answers, a community driven question-and-answer site. We fi…
▽ More
When foraging for information, users face a tradeoff between the accuracy and value of the acquired information and the time spent collecting it, a problem which also surfaces when seeking answers to a question posed to a large community. We empirically study how people behave when facing these conflicting objectives using data from Yahoo Answers, a community driven question-and-answer site. We first study how users behave when trying to maximize the amount of acquired information while minimizing the waiting time. We find that users are willing to wait longer for an additional answer if they have received a small number of answers. We then assume that users make a sequence of decisions, deciding to wait for an additional answer as long as the quality of the current answer exceeds some threshold. The resulting probability distribution for the number of answers that a question gets is an inverse Gaussian, a fact that is validated by our data.
△ Less
Submitted 27 August, 2010;
originally announced August 2010.
-
Influence and Passivity in Social Media
Authors:
Daniel M. Romero,
Wojciech Galuba,
Sitaram Asur,
Bernardo A. Huberman
Abstract:
The ever-increasing amount of information flowing through Social Media forces the members of these networks to compete for attention and influence by relying on other people to spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for in…
▽ More
The ever-increasing amount of information flowing through Social Media forces the members of these networks to compete for attention and influence by relying on other people to spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We also explicitly demonstrate that high popularity does not necessarily imply high influence and vice-versa.
△ Less
Submitted 6 August, 2010;
originally announced August 2010.
-
Predicting the Future with Social Media
Authors:
Sitaram Asur,
Bernardo A. Huberman
Abstract:
In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these websites remains largely untapped. In this paper, we demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies. We show tha…
▽ More
In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these websites remains largely untapped. In this paper, we demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies. We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors. We further demonstrate how sentiments extracted from Twitter can be further utilized to improve the forecasting power of social media.
△ Less
Submitted 29 March, 2010;
originally announced March 2010.
-
Harvesting Collective Intelligence: Temporal Behavior in Yahoo Answers
Authors:
Christina Aperjis,
Bernardo A. Huberman,
Fang Wu
Abstract:
When harvesting collective intelligence, a user wishes to maximize the accuracy and value of the acquired information without spending too much time collecting it. We empirically study how people behave when facing these conflicting objectives using data from Yahoo Answers, a community driven question-and-answer site. We take two complementary approaches. We first study how users behave when try…
▽ More
When harvesting collective intelligence, a user wishes to maximize the accuracy and value of the acquired information without spending too much time collecting it. We empirically study how people behave when facing these conflicting objectives using data from Yahoo Answers, a community driven question-and-answer site. We take two complementary approaches. We first study how users behave when trying to maximize the amount of the acquired information, while minimizing the waiting time. We identify and quantify how question authors at Yahoo Answers trade off the number of answers they receive and the cost of waiting. We find that users are willing to wait more to obtain an additional answer when they have only received a small number of answers; this implies decreasing marginal returns in the amount of collected information. We also estimate the user's utility function from the data. Our second approach focuses on how users assess the qualities of the individual answers without explicitly considering the cost of waiting. We assume that users make a sequence of decisions, deciding to wait for an additional answer as long as the quality of the current answer exceeds some threshold. Under this model, the probability distribution for the number of answers that a question gets is an inverse Gaussian, which is a Zipf-like distribution. We use the data to validate this conclusion.
△ Less
Submitted 13 January, 2010;
originally announced January 2010.
-
Feedback loops of attention in peer production
Authors:
Fang Wu,
Dennis M. Wilkinson,
Bernardo A. Huberman
Abstract:
A significant percentage of online content is now published and consumed via the mechanism of crowdsourcing. While any user can contribute to these forums, a disproportionately large percentage of the content is submitted by very active and devoted users, whose continuing participation is key to the sites' success. As we show, people's propensity to keep participating increases the more they con…
▽ More
A significant percentage of online content is now published and consumed via the mechanism of crowdsourcing. While any user can contribute to these forums, a disproportionately large percentage of the content is submitted by very active and devoted users, whose continuing participation is key to the sites' success. As we show, people's propensity to keep participating increases the more they contribute, suggesting motivating factors which increase over time. This paper demonstrates that submitters who stop receiving attention tend to stop contributing, while prolific contributors attract an ever increasing number of followers and their attention in a feedback loop. We demonstrate that this mechanism leads to the observed power law in the number of contributions per user and support our assertions by an analysis of hundreds of millions of contributions to top content sharing websites Digg.com and Youtube.com.
△ Less
Submitted 11 May, 2009;
originally announced May 2009.
-
Persistence and Success in the Attention Economy
Authors:
Fang Wu,
Bernardo A. Huberman
Abstract:
A hallmark of the attention economy is the competition for the attention of others. Thus people persistently upload content to social media sites, hoping for the highly unlikely outcome of topping the charts and reaching a wide audience. And yet, an analysis of the production histories and success dynamics of 10 million videos from \texttt{YouTube} revealed that the more frequently an individual…
▽ More
A hallmark of the attention economy is the competition for the attention of others. Thus people persistently upload content to social media sites, hoping for the highly unlikely outcome of topping the charts and reaching a wide audience. And yet, an analysis of the production histories and success dynamics of 10 million videos from \texttt{YouTube} revealed that the more frequently an individual uploads content the less likely it is that it will reach a success threshold. This paradoxical result is further compounded by the fact that the average quality of submissions does increase with the number of uploads, with the likelihood of success less than that of playing a lottery.
△ Less
Submitted 2 April, 2009;
originally announced April 2009.
-
Social networks that matter: Twitter under the microscope
Authors:
Bernardo A. Huberman,
Daniel M. Romero,
Fang Wu
Abstract:
Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people…
▽ More
Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the declared set of friends and followers.
△ Less
Submitted 4 December, 2008;
originally announced December 2008.
-
Predicting the popularity of online content
Authors:
Gabor Szabo,
Bernardo A. Huberman
Abstract:
We present a method for accurately predicting the long time popularity of online content from early measurements of user access. Using two content sharing portals, Youtube and Digg, we show that by modeling the accrual of views and votes on content offered by these services we can predict the long-term dynamics of individual submissions from initial data. In the case of Digg, measuring access to…
▽ More
We present a method for accurately predicting the long time popularity of online content from early measurements of user access. Using two content sharing portals, Youtube and Digg, we show that by modeling the accrual of views and votes on content offered by these services we can predict the long-term dynamics of individual submissions from initial data. In the case of Digg, measuring access to given stories during the first two hours allows us to forecast their popularity 30 days ahead with remarkable accuracy, while downloads of Youtube videos need to be followed for 10 days to attain the same performance. The differing time scales of the predictions are shown to be due to differences in how content is consumed on the two portals: Digg stories quickly become outdated, while Youtube videos are still found long after they are initially submitted to the portal. We show that predictions are more accurate for submissions for which attention decays quickly, whereas predictions for evergreen content will be prone to larger errors.
△ Less
Submitted 4 November, 2008;
originally announced November 2008.
-
Crowdsourcing, Attention and Productivity
Authors:
Bernardo A. Huberman,
Daniel M. Romero,
Fang Wu
Abstract:
The tragedy of the digital commons does not prevent the copious voluntary production of content that one witnesses in the web. We show through an analysis of a massive data set from \texttt{YouTube} that the productivity exhibited in crowdsourcing exhibits a strong positive dependence on attention, measured by the number of downloads. Conversely, a lack of attention leads to a decrease in the nu…
▽ More
The tragedy of the digital commons does not prevent the copious voluntary production of content that one witnesses in the web. We show through an analysis of a massive data set from \texttt{YouTube} that the productivity exhibited in crowdsourcing exhibits a strong positive dependence on attention, measured by the number of downloads. Conversely, a lack of attention leads to a decrease in the number of videos uploaded and the consequent drop in productivity, which in many cases asymptotes to no uploads whatsoever. Moreover, uploaders compare themselves to others when having low productivity and to themselves when exceeding a threshold.
△ Less
Submitted 17 September, 2008;
originally announced September 2008.
-
Novelty and Collective Attention
Authors:
Fang Wu,
Bernardo A. Huberman
Abstract:
The subject of collective attention is central to an information age where millions of people are inundated with daily messages. It is thus of interest to understand how attention to novel items propagates and eventually fades among large populations. We have analyzed the dynamics of collective attention among one million users of an interactive website -- \texttt{digg.com} -- devoted to thousan…
▽ More
The subject of collective attention is central to an information age where millions of people are inundated with daily messages. It is thus of interest to understand how attention to novel items propagates and eventually fades among large populations. We have analyzed the dynamics of collective attention among one million users of an interactive website -- \texttt{digg.com} -- devoted to thousands of novel news stories. The observations can be described by a dynamical model characterized by a single novelty factor. Our measurements indicate that novelty within groups decays with a stretched-exponential law, suggesting the existence of a natural time scale over which attention fades.
△ Less
Submitted 9 April, 2007;
originally announced April 2007.
-
Assessing the Value of Coooperation in Wikipedia
Authors:
Dennis M. Wilkinson,
Bernardo A. Huberman
Abstract:
Since its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certai…
▽ More
Since its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certain overall regularities. We show that the accretion of edits to an article is described by a simple stochastic mechanism, resulting in a heavy tail of highly visible articles with a large number of edits. We also demonstrate a crucial correlation between article quality and number of edits, which validates Wikipedia as a successful collaborative effort.
△ Less
Submitted 23 February, 2007;
originally announced February 2007.
-
Rhythms of social interaction: messaging within a massive online network
Authors:
Scott Golder,
Dennis M. Wilkinson,
Bernardo A. Huberman
Abstract:
We have analyzed the fully-anonymized headers of 362 million messages exchanged by 4.2 million users of Facebook, an online social network of college students, during a 26 month interval. The data reveal a number of strong daily and weekly regularities which provide insights into the time use of college students and their social lives, including seasonal variations. We also examined how factors…
▽ More
We have analyzed the fully-anonymized headers of 362 million messages exchanged by 4.2 million users of Facebook, an online social network of college students, during a 26 month interval. The data reveal a number of strong daily and weekly regularities which provide insights into the time use of college students and their social lives, including seasonal variations. We also examined how factors such as school affiliation and informal online friend lists affect the observed behavior and temporal patterns. Finally, we show that Facebook users appear to be clustered by school with respect to their temporal messaging patterns.
△ Less
Submitted 27 November, 2006;
originally announced November 2006.
-
Ensuring Trust in One Time Exchanges: Solving the QoS Problem
Authors:
Bernardo A. Huberman,
Fang Wu,
Li Zhang
Abstract:
We describe a pricing structure for the provision of IT services that ensures trust without requiring repeated interactions between service providers and users. It does so by offering a pricing structure that elicits truthful reporting of quality of service (QoS) by providers while making them profitable. This mechanism also induces truth-telling on the part of users reserving the service.
We describe a pricing structure for the provision of IT services that ensures trust without requiring repeated interactions between service providers and users. It does so by offering a pricing structure that elicits truthful reporting of quality of service (QoS) by providers while making them profitable. This mechanism also induces truth-telling on the part of users reserving the service.
△ Less
Submitted 8 December, 2005;
originally announced December 2005.
-
Bootstrapping the Long Tail in Peer to Peer Systems
Authors:
Bernardo A. Huberman,
Fang Wu
Abstract:
We describe an efficient incentive mechanism for P2P systems that generates a wide diversity of content offerings while responding adaptively to customer demand. Files are served and paid for through a parimutuel market similar to that commonly used for betting in horse races. An analysis of the performance of such a system shows that there exists an equilibrium with a long tail in the distribut…
▽ More
We describe an efficient incentive mechanism for P2P systems that generates a wide diversity of content offerings while responding adaptively to customer demand. Files are served and paid for through a parimutuel market similar to that commonly used for betting in horse races. An analysis of the performance of such a system shows that there exists an equilibrium with a long tail in the distribution of content offerings, which guarantees the real time provision of any content regardless of its popularity.
△ Less
Submitted 8 December, 2005;
originally announced December 2005.
-
Management Fads, Pedagogies and Soft Technologies
Authors:
Jonathan Bendor,
Bernardo A. Huberman,
Fang Wu
Abstract:
We present a model for the diffusion of management fads and other technologies which lack clear objective evidence about their merits. The choices made by non-Bayesian adopters reflect both their own evaluations and the social influence of their peers. We show, both analytically and computationally, that the dynamics lead to outcomes that appear to be deterministic in spite of being governed by…
▽ More
We present a model for the diffusion of management fads and other technologies which lack clear objective evidence about their merits. The choices made by non-Bayesian adopters reflect both their own evaluations and the social influence of their peers. We show, both analytically and computationally, that the dynamics lead to outcomes that appear to be deterministic in spite of being governed by a stochastic process. In other words, when the objective evidence about a technology is weak, the evolution of this process quickly settles down to a fraction of adopters that is not predetermined. When the objective evidence is strong, the proportion of adopters is determined by the quality of the evidence and the adopters' competence.
△ Less
Submitted 26 September, 2005;
originally announced September 2005.
-
The Dynamics of Viral Marketing
Authors:
Jure Leskovec,
Lada A. Adamic,
Bernardo A. Huberman
Abstract:
We present an analysis of a person-to-person recommendation network, consisting of 4 million people who made 16 million recommendations on half a million products. We observe the propagation of recommendations and the cascade sizes, which we explain by a simple stochastic model. We analyze how user behavior varies within user communities defined by a recommendation network. Product purchases fol…
▽ More
We present an analysis of a person-to-person recommendation network, consisting of 4 million people who made 16 million recommendations on half a million products. We observe the propagation of recommendations and the cascade sizes, which we explain by a simple stochastic model. We analyze how user behavior varies within user communities defined by a recommendation network. Product purchases follow a 'long tail' where a significant share of purchases belongs to rarely sold items. We establish how the recommendation network grows over time and how effective it is from the viewpoint of the sender and receiver of the recommendations. While on average recommendations are not very effective at inducing purchases and do not spread very far, we present a model that successfully identifies communities, product and pricing categories for which viral marketing seems to be very effective.
△ Less
Submitted 20 April, 2007; v1 submitted 5 September, 2005;
originally announced September 2005.
-
Finding Communities of Related Genes
Authors:
Dennis Wilkinson,
Bernardo A. Huberman
Abstract:
We present an automated method of identifying communities of functionally related genes from the biomedical literature. These communities encapsulate human gene and protein interactions and identify groups of genes that are complementary in their function. We use graphs to represent the network of gene cooccurrences in articles mentioning particular keywords, and find that these graphs consist o…
▽ More
We present an automated method of identifying communities of functionally related genes from the biomedical literature. These communities encapsulate human gene and protein interactions and identify groups of genes that are complementary in their function. We use graphs to represent the network of gene cooccurrences in articles mentioning particular keywords, and find that these graphs consist of one giant connected component and many small ones. In addition, the vertex degree distribution of the graphs follows a power law, whose exponent we determine. We then use an algorithm based on betweenness centrality to identify community structures within the giant component. The different structures are then aggregated into a final list of communities, whose members are weighted according to how strongly they belong to them. Our method is efficient enough to be applicable to the entire Medline database, and yet the information it extracts is significantly detailed, applicable to a particular problem, and interesting in and of itself. We illustrate the method in the case of colon cancer and demonstrate important features of the resulting communities.
△ Less
Submitted 7 October, 2002;
originally announced October 2002.
-
Intentional Walks on Scale Free Small Worlds
Authors:
Amit R Puniyani,
Rajan M Lukose,
Bernardo A Huberman
Abstract:
We present a novel algorithm that generates scale free small world graphs such as those found in the World Wide Web,social and metabolic networks. We use the generated graphs to study the dynamics of a realistic search strategy on the graphs, and find that they can be navigated in a very short number of steps.
We present a novel algorithm that generates scale free small world graphs such as those found in the World Wide Web,social and metabolic networks. We use the generated graphs to study the dynamics of a realistic search strategy on the graphs, and find that they can be navigated in a very short number of steps.
△ Less
Submitted 11 July, 2001;
originally announced July 2001.