-
Breaking the Code: Multi-level Learning in the Eurovision Song Contest
Authors:
Luís A. Nunes Amaral,
Arthur Capozzi,
Dirk Helbing
Abstract:
Organizations learn from the market, political, and societal responses to their actions. While in some cases both the actions and responses take place in an open manner, in many others, some aspects may be hidden from external observers. The Eurovision Song Contest offers an interesting example to study organizational level learning at two levels: organizers and participants. We find evidence for…
▽ More
Organizations learn from the market, political, and societal responses to their actions. While in some cases both the actions and responses take place in an open manner, in many others, some aspects may be hidden from external observers. The Eurovision Song Contest offers an interesting example to study organizational level learning at two levels: organizers and participants. We find evidence for changes in the rules of the Contest in response to undesired outcomes such as runaway winners. We also find strong evidence of participant learning in the characteristics of competing songs over the 70-years of the Contest. English has been adopted as the lingua franca of the competing songs and pop has become the standard genre. Number of words of lyrics has also grown in response to this collective learning. Remarkably, we find evidence that four participating countries have chosen to ignore the "lesson" that English lyrics increase winning probability. This choice is consistent with utility functions that award greater value to featuring national language than to winning the Contest. Indeed, we find evidence that some countries -- but not Germany -- appear to be less susceptible to "peer" pressure. These observations appear to be valid beyond Eurovision.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Widespread misidentification of SEM instruments in the peer-reviewed materials science and engineering literature
Authors:
Reese AK Richardson,
Jeonghyun Moon,
Spencer S Hong,
Luís A Nunes Amaral
Abstract:
Removed per arXiv policy. Please see version at https://doi.org/10.31219/osf.io/4wqcr
Removed per arXiv policy. Please see version at https://doi.org/10.31219/osf.io/4wqcr
△ Less
Submitted 27 August, 2024;
originally announced September 2024.
-
A new approach for extracting information from protein dynamics
Authors:
Jenny Liu,
Sinan Keten,
Luis A. N. Amaral
Abstract:
Increased ability to predict protein structures is moving research focus towards understanding protein dynamics. A promising approach is to represent protein dynamics through networks and take advantage of well-developed methods from network science. Most studies build protein dynamics networks from correlation measures, an approach that only works under very specific conditions, instead of the mo…
▽ More
Increased ability to predict protein structures is moving research focus towards understanding protein dynamics. A promising approach is to represent protein dynamics through networks and take advantage of well-developed methods from network science. Most studies build protein dynamics networks from correlation measures, an approach that only works under very specific conditions, instead of the more robust inverse approach. Thus, we apply the inverse approach to the dynamics of protein dihedral angles, a system of internal coordinates, to avoid structural alignment. Using the well-characterized adhesion protein, FimH, we show that our method identifies networks that are physically interpretable, robust, and relevant to the allosteric pathway sites. We further use our approach to detect dynamical differences, despite structural similarity, for Siglec-8 in the immune system, and the SARS-CoV-2 spike protein. Our study demonstrates that using the inverse approach to extract a network from protein dynamics yields important biophysical insights.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Spreader events and the limitations of projected networks for capturing dynamics on multipartite networks
Authors:
Hyojun A. Lee,
Luiz G. A. Alves,
Luís A. Nunes Amaral
Abstract:
Many systems of scientific interest can be conceptualized as multipartite networks. Examples include the spread of sexually transmitted infections, scientific collaborations, human friendships, product recommendation systems, and metabolic networks. In practice, these systems are often studied after projection onto a single class of nodes, losing crucial information. Here, we address a significant…
▽ More
Many systems of scientific interest can be conceptualized as multipartite networks. Examples include the spread of sexually transmitted infections, scientific collaborations, human friendships, product recommendation systems, and metabolic networks. In practice, these systems are often studied after projection onto a single class of nodes, losing crucial information. Here, we address a significant knowledge gap by comparing transmission dynamics on temporal multipartite networks and on their time-aggregated unipartite projections to determine the impact of the lost information on our ability to predict the systems' dynamics. We show that the dynamics of transmission models can be dramatically dissimilar on multipartite networks and on their projections at three levels: final outcome, the magnitude of the variability from realization to realization, and overall shape of the temporal trajectory. We find that the ratio of the number of nodes to the number of active edges over the time aggregation scale determines the ability of projected networks to capture the dynamics on the multipartite network. Finally, we explore which properties of a multipartite network are crucial in generating synthetic networks that better reproduce the dynamical behavior observed in real multipartite networks.
△ Less
Submitted 26 February, 2021;
originally announced March 2021.
-
Centrality anomalies in complex networks as a result of model over-simplification
Authors:
Luiz G. A. Alves,
Alberto Aleta,
Francisco A. Rodrigues,
Yamir Moreno,
Luis A. Nunes Amaral
Abstract:
Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophisticated models that include edge directionality and we…
▽ More
Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophisticated models that include edge directionality and weight properties, and multiple layers. Many studies still focus on unweighted undirected description of networks, prompting an essential question: how to identify when a model is simpler than it must be? Here, we argue that the presence of centrality anomalies in complex networks is a result of model over-simplification. Specifically, we investigate the well-known anomaly in betweenness centrality for transportation networks, according to which highly connected nodes are not necessarily the most central. Using a broad class of network models with weights and spatial constraints and four large data sets of transportation networks, we show that the unweighted projection of the structure of these networks can exhibit a significant fraction of anomalous nodes compared to a random null model. However, the weighted projection of these networks, compared with an appropriated null model, significantly reduces the fraction of anomalies observed, suggesting that centrality anomalies are a symptom of model over-simplification. Because lack of information-rich data is a common challenge when dealing with complex networks and can cause anomalies that misestimate the role of nodes in the system, we argue that sufficiently sophisticated models be used when anomalies are detected.
△ Less
Submitted 13 March, 2020; v1 submitted 2 February, 2019;
originally announced February 2019.
-
A new evaluation framework for topic modeling algorithms based on synthetic corpora
Authors:
Hanyu Shi,
Martin Gerlach,
Isabel Diersen,
Doug Downey,
Luis A. N. Amaral
Abstract:
Topic models are in widespread use in natural language processing and beyond. Here, we propose a new framework for the evaluation of probabilistic topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure. The major innovation of our approach is the ability to quantify the agreement between the planted and inferred topic structures by com…
▽ More
Topic models are in widespread use in natural language processing and beyond. Here, we propose a new framework for the evaluation of probabilistic topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure. The major innovation of our approach is the ability to quantify the agreement between the planted and inferred topic structures by comparing the assigned topic labels at the level of the tokens. In experiments, our approach yields novel insights about the relative strengths of topic models as corpus characteristics vary, and the first evidence of an "undetectable phase" for topic models when the planted structure is weak. We also establish the practical relevance of the insights gained for synthetic corpora by predicting the performance of topic modeling algorithms in classification tasks in real-world corpora.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Long-range correlations and fractal dynamics in C. elegans: changes with aging and stress
Authors:
Luiz G. A. Alves,
Peter B. Winter,
Leonardo N. Ferreira,
Renée M. Brielmann,
Richard I. Morimoto,
Luís A. N. Amaral
Abstract:
Reduced motor control is one of the most frequent features associated with aging and disease. Nonlinear and fractal analyses have proved to be useful in investigating human physiological alterations with age and disease. Similar findings have not been established for any of the model organisms typically studied by biologists, though. If the physiology of a simpler model organism displays the same…
▽ More
Reduced motor control is one of the most frequent features associated with aging and disease. Nonlinear and fractal analyses have proved to be useful in investigating human physiological alterations with age and disease. Similar findings have not been established for any of the model organisms typically studied by biologists, though. If the physiology of a simpler model organism displays the same characteristics, this fact would open a new research window on the control mechanisms that organisms use to regulate physiological processes during aging and stress. Here, we use a recently introduced animal tracking technology to simultaneously follow tens of Caenorhabdits elegans for several hours and use tools from fractal physiology to quantitatively evaluate the effects of aging and temperature stress on nematode motility. Similarly to human physiological signals, scaling analysis reveals long-range correlations in numerous motility variables, fractal properties in behavioral shifts, and fluctuation dynamics over a wide range of timescales. These properties change as a result of a superposition of age and stress-related adaptive mechanisms that regulate motility.
△ Less
Submitted 15 August, 2017; v1 submitted 3 May, 2017;
originally announced May 2017.
-
The Distribution of the Asymptotic Number of Citations to Sets of Publications by a Researcher or From an Academic Department Are Consistent With a Discrete Lognormal Model
Authors:
João A. G. Moreira,
Xiao Han T. Zeng,
Luís A. Nunes Amaral
Abstract:
How to quantify the impact of a researcher's or an institution's body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the h-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, wors…
▽ More
How to quantify the impact of a researcher's or an institution's body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the h-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, worst of all, susceptible to manipulation. Here, we pursue a principled approach to the development of an indicator to quantify the scientific impact of both individual researchers and research institutions grounded on the functional form of the distribution of the asymptotic number of citations. We validate our approach using the publication records of 1,283 researchers from seven scientific and engineering disciplines and the chemistry departments at the 106 U.S. research institutions classified as "very high research activity". Our approach has three distinct advantages. First, it accurately captures the overall scientific impact of researchers at all career stages, as measured by asymptotic citation counts. Second, unlike other measures, our indicator is resistant to manipulation and rewards publication quality over quantity. Third, our approach captures the time-evolution of the scientific impact of research institutions.
△ Less
Submitted 2 November, 2015;
originally announced November 2015.
-
Scaling and optimal synergy: Two principles determining microbial growth in complex media
Authors:
Francesco Alessandro Massucci,
Roger Guimerà,
Luís A. Nunes Amaral,
Marta Sales-Pardo
Abstract:
High-throughput experimental techniques and bioinformatics tools make it possible to obtain reconstructions of the metabolism of microbial species. Combined with mathematical frameworks such as flux balance analysis, which assumes that nutrients are used so as to maximize growth, these reconstructions enable us to predict microbial growth.
Although such predictions are generally accurate, these…
▽ More
High-throughput experimental techniques and bioinformatics tools make it possible to obtain reconstructions of the metabolism of microbial species. Combined with mathematical frameworks such as flux balance analysis, which assumes that nutrients are used so as to maximize growth, these reconstructions enable us to predict microbial growth.
Although such predictions are generally accurate, these approaches do not give insights on how different nutrients are used to produce growth, and thus are difficult to generalize to new media or to different organisms.
Here, we propose a systems-level phenomenological model of metabolism inspired by the virial expansion. Our model predicts biomass production given the nutrient uptakes and a reduced set of parameters, which can be easily determined experimentally. To validate our model, we test it against in silico simulations and experimental measurements of growth, and find good agreement. From a biological point of view, our model uncovers the impact that individual nutrients and the synergistic interaction between nutrient pairs have on growth, and suggests that we can understand the growth maximization principle as the optimization of nutrient synergies.
△ Less
Submitted 6 July, 2015;
originally announced July 2015.
-
A high-reproducibility and high-accuracy method for automated topic classification
Authors:
Andrea Lancichinetti,
M. Irmak Sirer,
Jane X. Wang,
Daniel Acuna,
Konrad Körding,
Luís A. Nunes Amaral
Abstract:
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perf…
▽ More
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.
△ Less
Submitted 3 February, 2014;
originally announced February 2014.
-
Correlations between user voting data, budget, and box office for films in the Internet Movie Database
Authors:
Max Wasserman,
Satyam Mukherjee,
Konner Scott,
Xiao Han T. Zeng,
Filippo Radicchi,
Luís A. N. Amaral
Abstract:
The Internet Movie Database (IMDb) is one of the most-visited websites in the world and the premier source for information on films. Like Wikipedia, much of IMDb's information is user contributed. IMDb also allows users to voice their opinion on the quality of films through voting. We investigate whether there is a connection between this user voting data and certain economic film characteristics.…
▽ More
The Internet Movie Database (IMDb) is one of the most-visited websites in the world and the premier source for information on films. Like Wikipedia, much of IMDb's information is user contributed. IMDb also allows users to voice their opinion on the quality of films through voting. We investigate whether there is a connection between this user voting data and certain economic film characteristics. To this end, we perform distribution and correlation analysis on a set of films chosen to mitigate effects of bias due to the language and country of origin of films. We show that production budget, box office gross, and total number of user votes for films are consistent with double-log normal distributions for certain time periods. Both total gross and user votes are consistent with a double-log normal distribution from the late 1980s onward, while for budget, it extends from 1935 to 1979. In addition, we find a strong correlation between number of user votes and the economic statistics, particularly budget. Remarkably, we find no evidence for a correlation between number of votes and average user rating. As previous studies have found a strong correlation between production budget and marketing expenses, our results suggest that total user votes is an indicator of a film's prominence or notability, which can be quantified by its promotional costs.
△ Less
Submitted 16 January, 2014; v1 submitted 13 December, 2013;
originally announced December 2013.
-
The Possible Role of Resource Requirements and Academic Career-Choice Risk on Gender Differences in Publication Rate and Impact
Authors:
Jordi Duch,
Xiao Han T. Zeng,
Marta Sales-Pardo,
Filippo Radicchi,
Shayna Otis,
Teresa K. Woodruff,
Luis A. Nunes Amaral
Abstract:
Many studies demonstrate that there is still a significant gender bias, especially at higher career levels, in many areas including science, technology, engineering, and mathematics (STEM). We investigated field-dependent, gender-specific effects of the selective pressures individuals experience as they pursue a career in academia within seven STEM disciplines. We built a unique database that comp…
▽ More
Many studies demonstrate that there is still a significant gender bias, especially at higher career levels, in many areas including science, technology, engineering, and mathematics (STEM). We investigated field-dependent, gender-specific effects of the selective pressures individuals experience as they pursue a career in academia within seven STEM disciplines. We built a unique database that comprises 437,787 publications authored by 4,292 faculty members at top United States research universities. Our analyses reveal that gender differences in publication rate and impact are discipline-specific. Our results also support two hypotheses. First, the widely-reported lower publication rates of female faculty are correlated with the amount of research resources typically needed in the discipline considered, and thus may be explained by the lower level of institutional support historically received by females. Second, in disciplines where pursuing an academic position incurs greater career risk, female faculty tend to have a greater fraction of higher impact publications than males. Our findings have significant, field-specific, policy implications for achieving diversity at the faculty level within the STEM disciplines.
△ Less
Submitted 13 December, 2012;
originally announced December 2012.
-
Move-by-move dynamics of the advantage in chess matches reveals population-level learning of the game
Authors:
H. V. Ribeiro,
R. S. Mendes,
E. K. Lenzi,
M. del Castillo-Mussot,
L. A. N. Amaral
Abstract:
The complexity of chess matches has attracted broad interest since its invention. This complexity and the availability of large number of recorded matches make chess an ideal model systems for the study of population-level learning of a complex system. We systematically investigate the move-by-move dynamics of the white player's advantage from over seventy thousand high level chess matches spannin…
▽ More
The complexity of chess matches has attracted broad interest since its invention. This complexity and the availability of large number of recorded matches make chess an ideal model systems for the study of population-level learning of a complex system. We systematically investigate the move-by-move dynamics of the white player's advantage from over seventy thousand high level chess matches spanning over 150 years. We find that the average advantage of the white player is positive and that it has been increasing over time. Currently, the average advantage of the white player is ~0.17 pawns but it is exponentially approaching a value of 0.23 pawns with a characteristic time scale of 67 years. We also study the diffusion of the move dependence of the white player's advantage and find that it is non-Gaussian, has long-ranged anti-correlations and that after an initial period with no diffusion it becomes super-diffusive. We find that the duration of the non-diffusive period, corresponding to the opening stage of a match, is increasing in length and exponentially approaching a value of 15.6 moves with a characteristic time scale of 130 years. We interpret these two trends as a resulting from learning of the features of the game. Additionally, we find that the exponent α characterizing the super-diffusive regime is increasing toward a value of 1.9, close to the ballistic regime. We suggest that this trend is due to the increased broadening of the range of abilities of chess players participating in major tournaments.
△ Less
Submitted 12 December, 2012;
originally announced December 2012.
-
Rationality, irrationality and escalating behavior in lowest unique bid auctions
Authors:
Filippo Radicchi,
Andrea Baronchelli,
Luis A. N. Amaral
Abstract:
Information technology has revolutionized the traditional structure of markets. The removal of geographical and time constraints has fostered the growth of online auction markets, which now include millions of economic agents worldwide and annual transaction volumes in the billions of dollars. Here, we analyze bid histories of a little studied type of online auctions --- lowest unique bid auctions…
▽ More
Information technology has revolutionized the traditional structure of markets. The removal of geographical and time constraints has fostered the growth of online auction markets, which now include millions of economic agents worldwide and annual transaction volumes in the billions of dollars. Here, we analyze bid histories of a little studied type of online auctions --- lowest unique bid auctions. Similarly to what has been reported for foraging animals searching for scarce food, we find that agents adopt Levy flight search strategies in their exploration of "bid space". The Levy regime, which is characterized by a power-law decaying probability distribution of step lengths, holds over nearly three orders of magnitude. We develop a quantitative model for lowest unique bid online auctions that reveals that agents use nearly optimal bidding strategies. However, agents participating in these auctions do not optimize their financial gain. Indeed, as long as there are many auction participants, a rational profit optimizing agent would choose not to participate in these auction markets.
△ Less
Submitted 18 January, 2012; v1 submitted 2 May, 2011;
originally announced May 2011.
-
The role of mentorship in protege performance
Authors:
R. Dean Malmgren,
Julio M. Ottino,
Luis A. N. Amaral
Abstract:
The role of mentorship on protege performance is a matter of importance to academic, business, and governmental organizations. While the benefits of mentorship for proteges, mentors and their organizations are apparent, the extent to which proteges mimic their mentors' career choices and acquire their mentorship skills is unclear. Here, we investigate one aspect of mentor emulation by studying men…
▽ More
The role of mentorship on protege performance is a matter of importance to academic, business, and governmental organizations. While the benefits of mentorship for proteges, mentors and their organizations are apparent, the extent to which proteges mimic their mentors' career choices and acquire their mentorship skills is unclear. Here, we investigate one aspect of mentor emulation by studying mentorship fecundity---the number of proteges a mentor trains---with data from the Mathematics Genealogy Project, which tracks the mentorship record of thousands of mathematicians over several centuries. We demonstrate that fecundity among academic mathematicians is correlated with other measures of academic success. We also find that the average fecundity of mentors remains stable over 60 years of recorded mentorship. We further uncover three significant correlations in mentorship fecundity. First, mentors with small mentorship fecundity train proteges that go on to have a 37% larger than expected mentorship fecundity. Second, in the first third of their career, mentors with large fecundity train proteges that go on to have a 29% larger than expected fecundity. Finally, in the last third of their career, mentors with large fecundity train proteges that go on to have a 31% smaller than expected fecundity.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.
-
On Universality in Human Correspondence Activity
Authors:
R. Dean Malmgren,
Daniel B. Stouffer,
Andriana S. L. O. Campanharo,
Luis A. Nunes Amaral
Abstract:
Identifying and modeling patterns of human activity has important ramifications in applications ranging from predicting disease spread to optimizing resource allocation. Because of its relevance and availability, written correspondence provides a powerful proxy for studying human activity. One school of thought is that human correspondence is driven by responses to received correspondence, a vie…
▽ More
Identifying and modeling patterns of human activity has important ramifications in applications ranging from predicting disease spread to optimizing resource allocation. Because of its relevance and availability, written correspondence provides a powerful proxy for studying human activity. One school of thought is that human correspondence is driven by responses to received correspondence, a view that requires distinct response mechanism to explain e-mail and letter correspondence observations. Here, we demonstrate that, like e-mail correspondence, the letter correspondence patterns of 16 writers, performers, politicians, and scientists are well-described by the circadian cycle, task repetition and changing communication needs. We confirm the universality of these mechanisms by properly rescaling letter and e-mail correspondence statistics to reveal their underlying similarity.
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
Micro-bias and macro-performance
Authors:
S. M. D. Seaver,
A. A. Moreira,
M. Sales-Pardo,
R. D. Malmgren,
D. Diermeier,
L. A. N. Amaral
Abstract:
We use agent-based modeling to investigate the effect of conservatism and partisanship on the efficiency with which large populations solve the density classification task--a paradigmatic problem for information aggregation and consensus building. We find that conservative agents enhance the populations' ability to efficiently solve the density classification task despite large levels of noise i…
▽ More
We use agent-based modeling to investigate the effect of conservatism and partisanship on the efficiency with which large populations solve the density classification task--a paradigmatic problem for information aggregation and consensus building. We find that conservative agents enhance the populations' ability to efficiently solve the density classification task despite large levels of noise in the system. In contrast, we find that the presence of even a small fraction of partisans holding the minority position will result in deadlock or a consensus on an incorrect answer. Our results provide a possible explanation for the emergence of conservatism and suggest that even low levels of partisanship can lead to significant social costs.
△ Less
Submitted 28 August, 2009;
originally announced August 2009.
-
Characterizing Individual Communication Patterns
Authors:
R. Dean Malmgren,
Jake M. Hofman,
Luis A. N. Amaral,
Duncan J. Watts
Abstract:
The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a model of individual e-mail communication that is sufficiently rich to capture meaningful variabilit…
▽ More
The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a model of individual e-mail communication that is sufficiently rich to capture meaningful variability across individuals, while remaining simple enough to be interpretable. We show that the model, a cascading non-homogeneous Poisson process, can be formulated as a double-chain hidden Markov model, allowing us to use an efficient inference algorithm to estimate the model parameters from observed data. We then apply this model to two e-mail data sets consisting of 404 and 6,164 users, respectively, that were collected from two universities in different countries and years. We find that the resulting best-estimate parameter distributions for both data sets are surprisingly similar, indicating that at least some features of communication dynamics generalize beyond specific contexts. We also find that variability of individual behavior over time is significantly less than variability across the population, suggesting that individuals can be classified into persistent "types". We conclude that communication patterns may prove useful as an additional class of attribute data, complementing demographic and network data, for user classification and outlier detection--a point that we illustrate with an interpretable clustering of users based on their inferred model parameters.
△ Less
Submitted 1 May, 2009;
originally announced May 2009.
-
A Poissonian explanation for heavy-tails in e-mail communication
Authors:
R. Dean Malmgren,
Daniel B. Stouffer,
Adilson E. Motter,
Luis A. N. Amaral
Abstract:
Patterns of deliberate human activity and behavior are of utmost importance in areas as diverse as disease spread, resource allocation, and emergency response. Because of its widespread availability and use, e-mail correspondence provides an attractive proxy for studying human activity. Recently, it was reported that the probability density for the inter-event time $τ$ between consecutively sent…
▽ More
Patterns of deliberate human activity and behavior are of utmost importance in areas as diverse as disease spread, resource allocation, and emergency response. Because of its widespread availability and use, e-mail correspondence provides an attractive proxy for studying human activity. Recently, it was reported that the probability density for the inter-event time $τ$ between consecutively sent e-mails decays asymptotically as $τ^{-α}$, with $α\approx 1$. The slower than exponential decay of the inter-event time distribution suggests that deliberate human activity is inherently non-Poissonian. Here, we demonstrate that the approximate power-law scaling of the inter-event time distribution is a consequence of circadian and weekly cycles of human activity. We propose a cascading non-homogeneous Poisson process which explicitly integrates these periodic patterns in activity with an individual's tendency to continue participating in an activity. Using standard statistical techniques, we show that our model is consistent with the empirical data. Our findings may also provide insight into the origins of heavy-tailed distributions in other complex systems.
△ Less
Submitted 5 January, 2009;
originally announced January 2009.
-
Detection of node group membership in networks with group overlap
Authors:
Erin N. Sawardecker,
Marta Sales-Pardo,
Luís A. Nunes Amaral
Abstract:
Most networks found in social and biochemical systems have modular structures. An important question prompted by the modularity of these networks is whether nodes can be said to belong to a single group. If they cannot, we would need to consider the role of "overlapping communities." Despite some efforts in this direction, the problem of detecting overlapping groups remains unsolved because ther…
▽ More
Most networks found in social and biochemical systems have modular structures. An important question prompted by the modularity of these networks is whether nodes can be said to belong to a single group. If they cannot, we would need to consider the role of "overlapping communities." Despite some efforts in this direction, the problem of detecting overlapping groups remains unsolved because there is neither a formal definition of overlapping community, nor an ensemble of networks with which to test the performance of group detection algorithms when nodes can belong to more than one group. Here, we introduce an ensemble of networks with overlapping groups. We then apply three group identification methods--modularity maximization, k-clique percolation, and modularity-landscape surveying--to these networks. We find that the modularity-landscape surveying method is the only one able to detect heterogeneities in node memberships, and that those heterogeneities are only detectable when the overlap is small. Surprisingly, we find that the k-clique percolation method is unable to detect node membership for the overlapping case.
△ Less
Submitted 5 December, 2008;
originally announced December 2008.
-
Module identification in bipartite and directed networks
Authors:
R. Guimera,
M. Sales-Pardo,
L. A. N. Amaral
Abstract:
Modularity is one of the most prominent properties of real-world complex networks. Here, we address the issue of module identification in two important classes of networks: bipartite networks and directed unipartite networks. Nodes in bipartite networks are divided into two non-overlapping sets, and the links must have one end node from each set. Directed unipartite networks only have one type o…
▽ More
Modularity is one of the most prominent properties of real-world complex networks. Here, we address the issue of module identification in two important classes of networks: bipartite networks and directed unipartite networks. Nodes in bipartite networks are divided into two non-overlapping sets, and the links must have one end node from each set. Directed unipartite networks only have one type of nodes, but links have an origin and an end. We show that directed unipartite networks can be conviniently represented as bipartite networks for module identification purposes. We report a novel approach especially suited for module detection in bipartite networks, and define a set of random networks that enable us to validate the new approach.
△ Less
Submitted 6 September, 2007; v1 submitted 12 January, 2007;
originally announced January 2007.
-
Classes of complex networks defined by role-to-role connectivity profiles
Authors:
R. Guimera,
M. Sales-Pardo,
L. A. N. Amaral
Abstract:
Interactions between units in phyical, biological, technological, and social systems usually give rise to intrincate networks with non-trivial structure, which critically affects the dynamics and properties of the system. The focus of most current research on complex networks is on global network properties. A caveat of this approach is that the relevance of global properties hinges on the premi…
▽ More
Interactions between units in phyical, biological, technological, and social systems usually give rise to intrincate networks with non-trivial structure, which critically affects the dynamics and properties of the system. The focus of most current research on complex networks is on global network properties. A caveat of this approach is that the relevance of global properties hinges on the premise that networks are homogeneous, whereas most real-world networks have a markedly modular structure. Here, we report that networks with different functions, including the Internet, metabolic, air transportation, and protein interaction networks, have distinct patterns of connections among nodes with different roles, and that, as a consequence, complex networks can be classified into two distinct functional classes based on their link type frequency. Importantly, we demonstrate that the above structural features cannot be captured by means of often studied global properties.
△ Less
Submitted 12 January, 2007;
originally announced January 2007.
-
Log-normal statistics in e-mail communication patterns
Authors:
Daniel B. Stouffer,
R. Dean Malmgren,
Luis A. N. Amaral
Abstract:
Following up on Barabasi's recent letter to Nature [435, 207--211 (2005)], we systematically investigate the time series of e-mail usage for 3,188 users at a university. We focus on two quantities for each user: the time interval between consecutively sent e-mails (interevent time), and the time interval between when a user sends an e-mail and when a recipient sends an e-mail back to the origina…
▽ More
Following up on Barabasi's recent letter to Nature [435, 207--211 (2005)], we systematically investigate the time series of e-mail usage for 3,188 users at a university. We focus on two quantities for each user: the time interval between consecutively sent e-mails (interevent time), and the time interval between when a user sends an e-mail and when a recipient sends an e-mail back to the original sender (waiting time). We perform a standard Bayesian model selection analysis that demonstrates that the interevent times are well-described by a single log-normal while the waiting times are better described by the superposition of two log-normals. Our analysis rejects the possibility that either measure could be described by truncated power-law distributions with exponent $α\simeq 1$. We also critically evaluate the priority queuing model proposed by Barabási to describe the distribution of the waiting times. We show that neither the assumptions nor the predictions of the model are plausible, and conclude that a theoretical description of human e-mail communication patterns remains an open problem.
△ Less
Submitted 2 May, 2006;
originally announced May 2006.
-
Comment on Barabasi, Nature 435, 207 (2005)
Authors:
Daniel B. Stouffer,
R. Dean Malmgren,
Luis A. N. Amaral
Abstract:
In a recent letter, Barabasi claims that the dynamics of a number of human activities are scale-free [1]. He specifically reports that the probability distribution of time intervals tau between consecutive e-mails sent by a single user and time delays for e-mail replies follow a power-law with an exponent -1, and proposes a priority-queuing process as an explanation of the bursty nature of human…
▽ More
In a recent letter, Barabasi claims that the dynamics of a number of human activities are scale-free [1]. He specifically reports that the probability distribution of time intervals tau between consecutive e-mails sent by a single user and time delays for e-mail replies follow a power-law with an exponent -1, and proposes a priority-queuing process as an explanation of the bursty nature of human activity. Here, we quantitatively demonstrate that the reported power-law distributions are solely an artifact of the analysis of the empirical data and that the proposed model is not representative of e-mail communication patterns.
△ Less
Submitted 25 October, 2005;
originally announced October 2005.
-
Scaling Phenomena in the Growth Dynamics of Scientific Output
Authors:
Kaushik Matia,
Luis A. Nunes Amaral,
Marc Luwel,
Henk. F. Moed,
H. Eugene Stanley
Abstract:
We analyze a set of three databases at different levels of aggregation (i) a database of approximately $10^6$ publications of 247 countries in the period between 1980--2001. (ii) A database of 508 academic institutions from European Union (EU) and 408 institutes from USA in the 11 year period between during 1991--2001. (iii) A database comprising of 2330 Flemish authors in the period 1980--2000.…
▽ More
We analyze a set of three databases at different levels of aggregation (i) a database of approximately $10^6$ publications of 247 countries in the period between 1980--2001. (ii) A database of 508 academic institutions from European Union (EU) and 408 institutes from USA in the 11 year period between during 1991--2001. (iii) A database comprising of 2330 Flemish authors in the period 1980--2000. At all levels of aggregation we find that the mean annual growth rates of publications is independent of the number of publications of the various units involved. We also find that the standard deviation of the distribution of annual growth rates decays with the number of publications as a power law with exponent $\approx 0.3$. These findings are consistent with those of recent studies of systems such as the size of R&D funding budgets of countries, the research publication volumes of US universities, and the size of business firms.
△ Less
Submitted 15 February, 2005;
originally announced February 2005.
-
Emergence of Complex Dynamics in a Simple Model of Signaling Networks
Authors:
Luis A. N. Amaral,
Albert Diaz-Guilera,
Andre A. Moreira,
Ary L. Goldberger,
Lewis A. Lipsitz
Abstract:
A variety of physical, social and biological systems generate complex fluctuations with correlations across multiple time scales. In physiologic systems, these long-range correlations are altered with disease and aging. Such correlated fluctuations in living systems have been attributed to the interaction of multiple control systems; however, the mechanisms underlying this behavior remain unknow…
▽ More
A variety of physical, social and biological systems generate complex fluctuations with correlations across multiple time scales. In physiologic systems, these long-range correlations are altered with disease and aging. Such correlated fluctuations in living systems have been attributed to the interaction of multiple control systems; however, the mechanisms underlying this behavior remain unknown. Here, we show that a number of distinct classes of dynamical behaviors, including correlated fluctuations characterized by $1/f$-scaling of their power spectra, can emerge in networks of simple signaling units. We find that under general conditions, complex dynamics can be generated by systems fulfilling two requirements: i) a ``small-world'' topology and ii) the presence of noise. Our findings support two notable conclusions: first, complex physiologic-like signals can be modeled with a minimal set of components; and second, systems fulfilling conditions (i) and (ii) are robust to some degree of degradation, i.e., they will still be able to generate $1/f$-dynamics.
△ Less
Submitted 19 November, 2004;
originally announced November 2004.
-
Scale Invariance in the Nonstationarity of Physiological Signals
Authors:
Pedro Bernaola-Galvan,
Plamen Ch. Ivanov,
Luis A. Nunes Amaral,
Ary L. Goldberger,
H. Eugene Stanley
Abstract:
We introduce a segmentation algorithm to probe temporal organization of heterogeneities in human heartbeat interval time series. We find that the lengths of segments with different local values of heart rates follow a power-law distribution. This scale-invariant structure is not a simple consequence of the long-range correlations present in the data. We also find that the differences in mean hea…
▽ More
We introduce a segmentation algorithm to probe temporal organization of heterogeneities in human heartbeat interval time series. We find that the lengths of segments with different local values of heart rates follow a power-law distribution. This scale-invariant structure is not a simple consequence of the long-range correlations present in the data. We also find that the differences in mean heart rates between consecutive segments display a common functional form, but with different parameters for healthy individuals and for patients with heart failure. This finding may provide information into the way heart rate variability is reduced in cardiac disease.
△ Less
Submitted 21 May, 2000; v1 submitted 17 May, 2000;
originally announced May 2000.