-
What we should learn from pandemic publishing
Authors:
Satyaki Sikdar,
Sara Venturini,
Marie-Laure Charpignon,
Sagar Kumar,
Francesco Rinaldi,
Francesco Tudisco,
Santo Fortunato,
Maimuna S. Majumder
Abstract:
Authors of COVID-19 papers produced during the pandemic were overwhelmingly not subject matter experts. Such a massive inflow of scholars from different expertise areas is both an asset and a potential problem. Domain-informed scientific collaboration is the key to preparing for future crises.
Authors of COVID-19 papers produced during the pandemic were overwhelmingly not subject matter experts. Such a massive inflow of scholars from different expertise areas is both an asset and a potential problem. Domain-informed scientific collaboration is the key to preparing for future crises.
△ Less
Submitted 24 September, 2024;
originally announced October 2024.
-
Modeling the amplification of epidemic spread by individuals exposed to misinformation on social media
Authors:
Matthew R. DeVerna,
Francesco Pierri,
Yong-Yeol Ahn,
Santo Fortunato,
Alessandro Flammini,
Filippo Menczer
Abstract:
Understanding how misinformation affects the spread of disease is crucial for public health, especially given recent research indicating that misinformation can increase vaccine hesitancy and discourage vaccine uptake. However, it is difficult to investigate the interaction between misinformation and epidemic outcomes due to the dearth of data-informed holistic epidemic models. Here, we employ an…
▽ More
Understanding how misinformation affects the spread of disease is crucial for public health, especially given recent research indicating that misinformation can increase vaccine hesitancy and discourage vaccine uptake. However, it is difficult to investigate the interaction between misinformation and epidemic outcomes due to the dearth of data-informed holistic epidemic models. Here, we employ an epidemic model that incorporates a large, mobility-informed physical contact network as well as the distribution of misinformed individuals across counties derived from social media data. The model allows us to simulate various scenarios to understand how epidemic spreading can be affected by misinformation spreading through one particular social media platform. Using this model, we compare a worst-case scenario, in which individuals become misinformed after a single exposure to low-credibility content, to a best-case scenario where the population is highly resilient to misinformation. We estimate the additional portion of the U.S. population that would become infected over the course of the COVID-19 epidemic in the worst-case scenario. This work can provide policymakers with insights about the potential harms of exposure to online vaccine misinformation.
△ Less
Submitted 29 January, 2025; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Iterative embedding and reweighting of complex networks reveals community structure
Authors:
Bianka Kovács,
Sadamori Kojaku,
Gergely Palla,
Santo Fortunato
Abstract:
Graph embeddings learn the structure of networks and represent it in low-dimensional vector spaces. Community structure is one of the features that are recognized and reproduced by embeddings. We show that an iterative procedure, in which a graph is repeatedly embedded and its links are reweighted based on the geometric proximity between the nodes, reinforces intra-community links and weakens inte…
▽ More
Graph embeddings learn the structure of networks and represent it in low-dimensional vector spaces. Community structure is one of the features that are recognized and reproduced by embeddings. We show that an iterative procedure, in which a graph is repeatedly embedded and its links are reweighted based on the geometric proximity between the nodes, reinforces intra-community links and weakens inter-community links, making the clusters of the initial network more visible and more easily detectable. The geometric separation between the communities can become so strong that even a very simple parsing of the links may recover the communities as isolated components with surprisingly high precision. Furthermore, when used as a pre-processing step, our embedding and reweighting procedure can improve the performance of traditional community detection algorithms.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Symmetry breaking in optimal transport networks
Authors:
Siddharth Patwardhan,
Marc Barthelemy,
Sirag Erkol,
Santo Fortunato,
Filippo Radicchi
Abstract:
Despite its importance for practical applications, not much is known about the optimal shape of a network that connects in an efficient way a set of points. This problem can be formulated in terms of a multiplex network with a fast layer embedded in a slow one. To connect a pair of points, one can then use either the fast or slow layer, or both, with a switching cost when going from one layer to t…
▽ More
Despite its importance for practical applications, not much is known about the optimal shape of a network that connects in an efficient way a set of points. This problem can be formulated in terms of a multiplex network with a fast layer embedded in a slow one. To connect a pair of points, one can then use either the fast or slow layer, or both, with a switching cost when going from one layer to the other. We consider here distributions of points in spaces of arbitrary dimension d and search for the fast-layer network of given size that minimizes the average time to reach a central node. We discuss the d = 1 case analytically and the d > 1 case numerically, and show the existence of transitions when we vary the network size, the switching cost and/or the relative speed of the two layers. Surprisingly, there is a transition characterized by a symmetry breaking indicating that it is sometimes better to avoid serving a whole area in order to save on switching costs, at the expense of using more the slow layer. Our findings underscore the importance of considering switching costs while studying optimal network structures, as small variations of the cost can lead to strikingly dissimilar results. Finally, we discuss real-world subways and their efficiency for the cities of Atlanta, Boston, and Toronto. We find that real subways are farther away from the optimal shapes as traffic congestion increases.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Representing the Disciplinary Structure of Physics: A Comparative Evaluation of Graph and Text Embedding Methods
Authors:
Isabel Constantino,
Sadamori Kojaku,
Santo Fortunato,
Yong-Yeol Ahn
Abstract:
Recent advances in machine learning offer new ways to represent and study scholarly works and the space of knowledge. Graph and text embeddings provide a convenient vector representation of scholarly works based on citations and text. Yet, it is unclear whether their representations are consistent or provide different views of the structure of science. Here, we compare graph and text embedding by…
▽ More
Recent advances in machine learning offer new ways to represent and study scholarly works and the space of knowledge. Graph and text embeddings provide a convenient vector representation of scholarly works based on citations and text. Yet, it is unclear whether their representations are consistent or provide different views of the structure of science. Here, we compare graph and text embedding by testing their ability to capture the hierarchical structure of the Physics and Astronomy Classification Scheme (PACS) of papers published by the American Physical Society (APS). We also provide a qualitative comparison of the overall structure of the graph and text embeddings for reference. We find that neural network-based methods outperform traditional methods and graph embedding methods such as node2vec are better than other methods at capturing the PACS structure. Our results call for further investigations into how different contexts of scientific papers are captured by different methods, and how we can combine and leverage such information in an interpretable manner.
△ Less
Submitted 12 February, 2025; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Network community detection via neural embeddings
Authors:
Sadamori Kojaku,
Filippo Radicchi,
Yong-Yeol Ahn,
Santo Fortunato
Abstract:
Recent advances in machine learning research have produced powerful neural graph embedding methods, which learn useful, low-dimensional vector representations of network data. These neural methods for graph embedding excel in graph machine learning tasks and are now widely adopted. However, how and why these methods work -- particularly how network structure gets encoded in the embedding -- remain…
▽ More
Recent advances in machine learning research have produced powerful neural graph embedding methods, which learn useful, low-dimensional vector representations of network data. These neural methods for graph embedding excel in graph machine learning tasks and are now widely adopted. However, how and why these methods work -- particularly how network structure gets encoded in the embedding -- remain largely unexplained. Here, we show that node2vec -- shallow, linear neural network -- encodes communities into separable clusters better than random partitioning down to the information-theoretic detectability limit for the stochastic block models. We show that this is due to the equivalence between the embedding learned by node2vec and the spectral embedding via the eigenvectors of the symmetric normalized Laplacian matrix. Numerical simulations demonstrate that node2vec is capable of learning communities on sparse graphs generated by the stochastic blockmodel, as well as on sparse degree-heterogeneous networks. Our results highlight the features of graph neural networks that enable them to separate communities in embedding space.
△ Less
Submitted 1 November, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Epidemic spreading in group-structured populations
Authors:
Siddharth Patwardhan,
Varun K. Rao,
Santo Fortunato,
Filippo Radicchi
Abstract:
Individuals involved in common group activities/settings -- e.g., college students that are enrolled in the same class and/or live in the same dorm -- are exposed to recurrent contacts of physical proximity. These contacts are known to mediate the spread of an infectious disease, however, it is not obvious how the properties of the spreading process are determined by the structure of and the inter…
▽ More
Individuals involved in common group activities/settings -- e.g., college students that are enrolled in the same class and/or live in the same dorm -- are exposed to recurrent contacts of physical proximity. These contacts are known to mediate the spread of an infectious disease, however, it is not obvious how the properties of the spreading process are determined by the structure of and the interrelation among the group settings that are at the root of those recurrent interactions. Here, we show that reshaping the organization of groups within a population can be used as an effective strategy to decrease the severity of an epidemic. Specifically, we show that when group structures are sufficiently correlated -- e.g., the likelihood for two students living in the same dorm to attend the same class is sufficiently high -- outbreaks are longer but milder than for uncorrelated group structures. Also, we show that the effectiveness of interventions for disease containment increases as the correlation among group structures increases. We demonstrate the practical relevance of our findings by taking advantage of data about housing and attendance of students at the Indiana University campus in Bloomington. By appropriately optimizing the assignment of students to dorms based on their enrollment, we are able to observe a two- to five-fold reduction in the severity of simulated epidemic processes.
△ Less
Submitted 21 October, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Collaboration and topic switches in science
Authors:
Sara Venturini,
Satyaki Sikdar,
Francesco Rinaldi,
Francesco Tudisco,
Santo Fortunato
Abstract:
Collaboration is a key driver of science and innovation. Mainly motivated by the need to leverage different capacities and expertise to solve a scientific problem, collaboration is also an excellent source of information about the future behavior of scholars. In particular, it allows us to infer the likelihood that scientists choose future research directions via the intertwined mechanisms of sele…
▽ More
Collaboration is a key driver of science and innovation. Mainly motivated by the need to leverage different capacities and expertise to solve a scientific problem, collaboration is also an excellent source of information about the future behavior of scholars. In particular, it allows us to infer the likelihood that scientists choose future research directions via the intertwined mechanisms of selection and social influence. Here we thoroughly investigate the interplay between collaboration and topic switches. We find that the probability for a scholar to start working on a new topic increases with the number of previous collaborators, with a pattern showing that the effects of individual collaborators are not independent. The higher the productivity and the impact of authors, the more likely their coworkers will start working on new topics. The average number of coauthors per paper is also inversely related to the topic switch probability, suggesting a dilution of this effect as the number of collaborators increases.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Consistency pays off in science
Authors:
Sirag Erkol,
Satyaki Sikdar,
Filippo Radicchi,
Santo Fortunato
Abstract:
The exponentially growing number of scientific papers stimulates a discussion on the interplay between quantity and quality in science. In particular, one may wonder which publication strategy may offer more chances of success: publishing lots of papers, producing a few hit papers, or something in between. Here we tackle this question by studying the scientific portfolios of Nobel Prize laureates.…
▽ More
The exponentially growing number of scientific papers stimulates a discussion on the interplay between quantity and quality in science. In particular, one may wonder which publication strategy may offer more chances of success: publishing lots of papers, producing a few hit papers, or something in between. Here we tackle this question by studying the scientific portfolios of Nobel Prize laureates. A comparative analysis of different citation-based indicators of individual impact suggests that the best path to success may rely on consistently producing high-quality work. Such a pattern is especially rewarded by a new metric, the $E$-index, which identifies excellence better than state-of-the-art measures.
△ Less
Submitted 11 May, 2023; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Influence Maximization: Divide and Conquer
Authors:
Siddharth Patwardhan,
Filippo Radicchi,
Santo Fortunato
Abstract:
The problem of influence maximization, i.e., finding the set of nodes having maximal influence on a network, is of great importance for several applications. In the past two decades, many heuristic metrics to spot influencers have been proposed. Here, we introduce a framework to boost the performance of any such metric. The framework consists in dividing the network into sectors of influence, and…
▽ More
The problem of influence maximization, i.e., finding the set of nodes having maximal influence on a network, is of great importance for several applications. In the past two decades, many heuristic metrics to spot influencers have been proposed. Here, we introduce a framework to boost the performance of any such metric. The framework consists in dividing the network into sectors of influence, and then selecting the most influential nodes within these sectors. We explore three different methodologies to find sectors in a network: graph partitioning, graph hyperbolic embedding, and community structure. The framework is validated with a systematic analysis of real and synthetic networks. We show that the gain in performance generated by dividing a network into sectors before selecting the influential spreaders increases as the modularity and heterogeneity of the network increase. Also, we show that the division of the network into sectors can be efficiently performed in a time that scales linearly with the network size, thus making the framework applicable to large-scale influence maximization problems.
△ Less
Submitted 6 October, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
20 years of network community detection
Authors:
Santo Fortunato,
M. E. J. Newman
Abstract:
A fundamental technical challenge in the analysis of network data is the automated discovery of communities - groups of nodes that are strongly connected or that share similar features or roles. In this commentary we review progress in the field over the last 20 years.
A fundamental technical challenge in the analysis of network data is the automated discovery of communities - groups of nodes that are strongly connected or that share similar features or roles. In this commentary we review progress in the field over the last 20 years.
△ Less
Submitted 2 August, 2022; v1 submitted 29 July, 2022;
originally announced August 2022.
-
Robustness modularity in complex networks
Authors:
Filipi N. Silva,
Aiiad Albeshri,
Vijey Thayananthan,
Wadee Alhalabi,
Santo Fortunato
Abstract:
A basic question in network community detection is how modular a given network is. This is usually addressed by evaluating the quality of partitions detected in the network. The Girvan-Newman (GN) modularity function is the standard way to make this assessment, but it has a number of drawbacks. Most importantly, it is not clearly interpretable, given that the measure can take relatively large valu…
▽ More
A basic question in network community detection is how modular a given network is. This is usually addressed by evaluating the quality of partitions detected in the network. The Girvan-Newman (GN) modularity function is the standard way to make this assessment, but it has a number of drawbacks. Most importantly, it is not clearly interpretable, given that the measure can take relatively large values on partitions of random networks without communities. Here we propose a new measure based on the concept of robustness: modularity is the probability to find trivial partitions when the structure of the network is randomly perturbed. This concept can be implemented for any clustering algorithm capable of telling when a group structure is absent. Tests on artificial and real graphs reveal that robustness modularity can be used to assess and compare the strength of the community structure of different networks. We also introduce two other quality functions: modularity difference, a suitably normalized version of the GN modularity; information modularity, a measure of distance based on information compression. Both measures are strongly correlated with robustness modularity, and are promising options as well.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Detecting climate teleconnections with Granger causality
Authors:
Filipi N Silva,
Didier A. Vega-Oliveros,
Xiaoran Yan,
Alessandro Flammini,
Filippo Menczer,
Filippo Radicchi,
Ben Kravitz,
Santo Fortunato
Abstract:
Climate system teleconnections are crucial for improving climate predictability, but difficult to quantify. Standard approaches to identify teleconnections are often based on correlations between time series. Here we present a novel method leveraging Granger causality, which can infer/detect relationships between any two fields. We compare teleconnections identified by correlation and Granger caus…
▽ More
Climate system teleconnections are crucial for improving climate predictability, but difficult to quantify. Standard approaches to identify teleconnections are often based on correlations between time series. Here we present a novel method leveraging Granger causality, which can infer/detect relationships between any two fields. We compare teleconnections identified by correlation and Granger causality at different timescales. We find that both Granger causality and correlation consistently recover known seasonal precipitation responses to the sea surface temperature pattern associated with the El Niño Southern Oscillation. Such findings are robust across multiple time resolutions. In addition, we identify candidates for unexplored teleconnection responses.
△ Less
Submitted 28 September, 2021; v1 submitted 16 November, 2020;
originally announced December 2020.
-
Community detection in networks using graph embeddings
Authors:
Aditya Tandon,
Aiiad Albeshri,
Vijey Thayananthan,
Wadee Alhalabi,
Filippo Radicchi,
Santo Fortunato
Abstract:
Graph embedding methods are becoming increasingly popular in the machine learning community, where they are widely used for tasks such as node classification and link prediction. Embedding graphs in geometric spaces should aid the identification of network communities as well, because nodes in the same community should be projected close to each other in the geometric space, where they can be dete…
▽ More
Graph embedding methods are becoming increasingly popular in the machine learning community, where they are widely used for tasks such as node classification and link prediction. Embedding graphs in geometric spaces should aid the identification of network communities as well, because nodes in the same community should be projected close to each other in the geometric space, where they can be detected via standard data clustering algorithms. In this paper, we test the ability of several graph embedding techniques to detect communities on benchmark graphs. We compare their performance against that of traditional community detection algorithms. We find that the performance is comparable, if the parameters of the embedding techniques are suitably chosen. However, the optimal parameter set varies with the specific features of the benchmark graphs, like their size, whereas popular community detection algorithms do not require any parameter. So it is not possible to indicate beforehand good parameter sets for the analysis of real networks. This finding, along with the high computational cost of embedding a network and grouping the points, suggests that, for community detection, current embedding techniques do not represent an improvement over network clustering algorithms.
△ Less
Submitted 5 March, 2021; v1 submitted 11 September, 2020;
originally announced September 2020.
-
Scientific elite revisited: Patterns of productivity, collaboration, authorship and impact
Authors:
Jichao Li,
Yian Yin,
Santo Fortunato,
Dashun Wang
Abstract:
Throughout history, a relatively small number of individuals have made a profound and lasting impact on science and society. Despite long-standing, multi-disciplinary interests in understanding careers of elite scientists, there have been limited attempts for a quantitative, career-level analysis. Here, we leverage a comprehensive dataset we assembled, allowing us to trace the entire career histor…
▽ More
Throughout history, a relatively small number of individuals have made a profound and lasting impact on science and society. Despite long-standing, multi-disciplinary interests in understanding careers of elite scientists, there have been limited attempts for a quantitative, career-level analysis. Here, we leverage a comprehensive dataset we assembled, allowing us to trace the entire career histories of nearly all Nobel laureates in physics, chemistry, and physiology or medicine over the past century. We find that, although Nobel laureates were energetic producers from the outset, producing works that garner unusually high impact, their careers before winning the prize follow relatively similar patterns as ordinary scientists, being characterized by hot streaks and increasing reliance on collaborations. We also uncovered notable variations along their careers, often associated with the Nobel prize, including shifting coauthorship structure in the prize-winning work, and a significant but temporary dip in the impact of work they produce after winning the Nobel. Together, these results document quantitative patterns governing the careers of scientific elites, offering an empirical basis for a deeper understanding of the hallmarks of exceptional careers in science.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Recency predicts bursts in the evolution of author citations
Authors:
Filipi Nascimento Silva,
Aditya Tandon,
Diego Raphael Amancio,
Alessandro Flammini,
Filippo Menczer,
Staša Milojević,
Santo Fortunato
Abstract:
The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evid…
▽ More
The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evidence of a correlation between the number of citations most recently accrued by an author and the number of citations they receive in the future. Using a simple model where the probability for an author to receive new citations depends only on the number of citations collected in the previous 12-24 months, we are able to reproduce both the citation and burst size distributions of authors across multiple decades.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Fast consensus clustering in complex networks
Authors:
Aditya Tandon,
Aiiad Albeshri,
Vijey Thayananthan,
Wadee Alhalabi,
Santo Fortunato
Abstract:
Algorithms for community detection are usually stochastic, leading to different partitions for different choices of random seeds. Consensus clustering has proven to be an effective technique to derive more stable and accurate partitions than the ones obtained by the direct application of the algorithm. However, the procedure requires the calculation of the consensus matrix, which can be quite dens…
▽ More
Algorithms for community detection are usually stochastic, leading to different partitions for different choices of random seeds. Consensus clustering has proven to be an effective technique to derive more stable and accurate partitions than the ones obtained by the direct application of the algorithm. However, the procedure requires the calculation of the consensus matrix, which can be quite dense if (some of) the clusters of the input partitions are large. Consequently, the complexity can get dangerously close to quadratic, which makes the technique inapplicable on large graphs. Here we present a fast variant of consensus clustering, which calculates the consensus matrix only on the links of the original graph and on a comparable number of additional node pairs, suitably chosen. This brings the complexity down to linear, while the performance remains comparable as the full technique. Therefore, our fast consensus clustering procedure can be applied on networks with millions of nodes and links.
△ Less
Submitted 19 April, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Weight Thresholding on Complex Networks
Authors:
Xiaoran Yan,
Lucas G. S. Jeub,
Alessandro Flammini,
Filippo Radicchi,
Santo Fortunato
Abstract:
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation…
▽ More
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation between topology and weight that characterizes real networks. On the other hand, the behavior of other properties is generally system dependent.
△ Less
Submitted 5 October, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Multiresolution Consensus Clustering in Networks
Authors:
Lucas G. S. Jeub,
Olaf Sporns,
Santo Fortunato
Abstract:
Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based o…
▽ More
Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based on the properties of modularity and, in particular, provides a natural way of avoiding the need to increase the resolution parameter by several orders of magnitude to break a few remaining small communities, necessitating the introduction of ad-hoc limits to the resolution range with standard sampling approaches. Second, we propose a hierarchical consensus clustering procedure, based on a modified modularity, that allows one to construct a hierarchical consensus structure given a set of input partitions. While here we are interested in its application to partitions sampled using multiresolution modularity, this consensus clustering procedure can be applied to the output of any clustering algorithm. As such, we see many potential applications of the individual parts of our multiresolution consensus clustering procedure in addition to using the procedure itself to identify hierarchical structure in networks.
△ Less
Submitted 30 January, 2018; v1 submitted 5 October, 2017;
originally announced October 2017.
-
Psychological and Personality Profiles of Political Extremists
Authors:
Meysam Alizadeh,
Ingmar Weber,
Claudio Cioffi-Revilla,
Santo Fortunato,
Michael Macy
Abstract:
Global recruitment into radical Islamic movements has spurred renewed interest in the appeal of political extremism. Is the appeal a rational response to material conditions or is it the expression of psychological and personality disorders associated with aggressive behavior, intolerance, conspiratorial imagination, and paranoia? Empirical answers using surveys have been limited by lack of access…
▽ More
Global recruitment into radical Islamic movements has spurred renewed interest in the appeal of political extremism. Is the appeal a rational response to material conditions or is it the expression of psychological and personality disorders associated with aggressive behavior, intolerance, conspiratorial imagination, and paranoia? Empirical answers using surveys have been limited by lack of access to extremist groups, while field studies have lacked psychological measures and failed to compare extremists with contrast groups. We revisit the debate over the appeal of extremism in the U.S. context by comparing publicly available Twitter messages written by over 355,000 political extremist followers with messages written by non-extremist U.S. users. Analysis of text-based psychological indicators supports the moral foundation theory which identifies emotion as a critical factor in determining political orientation of individuals. Extremist followers also differ from others in four of the Big Five personality traits.
△ Less
Submitted 1 April, 2017;
originally announced April 2017.
-
Community detection in networks: A user guide
Authors:
Santo Fortunato,
Darko Hric
Abstract:
Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like t…
▽ More
Community detection in networks is one of the most popular topics of modern network science. Communities, or clusters, are usually groups of vertices having higher probability of being connected to each other than to members of other groups, though other patterns are possible. Identifying communities is an ill-defined problem. There are no universal protocols on the fundamental ingredients, like the definition of community itself, nor on other crucial issues, like the validation of algorithms and the comparison of their performances. This has generated a number of confusions and misconceptions, which undermine the progress in the field. We offer a guided tour through the main aspects of the problem. We also point out strengths and weaknesses of popular methods, and give directions to their use.
△ Less
Submitted 3 November, 2016; v1 submitted 30 July, 2016;
originally announced August 2016.
-
The Memory of Science: Inflation, Myopia, and the Knowledge Network
Authors:
Raj K. Pan,
Alexander M. Petersen,
Fabio Pammolli,
Santo Fortunato
Abstract:
Science is a growing system, exhibiting ~4% annual growth in publications and ~1.8% annual growth in the number of references per publication. Combined these trends correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation n…
▽ More
Science is a growing system, exhibiting ~4% annual growth in publications and ~1.8% annual growth in the number of references per publication. Combined these trends correspond to a 12-year doubling period in the total supply of references, thereby challenging traditional methods of evaluating scientific production, from researchers to institutions. Against this background, we analyzed a citation network comprised of 837 million references produced by 32.6 million publications over the period 1965-2012, allowing for a temporal analysis of the `attention economy' in science. Unlike previous studies, we analyzed the entire probability distribution of reference ages - the time difference between a citing and cited paper - thereby capturing previously overlooked trends. Over this half-century period we observe a narrowing range of attention - both classic and recent literature are being cited increasingly less, pointing to the important role of socio-technical processes. To better understand the impact of exponential growth on the underlying knowledge network we develop a network-based model, featuring the redirection of scientific attention via publications' reference lists, and validate the model against several empirical benchmarks. We then use the model to test the causal impact of real paradigm shifts, thereby providing guidance for science policy analysis. In particular, we show how perturbations to the growth rate of scientific output affects the reference age distribution and the functionality of the vast science citation network as an aid for the search & retrieval of knowledge. In order to account for the inflation of science, our study points to the need for a systemic overhaul of the counting methods used to evaluate citation impact - especially in the case of evaluating science careers, which can span several decades and thus several doubling periods.
△ Less
Submitted 19 July, 2016;
originally announced July 2016.
-
Detection of timescales in evolving complex systems
Authors:
Richard K. Darst,
Clara Granell,
Alex Arenas,
Sergio Gómez,
Jari Saramäki,
Santo Fortunato
Abstract:
Most complex systems are intrinsically dynamic in nature. The evolution of a dynamic complex system is typically represented as a sequence of snapshots, where each snapshot describes the configuration of the system at a particular instant of time. Then, one may directly follow how the snapshots evolve in time, or aggregate the snapshots within some time intervals to form representative "slices" of…
▽ More
Most complex systems are intrinsically dynamic in nature. The evolution of a dynamic complex system is typically represented as a sequence of snapshots, where each snapshot describes the configuration of the system at a particular instant of time. Then, one may directly follow how the snapshots evolve in time, or aggregate the snapshots within some time intervals to form representative "slices" of the evolution of the system configuration. This is often done with constant intervals, whose duration is based on arguments on the nature of the system and of its dynamics. A more refined approach would be to consider the rate of activity in the system to perform a separation of timescales. However, an even better alternative would be to define dynamic intervals that match the evolution of the system's configuration. To this end, we propose a method that aims at detecting evolutionary changes in the configuration of a complex system, and generates intervals accordingly. We show that evolutionary timescales can be identified by looking for peaks in the similarity between the sets of events on consecutive time intervals of data. Tests on simple toy models reveal that the technique is able to detect evolutionary timescales of time-varying data both when the evolution is smooth as well as when it changes sharply. This is further corroborated by analyses of several real datasets. Our method is scalable to extremely large datasets and is computationally efficient. This allows a quick, parameter-free detection of multiple timescales in the evolution of a complex system.
△ Less
Submitted 4 April, 2016;
originally announced April 2016.
-
Network structure, metadata and the prediction of missing nodes and annotations
Authors:
Darko Hric,
Tiago P. Peixoto,
Santo Fortunato
Abstract:
The empirical validation of community detection methods is often based on available annotations on the nodes that serve as putative indicators of the large-scale network structure. Most often, the suitability of the annotations as topological descriptors itself is not assessed, and without this it is not possible to ultimately distinguish between actual shortcomings of the community detection algo…
▽ More
The empirical validation of community detection methods is often based on available annotations on the nodes that serve as putative indicators of the large-scale network structure. Most often, the suitability of the annotations as topological descriptors itself is not assessed, and without this it is not possible to ultimately distinguish between actual shortcomings of the community detection algorithms on one hand, and the incompleteness, inaccuracy or structured nature of the data annotations themselves on the other. In this work we present a principled method to access both aspects simultaneously. We construct a joint generative model for the data and metadata, and a nonparametric Bayesian framework to infer its parameters from annotated datasets. We assess the quality of the metadata not according to its direct alignment with the network communities, but rather in its capacity to predict the placement of edges in the network. We also show how this feature can be used to predict the connections to missing nodes when only the metadata is available, as well as missing metadata. By investigating a wide range of datasets, we show that while there are seldom exact agreements between metadata tokens and the inferred data groups, the metadata is often informative of the network structure nevertheless, and can improve the prediction of missing nodes. This shows that the method uncovers meaningful patterns in both the data and metadata, without requiring or expecting a perfect agreement between the two.
△ Less
Submitted 29 September, 2016; v1 submitted 1 April, 2016;
originally announced April 2016.
-
Eigenvector dynamics under perturbation of modular networks
Authors:
Somwrita Sarkar,
Sanjay Chawla,
Peter A. Robinson,
Santo Fortunato
Abstract:
Rotation dynamics of eigenvectors of modular network adjacency matrices under random perturbations are presented. In the presence of $q$ communities, the number of eigenvectors corresponding to the $q$ largest eigenvalues form a "community" eigenspace and rotate together, but separately from that of the "bulk" eigenspace spanned by all the other eigenvectors. Using this property, the number of mod…
▽ More
Rotation dynamics of eigenvectors of modular network adjacency matrices under random perturbations are presented. In the presence of $q$ communities, the number of eigenvectors corresponding to the $q$ largest eigenvalues form a "community" eigenspace and rotate together, but separately from that of the "bulk" eigenspace spanned by all the other eigenvectors. Using this property, the number of modules or clusters in a network can be estimated in an algorithm-independent way. A general argument and derivation for the theoretical detectability limit for sparse modular networks with $q$ communities is presented, beyond which modularity persists in the system but cannot be detected. It is shown that for detecting the clusters or modules using the adjacency matrix, there is a "band" in which it is hard to detect the clusters even before the theoretical detectability limit is reached, and for which the theoretically predicted detectability limit forms the sufficient upper bound. Analytic estimations of these bounds are presented, and empirically demonstrated.
△ Less
Submitted 6 July, 2016; v1 submitted 23 October, 2015;
originally announced October 2015.
-
Detection of gene communities in multi-networks reveals cancer drivers
Authors:
Laura Cantini,
Enzo Medico,
Santo Fortunato,
Michele Caselle
Abstract:
We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein…
▽ More
We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein-protein interactions require a tight coregulation of the partners and that such a fine tuned regulation can be obtained only combining both the transcriptional and post-transcriptional layers of regulation. To extract the relevant biological information from the multi-network we studied its partition into communities. To this end we applied a consensus clustering algorithm based on state of art community detection methods. Even if our procedure is valid in principle for any pathology in this work we concentrate on gastric, lung, pancreas and colorectal cancer and identified from the enrichment analysis of the multi-network communities a set of candidate driver cancer genes. Some of them were already known oncogenes while a few are new. The combination of the different layers of information allowed us to extract from the multi-network indications on the regulatory pattern and functional role of both the already known and the new candidate driver genes.
△ Less
Submitted 9 December, 2015; v1 submitted 30 July, 2015;
originally announced July 2015.
-
Network-based model of the growth of termite nests
Authors:
Young-Ho Eom,
Andrea Perna,
Santo Fortunato,
Eric Darrouzet,
Guy Theraulaz,
Christian Jost
Abstract:
We present a model for the growth of the transportation network inside nests of the social insect subfamily Termitinae (Isoptera, termitidae). These nests consist of large chambers (nodes) connected by tunnels (edges). The model based on the empirical analysis of the real nest networks combined with pruning (edge removal, either random or weighted by betweenness centrality) and a memory effect (pr…
▽ More
We present a model for the growth of the transportation network inside nests of the social insect subfamily Termitinae (Isoptera, termitidae). These nests consist of large chambers (nodes) connected by tunnels (edges). The model based on the empirical analysis of the real nest networks combined with pruning (edge removal, either random or weighted by betweenness centrality) and a memory effect (preferential growth from the latest added chambers) successfully predicts emergent nest properties (degree distribution, size of the largest connected component, average path lengths, backbone link ratios, and local graph redundancy). The two pruning alternatives can be associated with different genuses in the subfamily. A sensitivity analysis on the pruning and memory parameters indicates that Termitinae networks favor fast internal transportation over efficient defense strategies against ant predators. Our results provide an example of how complex network organization and efficient network properties can be generated from simple building rules based on local interactions and contribute to our understanding of the mechanisms that come into play for the formation of termite networks and of biological transportation networks in general.
△ Less
Submitted 10 December, 2015; v1 submitted 16 June, 2015;
originally announced June 2015.
-
Quantifying randomness in real networks
Authors:
Chiara Orsini,
Marija Mitrović Dankulov,
Almerima Jamakovic,
Priya Mahadevan,
Pol Colomer-de-Simón,
Amin Vahdat,
Kevin E. Bassler,
Zoltán Toroczkai,
Marián Boguñá,
Guido Caldarelli,
Santo Fortunato,
Dmitri Krioukov
Abstract:
Represented as graphs, real networks are intricate combinations of order and disorder. Fixing some of the structural properties of network models to their values observed in real networks, many other properties appear as statistical consequences of these fixed observables, plus randomness in other respects. Here we employ the $dk$-series, a complete set of basic characteristics of the network stru…
▽ More
Represented as graphs, real networks are intricate combinations of order and disorder. Fixing some of the structural properties of network models to their values observed in real networks, many other properties appear as statistical consequences of these fixed observables, plus randomness in other respects. Here we employ the $dk$-series, a complete set of basic characteristics of the network structure, to study the statistical dependencies between different network properties. We consider six real networks---the Internet, US airport network, human protein interactions, technosocial web of trust, English word network, and an fMRI map of the human brain---and find that many important local and global structural properties of these networks are closely reproduced by $dk$-random graphs whose degree distributions, degree correlations, and clustering are as in the corresponding real network. We discuss important conceptual, methodological, and practical implications of this evaluation of network randomness, and release software to generate $dk$-random graphs.
△ Less
Submitted 2 December, 2015; v1 submitted 27 May, 2015;
originally announced May 2015.
-
Attention decay in science
Authors:
Pietro Della Briotta Parolo,
Raj Kumar Pan,
Rumi Ghosh,
Bernardo A. Huberman,
Kimmo Kaski,
Santo Fortunato
Abstract:
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typicall…
▽ More
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typically, the citation rate of a paper increases up to a few years after its publication, reaches a peak and then decreases rapidly. This decay can be described by an exponential or a power law behavior, as in ultradiffusive processes, with exponential fitting better than power law for the majority of cases. The decay is also becoming faster over the years, signaling that nowadays papers are forgotten more quickly. However, when time is counted in terms of the number of published papers, the rate of decay of citations is fairly independent of the period considered. This indicates that the attention of scholars depends on the number of published items, and not on real time.
△ Less
Submitted 23 November, 2015; v1 submitted 6 March, 2015;
originally announced March 2015.
-
Benchmark model to assess community structure in evolving networks
Authors:
Clara Granell,
Richard K. Darst,
Alex Arenas,
Santo Fortunato,
Sergio Gómez
Abstract:
Detecting the time evolution of the community structure of networks is crucial to identify major changes in the internal organization of many complex systems, which may undergo important endogenous or exogenous events. This analysis can be done in two ways: considering each snapshot as an independent community detection problem or taking into account the whole evolution of the network. In the firs…
▽ More
Detecting the time evolution of the community structure of networks is crucial to identify major changes in the internal organization of many complex systems, which may undergo important endogenous or exogenous events. This analysis can be done in two ways: considering each snapshot as an independent community detection problem or taking into account the whole evolution of the network. In the first case, one can apply static methods on the temporal snapshots, which correspond to configurations of the system in short time windows, and match afterwards the communities across layers. Alternatively, one can develop dedicated dynamic procedures, so that multiple snapshots are simultaneously taken into account while detecting communities, which allows us to keep memory of the flow. To check how well a method of any kind could capture the evolution of communities, suitable benchmarks are needed. Here we propose a model for generating simple dynamic benchmark graphs, based on stochastic block models. In them, the time evolution consists of a periodic oscillation of the system's structure between configurations with built-in community structure. We also propose the extension of quality comparison indices to the dynamic scenario.
△ Less
Submitted 19 July, 2015; v1 submitted 23 January, 2015;
originally announced January 2015.
-
Triadic closure as a basic generating mechanism of communities in complex networks
Authors:
Ginestra Bianconi,
Richard K. Darst,
Jacopo Iacovacci,
Santo Fortunato
Abstract:
Most of the complex social, technological and biological networks have a significant community structure. Therefore the community structure of complex networks has to be considered as a universal property, together with the much explored small-world and scale-free properties of these networks. Despite the large interest in characterizing the community structures of real networks, not enough attent…
▽ More
Most of the complex social, technological and biological networks have a significant community structure. Therefore the community structure of complex networks has to be considered as a universal property, together with the much explored small-world and scale-free properties of these networks. Despite the large interest in characterizing the community structures of real networks, not enough attention has been devoted to the detection of universal mechanisms able to spontaneously generate networks with communities. Triadic closure is a natural mechanism to make new connections, especially in social networks. Here we show that models of network growth based on simple triadic closure naturally lead to the emergence of community structure, together with fat-tailed distributions of node degree, high clustering coefficients. Communities emerge from the initial stochastic heterogeneity in the concentration of links, followed by a cycle of growth and fragmentation. Communities are the more pronounced, the sparser the graph, and disappear for high values of link density and randomness in the attachment procedure. By introducing a fitness-based link attractivity for the nodes, we find a novel phase transition, where communities disappear for high heterogeneity of the fitness distribution, but a new mesoscopic organization of the nodes emerges, with groups of nodes being shared between just a few superhubs, which attract most of the links of the system.
△ Less
Submitted 1 December, 2014; v1 submitted 7 July, 2014;
originally announced July 2014.
-
Community detection in networks: Structural communities versus ground truth
Authors:
Darko Hric,
Richard K. Darst,
Santo Fortunato
Abstract:
Algorithms to find communities in networks rely just on structural information and search for cohesive subsets of nodes. On the other hand, most scholars implicitly or explicitly assume that structural communities represent groups of nodes with similar (non-topological) properties or functions. This hypothesis could not be verified, so far, because of the lack of network datasets with information…
▽ More
Algorithms to find communities in networks rely just on structural information and search for cohesive subsets of nodes. On the other hand, most scholars implicitly or explicitly assume that structural communities represent groups of nodes with similar (non-topological) properties or functions. This hypothesis could not be verified, so far, because of the lack of network datasets with information on the classification of the nodes. We show that traditional community detection methods fail to find the metadata groups in many large networks. Our results show that there is a marked separation between structural communities and metadata groups, in line with recent findings. That means that either our current modeling of community structure has to be substantially modified, or that metadata groups may not be recoverable from topology alone.
△ Less
Submitted 11 December, 2014; v1 submitted 1 June, 2014;
originally announced June 2014.
-
The Nobel Prize delay
Authors:
Francesco Becattini,
Arnab Chatterjee,
Santo Fortunato,
Marija Mitrović,
Raj Kumar Pan,
Pietro Della Briotta Parolo
Abstract:
The time lag between the publication of a Nobel discovery and the conferment of the prize has been rapidly increasing for all disciplines, especially for Physics. Does this mean that fundamental science is running out of groundbreaking discoveries?
The time lag between the publication of a Nobel discovery and the conferment of the prize has been rapidly increasing for all disciplines, especially for Physics. Does this mean that fundamental science is running out of groundbreaking discoveries?
△ Less
Submitted 28 May, 2014;
originally announced May 2014.
-
Author Impact Factor: tracking the dynamics of individual scientific impact
Authors:
Raj Kumar Pan,
Santo Fortunato
Abstract:
The impact factor (IF) of scientific journals has acquired a major role in the evaluations of the output of scholars, departments and whole institutions. Typically papers appearing in journals with large values of the IF receive a high weight in such evaluations. However, at the end of the day one is interested in assessing the impact of individuals, rather than papers. Here we introduce Author Im…
▽ More
The impact factor (IF) of scientific journals has acquired a major role in the evaluations of the output of scholars, departments and whole institutions. Typically papers appearing in journals with large values of the IF receive a high weight in such evaluations. However, at the end of the day one is interested in assessing the impact of individuals, rather than papers. Here we introduce Author Impact Factor (AIF), which is the extension of the IF to authors. The AIF of an author A in year $t$ is the average number of citations given by papers published in year $t$ to papers published by A in a period of $Δt$ years before year $t$. Due to its intrinsic dynamic character, AIF is capable to capture trends and variations of the impact of the scientific output of scholars in time, unlike the $h$-index, which is a growing measure taking into account the whole career path.
△ Less
Submitted 12 May, 2014; v1 submitted 9 December, 2013;
originally announced December 2013.
-
Improving the performance of algorithms to find communities in networks
Authors:
Richard K. Darst,
Zohar Nussinov,
Santo Fortunato
Abstract:
Many algorithms to detect communities in networks typically work without any information on the cluster structure to be found, as one has no a priori knowledge of it, in general. Not surprisingly, knowing some features of the unknown partition could help its identification, yielding an improvement of the performance of the method. Here we show that, if the number of clusters were known beforehand,…
▽ More
Many algorithms to detect communities in networks typically work without any information on the cluster structure to be found, as one has no a priori knowledge of it, in general. Not surprisingly, knowing some features of the unknown partition could help its identification, yielding an improvement of the performance of the method. Here we show that, if the number of clusters were known beforehand, standard methods, like modularity optimization, would considerably gain in accuracy, mitigating the severe resolution bias that undermines the reliability of the results of the original unconstrained version. The number of clusters can be inferred from the spectra of the recently introduced non-backtracking and flow matrices, even in benchmark graphs with realistic community structure. The limit of such two-step procedure is the overhead of the computation of the spectra.
△ Less
Submitted 1 December, 2014; v1 submitted 15 November, 2013;
originally announced November 2013.
-
On the Predictability of Future Impact in Science
Authors:
Orion Penner,
Raj Kumar Pan,
Alexander M. Petersen,
Kimmo Kaski,
Santo Fortunato
Abstract:
Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By…
▽ More
Correctly assessing a scientist's past research impact and potential for future impact is key in recruitment decisions and other evaluation processes. While a candidate's future impact is the main concern for these decisions, most measures only quantify the impact of previous work. Recently, it has been argued that linear regression models are capable of predicting a scientist's future impact. By applying that future impact model to 762 careers drawn from three disciplines: physics, biology, and mathematics, we identify a number of subtle, but critical, flaws in current models. Specifically, cumulative non-decreasing measures like the h-index contain intrinsic autocorrelation, resulting in significant overestimation of their "predictive power". Moreover, the predictive power of these models depend heavily upon scientists' career age, producing least accurate estimates for young researchers. Our results place in doubt the suitability of such models, and indicate further investigation is required before they can be used in recruiting decisions.
△ Less
Submitted 29 October, 2013; v1 submitted 1 June, 2013;
originally announced June 2013.
-
Editorial: Statistical Mechanics and Social Sciences
Authors:
Santo Fortunato,
Michael Macy,
Sidney Redner
Abstract:
This editorial opens the special issues that the Journal of Statistical Physics has dedicated to the growing field of statistical physics modeling of social dynamics. The issues include contributions from physicists and social scientists, with the goal of fostering a better communication between these two communities.
This editorial opens the special issues that the Journal of Statistical Physics has dedicated to the growing field of statistical physics modeling of social dynamics. The issues include contributions from physicists and social scientists, with the goal of fostering a better communication between these two communities.
△ Less
Submitted 20 April, 2013; v1 submitted 3 April, 2013;
originally announced April 2013.
-
The case for caution in predicting scientists' future impact
Authors:
Orion Penner,
Raj K. Pan,
Alexander M. Petersen,
Santo Fortunato
Abstract:
We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career mea…
▽ More
We stress-test the career predictability model proposed by Acuna et al. [Nature 489, 201-202 2012] by applying their model to a longitudinal career data set of 100 Assistant professors in physics, two from each of the top 50 physics departments in the US. The Acuna model claims to predict h(t+Δt), a scientist's h-index Δt years into the future, using a linear combination of 5 cumulative career measures taken at career age t. Here we investigate how the "predictability" depends on the aggregation of career data across multiple age cohorts. We confirm that the Acuna model does a respectable job of predicting h(t+Δt) up to roughly 6 years into the future when aggregating all age cohorts together. However, when calculated using subsets of specific age cohorts (e.g. using data for only t=3), we find that the model's predictive power significantly decreases, especially when applied to early career years. For young careers, the model does a much worse job of predicting future impact, and hence, exposes a serious limitation. The limitation is particularly concerning as early career decisions make up a significant portion, if not the majority, of cases where quantitative approaches are likely to be applied.
△ Less
Submitted 2 April, 2013;
originally announced April 2013.
-
Reputation and Impact in Academic Careers
Authors:
Alexander M. Petersen,
Santo Fortunato,
Raj K. Pan,
Kimmo Kaski,
Orion Penner,
Armando Rungi,
Massimo Riccaboni,
H. Eugene Stanley,
Fabio Pammolli
Abstract:
Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original…
▽ More
Reputation is an important social construct in science, which enables informed quality assessments of both publications and careers of scientists in the absence of complete systemic information. However, the relation between reputation and career growth of an individual remains poorly understood, despite recent proliferation of quantitative research evaluation methods. Here we develop an original framework for measuring how a publication's citation rate $Δc$ depends on the reputation of its central author $i$, in addition to its net citation count $c$. To estimate the strength of the reputation effect, we perform a longitudinal analysis on the careers of 450 highly-cited scientists, using the total citations $C_{i}$ of each scientist as his/her reputation measure. We find a citation crossover $c_{\times}$ which distinguishes the strength of the reputation effect. For publications with $c < c_{\times}$, the author's reputation is found to dominate the annual citation rate. Hence, a new publication may gain a significant early advantage corresponding to roughly a 66% increase in the citation rate for each tenfold increase in $C_{i}$. However, the reputation effect becomes negligible for highly cited publications meaning that for $c\geq c_{\times}$ the citation rate measures scientific impact more transparently. In addition we have developed a stochastic reputation model, which is found to reproduce numerous statistical observations for real careers, thus providing insight into the microscopic mechanisms underlying cumulative advantage in science.
△ Less
Submitted 7 October, 2014; v1 submitted 28 March, 2013;
originally announced March 2013.
-
Universality in voting behavior: an empirical analysis
Authors:
Arnab Chatterjee,
Marija Mitrović,
Santo Fortunato
Abstract:
Election data represent a precious source of information to study human behavior at a large scale. In proportional elections with open lists, the number of votes received by a candidate, rescaled by the average performance of all competitors in the same party list, has the same distribution regardless of the country and the year of the election. Here we provide the first thorough assessment of thi…
▽ More
Election data represent a precious source of information to study human behavior at a large scale. In proportional elections with open lists, the number of votes received by a candidate, rescaled by the average performance of all competitors in the same party list, has the same distribution regardless of the country and the year of the election. Here we provide the first thorough assessment of this claim. We analyzed election datasets of 15 countries with proportional systems. We confirm that a class of nations with similar election rules fulfill the universality claim. Discrepancies from this trend in other countries with open-lists elections are always associated with peculiar differences in the election rules, which matter more than differences between countries and historical periods. Our analysis shows that the role of parties in the electoral performance of candidates is crucial: alternative scalings not taking into account party affiliations lead to poor results.
△ Less
Submitted 24 January, 2013; v1 submitted 10 December, 2012;
originally announced December 2012.
-
Physics peeks into the ballot box
Authors:
Santo Fortunato,
Claudio Castellano
Abstract:
Electoral results show universal features, such as statistics of candidates' performance and turnout rates, in different countries and over time. Are voters as predictable as atoms?
Electoral results show universal features, such as statistics of candidates' performance and turnout rates, in different countries and over time. Are voters as predictable as atoms?
△ Less
Submitted 8 October, 2012;
originally announced October 2012.
-
World citation and collaboration networks: uncovering the role of geography in science
Authors:
Raj Kumar Pan,
Kimmo Kaski,
Santo Fortunato
Abstract:
Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analy…
▽ More
Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analysis of citation and collaboration networks between cities and countries, by assigning papers to the geographic locations of their authors' affiliations. The citation flows as well as the collaboration strengths between cities decrease with the distance between them and follow gravity laws. In addition, the total research impact of a country grows linearly with the amount of national funding for research & development. However, the average impact reveals a peculiar threshold effect: the scientific output of a country may reach an impact larger than the world average only if the country invests more than about 100,000 USD per researcher annually.
△ Less
Submitted 17 December, 2012; v1 submitted 4 September, 2012;
originally announced September 2012.
-
Consensus clustering in complex networks
Authors:
Andrea Lancichinetti,
Santo Fortunato
Abstract:
The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable…
▽ More
The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics.
△ Less
Submitted 27 March, 2012;
originally announced March 2012.
-
Characterizing and modeling citation dynamics
Authors:
Young-Ho Eom,
Santo Fortunato
Abstract:
Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simp…
▽ More
Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.
△ Less
Submitted 10 October, 2011;
originally announced October 2011.
-
Limits of modularity maximization in community detection
Authors:
Andrea Lancichinetti,
Santo Fortunato
Abstract:
Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates…
▽ More
Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates when the resolution is low; the tendency to split large subgraphs, which dominates when the resolution is high. In benchmark networks with heterogeneous distributions of cluster sizes, the simultaneous elimination of both biases is not possible and multiresolution modularity is not capable to recover the planted community structure, not even when it is pronounced and easily detectable by other methods, for any value of the resolution parameter. This holds for other multiresolution techniques and it is likely to be a general problem of methods based on global optimization.
△ Less
Submitted 12 February, 2012; v1 submitted 6 July, 2011;
originally announced July 2011.
-
How citation boosts promote scientific paradigm shifts and Nobel Prizes
Authors:
Amin Mazloumian,
Young-Ho Eom,
Dirk Helbing,
Sergi Lozano,
Santo Fortunato
Abstract:
Nobel Prizes are commonly seen to be among the most prestigious achievements of our times. Based on mining several million citations, we quantitatively analyze the processes driving paradigm shifts in science. We find that groundbreaking discoveries of Nobel Prize Laureates and other famous scientists are not only acknowledged by many citations of their landmark papers. Surprisingly, they also boo…
▽ More
Nobel Prizes are commonly seen to be among the most prestigious achievements of our times. Based on mining several million citations, we quantitatively analyze the processes driving paradigm shifts in science. We find that groundbreaking discoveries of Nobel Prize Laureates and other famous scientists are not only acknowledged by many citations of their landmark papers. Surprisingly, they also boost the citation rates of their previous publications. Given that innovations must outcompete the rich-gets-richer effect for scientific citations, it turns out that they can make their way only through citation cascades. A quantitative analysis reveals how and why they happen. Science appears to behave like a self-organized critical system, in which citation cascades of all sizes occur, from continuous scientific progress all the way up to scientific revolutions, which change the way we see our world. Measuring the "boosting effect" of landmark papers, our analysis reveals how new ideas and new players can make their way and finally triumph in a world dominated by established paradigms. The underlying "boost factor" is also useful to discover scientific breakthroughs and talents much earlier than through classical citation analysis, which by now has become a widespread method to measure scientific excellence, influencing scientific careers and the distribution of research funds. Our findings reveal patterns of collective social behavior, which are also interesting from an attention economics perspective. Understanding the origin of scientific authority may therefore ultimately help to explain, how social influence comes about and why the value of goods depends so strongly on the attention they attract.
△ Less
Submitted 10 May, 2011;
originally announced May 2011.
-
Explosive percolation in graphs
Authors:
Santo Fortunato,
Filippo Radicchi
Abstract:
Percolation is perhaps the simplest example of a process exhibiting a phase transition and one of the most studied phenomena in statistical physics. The percolation transition is continuous if sites/bonds are occupied independently with the same probability. However, alternative rules for the occupation of sites/bonds might affect the order of the transition. A recent set of rules proposed by Achl…
▽ More
Percolation is perhaps the simplest example of a process exhibiting a phase transition and one of the most studied phenomena in statistical physics. The percolation transition is continuous if sites/bonds are occupied independently with the same probability. However, alternative rules for the occupation of sites/bonds might affect the order of the transition. A recent set of rules proposed by Achlioptas et al. [Science 323, 1453 (2009)], characterized by competitive link addition, was claimed to lead to a discontinuous connectedness transition, named "explosive percolation". In this work we survey a numerical study of the explosive percolation transition on various types of graphs, from lattices to scale-free networks, and show the consistency of these results with recent analytical work showing that the transition is actually continuous.
△ Less
Submitted 18 January, 2011;
originally announced January 2011.
-
Finding statistically significant communities in networks
Authors:
Andrea Lancichinetti,
Filippo Radicchi,
Jose' Javier Ramasco,
Santo Fortunato
Abstract:
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present…
▽ More
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
△ Less
Submitted 4 May, 2011; v1 submitted 10 December, 2010;
originally announced December 2010.
-
Information filtering in complex weighted networks
Authors:
Filippo Radicchi,
José J. Ramasco,
Santo Fortunato
Abstract:
Many systems in nature, society and technology can be described as networks, where the vertices are the system's elements and edges between vertices indicate the interactions between the corresponding elements. Edges may be weighted if the interaction strength is measurable. However, the full network information is often redundant because tools and techniques from network analysis do not work or b…
▽ More
Many systems in nature, society and technology can be described as networks, where the vertices are the system's elements and edges between vertices indicate the interactions between the corresponding elements. Edges may be weighted if the interaction strength is measurable. However, the full network information is often redundant because tools and techniques from network analysis do not work or become very inefficient if the network is too dense and some weights may just reflect measurement errors, and shall be discarded. Moreover, since weight distributions in many complex weighted networks are broad, most of the weight is concentrated among a small fraction of all edges. It is then crucial to properly detect relevant edges. Simple thresholding would leave only the largest weights, disrupting the multiscale structure of the system, which is at the basis of the structure of complex networks, and ought to be kept. In this paper we propose a weight filtering technique based on a global null model (GloSS filter), keeping both the weight distribution and the full topological structure of the network. The method correctly quantifies the statistical significance of weights assigned independently to the edges from a given distribution. Applications to real networks reveal that the GloSS filter is indeed able to identify relevantconnections between vertices.
△ Less
Submitted 15 April, 2011; v1 submitted 15 September, 2010;
originally announced September 2010.
-
Characterizing the community structure of complex networks
Authors:
Andrea Lancichinetti,
Mikko Kivela,
Jari Saramaki,
Santo Fortunato
Abstract:
Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks. We present a systematic empirical analysis of the statistical properties of communities…
▽ More
Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks. We present a systematic empirical analysis of the statistical properties of communities in large information, communication, technological, biological, and social networks. We find that the mesoscopic organization of networks of the same category is remarkably similar. This is reflected in several characteristics of community structure, which can be used as ``fingerprints'' of specific network categories. While community size distributions are always broad, certain categories of networks consist mainly of tree-like communities, while others have denser modules. Average path lengths within communities initially grow logarithmically with community size, but the growth saturates or slows down for communities larger than a characteristic size. This behaviour is related to the presence of hubs within communities, whose roles differ across categories. Also the community embeddedness of nodes, measured in terms of the fraction of links within their communities, has a characteristic distribution for each category. Our findings are verified by the use of two fundamentally different community detection methods.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.