-
The Software Complexity of Nations
Authors:
Sándor Juhász,
Johannes Wachs,
Jermain Kaminski,
César A. Hidalgo
Abstract:
Despite the growing importance of the digital sector, research on economic complexity and its implications continues to rely mostly on administrative records, e.g. data on exports, patents, and employment, that fail to capture the nuances of the digital economy. In this paper we use data on the geography of programming languages used in open-source software projects to extend economic complexity i…
▽ More
Despite the growing importance of the digital sector, research on economic complexity and its implications continues to rely mostly on administrative records, e.g. data on exports, patents, and employment, that fail to capture the nuances of the digital economy. In this paper we use data on the geography of programming languages used in open-source software projects to extend economic complexity ideas to the digital economy. We estimate a country's software economic complexity and show that it complements the ability of measures of complexity based on trade, patents, and research papers to account for international differences in GDP per capita, income inequality, and emissions. We also show that open-source software follows the principle of relatedness, meaning that a country's software entries and exits are explained by specialization in related programming languages. We conclude by exploring the diversification and development of countries in open-source software in the context of large language models. Together, these findings help extend economic complexity methods and their policy considerations to the digital sector.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Large Language Models (LLMs) as Agents for Augmented Democracy
Authors:
Jairo Gudiño-Rosero,
Umberto Grandi,
César A. Hidalgo
Abstract:
We explore an augmented democracy system built on off-the-shelf LLMs fine-tuned to augment data on citizen's preferences elicited over policies extracted from the government programs of the two main candidates of Brazil's 2022 presidential election. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the…
▽ More
We explore an augmented democracy system built on off-the-shelf LLMs fine-tuned to augment data on citizen's preferences elicited over policies extracted from the government programs of the two main candidates of Brazil's 2022 presidential election. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants. At the individual level, we find that LLMs predict out of sample preferences more accurately than a "bundle rule", which would assume that citizens always vote for the proposals of the candidate aligned with their self-reported political orientation. At the population level, we show that a probabilistic sample augmented by an LLM provides a more accurate estimate of the aggregate preferences of a population than the non-augmented probabilistic sample alone. Together, these results indicates that policy preference data augmented using LLMs can capture nuances that transcend party lines and represents a promising avenue of research for data augmentation.
△ Less
Submitted 30 July, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Mapping Global Value Chains at the Product Level
Authors:
Lea Karbevska,
César A. Hidalgo
Abstract:
Value chain data is crucial to navigate economic disruptions, such as those caused by the COVID-19 pandemic and the war in Ukraine. Yet, despite its importance, publicly available value chain datasets, such as the ``World Input-Output Database'', ``Inter-Country Input-Output Tables'', ``EXIOBASE'' or the ``EORA'', lack detailed information about products (e.g. Radio Receivers, Telephones, Electric…
▽ More
Value chain data is crucial to navigate economic disruptions, such as those caused by the COVID-19 pandemic and the war in Ukraine. Yet, despite its importance, publicly available value chain datasets, such as the ``World Input-Output Database'', ``Inter-Country Input-Output Tables'', ``EXIOBASE'' or the ``EORA'', lack detailed information about products (e.g. Radio Receivers, Telephones, Electrical Capacitors, LCDs, etc.) and rely instead on more aggregate industrial sectors (e.g. Electrical Equipment, Telecommunications). Here, we introduce a method based on machine learning and trade theory to infer product-level value chain relationships from fine-grained international trade data. We apply our method to data summarizing the exports and imports of 300+ world regions (e.g. states in the U.S., prefectures in Japan, etc.) and 1200+ products to infer value chain information implicit in their trade patterns. Furthermore, we use proportional allocation to assign the trade flow between regions and countries. This work provides an approximate method to map value chain data at the product level with a relevant trade flow, that should be of interest to people working in logistics, trade, and sustainable development.
△ Less
Submitted 12 June, 2023;
originally announced August 2023.
-
Measuring and Controlling Divisiveness in Rank Aggregation
Authors:
Rachael Colley,
Umberto Grandi,
César Hidalgo,
Mariana Macedo,
Carlos Navarrete
Abstract:
In rank aggregation, members of a population rank issues to decide which are collectively preferred. We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of our divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms…
▽ More
In rank aggregation, members of a population rank issues to decide which are collectively preferred. We focus instead on identifying divisive issues that express disagreements among the preferences of individuals. We analyse the properties of our divisiveness measures and their relation to existing notions of polarisation. We also study their robustness under incomplete preferences and algorithms for control and manipulation of divisiveness. Our results advance our understanding of how to quantify disagreements in collective decision-making.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Understanding Political Divisiveness using Online Participation data from the 2022 French and Brazilian Presidential Elections
Authors:
Carlos Navarrete,
Mariana Macedo,
Rachael Colley,
Jingling Zhang,
Nicole Ferrada,
Maria Eduarda Mello,
Rodrigo Lira,
Carmelo Bastos-Filho,
Umberto Grandi,
Jerome Lang,
César A. Hidalgo
Abstract:
Digital technologies can augment civic participation by facilitating the expression of detailed political preferences. Yet, digital participation efforts often rely on methods optimized for elections involving a few candidates. Here we present data collected in an online experiment where participants built personalized government programs by combining policies proposed by the candidates of the 202…
▽ More
Digital technologies can augment civic participation by facilitating the expression of detailed political preferences. Yet, digital participation efforts often rely on methods optimized for elections involving a few candidates. Here we present data collected in an online experiment where participants built personalized government programs by combining policies proposed by the candidates of the 2022 French and Brazilian presidential elections. We use this data to explore aggregates complementing those used in social choice theory, finding that a metric of divisiveness, which is uncorrelated with traditional aggregation functions, can identify polarizing proposals. These metrics provide a score for the divisiveness of each proposal that can be estimated in the absence of data on the demographic characteristics of participants and that explains the issues that divide a population. These findings suggest divisiveness metrics can be useful complements to traditional aggregation functions in direct forms of digital participation.
△ Less
Submitted 25 October, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Why people judge humans differently from machines: The role of perceived agency and experience
Authors:
Jingling Zhang,
Jane Conway,
César A. Hidalgo
Abstract:
People are known to judge artificial intelligence using a utilitarian moral philosophy and humans using a moral philosophy emphasizing perceived intentions. But why do people judge humans and machines differently? Psychology suggests that people may have different mind perception models of humans and machines, and thus, will treat human-like robots more similarly to the way they treat humans. Here…
▽ More
People are known to judge artificial intelligence using a utilitarian moral philosophy and humans using a moral philosophy emphasizing perceived intentions. But why do people judge humans and machines differently? Psychology suggests that people may have different mind perception models of humans and machines, and thus, will treat human-like robots more similarly to the way they treat humans. Here we present a randomized experiment where we manipulated people's perception of machine agency (e.g., ability to plan, act) and experience (e.g., ability to feel) to explore whether people judge machines that are perceived to be more similar to humans along these two dimensions more similarly to the way they judge humans. We find that people's judgments of machines become more similar to that of humans when they perceive machines as having more agency but not more experience. Our findings indicate that people's use of different moral philosophies to judge humans and machines can be explained by a progression of mind perception models where the perception of agency plays a prominent role. These findings add to the body of evidence suggesting that people's judgment of machines becomes more similar to that of humans motivating further work on dimensions modulating people's judgment of human and machine actions.
△ Less
Submitted 19 September, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Multidimensional Economic Complexity and Inclusive Green Growth
Authors:
Viktor Stojkoski,
Philipp Koch,
César A. Hidalgo
Abstract:
To achieve inclusive green growth, countries need to consider a multiplicity of economic, social, and environmental factors. These are often captured by metrics of economic complexity derived from the geography of trade, thus missing key information on innovative activities. To bridge this gap, we combine trade data with data on patent applications and research publications to build models that si…
▽ More
To achieve inclusive green growth, countries need to consider a multiplicity of economic, social, and environmental factors. These are often captured by metrics of economic complexity derived from the geography of trade, thus missing key information on innovative activities. To bridge this gap, we combine trade data with data on patent applications and research publications to build models that significantly and robustly improve the ability of economic complexity metrics to explain international variations in inclusive green growth. We show that measures of complexity built on trade and patent data combine to explain future economic growth and income inequality and that countries that score high in all three metrics tend to exhibit lower emission intensities. These findings illustrate how the geography of trade, technology, and research combine to explain inclusive green growth.
△ Less
Submitted 21 April, 2023; v1 submitted 17 September, 2022;
originally announced September 2022.
-
The Policy Implications of Economic Complexity
Authors:
César A. Hidalgo
Abstract:
In recent years economic complexity has grown into an active field of fundamental and applied research. Yet, despite important advances, the policy implications of economic complexity remain unclear or misunderstood. Here I organize the policy implications of economic complexity in a framework grounded on 4 Ws: what approaches, focused on identifying target activities and/or locations; when approa…
▽ More
In recent years economic complexity has grown into an active field of fundamental and applied research. Yet, despite important advances, the policy implications of economic complexity remain unclear or misunderstood. Here I organize the policy implications of economic complexity in a framework grounded on 4 Ws: what approaches, focused on identifying target activities and/or locations; when approaches, focused on timing support for related and unrelated activities; where approaches, focused on the geographic diffusion of knowledge; and who approaches, focused on the role played by agents of structural change. The goal of this paper is to provide a framework that groups, organizes, and clarifies the policy implications of economic complexity to facilitate its continued use in regional and international development.
△ Less
Submitted 7 August, 2023; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Assessing dengue fever risk in Costa Rica by using climate variables and machine learning techniques
Authors:
Luis A. Barboza,
Shu-Wei Chou,
Paola Vásquez,
Yury E. García,
Juan G. Calvo,
Hugo C. Hidalgo,
Fabio Sanchez
Abstract:
Dengue fever is a vector-borne disease mostly endemic to tropical and subtropical countries that affect millions every year and is considered a significant burden for public health. Its geographic distribution makes it highly sensitive to climate conditions. Here, we explore the effect of climate variables using the Generalized Additive Model for location, scale, and shape (GAMLSS) and Random Fore…
▽ More
Dengue fever is a vector-borne disease mostly endemic to tropical and subtropical countries that affect millions every year and is considered a significant burden for public health. Its geographic distribution makes it highly sensitive to climate conditions. Here, we explore the effect of climate variables using the Generalized Additive Model for location, scale, and shape (GAMLSS) and Random Forest (RF) machine learning algorithms. Using the reported number of dengue cases, we obtained reliable predictions. The uncertainty of the predictions was also measured. These predictions will serve as input to health officials to further improve and optimize the allocation of resources prior to dengue outbreaks.
△ Less
Submitted 23 March, 2022;
originally announced April 2022.
-
Strategic reciprocity improves academic performance in public elementary school children
Authors:
Cristian Candia,
Víctor Landaeta-Torres,
César A. Hidalgo,
Carlos Rodriguez-Sickert
Abstract:
Social networks are pivotal for learning. Yet, we still lack a full understanding of the mechanisms connecting networks with learning outcomes. Here, we present the results of a large scale study (946 elementary school children from 45 different classrooms) designed to understand the social strategies used by elementary school children. We mapped the social networks of students using both, a non-a…
▽ More
Social networks are pivotal for learning. Yet, we still lack a full understanding of the mechanisms connecting networks with learning outcomes. Here, we present the results of a large scale study (946 elementary school children from 45 different classrooms) designed to understand the social strategies used by elementary school children. We mapped the social networks of students using both, a non-anonymous version of a prisoner's dilemma and a survey of nominated friendships, and compared the strategies played by students with their GPAs. We found that higher GPA students invest more strategically in their relationships, cooperating more generously with friends and less generously with non-friends than lower GPA students. Our findings suggest that the higher selectivity of social capital investments by high performing students may be one of the mechanisms helping them reap the learning benefits of their social networks.
△ Less
Submitted 29 September, 2019; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Sherlock: A Deep Learning Approach to Semantic Data Type Detection
Authors:
Madelon Hulsebos,
Kevin Hu,
Michiel Bakker,
Emanuel Zgraggen,
Arvind Satyanarayan,
Tim Kraska,
Çağatay Demiralp,
César Hidalgo
Abstract:
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number o…
▽ More
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
VizNet: Towards A Large-Scale Visualization Learning and Benchmarking Repository
Authors:
Kevin Hu,
Neil Gaikwad,
Michiel Bakker,
Madelon Hulsebos,
Emanuel Zgraggen,
César Hidalgo,
Tim Kraska,
Guoliang Li,
Arvind Satyanarayan,
Çağatay Demiralp
Abstract:
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data rep…
▽ More
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs. These exemplars often lack the characteristics of real-world datasets, and their one-off nature makes it difficult to compare different techniques. In this paper, we present VizNet: a large-scale corpus of over 31 million datasets compiled from open data repositories and online visualization galleries. On average, these datasets comprise 17 records over 3 dimensions and across the corpus, we find 51% of the dimensions record categorical data, 44% quantitative, and only 5% temporal. VizNet provides the necessary common baseline for comparing visualization design techniques, and developing benchmark models and algorithms for automating visual analysis. To demonstrate VizNet's utility as a platform for conducting online crowdsourced experiments at scale, we replicate a prior study assessing the influence of user task and data distribution on visual encoding effectiveness, and extend it by considering an additional task: outlier detection. To contend with running such studies at scale, we demonstrate how a metric of perceptual effectiveness can be learned from experimental results, and show its predictive power across test datasets.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
Computational Aspects of Optimal Strategic Network Diffusion
Authors:
Marcin Waniek,
Khaled Elbassioni,
Flavio L. Pinheiro,
Cesar A. Hidalgo,
Aamena Alshamsi
Abstract:
Diffusion on complex networks is often modeled as a stochastic process. Yet, recent work on strategic diffusion emphasizes the decision power of agents and treats diffusion as a strategic problem. Here we study the computational aspects of strategic diffusion, i.e., finding the optimal sequence of nodes to activate a network in the minimum time. We prove that finding an optimal solution to this pr…
▽ More
Diffusion on complex networks is often modeled as a stochastic process. Yet, recent work on strategic diffusion emphasizes the decision power of agents and treats diffusion as a strategic problem. Here we study the computational aspects of strategic diffusion, i.e., finding the optimal sequence of nodes to activate a network in the minimum time. We prove that finding an optimal solution to this problem is NP-complete in a general case. To overcome this computational difficulty, we present an algorithm to compute an optimal solution based on a dynamic programming technique. We also show that the problem is fixed parameter-tractable when parametrized by the product of the treewidth and maximum degree. We analyze the possibility of developing an efficient approximation algorithm and show that two heuristic algorithms proposed so far cannot have better than a logarithmic approximation guarantee. Finally, we prove that the problem does not admit better than a logarithmic approximation, unless P=NP.
△ Less
Submitted 30 January, 2020; v1 submitted 10 September, 2018;
originally announced September 2018.
-
VizML: A Machine Learning Approach to Visualization Recommendation
Authors:
Kevin Z. Hu,
Michiel A. Bakker,
Stephen Li,
Tim Kraska,
César A. Hidalgo
Abstract:
Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation th…
▽ More
Data visualization should be accessible for all analysts with data, not just the few with technical expertise. Visualization recommender systems aim to lower the barrier to exploring basic visualizations by automatically generating results for analysts to search and select, rather than manually specify. Here, we demonstrate a novel machine learning-based approach to visualization recommendation that learns visualization design choices from a large corpus of datasets and associated visualizations. First, we identify five key design choices made by analysts while creating visualizations, such as selecting a visualization type and choosing to encode a column along the X- or Y-axis. We train models to predict these design choices using one million dataset-visualization pairs collected from a popular online visualization platform. Neural networks predict these design choices with high accuracy compared to baseline models. We report and interpret feature importances from one of these baseline models. To evaluate the generalizability and uncertainty of our approach, we benchmark with a crowdsourced test set, and show that the performance of our model is comparable to human performance when predicting consensus visualization type, and exceeds that of other ML-based systems.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Complex Economic Activities Concentrate in Large Cities
Authors:
Pierre-Alexandre Balland,
Cristian Jara-Figueroa,
Sergio Petralia,
Mathieu Steijn,
David Rigby,
Cesar A. Hidalgo
Abstract:
Why do some economic activities agglomerate more than others? And, why does the agglomeration of some economic activities continue to increase despite recent developments in communication and transportation technologies? In this paper, we present evidence that complex economic activities concentrate more in large cities. We find this to be true for technologies, scientific publications, industries…
▽ More
Why do some economic activities agglomerate more than others? And, why does the agglomeration of some economic activities continue to increase despite recent developments in communication and transportation technologies? In this paper, we present evidence that complex economic activities concentrate more in large cities. We find this to be true for technologies, scientific publications, industries, and occupations. Using historical patent data, we show that the urban concentration of complex economic activities has been continuously increasing since 1850. These findings suggest that the increasing urban concentration of jobs and innovation might be a consequence of the growing complexity of the economy.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Optimal diversification strategies in the networks of related products and of related research areas
Authors:
Aamena Alshamsi,
Flavio L. Pinheiro,
Cesar A. Hidalgo
Abstract:
Countries and cities are likely to enter economic activities that are related to those that are already present in them. Yet, while these path dependencies are universally acknowledged, we lack an understanding of the diversification strategies that can optimally balance the development of related and unrelated activities. Here, we develop algorithms to identify the activities that are optimal to…
▽ More
Countries and cities are likely to enter economic activities that are related to those that are already present in them. Yet, while these path dependencies are universally acknowledged, we lack an understanding of the diversification strategies that can optimally balance the development of related and unrelated activities. Here, we develop algorithms to identify the activities that are optimal to target at each time step. We find that the strategies that minimize the total time needed to diversify an economy target highly connected activities during a narrow and specific time window. We compare the strategies suggested by our model with the strategies followed by countries in the diversification of their exports and research activities, finding that countries follow strategies that are close to the ones suggested by the model. These findings add to our understanding of economic diversification and also to our general understanding of diffusion in networks.
△ Less
Submitted 9 March, 2018; v1 submitted 29 April, 2017;
originally announced May 2017.
-
Deep Learning the City : Quantifying Urban Perception At A Global Scale
Authors:
Abhimanyu Dubey,
Nikhil Naik,
Devi Parikh,
Ramesh Raskar,
César A. Hidalgo
Abstract:
Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city's physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110…
▽ More
Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city's physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.
△ Less
Submitted 12 September, 2016; v1 submitted 5 August, 2016;
originally announced August 2016.
-
Are Safer Looking Neighborhoods More Lively? A Multimodal Investigation into Urban Life
Authors:
Marco De Nadai,
Radu L. Vieriu,
Gloria Zen,
Stefan Dragicevic,
Nikhil Naik,
Michele Caraviello,
Cesar A. Hidalgo,
Nicu Sebe,
Bruno Lepri
Abstract:
Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the…
▽ More
Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the natural surveillance of human activity. But in a world where the preference for safe looking neighborhoods is small, the connection between the perception of safety and liveliness will be either weak or nonexistent. In this paper we explore the connection between the levels of activity and the perception of safety of neighborhoods in two major Italian cities by combining mobile phone data (as a proxy for activity or liveliness) with scores of perceived safety estimated using a Convolutional Neural Network trained on a dataset of Google Street View images scored using a crowdsourced visual perception survey. We find that: (i) safer looking neighborhoods are more active than what is expected from their population density, employee density, and distance to the city centre; and (ii) that the correlation between appearance of safety and activity is positive, strong, and significant, for females and people over 50, but negative for people under 30, suggesting that the behavioral impact of perception depends on the demographic of the population. Finally, we use occlusion techniques to identify the urban features that contribute to the appearance of safety, finding that greenery and street facing windows contribute to a positive appearance of safety (in agreement with Oscar Newman's defensible space theory). These results suggest that urban appearance modulates levels of human activity and, consequently, a neighborhood's rate of natural surveillance.
△ Less
Submitted 1 August, 2016;
originally announced August 2016.
-
The Research Space: using the career paths of scholars to predict the evolution of the research output of individuals, institutions, and nations
Authors:
Miguel R. Guevara,
Dominik Hartmann,
Manuel Aristarán,
Marcelo Mendoza,
César A. Hidalgo
Abstract:
In recent years scholars have built maps of science by connecting the academic fields that cite each other, are cited together, or that cite a similar literature. But since scholars cannot always publish in the fields they cite, or that cite them, these science maps are only rough proxies for the potential of a scholar, organization, or country, to enter a new academic field. Here we use a large d…
▽ More
In recent years scholars have built maps of science by connecting the academic fields that cite each other, are cited together, or that cite a similar literature. But since scholars cannot always publish in the fields they cite, or that cite them, these science maps are only rough proxies for the potential of a scholar, organization, or country, to enter a new academic field. Here we use a large dataset of scholarly publications disambiguated at the individual level to create a map of science-or research space-where links connect pairs of fields based on the probability that an individual has published in both of them. We find that the research space is a significantly more accurate predictor of the fields that individuals and organizations will enter in the future than citation based science maps. At the country level, however, the research space and citations based science maps are equally accurate. These findings show that data on career trajectories-the set of fields that individuals have previously published in-provide more accurate predictors of future research output for more focalized units-such as individuals or organizations-than citation based science maps.
△ Less
Submitted 14 April, 2016; v1 submitted 26 February, 2016;
originally announced February 2016.
-
How the medium shapes the message: Printing and the rise of the arts and sciences
Authors:
C. Jara-Figueroa,
Amy Z. Yu,
Cesar A. Hidalgo
Abstract:
Communication technologies, from printing to social media, affect our historical records by changing the way ideas are spread and recorded. Yet, finding statistical instruments to address the endogeneity of this relationship has been problematic. Here we use a city's distance to Mainz as an instrument for the introduction of the printing press in European cities, together with data on nearly 50 th…
▽ More
Communication technologies, from printing to social media, affect our historical records by changing the way ideas are spread and recorded. Yet, finding statistical instruments to address the endogeneity of this relationship has been problematic. Here we use a city's distance to Mainz as an instrument for the introduction of the printing press in European cities, together with data on nearly 50 thousand biographies, to show that cities that adopted printing earlier were more likely to be the birthplace of a famous scientist or artist in the years after the introduction of printing. At the global scale, we find that the introduction of printing is associated with a significant and discontinuous increase in the number of biographies available from people born after the introduction of printing. We bring these findings to more recent communication technologies by showing that the number of radios and televisions in a country correlates with the number of performing artists and sports players from that country that reached global fame, even after controlling for GDP, population, and including country and year fixed effects. These findings support the hypothesis that the introduction of communication technologies shift historical records in the direction of the content that is best suited for each technology.
△ Less
Submitted 9 August, 2017; v1 submitted 15 December, 2015;
originally announced December 2015.
-
Disconnected, fragmented, or united? A trans-disciplinary review of network science
Authors:
Cesar A. Hidalgo
Abstract:
During decades the study of networks has been divided between the efforts of social scientists and natural scientists, two groups of scholars who often do not see eye to eye. In this review I present an effort to mutually translate the work conducted by scholars from both of these academic fronts hoping to continue to unify what has become a diverging body of literature. I argue that social and na…
▽ More
During decades the study of networks has been divided between the efforts of social scientists and natural scientists, two groups of scholars who often do not see eye to eye. In this review I present an effort to mutually translate the work conducted by scholars from both of these academic fronts hoping to continue to unify what has become a diverging body of literature. I argue that social and natural scientists fail to see eye to eye because they have diverging academic goals. Social scientists focus on explaining how context specific social and economic mechanisms drive the structure of networks and on how networks shape social and economic outcomes. By contrast, natural scientists focus primarily on modeling network characteristics that are independent of context, since their focus is to identify universal characteristics of systems instead of context specific mechanisms. In the following pages I discuss the differences between both of these literatures by summarizing the parallel theories advanced to explain link formation and the applications used by scholars in each field to justify their approach to network science. I conclude by providing an outlook on how these literatures can be further unified.
△ Less
Submitted 15 July, 2016; v1 submitted 12 November, 2015;
originally announced November 2015.
-
Pantheon 1.0, a manually verified dataset of globally famous biographies
Authors:
Amy Zhao Yu,
Shahar Ronen,
Kevin Hu,
Tiffany Lu,
César A. Hidalgo
Abstract:
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually verified demographic information (place and date of birth, gender) (ii) a taxonomy of occupations classifying…
▽ More
We present the Pantheon 1.0 dataset: a manually verified dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually verified demographic information (place and date of birth, gender) (ii) a taxonomy of occupations classifying each biography at three levels of aggregation and (iii) two measures of global popularity including the number of languages in which a biography is present in Wikipedia (L), and the Historical Popularity Index (HPI) a metric that combines information on L, time since birth, and page-views (2008-2013). We compare the Pantheon 1.0 dataset to data from the 2003 book, Human Accomplishments, and also to external measures of accomplishment in individual games and sports: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of popularity (L and HPI) correlate highly with individual accomplishment, suggesting that measures of global popularity proxy the historical impact of individuals.
△ Less
Submitted 5 January, 2016; v1 submitted 25 February, 2015;
originally announced February 2015.
-
Beyond network structure: How heterogenous susceptibility modulates the spread of epidemics
Authors:
Daniel Smilkov,
Cesar A. Hidalgo,
Ljupco Kocarev
Abstract:
The compartmental models used to study epidemic spreading often assume the same susceptibility for all individuals, and are therefore, agnostic about the effects that differences in susceptibility can have on epidemic spreading. Here we show that--for the SIS model--differential susceptibility can make networks more vulnerable to the spread of diseases when the correlation between a node's degree…
▽ More
The compartmental models used to study epidemic spreading often assume the same susceptibility for all individuals, and are therefore, agnostic about the effects that differences in susceptibility can have on epidemic spreading. Here we show that--for the SIS model--differential susceptibility can make networks more vulnerable to the spread of diseases when the correlation between a node's degree and susceptibility are positive, and less vulnerable when this correlation is negative. Moreover, we show that networks become more likely to contain a pocket of infection when individuals are more likely to connect with others that have similar susceptibility (the network is segregated). These results show that the failure to include differential susceptibility to epidemic models can lead to a systematic over/under estimation of fundamental epidemic parameters when the structure of the networks is not independent from the susceptibility of the nodes or when there are correlations between the susceptibility of connected individuals.
△ Less
Submitted 10 March, 2014;
originally announced March 2014.
-
Understanding the spreading patterns of mobile phone viruses
Authors:
P. Wang,
M. Gonzalez,
C. A. Hidalgo,
A. -L. Barabasi
Abstract:
We model the mobility of mobile phone users to study the fundamental spreading patterns characterizing a mobile virus outbreak. We find that while Bluetooth viruses can reach all susceptible handsets with time, they spread slowly due to human mobility, offering ample opportunities to deploy antiviral software. In contrast, viruses utilizing multimedia messaging services could infect all users in…
▽ More
We model the mobility of mobile phone users to study the fundamental spreading patterns characterizing a mobile virus outbreak. We find that while Bluetooth viruses can reach all susceptible handsets with time, they spread slowly due to human mobility, offering ample opportunities to deploy antiviral software. In contrast, viruses utilizing multimedia messaging services could infect all users in hours, but currently a phase transition on the underlying call graph limits them to only a small fraction of the susceptible users. These results explain the lack of a major mobile virus breakout so far and predict that once a mobile operating system's market share reaches the phase transition point, viruses will pose a serious threat to mobile communications.
△ Less
Submitted 24 June, 2009;
originally announced June 2009.
-
Understanding individual human mobility patterns
Authors:
M. C. Gonzalez,
C. A. Hidalgo,
A. -L. Barabasi
Abstract:
Despite their importance for urban planning, traffic forecasting, and the spread of biological and mobile viruses, our understanding of the basic laws governing human motion remains limited thanks to the lack of tools to monitor the time resolved location of individuals. Here we study the trajectory of 100,000 anonymized mobile phone users whose position is tracked for a six month period. We fin…
▽ More
Despite their importance for urban planning, traffic forecasting, and the spread of biological and mobile viruses, our understanding of the basic laws governing human motion remains limited thanks to the lack of tools to monitor the time resolved location of individuals. Here we study the trajectory of 100,000 anonymized mobile phone users whose position is tracked for a six month period. We find that in contrast with the random trajectories predicted by the prevailing Levy flight and random walk models, human trajectories show a high degree of temporal and spatial regularity, each individual being characterized by a time independent characteristic length scale and a significant probability to return to a few highly frequented locations. After correcting for differences in travel distances and the inherent anisotropy of each trajectory, the individual travel patterns collapse into a single spatial probability distribution, indicating that despite the diversity of their travel history, humans follow simple reproducible patterns. This inherent similarity in travel patterns could impact all phenomena driven by human mobility, from epidemic prevention to emergency response, urban planning and agent based modeling.
△ Less
Submitted 6 June, 2008;
originally announced June 2008.