-
Metrics and Mechanisms: Measuring the Unmeasurable in the Science of Science
Authors:
Lingfei Wu,
Aniket Kittur,
Hyejin Youn,
Staša Milojević,
Erin Leahey,
Stephen M. Fiore,
Yong Yeol Ahn
Abstract:
What science does, what science could do, and how to make science work? If we want to know the answers to these questions, we need to be able to uncover the mechanisms of science, going beyond metrics that are easily collectible and quantifiable. In this perspective piece, we link metrics to mechanisms by demonstrating how emerging metrics of science not only offer complementaries to existing ones…
▽ More
What science does, what science could do, and how to make science work? If we want to know the answers to these questions, we need to be able to uncover the mechanisms of science, going beyond metrics that are easily collectible and quantifiable. In this perspective piece, we link metrics to mechanisms by demonstrating how emerging metrics of science not only offer complementaries to existing ones, but also shed light on the hidden structure and mechanisms of science. Based on fundamental properties of science, we classify existing theories and findings into: hot and cold science referring to attention shift between scientific fields, fast and slow science reflecting productivity of scientists and teams, soft and hard science revealing reproducibility of scientific research. We suggest that interest about mechanisms of science since Derek J. de Solla Price, Robert K. Merton, Eugene Garfield, and many others complement the zeitgeist in pursuing new, complex metrics without understanding the underlying processes. We propose that understanding and modeling the mechanisms of science condition effective development and application of metrics.
△ Less
Submitted 9 April, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
Unsupervised embedding of trajectories captures the latent structure of scientific migration
Authors:
Dakota Murray,
Jisung Yoon,
Sadamori Kojaku,
Rodrigo Costas,
Woo-Sung Jung,
Staša Milojević,
Yong-Yeol Ahn
Abstract:
Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, origi…
▽ More
Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, originally designed for natural language, provide an opportunity to tame this complexity and open new avenues for the study of migration. Here, we demonstrate the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories, producing an accurate, dense, continuous, and meaningful vector-space representation. The resulting representation provides a functional distance between locations, as well as a digital double that can be distributed, re-used, and itself interrogated to understand the many dimensions of migration. We show that the unique power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility. Focusing on the case of scientific migration, we apply word2vec to a database of three million migration trajectories of scientists derived from the affiliations listed on their publication records. Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration, such as cultural, linguistic, and prestige relationships at multiple levels of granularity. Our results provide a theoretical foundation and methodological framework for using neural embeddings to represent and understand migration both within and beyond science.
△ Less
Submitted 17 November, 2023; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Nature, Science, and PNAS -- Disciplinary profiles and impact
Authors:
Staša Milojević
Abstract:
Nature, Science, and PNAS are the three most prestigious general-science journals, and Nature and Science are among the most influential journals overall, based on the journal Impact Factor (IF). In this paper we perform automatic classification of ~50,000 articles in these journals (published in the period 2005-2015) into 14 broad areas, to explore disciplinary profiles and to determine their fie…
▽ More
Nature, Science, and PNAS are the three most prestigious general-science journals, and Nature and Science are among the most influential journals overall, based on the journal Impact Factor (IF). In this paper we perform automatic classification of ~50,000 articles in these journals (published in the period 2005-2015) into 14 broad areas, to explore disciplinary profiles and to determine their field-specific IFs. We find that in all three journals the articles from Bioscience, Astronomy, and Geosciences are over-represented, with other areas being under-represented, some of them severely. Discipline-specific IFs in these journals vary greatly, for example, between 18 and 46 for Nature. We find that the areas that have the highest disciplinary IFs are not the ones that contribute the most articles. We also find that publishing articles in these three journals brings prestige for articles in all areas, but at different levels, the least being for Astronomy. Comparing field-specific IFs of Nature, Science and PNAS to other top journals in six largest areas (Bioscience, Medicine, Geosciences, Physics, Astronomy, and Chemistry) these three journals are always among the top seven journals, with Nature being at the very top for all fields except in Medicine.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Towards a more realistic citation model: The key role of research team sizes
Authors:
Staša Milojević
Abstract:
We propose a new citation model which builds on the existing models that explicitly or implicitly include "direct" and "indirect" (learning about a cited paper's existence from references in another paper) citation mechanisms. Our model departs from the usual, unrealistic assumption of uniform probability of direct citation, in which initial differences in citation arise purely randomly. Instead,…
▽ More
We propose a new citation model which builds on the existing models that explicitly or implicitly include "direct" and "indirect" (learning about a cited paper's existence from references in another paper) citation mechanisms. Our model departs from the usual, unrealistic assumption of uniform probability of direct citation, in which initial differences in citation arise purely randomly. Instead, we demonstrate that a two-mechanism model in which the probability of direct citation is proportional to the number of authors on a paper (team size) is able to reproduce the empirical citation distributions of articles published in the field of astronomy remarkably well, and at different points in time. Interpretation of our model is that the intrinsic citation capacity, and hence the initial visibility of a paper, will be enhanced when more people are intimately familiar with some work, favoring papers from larger teams. While the intrinsic citation capacity cannot depend only on the team size, our model demonstrates that it must be to some degree correlated with it, and distributed in a similar way, i.e., having a power-law tail. Consequently, our team-size model qualitatively explains the existence of a correlation between the number of citations and the number of authors on a paper.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Practical method to reclassify Web of Science articles into unique subject categories and broad disciplines
Authors:
Staša Milojević
Abstract:
Classification of bibliographic items into subjects and disciplines in large databases is essential for many quantitative science studies. The Web of Science classification of journals into ~250 subject categories, which has served as a basis for many studies, is known to have some fundamental problems and several practical limitations that may affect the results from such studies. Here we present…
▽ More
Classification of bibliographic items into subjects and disciplines in large databases is essential for many quantitative science studies. The Web of Science classification of journals into ~250 subject categories, which has served as a basis for many studies, is known to have some fundamental problems and several practical limitations that may affect the results from such studies. Here we present an easily reproducible method to perform reclassification of the Web of Science into existing subject categories and into 14 broad areas. Our reclassification is at a level of articles, so it preserves disciplinary differences that may exist among individual articles published in the same journal. Reclassification also eliminates ambiguous (multiple) categories that are found for 50% of items, and assigns a discipline/field category to all articles that come from broad-coverage journals such as Nature and Science. The correctness of the assigned subject categories is evaluated manually and is found to be ~95%.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Recency predicts bursts in the evolution of author citations
Authors:
Filipi Nascimento Silva,
Aditya Tandon,
Diego Raphael Amancio,
Alessandro Flammini,
Filippo Menczer,
Staša Milojević,
Santo Fortunato
Abstract:
The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evid…
▽ More
The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evidence of a correlation between the number of citations most recently accrued by an author and the number of citations they receive in the future. Using a simple model where the probability for an author to receive new citations depends only on the number of citations collected in the previous 12-24 months, we are able to reproduce both the citation and burst size distributions of authors across multiple decades.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Network dynamics of innovation processes
Authors:
Iacopo Iacopini,
Staša Milojević,
Vito Latora
Abstract:
We introduce a model for the emergence of innovations, in which cognitive processes are described as random walks on the network of links among ideas or concepts, and an innovation corresponds to the first visit of a node. The transition matrix of the random walk depends on the network weights, while in turn the weight of an edge is reinforced by the passage of a walker. The presence of the networ…
▽ More
We introduce a model for the emergence of innovations, in which cognitive processes are described as random walks on the network of links among ideas or concepts, and an innovation corresponds to the first visit of a node. The transition matrix of the random walk depends on the network weights, while in turn the weight of an edge is reinforced by the passage of a walker. The presence of the network naturally accounts for the mechanism of the adjacent possible, and the model reproduces both the rate at which novelties emerge and the correlations among them observed empirically. We show this by using synthetic networks and by studying real data sets on the growth of knowledge in different scientific disciplines. Edge-reinforced random walks on complex topologies offer a new modeling framework for the dynamics of correlated novelties and are another example of coevolution of processes and networks.
△ Less
Submitted 24 January, 2018; v1 submitted 13 July, 2017;
originally announced July 2017.
-
Citation success index - An intuitive pair-wise journal comparison metric
Authors:
Staša Milojević,
Filippo Radicchi,
Judit Bar-Ilan
Abstract:
In this paper we present "citation success index", a metric for comparing the citation capacity of pairs of journals. Citation success index is the probability that a random paper in one journal has more citations than a random paper in another journal (50% means the two journals do equally well). Unlike the journal impact factor (IF), the citation success index depends on the broadness and the sh…
▽ More
In this paper we present "citation success index", a metric for comparing the citation capacity of pairs of journals. Citation success index is the probability that a random paper in one journal has more citations than a random paper in another journal (50% means the two journals do equally well). Unlike the journal impact factor (IF), the citation success index depends on the broadness and the shape of citation distributions. Also, it is insensitive to sporadic highly-cited papers that skew the IF. Nevertheless, we show, based on 16,000 journals containing ~2.4 million articles, that the citation success index is a relatively tight function of the ratio of IFs of journals being compared, due to the fact that journals with same IF have quite similar citation distributions. The citation success index grows slowly as a function of IF ratio. It is substantial (>90%) only when the ratio of IFs exceeds ~6, whereas a factor of two difference in IF values translates into a modest advantage for the journal with higher IF (index of ~70%). We facilitate the wider adoption of this metric by providing an online calculator that takes as input parameters only the IFs of the pair of journals.
△ Less
Submitted 21 December, 2016; v1 submitted 11 July, 2016;
originally announced July 2016.
-
Citations: Indicators of Quality? The Impact Fallacy
Authors:
Loet Leydesdorff,
Lutz Bornmann,
Jordan Comins,
Staša Milojević
Abstract:
We argue that citation is a composed indicator: short-term citations can be considered as currency at the research front, whereas long-term citations can contribute to the codification of knowledge claims into concept symbols. Knowledge claims at the research front are more likely to be transitory and are therefore problematic as indicators of quality. Citation impact studies focus on short-term c…
▽ More
We argue that citation is a composed indicator: short-term citations can be considered as currency at the research front, whereas long-term citations can contribute to the codification of knowledge claims into concept symbols. Knowledge claims at the research front are more likely to be transitory and are therefore problematic as indicators of quality. Citation impact studies focus on short-term citation, and therefore tend to measure not epistemic quality, but involvement in current discourses in which contributions are positioned by referencing. We explore this argument using three case studies: (1) citations of the journal Soziale Welt as an example of a venue that tends not to publish papers at a research front, unlike, for example, JACS; (2) Robert Merton as a concept symbol across theories of citation; and (3) the Multi-RPYS ("Multi-Referenced Publication Year Spectroscopy") of the journals Scientometrics, Gene, and Soziale Welt. We show empirically that the measurement of "quality" in terms of citations can further be qualified: short-term citation currency at the research front can be distinguished from longer-term processes of incorporation and codification of knowledge claims into bodies of knowledge. The recently introduced Multi-RPYS can be used to distinguish between short-term and long-term impacts.
△ Less
Submitted 21 July, 2016; v1 submitted 28 March, 2016;
originally announced March 2016.
-
Quantifying the Cognitive Extent of Science
Authors:
Staša Milojević
Abstract:
While the modern science is characterized by an exponential growth in scientific literature, the increase in publication volume clearly does not reflect the expansion of the cognitive boundaries of science. Nevertheless, most of the metrics for assessing the vitality of science or for making funding and policy decisions are based on productivity. Similarly, the increasing level of knowledge produc…
▽ More
While the modern science is characterized by an exponential growth in scientific literature, the increase in publication volume clearly does not reflect the expansion of the cognitive boundaries of science. Nevertheless, most of the metrics for assessing the vitality of science or for making funding and policy decisions are based on productivity. Similarly, the increasing level of knowledge production by large science teams, whose results often enjoy greater visibility, does not necessarily mean that "big science" leads to cognitive expansion. Here we present a novel, big-data method to quantify the extents of cognitive domains of different bodies of scientific literature independently from publication volume, and apply it to 20 million articles published over 60-130 years in physics, astronomy, and biomedicine. The method is based on the lexical diversity of titles of fixed quotas of research articles. Owing to large size of quotas, the method overcomes the inherent stochasticity of article titles to achieve <1% precision. We show that the periods of cognitive growth do not necessarily coincide with the trends in publication volume. Furthermore, we show that the articles produced by larger teams cover significantly smaller cognitive territory than (the same quota of) articles from smaller teams. Our findings provide a new perspective on the role of small teams and individual researchers in expanding the cognitive boundaries of science. The proposed method of quantifying the extent of the cognitive territory can also be applied to study many other aspects of "science of science."
△ Less
Submitted 3 November, 2015; v1 submitted 30 October, 2015;
originally announced November 2015.
-
The role of handbooks in knowledge creation and diffusion: A case of science and technology studies
Authors:
Staša Milojević,
Cassidy R. Sugimoto,
Vincent Larivière,
Mike Thelwall,
Ying Ding
Abstract:
Genre is considered to be an important element in scholarly communication and in the practice of scientific disciplines. However, scientometric studies have typically focused on a single genre, the journal article. The goal of this study is to understand the role that handbooks play in knowledge creation and diffusion and their relationship with the genre of journal articles, particularly in highl…
▽ More
Genre is considered to be an important element in scholarly communication and in the practice of scientific disciplines. However, scientometric studies have typically focused on a single genre, the journal article. The goal of this study is to understand the role that handbooks play in knowledge creation and diffusion and their relationship with the genre of journal articles, particularly in highly interdisciplinary and emergent social science and humanities disciplines. To shed light on these questions we focused on handbooks and journal articles published over the last four decades belonging to the research area of Science and Technology Studies (STS), broadly defined. To get a detailed picture we used the full-text of five handbooks (500,000 words) and a well-defined set of 11,700 STS articles. We confirmed the methodological split of STS into qualitative and quantitative (scientometric) approaches. Even when the two traditions explore similar topics (e.g., science and gender) they approach them from different starting points. The change in cognitive foci in both handbooks and articles partially reflects the changing trends in STS research, often driven by technology. Using text similarity measures we found that, in the case of STS, handbooks play no special role in either focusing the research efforts or marking their decline. In general, they do not represent the summaries of research directions that have emerged since the previous edition of the handbook.
△ Less
Submitted 11 June, 2014;
originally announced June 2014.
-
Principles of scientific research team formation and evolution
Authors:
Staša Milojević
Abstract:
Research teams are the fundamental social unit of science, and yet there is currently no model that describes their basic property: size. In most fields teams have grown significantly in recent decades. We show that this is partly due to the change in the character of team-size distribution. We explain these changes with a comprehensive yet straightforward model of how teams of different sizes eme…
▽ More
Research teams are the fundamental social unit of science, and yet there is currently no model that describes their basic property: size. In most fields teams have grown significantly in recent decades. We show that this is partly due to the change in the character of team-size distribution. We explain these changes with a comprehensive yet straightforward model of how teams of different sizes emerge and grow. This model accurately reproduces the evolution of empirical team-size distribution over the period of 50 years. The modeling reveals that there are two modes of knowledge production. The first and more fundamental mode employs relatively small, core teams. Core teams form by a Poisson process and produce a Poisson distribution of team sizes in which larger teams are exceedingly rare. The second mode employs extended teams, which started as core teams, but subsequently accumulated new members proportional to the past productivity of their members. Given time, this mode gives rise to a power-law tail of large teams (10-1000 members), which features in many fields today. Based on this model we construct an analytical functional form that allows the contribution of different modes of authorship to be determined directly from the data and is applicable to any field. The model also offers a solid foundation for studying other social aspects of science, such as productivity and collaboration.
△ Less
Submitted 11 March, 2014;
originally announced March 2014.
-
Referenced Publication Years Spectroscopy applied to iMetrics: Scientometrics, Journal of Informetrics, and a relevant subset of JASIST
Authors:
Loet Leydesdorff,
Lutz Bornmann,
Werner Marx,
Staša Milojević
Abstract:
We have developed a (freeware) routine for "referenced publication years spectroscopy" (RPYS) and apply this method to the historiography of "iMetrics," that is, the junction of the journals Scientometrics, Informetrics, and the relevant subset of JASIST (approx. 20%) that shapes the intellectual space for the development of information metrics (bibliometrics, scientometrics, informetrics, and web…
▽ More
We have developed a (freeware) routine for "referenced publication years spectroscopy" (RPYS) and apply this method to the historiography of "iMetrics," that is, the junction of the journals Scientometrics, Informetrics, and the relevant subset of JASIST (approx. 20%) that shapes the intellectual space for the development of information metrics (bibliometrics, scientometrics, informetrics, and webometrics). The application to information metrics (our own field of research) provides us with the opportunity to validate this methodology, and to add a reflection about using citations for the historical reconstruction. The results show that the field is rooted in individual contributions of the 1920s-1950s (e.g., Alfred J. Lotka), and was then shaped intellectually in the early 1960s by a confluence of the history of science (Derek de Solla Price), documentation (e.g., Michael M. Kessler's "bibliographic coupling"), and "citation indexing" (Eugene Garfield). Institutional development at the interfaces between science studies and information science has been reinforced by the new journal Informetrics since 2007. In a concluding reflection, we return to the question of how the historiography of science using algorithmic means--in terms of citation practices--can be different from an intellectual history of the field based, for example, on reading source materials.
△ Less
Submitted 21 November, 2013; v1 submitted 23 September, 2013;
originally announced September 2013.
-
Accuracy of simple, initials-based methods for author name disambiguation
Authors:
Staša Milojević
Abstract:
There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common co-authorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for…
▽ More
There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common co-authorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying a simple criterion. Finally, we introduce a new name-based method that combines the features of first initial and all initials methods by implicitly taking into account the last name frequency and the size of the dataset. This hybrid method reduces the fraction of incorrectly identified authors by 10-30% over the first initial method.
△ Less
Submitted 3 August, 2013;
originally announced August 2013.
-
arXiv e-prints and the journal of record: An analysis of roles and relationships
Authors:
Vincent Lariviere,
Cassidy R. Sugimoto,
Benoit Macaluso,
Stasa Milojevic,
Blaise Cronin,
Mike Thelwall
Abstract:
Since its creation in 1991, arXiv has become central to the diffusion of research in a number of fields. Combining data from the entirety of arXiv and the Web of Science (WoS), this paper investigates (a) the proportion of papers across all disciplines that are on arXiv and the proportion of arXiv papers that are in the WoS, (b) elapsed time between arXiv submission and journal publication, and (c…
▽ More
Since its creation in 1991, arXiv has become central to the diffusion of research in a number of fields. Combining data from the entirety of arXiv and the Web of Science (WoS), this paper investigates (a) the proportion of papers across all disciplines that are on arXiv and the proportion of arXiv papers that are in the WoS, (b) elapsed time between arXiv submission and journal publication, and (c) the aging characteristics and scientific impact of arXiv e-prints and their published version. It shows that the proportion of WoS papers found on arXiv varies across the specialties of physics and mathematics, and that only a few specialties make extensive use of the repository. Elapsed time between arXiv submission and journal publication has shortened but remains longer in mathematics than in physics. In physics, mathematics, as well as in astronomy and astrophysics, arXiv versions are cited more promptly and decay faster than WoS papers. The arXiv versions of papers - both published and unpublished - have lower citation rates than published papers, although there is almost no difference in the impact of the arXiv versions of both published and unpublished papers.
△ Less
Submitted 13 June, 2013;
originally announced June 2013.
-
Citation content analysis (cca): A framework for syntactic and semantic analysis of citation content
Authors:
Guo Zhang,
Ying Ding,
Staša Milojević
Abstract:
This paper proposes a new framework for Citation Content Analysis (CCA), for syntactic and semantic analysis of citation content that can be used to better analyze the rich sociocultural context of research behavior. The framework could be considered the next generation of citation analysis. This paper briefly reviews the history and features of content analysis in traditional social sciences, and…
▽ More
This paper proposes a new framework for Citation Content Analysis (CCA), for syntactic and semantic analysis of citation content that can be used to better analyze the rich sociocultural context of research behavior. The framework could be considered the next generation of citation analysis. This paper briefly reviews the history and features of content analysis in traditional social sciences, and its previous application in Library and Information Science. Based on critical discussion of the theoretical necessity of a new method as well as the limits of citation analysis, the nature and purposes of CCA are discussed, and potential procedures to conduct CCA, including principles to identify the reference scope, a two-dimensional (citing and cited) and two-modular (syntactic and semantic modules) codebook, are provided and described. Future works and implications are also suggested.
△ Less
Submitted 27 November, 2012;
originally announced November 2012.
-
How are academic age, productivity and collaboration related to citing behavior of researchers?
Authors:
Staša Milojević
Abstract:
References are an essential component of research articles and therefore of scientific communication. In this study we investigate referencing (citing) behavior in five diverse fields (astronomy, mathematics, robotics, ecology and economics) based on 213,756 core journal articles. At the macro level we find: (a) a steady increase in the number of references per article over the period studied (50…
▽ More
References are an essential component of research articles and therefore of scientific communication. In this study we investigate referencing (citing) behavior in five diverse fields (astronomy, mathematics, robotics, ecology and economics) based on 213,756 core journal articles. At the macro level we find: (a) a steady increase in the number of references per article over the period studied (50 years), which in some fields is due to a higher rate of usage, while in others reflects longer articles and (b) an increase in all fields in the fraction of older, foundational references since the 1980s, with no obvious change in citing patterns associated with the introduction of the Internet. At the meso level we explore current (2006-2010) referencing behavior of different categories of authors (21,562 total) within each field, based on their academic age, productivity and collaborative practices. Contrary to some previous findings and expectations we find that senior researchers use references at the same rate as their junior colleagues, with similar rates of re-citation (use of same references in multiple papers). High Modified Price Index (MPI, which measures the speed of the research front more accurately than the traditional Price Index) of senior authors indicates that their research has the similar cutting-edge aspect as that of their younger colleagues. In all fields both the productive researchers and especially those who collaborate more use a significantly lower fraction of foundational references and have much higher MPI and lower re-citation rates, i.e., they are the ones pushing the research front regardless of researcher age. This paper introduces improved bibliometric methods to measure the speed of the research front, disambiguate lead authors in co-authored papers and decouple measures of productivity and collaboration.
△ Less
Submitted 21 November, 2012; v1 submitted 13 October, 2012;
originally announced October 2012.
-
Social Dynamics of Science
Authors:
Xiaoling Sun,
Jasleen Kaur,
Staša Milojević,
Alessandro Flammini,
Filippo Menczer
Abstract:
The birth and decline of disciplines are critical to science and society. However, no quantitative model to date allows us to validate competing theories of whether the emergence of scientific disciplines drives or follows the formation of social communities of scholars. Here we propose an agent-based model based on a \emph{social dynamics of science,} in which the evolution of disciplines is guid…
▽ More
The birth and decline of disciplines are critical to science and society. However, no quantitative model to date allows us to validate competing theories of whether the emergence of scientific disciplines drives or follows the formation of social communities of scholars. Here we propose an agent-based model based on a \emph{social dynamics of science,} in which the evolution of disciplines is guided mainly by the social interactions among scientists. We find that such a social theory can account for a number of stylized facts about the relationships between disciplines, authors, and publications. These results provide strong quantitative support for the key role of social interactions in shaping the dynamics of science. A "science of science" must gauge the role of exogenous events, such as scientific discoveries and technological advances, against this purely social baseline.
△ Less
Submitted 21 September, 2012;
originally announced September 2012.
-
Information Metrics (iMetrics): A Research Specialty with a Socio-Cognitive Identity?
Authors:
Staša Milojević,
Loet Leydesdorff
Abstract:
"Bibliometrics", "scientometrics", "informetrics", and "webometrics" can all be considered as manifestations of a single research area with similar objectives and methods, which we call "information metrics" or iMetrics. This study explores the cognitive and social distinctness of iMetrics with respect to the general information science (IS), focusing on a core of researchers, shared vocabulary an…
▽ More
"Bibliometrics", "scientometrics", "informetrics", and "webometrics" can all be considered as manifestations of a single research area with similar objectives and methods, which we call "information metrics" or iMetrics. This study explores the cognitive and social distinctness of iMetrics with respect to the general information science (IS), focusing on a core of researchers, shared vocabulary and literature/knowledge base. Our analysis investigates the similarities and differences between four document sets. The document sets are drawn from three core journals for iMetrics research (Scientometrics, Journal of the American Society for Information Science and Technology, and Journal of Informetrics). We split JASIST into document sets containing iMetrics and general IS articles. The volume of publications in this representation of the specialty has increased rapidly during the last decade. A core of researchers that predominantly focus on iMetrics topics can thus be identified. This core group has developed a shared vocabulary as exhibited in high similarity of title words and one that shares a knowledge base. The research front of this field moves faster than the research front of information science in general, bringing it closer to Price's dream.
△ Less
Submitted 23 September, 2012; v1 submitted 15 September, 2012;
originally announced September 2012.
-
Scientometrics
Authors:
Loet Leydesdorff,
Staša Milojević
Abstract:
The paper provides an overview of the field of scientometrics, that is: the study of science, technology, and innovation from a quantitative perspective. We cover major historical milestones in the development of this specialism from the 1960s to today and discuss its relationship with the sociology of scientific knowledge, the library and information sciences, and science policy issues such as in…
▽ More
The paper provides an overview of the field of scientometrics, that is: the study of science, technology, and innovation from a quantitative perspective. We cover major historical milestones in the development of this specialism from the 1960s to today and discuss its relationship with the sociology of scientific knowledge, the library and information sciences, and science policy issues such as indicator development. The disciplinary organization of scientometrics is analyzed both conceptually and empirically, using a map of journals cited in the core journal of the field, entitled Scientometrics. A state-of-the-art review of five major research threads is provided: (1) the measurement of impact; (2) the delineation of reference sets; (3) theories of citation; (4) mapping science; and (5) the policy and management contexts of indicator developments.
△ Less
Submitted 23 September, 2013; v1 submitted 22 August, 2012;
originally announced August 2012.
-
Multidisciplinary Cognitive Content of Nanoscience and Nanotechnology
Authors:
Staša Milojević
Abstract:
This article examines the cognitive evolution and disciplinary diversity of nanotechnology as expressed through the terminology used in titles of nano journal articles. The analysis is based on the NanoBank bibliographic database of 287,106 nano articles published between 1981 and 2004. We perform multifaceted analyses of title words, focusing on 100 most frequent terms. Hierarchical clustering of…
▽ More
This article examines the cognitive evolution and disciplinary diversity of nanotechnology as expressed through the terminology used in titles of nano journal articles. The analysis is based on the NanoBank bibliographic database of 287,106 nano articles published between 1981 and 2004. We perform multifaceted analyses of title words, focusing on 100 most frequent terms. Hierarchical clustering of title terms reveals three distinct time periods of cognitive development of nano research: formative (1981-1990), early (1991-1998), and current (after 1998). Early period is characterized by the introduction of thin film deposition techniques, while the current period is characterized by the increased focus on carbon nanotube and nanoparticle research. We introduce a method to identify disciplinary components of nanotechnology. It shows that the nano research is being carried out in a number of diverse parent disciplines. Currently only 5% of articles are published in dedicated nano-only journals. We find that some 85% of nano research today is multidisciplinary. Hierarchical clustering of disciplinary components reveals that the cognitive content of current nanoscience can be divided into nine clusters. Some clusters account for a large fraction of nano research and are identified with such parent disciplines as the condensed matter and applied physics, materials science, and analytical chemistry. Other clusters represent much smaller parts of nano research, but are as cognitively distinct. In the decreasing order of size, these fields are: polymer science, biotechnology, general chemistry, surface science, and pharmacology. Cognitive content of research published in nano-only journals is closest to nano research published in condensed matter and applied physics journals.
△ Less
Submitted 31 March, 2012;
originally announced April 2012.
-
Power-law Distributions in Information Science - Making the Case for Logarithmic Binning
Authors:
Staša Milojević
Abstract:
We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to met…
▽ More
We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often used cumulative distributions can make it difficult to distinguish noise from genuine features, and make it difficult to obtain an accurate power-law exponent of the underlying distribution. The treatment is non-technical, aimed at IS researchers with little or no background in mathematics.
△ Less
Submitted 5 November, 2010;
originally announced November 2010.
-
Modes of Collaboration in Modern Science - Beyond Power Laws and Preferential Attachment
Authors:
Staša Milojević
Abstract:
The goal of the study is to determine the underlying processes leading to the observed collaborator distribution in modern scientific fields, with special attention to non-power law behavior. Nanoscience is used as a case study of a modern interdisciplinary field, and its coauthorship network for 2000-04 period is constructed from NanoBank database. We find three collaboration modes that correspon…
▽ More
The goal of the study is to determine the underlying processes leading to the observed collaborator distribution in modern scientific fields, with special attention to non-power law behavior. Nanoscience is used as a case study of a modern interdisciplinary field, and its coauthorship network for 2000-04 period is constructed from NanoBank database. We find three collaboration modes that correspond to three distinct ranges in the distribution of collaborators: (1) for authors with fewer than 20 collaborators (the majority) preferential attachment does not hold and they form a log-normal "hook" instead of a power law, (2) authors with more than 20 collaborators benefit from preferential attachment and form a power law tail, and (3) authors with between 250 and 800 collaborators are more frequent than expected because of the hyperauthorship practices in certain subfields.
△ Less
Submitted 28 April, 2010;
originally announced April 2010.