Search | arXiv e-print repository

Large coverage fluctuations in Google Scholar: a case study

Authors: Alberto Martín-Martín, Emilio Delgado López-Cózar

Abstract: Unlike other academic bibliographic databases, Google Scholar intentionally operates in a way that does not maintain coverage stability: documents that stop being available to Google Scholar's crawlers are removed from the system. This can also affect Google Scholar's citation graph (citation counts can decrease). Furthermore, because Google Scholar is not transparent about its coverage, the only… ▽ More Unlike other academic bibliographic databases, Google Scholar intentionally operates in a way that does not maintain coverage stability: documents that stop being available to Google Scholar's crawlers are removed from the system. This can also affect Google Scholar's citation graph (citation counts can decrease). Furthermore, because Google Scholar is not transparent about its coverage, the only way to directly observe coverage loss is through regular monitorization of Google Scholar data. Because of this, few studies have empirically documented this phenomenon. This study analyses a large decrease in coverage of documents in the field of Astronomy and Astrophysics that took place in 2019 and its subsequent recovery, using longitudinal data from previous analyses and a new dataset extracted in 2020. Documents from most of the larger publishers in the field disappeared from Google Scholar despite continuing to be available on the Web, which suggests an error on Google Scholar's side. Disappeared documents did not reappear until the following index-wide update, many months after the problem was discovered. The slowness with which Google Scholar is currently able to resolve indexing errors is a clear limitation of the platform both for literature search and bibliometric use cases. △ Less

Submitted 15 February, 2021; originally announced February 2021.

arXiv:2004.14329 [pdf]

doi 10.1007/s11192-020-03690-4

Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations

Authors: Alberto Martín-Martín, Mike Thelwall, Enrique Orduna-Malea, Emilio Delgado López-Cózar

Abstract: New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. In response, this paper investigates 3,073,351 citation… ▽ More New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. In response, this paper investigates 3,073,351 citations found by these six data sources to 2,515 English-language highly-cited documents published in 2006 from 252 subject categories, expanding and updating the largest previous study. Google Scholar found 88% of all citations, many of which were not found by the other sources, and nearly all citations found by the remaining sources (89%-94%). A similar pattern held within most subject categories. Microsoft Academic is the second largest overall (60% of all citations), including 82% of Scopus citations and 86% of Web of Science citations. In most categories, Microsoft Academic found more citations than Scopus and WoS (182 and 223 subject categories, respectively), but had coverage gaps in some areas, such as Physics and some Humanities categories. After Scopus, Dimensions is fourth largest (54% of all citations), including 84% of Scopus citations and 88% of WoS citations. It found more citations than Scopus in 36 categories, more than WoS in 185, and displays some coverage gaps, especially in the Humanities. Following WoS, COCI is the smallest, with 28% of all citations. Google Scholar is still the most comprehensive source. In many subject categories Microsoft Academic and Dimensions are good alternatives to Scopus and WoS in terms of coverage. △ Less

Submitted 30 January, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: 40 pages, 10 figures, 4 tables

Journal ref: 2021. Scientometrics, 126(1), 871-906

arXiv:1808.05053 [pdf]

doi 10.1016/j.joi.2018.09.002

Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories

Authors: Alberto Martín-Martín, Enrique Orduna-Malea, Mike Thelwall, Emilio Delgado López-Cózar

Abstract: Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, c… ▽ More Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%-96%), far ahead of Scopus (35%-77%) and WoS (27%-73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%-38%), and they tended to be much less cited than citing sources that were also in Scopus or WoS. Despite the many unique GS citing sources, Spearman correlations between citation counts in GS and WoS or Scopus are high (0.78-0.99). They are lower in the Humanities, and lower between GS and WoS than between GS and Scopus. The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage. △ Less

Submitted 12 March, 2019; v1 submitted 15 August, 2018; originally announced August 2018.

Journal ref: Journal of Informetrics, 12(4), 1160-1177, 2018

arXiv:1806.06351 [pdf]

Google Scholar: the 'big data' bibliographic tool

Authors: Emilio Delgado Lopez-Cozar, Enrique Orduna-Malea, Alberto Martin-Martin, Juan M. Ayllon

Abstract: The launch of Google Scholar back in 2004 meant a revolution not only in the scientific information search market but also in research evaluation processes. Its dynamism, unparalleled coverage, and uncontrolled indexing make of Google Scholar an unusual product, especially when compared to traditional bibliographic databases. Conceived primarily as a discovery tool for academic information, it pre… ▽ More The launch of Google Scholar back in 2004 meant a revolution not only in the scientific information search market but also in research evaluation processes. Its dynamism, unparalleled coverage, and uncontrolled indexing make of Google Scholar an unusual product, especially when compared to traditional bibliographic databases. Conceived primarily as a discovery tool for academic information, it presents a number of limitations as a bibliometric tool. The main objective of this chapter is to show how Google Scholar operates and how its core database may be used for bibliometric purposes. To do this, the general features of the search engine (in terms of document typologies, disciplines, and coverage) are analysed. Lastly, several bibliometric tools based on Google Scholar data, both official (Google Scholar Metrics, Google Scholar Citations), and some developed by third parties (H Index Scholar, Publishers Scholar Metrics, Proceedings Scholar Metrics, Journal Scholar Metrics, Scholar Mirrors), as well as software to collect and process data from this source (Publish or Perish, Scholarometer) are introduced, aiming to illustrate the potential bibliometric uses of this source. △ Less

Submitted 17 June, 2018; originally announced June 2018.

Comments: 31 pages with 6 figures and 1 table

arXiv:1806.05029 [pdf]

doi 10.17605/OSF.IO/7B4AJ

Unbundling Open Access dimensions: a conceptual discussion to reduce terminology inconsistencies

Authors: Alberto Martín-Martín, Rodrigo Costas, Thed N. van Leeuwen, Emilio Delgado López-Cózar

Abstract: The current ways in which documents are made freely accessible in the Web no longer adhere to the models established Budapest/Bethesda/Berlin (BBB) definitions of Open Access (OA). Since those definitions were established, OA-related terminology has expanded, trying to keep up with all the variants of OA publishing that are out there. However, the inconsistent and arbitrary terminology that is bei… ▽ More The current ways in which documents are made freely accessible in the Web no longer adhere to the models established Budapest/Bethesda/Berlin (BBB) definitions of Open Access (OA). Since those definitions were established, OA-related terminology has expanded, trying to keep up with all the variants of OA publishing that are out there. However, the inconsistent and arbitrary terminology that is being used to refer to these variants are complicating communication about OA-related issues. This study intends to initiate a discussion on this issue, by proposing a conceptual model of OA. Our model features six different dimensions (prestige, user rights, stability, immediacy, peer-review, and cost). Each dimension allows for a range of different options. We believe that by combining the options in these six dimensions, we can arrive at all the current variants of OA, while avoiding ambiguous and/or arbitrary terminology. This model can be an useful tool for funders and policy makers who need to decide exactly which aspects of OA are necessary for each specific scenario. △ Less

Submitted 21 August, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: 8 pages, 1 figure. Accepted as oral presentation in 23rd STI conference (2018)

arXiv:1806.04435 [pdf]

Google Scholar as a data source for research assessment

Authors: Emilio Delgado López-Cózar, Enrique Orduna-Malea, Alberto Martín-Martín

Abstract: The launch of Google Scholar (GS) marked the beginning of a revolution in the scientific information market. This search engine, unlike traditional databases, automatically indexes information from the academic web. Its ease of use, together with its wide coverage and fast indexing speed, have made it the first tool most scientists currently turn to when they need to carry out a literature search.… ▽ More The launch of Google Scholar (GS) marked the beginning of a revolution in the scientific information market. This search engine, unlike traditional databases, automatically indexes information from the academic web. Its ease of use, together with its wide coverage and fast indexing speed, have made it the first tool most scientists currently turn to when they need to carry out a literature search. Additionally, the fact that its search results were accompanied from the beginning by citation counts, as well as the later development of secondary products which leverage this citation data (such as Google Scholar Metrics and Google Scholar Citations), made many scientists wonder about its potential as a source of data for bibliometric analyses. The goal of this chapter is to lay the foundations for the use of GS as a supplementary source (and in some disciplines, arguably the best alternative) for scientific evaluation. First, we present a general overview of how GS works. Second, we present empirical evidences about its main characteristics (size, coverage, and growth rate). Third, we carry out a systematic analysis of the main limitations this search engine presents as a tool for the evaluation of scientific performance. Lastly, we discuss the main differences between GS and other more traditional bibliographic databases in light of the correlations found between their citation data. We conclude that Google Scholar presents a broader view of the academic world because it has brought to light a great amount of sources that were not previously visible. △ Less

Submitted 18 June, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

Comments: 42 pages. Forthcoming in: Springer Handbook of Science and Technology Indicators (Editors: Wolfgang Glaenzel, Henk Moed, Ulrich Schmoch, Michael Thelwall)

arXiv:1804.11209 [pdf]

doi 10.1007/s11192-017-2587-4

A novel method for depicting academic disciplines through Google Scholar Citations: The case of Bibliometrics

Authors: Alberto Martín-Martín, Enrique Orduna-Malea, Emilio Delgado López-Cózar

Abstract: This article describes a procedure to generate a snapshot of the structure of a specific scientific community and their outputs based on the information available in Google Scholar Citations (GSC). We call this method MADAP (Multifaceted Analysis of Disciplines through Academic Profiles). The international community of researchers working in Bibliometrics, Scientometrics, Informetrics, Webometrics… ▽ More This article describes a procedure to generate a snapshot of the structure of a specific scientific community and their outputs based on the information available in Google Scholar Citations (GSC). We call this method MADAP (Multifaceted Analysis of Disciplines through Academic Profiles). The international community of researchers working in Bibliometrics, Scientometrics, Informetrics, Webometrics, and Altmetrics was selected as a case study. The records of the top 1,000 most cited documents by these authors according to GSC were manually processed to fill any missing information and deduplicate fields like the journal titles and book publishers. The results suggest that it is feasible to use GSC and the MADAP method to produce an accurate depiction of the community of researchers working in Bibliometrics (both specialists and occasional researchers) and their publication habits (main publication venues such as journals and book publishers). Additionally, the wide document coverage of Google Scholar (specially books and book chapters) enables more comprehensive analyses of the documents published in a specific discipline than were previously possible with other citation indexes, finally shedding light on what until now had been a blind spot in most citation analyses. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1602.02412

Journal ref: Martín-Martín, A., Orduna-Malea, E., & Delgado López-Cózar, E. (2018). A novel method for depicting academic disciplines through Google Scholar Citations: The case of Bibliometrics. Scientometrics, 114(3), 1251-1273

arXiv:1804.10439 [pdf]

doi 10.1016/j.joi.2016.11.008

Can we use Google Scholar to identify highly-cited documents?

Authors: Alberto Martín-Martín, Enrique Orduna-Malea, Anne-Wil Harzing, Emilio Delgado López-Cózar

Abstract: The main objective of this paper is to empirically test whether the identification of highly-cited documents through Google Scholar is feasible and reliable. To this end, we carried out a longitudinal analysis (1950 to 2013), running a generic query (filtered only by year of publication) to minimise the effects of academic search engine optimisation. This gave us a final sample of 64,000 documents… ▽ More The main objective of this paper is to empirically test whether the identification of highly-cited documents through Google Scholar is feasible and reliable. To this end, we carried out a longitudinal analysis (1950 to 2013), running a generic query (filtered only by year of publication) to minimise the effects of academic search engine optimisation. This gave us a final sample of 64,000 documents (1,000 per year). The strong correlation between a document's citations and its position in the search results (r= -0.67) led us to conclude that Google Scholar is able to identify highly-cited papers effectively. This, combined with Google Scholar's unique coverage (no restrictions on document type and source), makes the academic search engine an invaluable tool for bibliometric research relating to the identification of the most influential scientific documents. We find evidence, however, that Google Scholar ranks those documents whose language (or geographical web domain) matches with the user's interface language higher than could be expected based on citations. Nonetheless, this language effect and other factors related to the Google Scholar's operation, i.e. the proper identification of versions and the date of publication, only have an incidental impact. They do not compromise the ability of Google Scholar to identify the highly-cited papers. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Journal ref: Martín-Martín, A., Orduna-Malea, E., Harzing, A.-W., & Delgado López-Cózar, E. (2017). Can we use Google Scholar to identify highly-cited documents? Journal of Informetrics, 11(1), 152-163

arXiv:1804.09479 [pdf]

doi 10.1007/s11192-018-2820-9

Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison

Authors: Alberto Martín-Martín, Enrique Orduna-Malea, Emilio Delgado López-Cózar

Abstract: This study explores the extent to which bibliometric indicators based on counts of highly-cited documents could be affected by the choice of data source. The initial hypothesis is that databases that rely on journal selection criteria for their document coverage may not necessarily provide an accurate representation of highly-cited documents across all subject areas, while inclusive databases, whi… ▽ More This study explores the extent to which bibliometric indicators based on counts of highly-cited documents could be affected by the choice of data source. The initial hypothesis is that databases that rely on journal selection criteria for their document coverage may not necessarily provide an accurate representation of highly-cited documents across all subject areas, while inclusive databases, which give each document the chance to stand on its own merits, might be better suited to identify highly-cited documents. To test this hypothesis, an analysis of 2,515 highly-cited documents published in 2006 that Google Scholar displays in its Classic Papers product is carried out at the level of broad subject categories, checking whether these documents are also covered in Web of Science and Scopus, and whether the citation counts offered by the different sources are similar. The results show that a large fraction of highly-cited documents in the Social Sciences and Humanities (8.6%-28.2%) are invisible to Web of Science and Scopus. In the Natural, Life, and Health Sciences the proportion of missing highly-cited documents in Web of Science and Scopus is much lower. Furthermore, in all areas, Spearman correlation coefficients of citation counts in Google Scholar, as compared to Web of Science and Scopus citation counts, are remarkably strong (.83-.99). The main conclusion is that the data about highly-cited documents available in the inclusive database Google Scholar does indeed reveal significant coverage deficiencies in Web of Science and Scopus in several areas of research. Therefore, using these selective databases to compute bibliometric indicators based on counts of highly-cited documents might produce biased assessments in poorly covered areas. △ Less

Submitted 26 June, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

Comments: 11 pages, 3 tables, 1 figure. Accepted for publication in Scientometrics

Journal ref: Scientometrics, 116(3), 2175-2188, 2018

arXiv:1804.07268 [pdf]

doi 10.1108/OIR-10-2016-0302

The lost academic home: institutional affiliation links in Google Scholar Citations

Authors: Enrique Orduña-Malea, Juan M. Ayllón, Alberto Martín-Martín, Emilio Delgado López-Cózar

Abstract: This paper analyzes the new affiliation feature available in Google-Scholar Citations revealing that the affiliation-tool works well for most-institutions, it is unable to detect all existing institutions in database, and it is not always able to create unique-standardized entry for each-institution. This paper analyzes the new affiliation feature available in Google-Scholar Citations revealing that the affiliation-tool works well for most-institutions, it is unable to detect all existing institutions in database, and it is not always able to create unique-standardized entry for each-institution. △ Less

Submitted 19 April, 2018; originally announced April 2018.

Journal ref: 2017, Online Information Review, 41(6), 762-781

arXiv:1804.05599 [pdf]

doi 10.1016/j.joi.2018.04.001

Author-level metrics in the new academic profile platforms: The online behaviour of the Bibliometrics community

Authors: Alberto Martín-Martín, Enrique Orduna-Malea, Emilio Delgado López-Cózar

Abstract: The new web-based academic communication platforms do not only enable researchers to better advertise their academic outputs, making them more visible than ever before, but they also provide a wide supply of metrics to help authors better understand the impact their work is making. This study has three objectives: a) to analyse the uptake of some of the most popular platforms (Google Scholar Citat… ▽ More The new web-based academic communication platforms do not only enable researchers to better advertise their academic outputs, making them more visible than ever before, but they also provide a wide supply of metrics to help authors better understand the impact their work is making. This study has three objectives: a) to analyse the uptake of some of the most popular platforms (Google Scholar Citations, ResearcherID, ResearchGate, Mendeley and Twitter) by a specific scientific community (bibliometrics, scientometrics, informetrics, webometrics, and altmetrics); b) to compare the metrics available from each platform; and c) to determine the meaning of all these new metrics. To do this, the data available in these platforms about a sample of 811 authors (researchers in bibliometrics for whom a public profile Google Scholar Citations was found) were extracted. A total of 31 metrics were analysed. The results show that a high number of the analysed researchers only had a profile in Google Scholar Citations (159), or only in Google Scholar Citations and ResearchGate (142). Lastly, we find two kinds of metrics of online impact. First, metrics related to connectivity (followers), and second, all metrics associated to academic impact. This second group can further be divided into usage metrics (reads, views), and citation metrics. The results suggest that Google Scholar Citations is the source that provides more comprehensive citation-related data, whereas Twitter stands out in connectivity-related metrics. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: 26 pages, 6 figures, 7 tables

Journal ref: 2018, Journal of Informetrics, 12(2), 494-509

arXiv:1803.06161 [pdf]

doi 10.1016/J.JOI.2018.06.012

Evidence of Open Access of scientific publications in Google Scholar: a large-scale analysis

Authors: Alberto Martín-Martín, Rodrigo Costas, Thed van Leeuwen, Emilio Delgado López-Cózar

Abstract: This article uses Google Scholar (GS) as a source of data to analyse Open Access (OA) levels across all countries and fields of research. All articles and reviews with a DOI and published in 2009 or 2014 and covered by the three main citation indexes in the Web of Science (2,269,022 documents) were selected for study. The links to freely available versions of these documents displayed in GS were c… ▽ More This article uses Google Scholar (GS) as a source of data to analyse Open Access (OA) levels across all countries and fields of research. All articles and reviews with a DOI and published in 2009 or 2014 and covered by the three main citation indexes in the Web of Science (2,269,022 documents) were selected for study. The links to freely available versions of these documents displayed in GS were collected. To differentiate between more reliable (sustainable and legal) forms of access and less reliable ones, the data extracted from GS was combined with information available in DOAJ, CrossRef, OpenDOAR, and ROAR. This allowed us to distinguish the percentage of documents in our sample that are made OA by the publisher (23.1%, including Gold, Hybrid, Delayed, and Bronze OA) from those available as Green OA (17.6%), and those available from other sources (40.6%, mainly due to ResearchGate). The data shows an overall free availability of 54.6%, with important differences at the country and subject category levels. The data extracted from GS yielded very similar results to those found by other studies that analysed similar samples of documents, but employed different methods to find evidence of OA, thus suggesting a relative consistency among methods. △ Less

Submitted 24 July, 2018; v1 submitted 16 March, 2018; originally announced March 2018.

Comments: 38 pages, 8 figures, 3 tables. Complementary materials available at https://osf.io/fsujy/

Journal ref: Journal of Informetrics, 12(3), 819-841, 2018

arXiv:1706.09258 [pdf]

doi 10.13140/RG.2.2.35729.22880/1

Classic papers: déjà vu, a step further in the bibliometric exploitation of Google Scholar

Authors: Emilio Delgado Lopez-Cozar, Alberto Martin-Martin, Enrique Oduna-Malea

Abstract: After giving a brief overview of Eugene Garfield contributions to the issue of identifying and studying the most cited scientific articles, manifested in the creation of his Citation Classics, the main characteristics and features of Google Scholar new service Classic Papers, as well as its main strengths and weaknesses, are addressed. This product currently displays the most cited English-languag… ▽ More After giving a brief overview of Eugene Garfield contributions to the issue of identifying and studying the most cited scientific articles, manifested in the creation of his Citation Classics, the main characteristics and features of Google Scholar new service Classic Papers, as well as its main strengths and weaknesses, are addressed. This product currently displays the most cited English-language original research articles by fields and published in 2006 △ Less

Submitted 28 June, 2017; originally announced June 2017.

Comments: 14 pages, 6 tables, 6 figures

Report number: EC3 Working Papers 24

arXiv:1705.03339 [pdf]

doi 10.1007/s11192-017-2396-9

Do ResearchGate Scores create ghost academic reputations?

Authors: Enrique Orduna-Malea, Alberto Martin-Martin, Mike Thelwall, Emilio Delgado Lopez-Cozar

Abstract: The academic social network site ResearchGate (RG) has its own indicator, RG Score, for its members. The high profile nature of the site means that the RG score may be used for recruitment, promotion and other tasks for which researchers are evaluated. In response, this study investigates whether it is reasonable to employ the RG Score as evidence of scholarly reputation. For this, three different… ▽ More The academic social network site ResearchGate (RG) has its own indicator, RG Score, for its members. The high profile nature of the site means that the RG score may be used for recruitment, promotion and other tasks for which researchers are evaluated. In response, this study investigates whether it is reasonable to employ the RG Score as evidence of scholarly reputation. For this, three different author samples were investigated. An outlier sample includes 104 authors with high values. A Nobel sample comprises 73 Nobel winners from Medicine & Physiology, Chemistry, Physics and Economics (from 1975 to 2015). A longitudinal sample includes weekly data on 4 authors with different RG Scores. The results suggest that high RG Scores are built primarily from activity related to asking and answering questions in the site. In particular, it seems impossible to get a high RG Score solely through publications. Within RG it is possible to distinguish between (passive) academics that interact little in the site and active platform users, who can get high RG Scores through engaging with others inside the site (questions, answers, social networks with influential researchers). Thus, RG Scores should not be mistaken for academic reputation indicators. △ Less

Submitted 9 May, 2017; originally announced May 2017.

Comments: 19 pages, 7 tables, 4 figures

arXiv:1702.03991 [pdf]

Google Scholar and the gray literature: A reply to Bonato's review

Authors: Enrique Orduna-Malea, Alberto Martin-Martin, Emilio Delgado Lopez-Cozar

Abstract: Recently, a review concluded that Google Scholar (GS) is not a suitable source of information "for identifying recent conference papers or other gray literature publications". The goal of this letter is to demonstrate that GS can be an effective tool to search and find gray literature, as long as appropriate search strategies are used. To do this, we took as examples the same two case studies used… ▽ More Recently, a review concluded that Google Scholar (GS) is not a suitable source of information "for identifying recent conference papers or other gray literature publications". The goal of this letter is to demonstrate that GS can be an effective tool to search and find gray literature, as long as appropriate search strategies are used. To do this, we took as examples the same two case studies used by the original review, describing first how GS processes original's search strategies, then proposing alternative search strategies, and finally generalizing each case study to compose a general search procedure aimed at finding gray literature in Google Scholar for two wide selected case studies: a) all contributions belonging to a congress (the ASCO Annual Meeting); and b) indexed guidelines as well as gray literature within medical institutions (National Institutes of Health) and governmental agencies (U.S. Department of Health & Human Services). The results confirm that original search strategies were undertrained offering misleading results and erroneous conclusions. Google Scholar lacks many of the advanced search features available in other bibliographic databases (such as Pubmed), however, it is one thing to have a friendly search experience, and quite another to find gray literature. We finally conclude that Google Scholar is a powerful tool for searching gray literature, as long as the users are familiar with all the possibilities it offers as a search engine. Poorly formulated searches will undoubtedly return misleading results. △ Less

Submitted 13 February, 2017; originally announced February 2017.

Comments: 17 pages, 14 figures, 1 table

Report number: EC3 Working Papers, 23

arXiv:1607.06260 [pdf]

doi 10.13140/RG.2.1.1496.4724

2016 Google Scholar Metrics released: a matter of languages... and something else

Authors: Alberto Martín-Martín, Juan Manuel Ayllón, Enrique Orduña-Malea, Emilio Delgado López-Cózar

Abstract: The 2016 edition of Google Scholar Metrics was released on July 15th 2016. There haven't been any structural changes respect to previous versions, which means that most of its limitations still persist. The biggest changes are the addition of five new language rankings (Russian, Korean, Polish, Ukrainian, and Indonesian) and elimination of two other language rankings (Italian and Dutch). In additi… ▽ More The 2016 edition of Google Scholar Metrics was released on July 15th 2016. There haven't been any structural changes respect to previous versions, which means that most of its limitations still persist. The biggest changes are the addition of five new language rankings (Russian, Korean, Polish, Ukrainian, and Indonesian) and elimination of two other language rankings (Italian and Dutch). In addition, for reasons still unknown, this new edition doesn't include as many working paper and discussion paper series as previous editions. △ Less

Submitted 21 July, 2016; originally announced July 2016.

Comments: EC3 Working Papers 22. arXiv admin note: substantial text overlap with arXiv:1407.2827

arXiv:1607.02861 [pdf]

doi 10.3989/redc.2016.4.1405

A two-sided academic landscape: portrait of highly-cited documents in Google Scholar (1950-2013)

Authors: Alberto Martin-Martin, Enrique Orduna-Malea, Juan M. Ayllon, Emilio Delgado Lopez-Cozar

Abstract: The main objective of this paper is to identify the set of highly-cited documents in Google Scholar and to define their core characteristics (document types, language, free availability, source providers, and number of versions), under the hypothesis that the wide coverage of this search engine may provide a different portrait about this document set respect to that offered by the traditional bibl… ▽ More The main objective of this paper is to identify the set of highly-cited documents in Google Scholar and to define their core characteristics (document types, language, free availability, source providers, and number of versions), under the hypothesis that the wide coverage of this search engine may provide a different portrait about this document set respect to that offered by the traditional bibliographic databases. To do this, a query per year was carried out from 1950 to 2013 identifying the top 1,000 documents retrieved from Google Scholar and obtaining a final sample of 64,000 documents, of which 40% provided a free full-text link. The results obtained show that the average highly-cited document is a journal article or a book (62% of the top 1% most cited documents of the sample), written in English (92.5% of all documents) and available online in PDF format (86.0% of all documents). Yet, the existence of errors especially when detecting duplicates and linking cites properly must be pointed out. The fact of managing with highly cited papers, however, minimizes the effects of these limitations. Given the high presence of books, and to a lesser extend of other document types (such as proceedings or reports), the research concludes that Google Scholar data offer an original and different vision of the most influential academic documents (measured from the perspective of their citation count), a set composed not only by strictly scientific material (journal articles) but academic in its broad sense △ Less

Submitted 11 July, 2016; originally announced July 2016.

Comments: 26 pages, 5 tables, 10 figures

Journal ref: Revista española de Documentación Científica, v. 39, n. 4, p. e149, dec. 2016. ISSN 1988-4621

arXiv:1606.05341 [pdf]

doi 10.13140/RG.2.1.4504.9681

Proceedings Scholar Metrics: H Index of proceedings on Computer Science, Electrical & Electronic Engineering, and Communications according to Google Scholar Metrics (2010-2014)

Authors: Alberto Martín-Martín, Juan Manuel Ayllón, Enrique Orduña-Malea, Emilio Delgado López-Cózar

Abstract: The objective of this report is to present a list of proceedings (conferences, workshops, symposia, meetings) in the areas of Computer Science, Electrical & Electronic Engineering, and Communications covered by Google Scholar Metrics and ranked according to their h-index. Google Scholar Metrics only displays publications that have published at least 100 papers and have received at least one citati… ▽ More The objective of this report is to present a list of proceedings (conferences, workshops, symposia, meetings) in the areas of Computer Science, Electrical & Electronic Engineering, and Communications covered by Google Scholar Metrics and ranked according to their h-index. Google Scholar Metrics only displays publications that have published at least 100 papers and have received at least one citation in the last five years (2010-2014). The searches were conducted between the 8th and 10th of December, 2015. A total of 1501 proceedings have been identified △ Less

Submitted 17 June, 2016; originally announced June 2016.

Comments: 36 pages, EC3 reports 15

arXiv:1603.09111 [pdf]

doi 10.1007/s11192-016-1917-2

Back to the past: on the shoulders of an academic search engine giant

Authors: Alberto Martin-Martin, Enrique Orduna-Malea, Juan M. Ayllon, Emilio Delgado Lopez-Cozar

Abstract: A study released by the Google Scholar team found an apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). To demonstrate this finding we conducted a complementary study using a different data source (Journal Citation Reports), metric (aggregate cited half-life), time spam (2003-2013), and set of categories (53 Social Science subject c… ▽ More A study released by the Google Scholar team found an apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). To demonstrate this finding we conducted a complementary study using a different data source (Journal Citation Reports), metric (aggregate cited half-life), time spam (2003-2013), and set of categories (53 Social Science subject categories and 167 Science subject categories). Although the results obtained confirm and reinforce the previous findings, the possible causes of this phenomenon keep unclear. We finally hypothesize that first page results syndrome in conjunction with the fact that Google Scholar favours the most cited documents are suggesting the growing trend of citing old documents is partly caused by Google Scholar. △ Less

Submitted 30 March, 2016; originally announced March 2016.

Comments: 12 pages, 2 tables

arXiv:1602.02412 [pdf]

The counting house: measuring those who count. Presence of Bibliometrics, Scientometrics, Informetrics, Webometrics and Altmetrics in the Google Scholar Citations, ResearcherID, ResearchGate, Mendeley & Twitter

Authors: Alberto Martin-Martin, Enrique Orduna-Malea, Juan M. Ayllon, Emilio Delgado Lopez-Cozar

Abstract: Following in the footsteps of the model of scientific communication, which has recently gone through a metamorphosis (from the Gutenberg galaxy to the Web galaxy), a change in the model and methods of scientific evaluation is also taking place. A set of new scientific tools are now providing a variety of indicators which measure all actions and interactions among scientists in the digital space, m… ▽ More Following in the footsteps of the model of scientific communication, which has recently gone through a metamorphosis (from the Gutenberg galaxy to the Web galaxy), a change in the model and methods of scientific evaluation is also taking place. A set of new scientific tools are now providing a variety of indicators which measure all actions and interactions among scientists in the digital space, making new aspects of scientific communication emerge. In this work we present a method for capturing the structure of an entire scientific community (the Bibliometrics, Scientometrics, Informetrics, Webometrics, and Altmetrics community) and the main agents that are part of it (scientists, documents, and sources) through the lens of Google Scholar Citations. Additionally, we compare these author portraits to the ones offered by other profile or social platforms currently used by academics (ResearcherID, ResearchGate, Mendeley, and Twitter), in order to test their degree of use, completeness, reliability, and the validity of the information they provide. A sample of 814 authors (researchers in Bibliometrics with a public profile created in Google Scholar Citations was subsequently searched in the other platforms, collecting the main indicators computed by each of them. The data collection was carried out on September, 2015. The Spearman correlation was applied to these indicators (a total of 31) , and a Principal Component Analysis was carried out in order to reveal the relationships among metrics and platforms as well as the possible existence of metric clusters △ Less

Submitted 7 February, 2016; originally announced February 2016.

Comments: 60 pages, 12 tables, 35 figures

Report number: EC3 Working Papers 21

arXiv:1509.04515 [pdf]

Improvements in Google Scholar Citations are for the summer: creating an institutional affiliation link feature

Authors: Enrique Orduna-Malea, Juan Manuel Ayllón, Alberto Martín-Martín, Emilio Delgado López-Cózar

Abstract: This report describes the feature introduced by Google to provide standardized access to institutional affiliations within Google Scholar Citations. First, this new tool is described, pointing out its main characteristics and functioning. Next, the coverage and precision of the tool are evaluated. Two special cases (Google Inc. and Spanish Universities) are briefly treated with the purpose of illu… ▽ More This report describes the feature introduced by Google to provide standardized access to institutional affiliations within Google Scholar Citations. First, this new tool is described, pointing out its main characteristics and functioning. Next, the coverage and precision of the tool are evaluated. Two special cases (Google Inc. and Spanish Universities) are briefly treated with the purpose of illustrating some aspects about the accuracy of the tool for the task of gathering authors within their appropriate institution. Finally, some inconsistencies, errors and malfunctioning are identified, categorized and described. The report finishes by providing some suggestions to improve the feature. The general conclusion is that the standardized institutional affiliation link provided by Google Scholar Citations, despite working pretty well for a large number of institutions (especially Anglo-Saxon universities) still has a number of shortcomings and pitfalls which need to be addressed in order to make this authority control tool fully useful worldwide, both for searching purposes and for metric tasks △ Less

Submitted 15 September, 2015; originally announced September 2015.

Comments: 20 pages, 21 figures

Report number: 14

arXiv:1506.03009 [pdf]

doi 10.1007/s11192-015-1614-6

Methods for estimating the size of Google Scholar

Authors: Enrique Orduna-Malea, Juan M. Ayllon, Alberto Martin-Martin, Emilio Delgado Lopez-Cozar

Abstract: The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its v… ▽ More The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160 to 165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities. △ Less

Submitted 9 June, 2015; originally announced June 2015.

Comments: 22 pages, 4 figures and 6 tables. arXiv admin note: text overlap with arXiv:1407.6239

arXiv:1501.02084 [pdf]

Reviving the past: the growth of citations to old documents

Authors: Alberto Martín-Martín, Enrique Orduña-Malea, Juan Manuel Ayllón, Emilio Delgado López-Cózar

Abstract: In this Digest we review a recent study released by the Google Scholar team on the apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). First, we describe the main findings of their article. Secondly, we conduct an analogue study, using a different data source as well as different measures which throw very similar results, thus confir… ▽ More In this Digest we review a recent study released by the Google Scholar team on the apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). First, we describe the main findings of their article. Secondly, we conduct an analogue study, using a different data source as well as different measures which throw very similar results, thus confirming the phenomenon. Lastly, we discuss the possible causes of this phenomenon. △ Less

Submitted 9 January, 2015; originally announced January 2015.

Comments: 13 pages, 4 tables, 3 figures

Report number: EC3 Google Scholar's Digest Reviews 04

arXiv:1412.7633 [pdf]

Proceedings Scholar Metrics: H Index of proceedings on Computer Science, Electrical & Electronic Engineering, and Communications according to Google Scholar Metrics (2009-2013)

Authors: Alberto Martin-Martin, Enrique Ordunna-Malea, Juan Manuel Ayllon, Emilio Delgado Lopez-Cozar

Abstract: The objective of this report is to present a list of proceedings (conferences, workshops, symposia, meetings) in the areas of Computer Science, Electrical & Electronic Engineering, and Communications covered by Google Scholar Metrics and ranked according to their h-index. Google Scholar Metrics only displays publications that have published at least 100 papers and have received at least one citati… ▽ More The objective of this report is to present a list of proceedings (conferences, workshops, symposia, meetings) in the areas of Computer Science, Electrical & Electronic Engineering, and Communications covered by Google Scholar Metrics and ranked according to their h-index. Google Scholar Metrics only displays publications that have published at least 100 papers and have received at least one citation in the last five years (2009-2013). The searches were conducted between the 15th and 22nd of December, 2014. A total of 1208 proceedings have been identified △ Less

Submitted 7 January, 2015; v1 submitted 24 December, 2014; originally announced December 2014.

Comments: 29 pages

Report number: EC3 Reports 12

arXiv:1410.8464 [pdf]

Does Google Scholar contain all highly cited documents (1950-2013)?

Authors: Alberto Martín-Martín, Enrique Orduña-Malea, Juan Manuel Ayllón, Emilio Delgado López-Cózar

Abstract: The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that… ▽ More The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below: 1. Which are the most cited documents in GS? 2. Which are the most cited document types in GS? 3. What languages are the most cited documents written in GS? 4. How many highly cited documents are freely accessible? 4.1 What file types are the most commonly used to store these highly cited documents? 4.2 Which are the main providers of these documents? 5. How many of the highly cited documents indexed by GS are also indexed by WoS? 6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS? 7. How many versions of these highly cited documents has GS detected? 8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received? 9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages? 10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received? △ Less

Submitted 25 March, 2015; v1 submitted 30 October, 2014; originally announced October 2014.

Comments: Full raw data available at: http://dx.doi.org/10.6084/m9.figshare.1224314

Report number: EC3's Working Papers 19

arXiv:1407.6239 [pdf]

About the size of Google Scholar: playing the numbers

Authors: Enrique Orduña-Malea, Juan Manuel Ayllón, Alberto Martín-Martín, Emilio Delgado López-Cózar

Abstract: The emergence of academic search engines (Google Scholar and Microsoft Academic Search essentially) has revived and increased the interest in the size of the academic web, since their aspiration is to index the entirety of current academic knowledge. The search engine functionality and human search patterns lead us to believe, sometimes, that what you see in the search engine's results page is all… ▽ More The emergence of academic search engines (Google Scholar and Microsoft Academic Search essentially) has revived and increased the interest in the size of the academic web, since their aspiration is to index the entirety of current academic knowledge. The search engine functionality and human search patterns lead us to believe, sometimes, that what you see in the search engine's results page is all that really exists. And, even when this is not true, we wonder which information is missing and why. The main objective of this working paper is to calculate the size of Google Scholar at present (May 2014). To do this, we present, apply and discuss up to 4 empirical methods: Khabsa & Giles's method, an estimate based on empirical data, and estimates based on direct queries and absurd queries. The results, despite providing disparate values, place the estimated size of Google Scholar in about 160 million documents. However, the fact that all methods show great inconsistencies, limitations and uncertainties, makes us wonder why Google does not simply provide this information to the scientific community if the company really knows this figure. △ Less

Submitted 5 September, 2014; v1 submitted 23 July, 2014; originally announced July 2014.

Comments: 42 pages, 18 figures, 8 tables, 3 appendixes

Report number: EC3 Working Papers 18

arXiv:1407.2827 [pdf]

Google Scholar Metrics 2014: a low cost bibliometric tool

Authors: Alberto Martín-Martín, Juan Manuel Ayllón, Enrique Orduña-Malea, Emilio Delgado López-Cózar

Abstract: We analyse the main features of the third edition of Google Scholar Metrics (GSM), released in June 2014, focusing on its more important changes, strengths, and weaknesses. Additionally, we present some figures that outline the dimensions of this new edition, and we compare them to those of previous editions. Principal among these figures are the number of visualized publications, publication type… ▽ More We analyse the main features of the third edition of Google Scholar Metrics (GSM), released in June 2014, focusing on its more important changes, strengths, and weaknesses. Additionally, we present some figures that outline the dimensions of this new edition, and we compare them to those of previous editions. Principal among these figures are the number of visualized publications, publication types, languages, and the maximum and minimum h5-index and h5-median values by language, subject area, and subcategory. This new edition is marked by continuity. There is nothing new other than the updating of the time frame (2009-2013) and the removal of some redundant subcategories (from 268 to 261) for English written publications. Google has just updated the data, which means that some of the errors discussed in previous studies still persist. To sum up, GSM is a minimalist information product with few features, closed (it cannot be customized by the user), and simple (navigating it only takes a few clicks). For these reasons, we consider it a 'low cost' bibliometric tool, and propose a list of features it should incorporate in order to stop being labeled as such. Notwithstanding the above, this product presents a stability in its bibliometric indicators that supports its ability to measure and track the impact of scientific publications. △ Less

Submitted 10 July, 2014; originally announced July 2014.

Comments: 37 pages, 10 tables, 2 figures, 5 appendixes

Report number: EC3 Working Paper 17

arXiv:1404.7045 [pdf]

doi 10.1108/OIR-07-2014-0169

Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search dead?

Authors: Enrique Orduna-Malea, Juan Manuel Ayllon, Alberto Martin-Martin, Emilio Delgado Lopez-Cozar

Abstract: The goal of this working paper is to summarize the main empirical evidences provided by the scientific community as regards the comparison between the two main citation based academic search engines: Google Scholar and Microsoft Academic Search, paying special attention to the following issues: coverage, correlations between journal rankings, and usage of these academic search engines. Additionall… ▽ More The goal of this working paper is to summarize the main empirical evidences provided by the scientific community as regards the comparison between the two main citation based academic search engines: Google Scholar and Microsoft Academic Search, paying special attention to the following issues: coverage, correlations between journal rankings, and usage of these academic search engines. Additionally, selfelaborated data is offered, which are intended to provide current evidence about the popularity of these tools on the Web, by measuring the number of rich files PDF, PPT and DOC in which these tools are mentioned, the amount of external links that both products receive, and the search queries frequency from Google Trends. The poor results obtained by MAS led us to an unexpected and unnoticed discovery: Microsoft Academic Search is outdated since 2013. Therefore, the second part of the working paper aims at advancing some data demonstrating this lack of update. For this purpose we gathered the number of total records indexed by Microsoft Academic Search since 2000. The data shows an abrupt drop in the number of documents indexed from 2,346,228 in 2010 to 8,147 in 2013 and 802 in 2014. This decrease is offered according to 15 thematic areas as well. In view of these problems it seems logical not only that Microsoft Academic Searchwas poorly used to search for articles by academics and students, who mostly use Google or Google Scholar, but virtually ignored by bibliometricians △ Less

Submitted 23 May, 2014; v1 submitted 28 April, 2014; originally announced April 2014.

Comments: 14 pages, 7 figures, 6 tables

Report number: EC3 16

Journal ref: Online Information Review, Vol. 38 Iss: 7, pp.936 - 953 (2014)

arXiv:1306.6584 [pdf]

doi 10.3989/redc.2014.1.1114

An introduction to the coverage of the Data Citation Index (Thomson-Reuters): disciplines, document types and repositories

Authors: Daniel Torres-Salinas, Alberto Martín-Martín, Enrique Fuente-Gutiérrez

Abstract: In the past years, the movement of data sharing has been enjoying great popularity. Within this context, Thomson Reuters launched at the end of 2012 a new product inside the Web of Knowledge family: the Data Citation Index. The aim of this tool is to enable discovery and access, from a single place, to data from a variety of data repositories from different subject areas and from around the world.… ▽ More In the past years, the movement of data sharing has been enjoying great popularity. Within this context, Thomson Reuters launched at the end of 2012 a new product inside the Web of Knowledge family: the Data Citation Index. The aim of this tool is to enable discovery and access, from a single place, to data from a variety of data repositories from different subject areas and from around the world. In this short note we present some preliminary results from the analysis of the Data Citation Index. Specifically, we address the following issues: discipline coverage, data types present in the database, and repositories that were included at the time of the study △ Less

Submitted 26 February, 2014; v1 submitted 27 June, 2013; originally announced June 2013.

Journal ref: Rev. Esp. Doc. Cient., 37(1), enero-marzo 2014, e036. ISSN-L: 0210-0614

Showing 1–29 of 29 results for author: Martín-Martín, A