-
Implicit and Explicit Research Quality Score Probabilities from ChatGPT
Authors:
Mike Thelwall,
Yunhan Yang
Abstract:
The large language model (LLM) ChatGPT's quality scores for journal articles correlate more strongly with human judgements than some citation-based indicators in most fields. Averaging multiple ChatGPT scores improves the results, apparently leveraging its internal probability model. To leverage these probabilities, this article tests two novel strategies: requesting percentage likelihoods for sco…
▽ More
The large language model (LLM) ChatGPT's quality scores for journal articles correlate more strongly with human judgements than some citation-based indicators in most fields. Averaging multiple ChatGPT scores improves the results, apparently leveraging its internal probability model. To leverage these probabilities, this article tests two novel strategies: requesting percentage likelihoods for scores and extracting the probabilities of alternative tokens in the responses. The probability estimates were then used to calculate weighted average scores. Both strategies were evaluated with five iterations of ChatGPT 4o-mini on 96,800 articles submitted to the UK Research Excellence Framework (REF) 2021, using departmental average REF2021 quality scores as a proxy for article quality. The data was analysed separately for each of the 34 field-based REF Units of Assessment. For the first strategy, explicit requests for tables of score percentage likelihoods substantially decreased the value of the scores (lower correlation with the proxy quality indicator). In contrast, weighed averages of score token probabilities slightly increased the correlation with the quality proxy indicator and these probabilities reasonably accurately reflected ChatGPT's outputs. The token probability approach is therefore the most accurate method for ranking articles by research quality as well as being cheaper than comparable ChatGPT strategies.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Research quality evaluation by AI in the era of Large Language Models: Advantages, disadvantages, and systemic effects
Authors:
Mike Thelwall
Abstract:
Artificial Intelligence (AI) technologies like ChatGPT now threaten bibliometrics as the primary generators of research quality indicators. They are already used in at least one research quality evaluation system and evidence suggests that they are used informally by many peer reviewers. Since using bibliometrics to support research evaluation continues to be controversial, this article reviews th…
▽ More
Artificial Intelligence (AI) technologies like ChatGPT now threaten bibliometrics as the primary generators of research quality indicators. They are already used in at least one research quality evaluation system and evidence suggests that they are used informally by many peer reviewers. Since using bibliometrics to support research evaluation continues to be controversial, this article reviews the corresponding advantages and disadvantages of AI-generated quality scores. From a technical perspective, generative AI based on Large Language Models (LLMs) equals or surpasses bibliometrics in most important dimensions, including accuracy (mostly higher correlations with human scores), and coverage (more fields, more recent years) and may reflect more research quality dimensions. Like bibliometrics, current LLMs do not "measure" research quality, however. On the clearly negative side, LLM biases are currently unknown for research evaluation, and LLM scores are less transparent than citation counts. From a systemic perspective, the key issue is how introducing LLM-based indicators into research evaluation will change the behaviour of researchers. Whilst bibliometrics encourage some authors to target journals with high impact factors or to try to write highly cited work, LLM-based indicators may push them towards writing misleading abstracts and overselling their work in the hope of impressing the AI. Moreover, if AI-generated journal indicators replace impact factors, then this would encourage journals to allow authors to oversell their work in abstracts, threatening the integrity of the academic record.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
In which fields do ChatGPT 4o scores align better than citations with research quality?
Authors:
Mike Thelwall
Abstract:
Although citation-based indicators are widely used for research evaluation, they are not useful for recently published research, reflect only one of the three common dimensions of research quality, and have little value in some social sciences, arts and humanities. Large Language Models (LLMs) have been shown to address some of these weaknesses, with ChatGPT 4o-mini showing the most promising resu…
▽ More
Although citation-based indicators are widely used for research evaluation, they are not useful for recently published research, reflect only one of the three common dimensions of research quality, and have little value in some social sciences, arts and humanities. Large Language Models (LLMs) have been shown to address some of these weaknesses, with ChatGPT 4o-mini showing the most promising results, although on incomplete data. This article reports by far the largest scale evaluation of ChatGPT 4o-mini yet, and also evaluates its larger sibling ChatGPT 4o. Based on comparisons between LLM scores, averaged over 5 repetitions, and departmental average quality scores for 107,212 UK-based refereed journal articles, ChatGPT 4o is marginally better than ChatGPT 4o-mini in most of the 34 field-based Units of Assessment (UoAs) tested, although combining both gives better results than either one. ChatGPT 4o scores have a positive correlation with research quality in 33 of the 34 UoAs, with the results being statistically significant in 31. ChatGPT 4o scores had a higher correlation with research quality than long term citation rates in 21 out of 34 UoAs and a higher correlation than short term citation rates in 26 out of 34 UoAs. The main limitation is that it is not clear whether ChatGPT leverages public information about departmental research quality to cheat with its scores. In summary, the results give the first large scale evidence that ChatGPT 4o is competitive with citations as a new research quality indicator, but ChatGPT 4o-mini, which is more cost-effective.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
Can news and social media attention reduce the influence of problematic research?
Authors:
Er-Te Zheng,
Hui-Zhen Fu,
Xiaorui Jiang,
Zhichao Fang,
Mike Thelwall
Abstract:
News and social media are widely used to disseminate science, but do they also help raise awareness of problems in research? This study investigates whether high levels of news and social media attention might accelerate the retraction process and increase the visibility of retracted articles. To explore this, we analyzed 15,642 news mentions, 6,588 blog mentions, and 404,082 X mentions related to…
▽ More
News and social media are widely used to disseminate science, but do they also help raise awareness of problems in research? This study investigates whether high levels of news and social media attention might accelerate the retraction process and increase the visibility of retracted articles. To explore this, we analyzed 15,642 news mentions, 6,588 blog mentions, and 404,082 X mentions related to 15,461 retracted articles. Articles receiving high levels of news and X mentions were retracted more quickly than non-mentioned articles in the same broad field and with comparable publication years, author impact, and journal impact. However, this effect was not statistically signicant for articles with high levels of blog mentions. Notably, articles frequently mentioned in the news experienced a significant increase in annual citation rates after their retraction, possibly because media exposure enhances the visibility of retracted articles, making them more likely to be cited. These findings suggest that increased public scrutiny can improve the efficiency of scientific self-correction, although mitigating the influence of retracted articles remains a gradual process.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
Is OpenAlex Suitable for Research Quality Evaluation and Which Citation Indicator is Best?
Authors:
Mike Thelwall,
Xiaorui Jiang
Abstract:
This article compares (1) citation analysis with OpenAlex and Scopus, testing their citation counts, document type/coverage and subject classifications and (2) three citation-based indicators: raw counts, (field and year) Normalised Citation Scores (NCS) and Normalised Log-transformed Citation Scores (NLCS). Methods (1&2): The indicators calculated from 28.6 million articles were compared through…
▽ More
This article compares (1) citation analysis with OpenAlex and Scopus, testing their citation counts, document type/coverage and subject classifications and (2) three citation-based indicators: raw counts, (field and year) Normalised Citation Scores (NCS) and Normalised Log-transformed Citation Scores (NLCS). Methods (1&2): The indicators calculated from 28.6 million articles were compared through 8,704 correlations on two gold standards for 97,816 UK Research Excellence Framework (REF) 2021 articles. The primary gold standard is ChatGPT scores, and the secondary is the average REF2021 expert review score for the department submitting the article. Results: (1) OpenAlex provides better citation counts than Scopus and its inclusive document classification/scope does not seem to cause substantial field normalisation problems. The broadest OpenAlex classification scheme provides the best indicators. (2) Counterintuitively, raw citation counts are at least as good as nearly all field normalised indicators, and better for single years, and NCS is better than NLCS. (1&2) There are substantial field differences. Thus, (1) OpenAlex is suitable for citation analysis in most fields and (2) the major citation-based indicators seem to work counterintuitively compared to quality judgements. Field normalisation seems ineffective because more cited fields tend to produce higher quality work, affecting interdisciplinary research or within-field topic differences.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Estimating the quality of academic books from their descriptions with ChatGPT
Authors:
Mike Thelwall,
Andrew Cox
Abstract:
Although indicators based on scholarly citations are widely used to support the evaluation of academic journals, alternatives are needed for scholarly book acquisitions. This article assesses the value of research quality scores from ChatGPT 4o-mini for 9,830 social sciences, arts, and humanities books from 2019 indexed in Scopus, based on their titles and descriptions but not their full texts. Al…
▽ More
Although indicators based on scholarly citations are widely used to support the evaluation of academic journals, alternatives are needed for scholarly book acquisitions. This article assesses the value of research quality scores from ChatGPT 4o-mini for 9,830 social sciences, arts, and humanities books from 2019 indexed in Scopus, based on their titles and descriptions but not their full texts. Although most books scored the same (3* on a 1* to 4* scale), the citation rates correlate positively but weakly with ChatGPT 4o-mini research quality scores in both the social sciences and the arts and humanities. Part of the reason for the differences was the inclusion of textbooks, short books, and edited collections, all of which tended to be less cited and lower scoring. Some topics also tend to attract many/few citations and/or high/low ChatGPT scores. Descriptions explicitly mentioning theory and/or some methods also associated with higher scores and more citations. Overall, the results provide some evidence that both ChatGPT scores and citation counts are weak indicators of the research quality of books. Whilst not strong enough to support individual book quality judgements, they may help academic librarians seeking to evaluate new book collections, series, or publishers for potential acquisition.
△ Less
Submitted 12 February, 2025;
originally announced February 2025.
-
Journal Quality Factors from ChatGPT: More meaningful than Impact Factors?
Authors:
Mike Thelwall,
Kayvan Kousha
Abstract:
Purpose: Journal Impact Factors and other citation-based indicators are widely used and abused to help select journals to publish in or to estimate the value of a published article. Nevertheless, citation rates primarily reflect scholarly impact rather than other quality dimensions, including societal impact, originality, and rigour. In contrast, Journal Quality Factors (JQFs) are average quality…
▽ More
Purpose: Journal Impact Factors and other citation-based indicators are widely used and abused to help select journals to publish in or to estimate the value of a published article. Nevertheless, citation rates primarily reflect scholarly impact rather than other quality dimensions, including societal impact, originality, and rigour. In contrast, Journal Quality Factors (JQFs) are average quality score estimates given to a journal's articles by ChatGPT. Design: JQFs were compared with Polish, Norwegian and Finnish journal ranks and with journal citation rates for 1,300 journals with 130,000 articles from 2021 in large monodisciplinary journals in the 25 out of 27 Scopus broad fields of research for which it was possible. Outliers were also examined. Findings: JQFs correlated positively and mostly strongly (median correlation: 0.641) with journal ranks in 24 out of the 25 broad fields examined, indicating a nearly science-wide ability for ChatGPT to estimate journal quality. Journal citation rates had similarly high correlations with national journal ranks, however, so JQFs are not a universally better indicator. An examination of journals with JQFs not matching their journal ranks suggested that abstract styles may affect the result, such as whether the societal contexts of research are mentioned. Limitations: Different journal rankings may have given different findings because there is no agreed meaning for journal quality. Implications: The results suggest that JQFs are plausible as journal quality indicators in all fields and may be useful for the (few) research and evaluation contexts where journal quality is an acceptable proxy for article quality, and especially for fields like mathematics for which citations are not strong indicators of quality. Originality: This is the first attempt to estimate academic journal value with a Large Language Model.
△ Less
Submitted 6 December, 2024; v1 submitted 15 November, 2024;
originally announced November 2024.
-
Research evaluation with ChatGPT: Is it age, country, length, or field biased?
Authors:
Mike Thelwall,
Zeyneb Kurt
Abstract:
Some research now suggests that ChatGPT can estimate the quality of journal articles from their titles and abstracts. This has created the possibility to use ChatGPT quality scores, perhaps alongside citation-based formulae, to support peer review for research evaluation. Nevertheless, ChatGPT's internal processes are effectively opaque, despite it writing a report to support its scores, and its b…
▽ More
Some research now suggests that ChatGPT can estimate the quality of journal articles from their titles and abstracts. This has created the possibility to use ChatGPT quality scores, perhaps alongside citation-based formulae, to support peer review for research evaluation. Nevertheless, ChatGPT's internal processes are effectively opaque, despite it writing a report to support its scores, and its biases are unknown. This article investigates whether publication date and field are biasing factors. Based on submitting a monodisciplinary journal-balanced set of 117,650 articles from 26 fields published in the years 2003, 2008, 2013, 2018 and 2023 to ChatGPT 4o-mini, the results show that average scores increased over time, and this was not due to author nationality or title and abstract length changes. The results also varied substantially between fields, and first author countries. In addition, articles with longer abstracts tended to receive higher scores, but plausibly due to such articles tending to be better rather than due to ChatGPT analysing more text. Thus, for the most accurate research quality evaluation results from ChatGPT, it is important to normalise ChatGPT scores for field and year and check for anomalies caused by sets of articles with short abstracts.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Evaluating the Predictive Capacity of ChatGPT for Academic Peer Review Outcomes Across Multiple Platforms
Authors:
Mike Thelwall,
Abdullah Yaghi
Abstract:
While previous studies have demonstrated that Large Language Models (LLMs) can predict peer review outcomes to some extent, this paper builds on that by introducing two new contexts and employing a more robust method - averaging multiple ChatGPT scores. The findings that averaging 30 ChatGPT predictions, based on reviewer guidelines and using only the submitted titles and abstracts, failed to pred…
▽ More
While previous studies have demonstrated that Large Language Models (LLMs) can predict peer review outcomes to some extent, this paper builds on that by introducing two new contexts and employing a more robust method - averaging multiple ChatGPT scores. The findings that averaging 30 ChatGPT predictions, based on reviewer guidelines and using only the submitted titles and abstracts, failed to predict peer review outcomes for F1000Research (Spearman's rho=0.00). However, it produced mostly weak positive correlations with the quality dimensions of SciPost Physics (rho=0.25 for validity, rho=0.25 for originality, rho=0.20 for significance, and rho = 0.08 for clarity) and a moderate positive correlation for papers from the International Conference on Learning Representations (ICLR) (rho=0.38). Including the full text of articles significantly increased the correlation for ICLR (rho=0.46) and slightly improved it for F1000Research (rho=0.09), while it had variable effects on the four quality dimension correlations for SciPost LaTeX files. The use of chain-of-thought system prompts slightly increased the correlation for F1000Research (rho=0.10), marginally reduced it for ICLR (rho=0.37), and further decreased it for SciPost Physics (rho=0.16 for validity, rho=0.18 for originality, rho=0.18 for significance, and rho=0.05 for clarity). Overall, the results suggest that in some contexts, ChatGPT can produce weak pre-publication quality assessments. However, the effectiveness of these assessments and the optimal strategies for employing them vary considerably across different platforms, journals, and conferences. Additionally, the most suitable inputs for ChatGPT appear to differ depending on the platform.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Evaluating the quality of published medical research with ChatGPT
Authors:
Mike Thelwall,
Xiaorui Jiang,
Peter A. Bath
Abstract:
Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores pro…
▽ More
Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (r=0.134, n=9872) with departmental mean REF scores, against a theoretical maximum correlation of r=0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (r=0.395, n=31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their REF score (r=0.495) but negatively with their citation rate (r=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.
△ Less
Submitted 3 March, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Assessing the societal influence of academic research with ChatGPT: Impact case study evaluations
Authors:
Kayvan Kousha,
Mike Thelwall
Abstract:
Academics and departments are sometimes judged by how their research has benefitted society. For example, the UK Research Excellence Framework (REF) assesses Impact Case Studies (ICS), which are five-page evidence-based claims of societal impacts. This study investigates whether ChatGPT can evaluate societal impact claims and therefore potentially support expert human assessors. For this, various…
▽ More
Academics and departments are sometimes judged by how their research has benefitted society. For example, the UK Research Excellence Framework (REF) assesses Impact Case Studies (ICS), which are five-page evidence-based claims of societal impacts. This study investigates whether ChatGPT can evaluate societal impact claims and therefore potentially support expert human assessors. For this, various parts of 6,220 public ICS from REF2021 were fed to ChatGPT 4o-mini along with the REF2021 evaluation guidelines, comparing the results with published departmental average ICS scores. The results suggest that the optimal strategy for high correlations with expert scores is to input the title and summary of an ICS but not the remaining text, and to modify the original REF guidelines to encourage a stricter evaluation. The scores generated by this approach correlated positively with departmental average scores in all 34 Units of Assessment (UoAs), with values between 0.18 (Economics and Econometrics) and 0.56 (Psychology, Psychiatry and Neuroscience). At the departmental level, the corresponding correlations were higher, reaching 0.71 for Sport and Exercise Sciences, Leisure and Tourism. Thus, ChatGPT-based ICS evaluations are simple and viable to support or cross-check expert judgments, although their value varies substantially between fields.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results
Authors:
Mike Thelwall,
Abdallah Yaghi
Abstract:
Time spent by academics on research quality assessment might be reduced if automated approaches can help. Whilst citation-based indicators have been extensively developed and evaluated for this, they have substantial limitations and Large Language Models (LLMs) like ChatGPT provide an alternative approach. This article assesses whether ChatGPT 4o-mini can be used to estimate the quality of journal…
▽ More
Time spent by academics on research quality assessment might be reduced if automated approaches can help. Whilst citation-based indicators have been extensively developed and evaluated for this, they have substantial limitations and Large Language Models (LLMs) like ChatGPT provide an alternative approach. This article assesses whether ChatGPT 4o-mini can be used to estimate the quality of journal articles across academia. It samples up to 200 articles from all 34 Units of Assessment (UoAs) in the UK's Research Excellence Framework (REF) 2021, comparing ChatGPT scores with departmental average scores. There was an almost universally positive Spearman correlation between ChatGPT scores and departmental averages, varying between 0.08 (Philosophy) and 0.78 (Psychology, Psychiatry and Neuroscience), except for Clinical Medicine (rho=-0.12). Although other explanations are possible, especially because REF score profiles are public, the results suggest that LLMs can provide reasonable research quality estimates in most areas of science, and particularly the physical and health sciences and engineering, even before citation data is available. Nevertheless, ChatGPT assessments seem to be more positive for most health and physical sciences than for other fields, a concern for multidisciplinary assessments, and the ChatGPT scores are only based on titles and abstracts, so cannot be research evaluations.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT's Effectiveness with Different Settings and Inputs
Authors:
Mike Thelwall
Abstract:
Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises, appointments and promotion. It is therefore important to investigate whether Large Language Models (LLMs) can play a role in this process. This article assesses which ChatGPT inputs (full text without tables, figures and references; title and abstract; title only) p…
▽ More
Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises, appointments and promotion. It is therefore important to investigate whether Large Language Models (LLMs) can play a role in this process. This article assesses which ChatGPT inputs (full text without tables, figures and references; title and abstract; title only) produce better quality score estimates, and the extent to which scores are affected by ChatGPT models and system prompts. The results show that the optimal input is the article title and abstract, with average ChatGPT scores based on these (30 iterations on a dataset of 51 papers) correlating at 0.67 with human scores, the highest ever reported. ChatGPT 4o is slightly better than 3.5-turbo (0.66), and 4o-mini (0.66). The results suggest that article full texts might confuse LLM research quality evaluations, even though complex system instructions for the task are more effective than simple ones. Thus, whilst abstracts contain insufficient information for a thorough assessment of rigour, they may contain strong pointers about originality and significance. Finally, linear regression can be used to convert the model scores into the human scale scores, which is 31% more accurate than guessing.
△ Less
Submitted 29 November, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Quantitative Methods in Research Evaluation Citation Indicators, Altmetrics, and Artificial Intelligence
Authors:
Mike Thelwall
Abstract:
This book critically analyses the value of citation data, altmetrics, and artificial intelligence to support the research evaluation of articles, scholars, departments, universities, countries, and funders. It introduces and discusses indicators that can support research evaluation and analyses their strengths and weaknesses as well as the generic strengths and weaknesses of the use of indicators…
▽ More
This book critically analyses the value of citation data, altmetrics, and artificial intelligence to support the research evaluation of articles, scholars, departments, universities, countries, and funders. It introduces and discusses indicators that can support research evaluation and analyses their strengths and weaknesses as well as the generic strengths and weaknesses of the use of indicators for research assessment. The book includes evidence of the comparative value of citations and altmetrics in all broad academic fields primarily through comparisons against article level human expert judgements from the UK Research Excellence Framework 2021. It also discusses the potential applications of traditional artificial intelligence and large language models for research evaluation, with large scale evidence for the former. The book concludes that citation data can be informative and helpful in some research fields for some research evaluation purposes but that indicators are never accurate enough to be described as research quality measures. It also argues that AI may be helpful in limited circumstances for some types of research evaluation.
△ Less
Submitted 10 April, 2025; v1 submitted 28 June, 2024;
originally announced July 2024.
-
Can tweets predict article retractions? A comparison between human and LLM labelling
Authors:
Er-Te Zheng,
Hui-Zhen Fu,
Mike Thelwall,
Zhichao Fang
Abstract:
Quickly detecting problematic research articles is crucial to safeguarding the integrity of scientific research. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to their retraction, potentially serving as an early warning system for scholars. To investigate this, we analysed a dataset of 4,354 Twitter mentions associated with…
▽ More
Quickly detecting problematic research articles is crucial to safeguarding the integrity of scientific research. This study explores whether Twitter mentions of retracted articles can signal potential problems with the articles prior to their retraction, potentially serving as an early warning system for scholars. To investigate this, we analysed a dataset of 4,354 Twitter mentions associated with 504 retracted articles. The effectiveness of Twitter mentions in predicting article retractions was evaluated by both manual and Large Language Model (LLM) labelling. Manual labelling results indicated that 25.7% of tweets signalled problems before retraction. Using the manual labelling results as the baseline, we found that LLMs (GPT-4o-mini, Gemini 1.5 Flash, and Claude-3.5-Haiku) outperformed lexicon-based sentiment analysis tools (e.g., TextBlob) in detecting potential problems, suggesting that automatic detection of problematic articles from social media using LLMs is technically feasible. Nevertheless, since only a small proportion of retracted articles (11.1%) were criticised on Twitter prior to retraction, such automatic systems would detect only a minority of problematic articles. Overall, this study offers insights into how social media data, coupled with emerging generative AI techniques, can support research integrity.
△ Less
Submitted 9 December, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Can ChatGPT evaluate research quality?
Authors:
Mike Thelwall
Abstract:
Purpose: Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task. Design/methodology/approach: Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. T…
▽ More
Purpose: Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task. Design/methodology/approach: Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. This was applied to 51 of my own articles and compared against my own quality judgements. Findings: ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria. Its overall scores have weak correlations with my self-evaluation scores of the same documents (averaging r=0.281 over 15 iterations, with 8 being statistically significantly different from 0). In contrast, the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509. Thus, averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores. The positive correlation may be due to ChatGPT being able to extract the author's significance, rigour, and originality claims from inside each paper. If my weakest articles are removed, then the correlation with average scores (r=0.200) falls below statistical significance, suggesting that ChatGPT struggles to make fine-grained evaluations. Research limitations: The data is self-evaluations of a convenience sample of articles from one academic in one field. Practical implications: Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks. Research evaluators, including journal editors, should therefore take steps to control its use. Originality/value: This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Are Scopus journal field classifications ever misleading?
Authors:
Mike Thelwall,
Stephen Pinfield
Abstract:
Journal field classifications in Scopus are used for citation-based indicators and by authors choosing appropriate journals to submit to. Whilst prior research has found that Scopus categories are occasionally misleading, it is not known how this varies for different journal types. In response, we assessed whether specialist, cross-field and general academic journals sometimes have publication pra…
▽ More
Journal field classifications in Scopus are used for citation-based indicators and by authors choosing appropriate journals to submit to. Whilst prior research has found that Scopus categories are occasionally misleading, it is not known how this varies for different journal types. In response, we assessed whether specialist, cross-field and general academic journals sometimes have publication practices that do not match their Scopus classifications. For this, we compared the Scopus narrow fields of journals with the fields that best fit their articles' titles and abstracts. We also conducted qualitative follow-up to distinguish between Scopus classification errors and misleading journal aims. The results show sharp field differences in the extent to which both cross-field and apparently specialist journals publish articles that match their Scopus narrow fields, and the same for general journals. The results also suggest that a few journals have titles and aims that do not match their contents well, and that some large topics spread themselves across many relevant disciplines. Thus, the likelihood that a journal's Scopus narrow fields reflect its contents varies substantially by field (although without systematic field trends) and some cross-field topics seem to cause difficulties in appropriately classifying relevant journals. These issues undermine citation-based indicators that rely on journal-level classification and may confuse scholars seeking publishing venues.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Can REF output quality scores be assigned by AI? Experimental evidence
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
This document describes strategies for using Artificial Intelligence (AI) to predict some journal article scores in future research assessment exercises. Five strategies have been assessed.
This document describes strategies for using Artificial Intelligence (AI) to predict some journal article scores in future research assessment exercises. Five strategies have been assessed.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Do bibliometrics introduce gender, institutional or interdisciplinary biases into research evaluations?
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
Systematic evaluations of publicly funded research typically employ a combination of bibliometrics and peer review, but it is not known whether the bibliometric component introduces biases. This article compares three alternative mechanisms for scoring 73,612 UK Research Excellence Framework (REF) journal articles from all 34 field-based Units of Assessment (UoAs) 2014-17: peer review, field norma…
▽ More
Systematic evaluations of publicly funded research typically employ a combination of bibliometrics and peer review, but it is not known whether the bibliometric component introduces biases. This article compares three alternative mechanisms for scoring 73,612 UK Research Excellence Framework (REF) journal articles from all 34 field-based Units of Assessment (UoAs) 2014-17: peer review, field normalised citations, and journal average field normalised citation impact. All three were standardised into a four-point scale. The results suggest that in almost all academic fields, bibliometric scoring can disadvantage departments publishing high quality research, with the main exception of article citation rates in chemistry. Thus, introducing journal or article level citation information into peer review exercises may have a regression to the mean effect. Bibliometric scoring slightly advantaged women compared to men, but this varied between UoAs and was most evident in the physical sciences, engineering, and social sciences. In contrast, interdisciplinary research gained from bibliometric scoring in about half of the UoAs, but relatively substantially in two. In conclusion, out of the three potential source of bias examined, the most serious seems to be the tendency for bibliometric scores to work against high quality departments, assuming that the peer review scores are correct. This is almost a paradox: although high quality departments tend to get the highest bibliometric scores, bibliometrics conceal the full extent of departmental quality advantages. This should be considered when using bibliometrics or bibliometric informed peer review.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Do altmetric scores reflect article quality? Evidence from the UK Research Excellence Framework 2021
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
Altmetrics are web-based quantitative impact or attention indicators for academic articles that have been proposed to supplement citation counts. This article reports the first assessment of the extent to which mature altmetrics from Altmetric.com and Mendeley associate with journal article quality. It exploits expert norm-referenced peer review scores from the UK Research Excellence Framework 202…
▽ More
Altmetrics are web-based quantitative impact or attention indicators for academic articles that have been proposed to supplement citation counts. This article reports the first assessment of the extent to which mature altmetrics from Altmetric.com and Mendeley associate with journal article quality. It exploits expert norm-referenced peer review scores from the UK Research Excellence Framework 2021 for 67,030+ journal articles in all fields 2014-17/18, split into 34 Units of Assessment (UoAs). The results show that altmetrics are better indicators of research quality than previously thought, although not as good as raw and field normalised Scopus citation counts. Surprisingly, field normalising citation counts can reduce their strength as a quality indicator for articles in a single field. For most UoAs, Mendeley reader counts are the best, tweet counts are also a relatively strong indicator in many fields, and Facebook, blogs and news citations are moderately strong indicators in some UoAs, at least in the UK. In general, altmetrics are the strongest indicators of research quality in the health and physical sciences and weakest in the arts and humanities. The Altmetric Attention Score, although hybrid, is almost as good as Mendeley reader counts as a quality indicator and reflects more non-scholarly impacts.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Artificial intelligence technologies to support research assessment: A review
Authors:
Kayvan Kousha,
Mike Thelwall
Abstract:
This literature review identifies indicators that associate with higher impact or higher quality research from article text (e.g., titles, abstracts, lengths, cited references and readability) or metadata (e.g., the number of authors, international or domestic collaborations, journal impact factors and authors' h-index). This includes studies that used machine learning techniques to predict citati…
▽ More
This literature review identifies indicators that associate with higher impact or higher quality research from article text (e.g., titles, abstracts, lengths, cited references and readability) or metadata (e.g., the number of authors, international or domestic collaborations, journal impact factors and authors' h-index). This includes studies that used machine learning techniques to predict citation counts or quality scores for journal articles or conference papers. The literature review also includes evidence about the strength of association between bibliometric indicators and quality score rankings from previous UK Research Assessment Exercises (RAEs) and REFs in different subjects and years and similar evidence from other countries (e.g., Australia and Italy). In support of this, the document also surveys studies that used public datasets of citations, social media indictors or open review texts (e.g., Dimensions, OpenCitations, Altmetric.com and Publons) to help predict the scholarly impact of articles. The results of this part of the literature review were used to inform the experiments using machine learning to predict REF journal article quality scores, as reported in the AI experiments report for this project. The literature review also covers technology to automate editorial processes, to provide quality control for papers and reviewers' suggestions, to match reviewers with articles, and to automatically categorise journal articles into fields. Bias and transparency in technology assisted assessment are also discussed.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Is big team research fair in national research assessments? The case of the UK Research Excellence Framework 2021
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
Collaborative research causes problems for research assessments because of the difficulty in fairly crediting its authors. Whilst splitting the rewards for an article amongst its authors has the greatest surface-level fairness, many important evaluations assign full credit to each author, irrespective of team size. The underlying rationales for this are labour reduction and the need to incentivise…
▽ More
Collaborative research causes problems for research assessments because of the difficulty in fairly crediting its authors. Whilst splitting the rewards for an article amongst its authors has the greatest surface-level fairness, many important evaluations assign full credit to each author, irrespective of team size. The underlying rationales for this are labour reduction and the need to incentivise collaborative work because it is necessary to solve many important societal problems. This article assesses whether full counting changes results compared to fractional counting in the case of the UK's Research Excellence Framework (REF) 2021. For this assessment, fractional counting reduces the number of journal articles to as little as 10% of the full counting value, depending on the Unit of Assessment (UoA). Despite this large difference, allocating an overall grade point average (GPA) based on full counting or fractional counting give results with a median Pearson correlation within UoAs of 0.98. The largest changes are for Archaeology (r=0.84) and Physics (r=0.88). There is a weak tendency for higher scoring institutions to lose from fractional counting, with the loss being statistically significant in 5 of the 34 UoAs. Thus, whilst the apparent over-weighting of contributions to collaboratively authored outputs does not seem too problematic from a fairness perspective overall, it may be worth examining in the few UoAs in which it makes the most difference.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Why are co-authored academic articles more cited: Higher quality or larger audience?
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
Co-authored articles tend to be more cited in many academic fields, but is this because they tend to be higher quality or is it an audience effect: increased awareness through multiple author networks? We address this unknown with the largest investigation yet into whether author numbers associate with research quality, using expert peer quality judgements for 122,331 non-review journal articles s…
▽ More
Co-authored articles tend to be more cited in many academic fields, but is this because they tend to be higher quality or is it an audience effect: increased awareness through multiple author networks? We address this unknown with the largest investigation yet into whether author numbers associate with research quality, using expert peer quality judgements for 122,331 non-review journal articles submitted by UK academics for the 2014-20 national assessment process. Spearman correlations between the number of authors and the quality scores show moderately strong positive associations (0.2-0.4) in the health, life, and physical sciences, but weak or no positive associations in engineering, and social sciences. In contrast, we found little or no association in the arts and humanities, and a possible negative association for decision sciences. This gives reasonably conclusive evidence that greater numbers of authors associates with higher quality journal articles in the majority of academia outside the arts and humanities, at least for the UK. Positive associations between team size and citation counts in areas with little association between team size and quality also show that audience effects or other non-quality factors account for the higher citation rates of co-authored articles in some fields.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Terms in journal articles associating with high quality: Can qualitative research be world-leading?
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
Purpose: Scholars often aim to conduct high quality research and their success is judged primarily by peer reviewers. Research quality is difficult for either group to identify, however, and misunderstandings can reduce the efficiency of the scientific enterprise. In response, we use a novel term association strategy to seek quantitative evidence of aspects of research that associate with high or…
▽ More
Purpose: Scholars often aim to conduct high quality research and their success is judged primarily by peer reviewers. Research quality is difficult for either group to identify, however, and misunderstandings can reduce the efficiency of the scientific enterprise. In response, we use a novel term association strategy to seek quantitative evidence of aspects of research that associate with high or low quality. Design/methodology/approach: We extracted the words and 2-5-word phrases most strongly associating with different quality scores in each of 34 Units of Assessment (UoAs) in the Research Excellence Framework (REF) 2021. We extracted the terms from 122,331 journal articles 2014-2020 with individual REF2021 quality scores. Findings: The terms associating with high- or low-quality scores vary between fields but relate to writing styles, methods, and topics. We show that the first-person writing style strongly associates with higher quality research in many areas because it is the norm for a set of large prestigious journals. We found methods and topics that associate with both high- and low-quality scores. Worryingly, terms associating with educational and qualitative research attract lower quality scores in multiple areas. REF experts may rarely give high scores to qualitative or educational research because the authors tend to be less competent, because it is harder to make world leading research with these themes, or because they do not value them. Originality: This is the first investigation of journal article terms associating with research quality.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
In which fields do higher impact journals publish higher quality articles?
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
The Journal Impact Factor and other indicators that assess the average citation rate of articles in a journal are consulted by many academics and research evaluators, despite initiatives against overreliance on them. Despite this, there is limited evidence about the extent to which journal impact indicators in any field relates to human judgements about the journals or their articles. In response,…
▽ More
The Journal Impact Factor and other indicators that assess the average citation rate of articles in a journal are consulted by many academics and research evaluators, despite initiatives against overreliance on them. Despite this, there is limited evidence about the extent to which journal impact indicators in any field relates to human judgements about the journals or their articles. In response, we compared average citation rates of journals against expert judgements of their articles in all fields of science. We used preliminary quality scores for 96,031 articles published 2014-18 from the UK Research Excellence Framework (REF) 2021. We show that whilst there is a positive correlation between expert judgements of article quality and average journal impact in all fields of science, it is very weak in many fields and is never strong. The strength of the correlation varies from 0.11 to 0.43 for the 27 broad fields of Scopus. The highest correlation for the 94 Scopus narrow fields with at least 750 articles was only 0.54, for Infectious Diseases, and there was only one negative correlation, for the mixed category Computer Science (all). The results suggest that the average citation impact of a Scopus-indexed journal is never completely irrelevant to the quality of an article, even though it is never a strong indicator of article quality.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Is Research Funding Always Beneficial? A Cross-Disciplinary Analysis of UK Research 2014-20
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Cristina Font-Julián,
Paul Wilson,
Jonathan Levitt
Abstract:
The search for and management of external funding now occupies much valuable researcher time. Whilst funding is essential for some types of research and beneficial for others, it may also constrain academic choice and creativity. Thus, it is important to assess whether it is ever detrimental or unnecessary. Here we investigate whether funded research tends to be higher quality in all fields and fo…
▽ More
The search for and management of external funding now occupies much valuable researcher time. Whilst funding is essential for some types of research and beneficial for others, it may also constrain academic choice and creativity. Thus, it is important to assess whether it is ever detrimental or unnecessary. Here we investigate whether funded research tends to be higher quality in all fields and for all major research funders. Based on peer review quality scores for 113,877 articles from all fields in the UK's Research Excellence Framework (REF) 2021, we estimate that there are substantial disciplinary differences in the proportion of funded journal articles, from Theology and Religious Studies (16%+) to Biological Sciences (91%+). The results suggest that funded research is likely to be higher quality overall, for all the largest research funders, and for all fields, even after factoring out research team size. There are differences between funders in the average quality of the research they support, however. Funding seems particularly beneficial in health-related fields. The results do not show cause and effect and do not take into account the amount of funding received but are consistent with funding either improving research quality or being won by high quality researchers or projects. In summary, there are no broad fields of research in which funding is irrelevant, so no fields can afford to ignore it. The results also show that citations are not effective proxies for research quality in the arts and humanities and most social sciences for evaluating research funding.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Which international co-authorships produce higher quality journal articles?
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
International collaboration is sometimes encouraged in the belief that it generates higher quality research or is more capable of addressing societal problems. Nevertheless, while there is evidence that the journal articles of international teams tend to be more cited than average, perhaps from increased international audiences, there is no science-wide direct academic evidence of a connection bet…
▽ More
International collaboration is sometimes encouraged in the belief that it generates higher quality research or is more capable of addressing societal problems. Nevertheless, while there is evidence that the journal articles of international teams tend to be more cited than average, perhaps from increased international audiences, there is no science-wide direct academic evidence of a connection between international collaboration and research quality. This article empirically investigates the connection between international collaboration and research quality for the first time, with 148,977 UK-based journal articles with post publication expert review scores from the 2021 Research Excellence Framework (REF). Using an ordinal regression model controlling for collaboration, international partners increased the odds of higher quality scores in 27 out of 34 Units of Assessment (UoAs) and all Main Panels. The results therefore give the first large scale evidence of the fields in which international co-authorship for articles is usually apparently beneficial. At the country level, the results suggests that UK collaboration with other high research-expenditure economies generates higher quality research, even when the countries produce lower citation impact journal articles than the United Kingdom. Worryingly, collaborations with lower research-expenditure economies tend to be judged lower quality, possibly through misunderstanding Global South research goals.
△ Less
Submitted 18 March, 2024; v1 submitted 11 December, 2022;
originally announced December 2022.
-
In which fields are citations indicators of research quality?
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt
Abstract:
Citation counts are widely used as indicators of research quality to support or replace human peer review and for lists of top cited papers, researchers, and institutions. Nevertheless, the relationship between citations and research quality is poorly evidenced. We report the first large-scale science-wide academic evaluation of the relationship between research quality and citations (field normal…
▽ More
Citation counts are widely used as indicators of research quality to support or replace human peer review and for lists of top cited papers, researchers, and institutions. Nevertheless, the relationship between citations and research quality is poorly evidenced. We report the first large-scale science-wide academic evaluation of the relationship between research quality and citations (field normalised citation counts), correlating them for 87,739 journal articles in 34 field-based UK Units of Assessment (UoAs). The two correlate positively in all academic fields, from very weak (0.1) to strong (0.5), reflecting broadly linear relationships in all fields. We give the first evidence that the correlations are positive even across the arts and humanities. The patterns are similar for the field classification schemes of Scopus and Dimensions.ai, although varying for some individual subjects and therefore more uncertain for these. We also show for the first time that no field has a citation threshold beyond which all articles are excellent quality, so lists of top cited articles are not pure collections of excellence, and neither is any top citation percentile indicator. Thus, whilst appropriately field normalised citations associate positively with research quality in all fields, they never perfectly reflect it, even at high values.
△ Less
Submitted 24 April, 2023; v1 submitted 11 December, 2022;
originally announced December 2022.
-
Predicting article quality scores with machine learning: The UK Research Excellence Framework
Authors:
Mike Thelwall,
Kayvan Kousha,
Mahshid Abdoli,
Emma Stuart,
Meiko Makita,
Paul Wilson,
Jonathan Levitt,
Petr Knoth,
Matteo Cancellieri
Abstract:
National research evaluation initiatives and incentive schemes have previously chosen between simplistic quantitative indicators and time-consuming peer review, sometimes supported by bibliometrics. Here we assess whether artificial intelligence (AI) could provide a third alternative, estimating article quality using more multiple bibliometric and metadata inputs. We investigated this using provis…
▽ More
National research evaluation initiatives and incentive schemes have previously chosen between simplistic quantitative indicators and time-consuming peer review, sometimes supported by bibliometrics. Here we assess whether artificial intelligence (AI) could provide a third alternative, estimating article quality using more multiple bibliometric and metadata inputs. We investigated this using provisional three-level REF2021 peer review scores for 84,966 articles submitted to the UK Research Excellence Framework 2021, matching a Scopus record 2014-18 and with a substantial abstract. We found that accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics, reaching 42% above the baseline (72% overall) in the best case. This is based on 1000 bibliometric inputs and half of the articles used for training in each UoA. Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, and humanities UoAs were much lower or close to zero. The Random Forest Classifier (standard or ordinal) and Extreme Gradient Boosting Classifier algorithms performed best from the 32 tested. Accuracy was lower if UoAs were merged or replaced by Scopus broad categories. We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, as estimated by the algorithms, but this substantially reduced the number of scores predicted.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Is research with qualitative data more prevalent and impactful now? Interviews, case studies, focus groups and ethnographies
Authors:
Mike Thelwall,
Tamara Nevill
Abstract:
Researchers, editors, educators and publishers need to understand the mix of research methods used in their field to guide decision making, with a current concern being that qualitative research is threatened by big data. Although there have been many studies of the prevalence of different methods within individual narrow fields, there have been no systematic studies across academia. In response,…
▽ More
Researchers, editors, educators and publishers need to understand the mix of research methods used in their field to guide decision making, with a current concern being that qualitative research is threatened by big data. Although there have been many studies of the prevalence of different methods within individual narrow fields, there have been no systematic studies across academia. In response, this article assesses the prevalence and citation impact of academic research 1996-2019 that reports one of four common methods to gather qualitative data: interviews; focus groups; case studies; ethnography. The results show that, with minor exceptions, the prevalence of qualitative data has increased, often substantially, since 1996. In addition, all 27 broad fields (as classified by Scopus) now publish some qualitative research, with interviewing being by far the most common approach. This suggest that qualitative methods teaching and should increase, and researchers, editors and publishers should be increasingly open to the value that qualitative data can bring.
△ Less
Submitted 24 April, 2021;
originally announced April 2021.
-
Cures, Treatments and Vaccines for Covid-19: International differences in interest on Twitter
Authors:
Mike Thelwall
Abstract:
Since the Covid-19 pandemic is a global threat to health that few can fully escape, it has given a unique opportunity to study international reactions to a common problem. Such reactions can be partly obtained from public posts to Twitter, allowing investigations of changes in interest over time. This study analysed English-language Covid-19 tweets mentioning cures, treatments, or vaccines from 1…
▽ More
Since the Covid-19 pandemic is a global threat to health that few can fully escape, it has given a unique opportunity to study international reactions to a common problem. Such reactions can be partly obtained from public posts to Twitter, allowing investigations of changes in interest over time. This study analysed English-language Covid-19 tweets mentioning cures, treatments, or vaccines from 1 January 2020 to 8 April 2021, seeking trends and international differences. The results have methodological limitations but show a tendency for countries with a lower human development index score to tweet more about cures, although they were a minor topic for all countries. Vaccines were discussed about as much as treatments until July 2020, when they generated more interest because of developments in Russia. The November 2020 Pfizer-BioNTech preliminary Phase 3 trials results generated an immediate and sustained sharp increase, however, followed by a continuing roughly linear increase in interest for vaccines until at least April 2021. Against this background, national deviations from the average were triggered by country-specific news about cures, treatments or vaccines. Nevertheless, interest in vaccines in all countries increased in parallel to some extent, despite substantial international differences in national regulatory approval and availability. The results also highlight that unsubstantiated claims about alternative medicine remedies gained traction in several countries, apparently posing a threat to public health.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Can Twitter Give Insights into International Differences in Covid-19 Vaccination? Eight countries' English tweets to 21 March 2021
Authors:
Mike Thelwall
Abstract:
Vaccination programs may help the world to reduce or eliminate Covid-19. Information about them may help countries to design theirs more effectively, with important benefits for public health. This article investigates whether it is possible to get insights into national vaccination programmes from a quick international comparison of public comments on Twitter. For this, word association thematic…
▽ More
Vaccination programs may help the world to reduce or eliminate Covid-19. Information about them may help countries to design theirs more effectively, with important benefits for public health. This article investigates whether it is possible to get insights into national vaccination programmes from a quick international comparison of public comments on Twitter. For this, word association thematic analysis (WATA) was applied to English-language vaccine-related tweets from eight countries gathered between 5 December 2020 and 21 March 2021. The method was able to quickly identify multiple international differences. Whilst some were irrelevant, potentially non-trivial differences include differing extents to which non-government scientific experts are important to national vaccination discussions. For example, Ireland seemed to be the only country in which university presidents were widely tweeted about in vaccine discussions. India's vaccine kindness term #VaccineMaitri was another interesting difference, highlighting the need for international sharing.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
A Bayesian Hurdle Quantile Regression Model for Citation Analysis with Mass Points at Lower Values
Authors:
Marzieh Shahmandi,
Paul Wilson,
Mike Thelwall
Abstract:
Quantile regression presents a complete picture of the effects on the location, scale, and shape of the dependent variable at all points, not just the mean. We focus on two challenges for citation count analysis by quantile regression: discontinuity and substantial mass points at lower counts. A Bayesian hurdle quantile regression model for count data with a substantial mass point at zero was prop…
▽ More
Quantile regression presents a complete picture of the effects on the location, scale, and shape of the dependent variable at all points, not just the mean. We focus on two challenges for citation count analysis by quantile regression: discontinuity and substantial mass points at lower counts. A Bayesian hurdle quantile regression model for count data with a substantial mass point at zero was proposed by King and Song (2019). It uses quantile regression for modeling the nonzero data and logistic regression for modeling the probability of zeros versus nonzeros. We show that substantial mass points for low citation counts will nearly certainly also affect parameter estimation in the quantile regression part of the model, similar to a mass point at zero. We update the King and Song model by shifting the hurdle point past the main mass points. This model delivers more accurate quantile regression for moderately to highly cited articles, especially at quantiles corresponding to values just beyond the mass points, and enables estimates of the extent to which factors influence the chances that an article will be low cited. To illustrate the potential of this method, it is applied to simulated citation counts and data from Scopus.
△ Less
Submitted 10 July, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.
-
Exploring WorldCat Identities as an altmetric information source: A library catalog analysis experiment in the field of Scientometrics
Authors:
Daniel Torres-Salinas,
Wenceslao Arroyo-Machado,
Mike Thelwall
Abstract:
Assessing the impact of scholarly books is a difficult research evaluation problem. Library Catalog Analysis facilitates the quantitative study, at different levels, of the impact and diffusion of academic books based on data about their availability in libraries. The WorldCat global catalog collates data on library holdings, offering a range of tools including the novel WorldCat Identities. This…
▽ More
Assessing the impact of scholarly books is a difficult research evaluation problem. Library Catalog Analysis facilitates the quantitative study, at different levels, of the impact and diffusion of academic books based on data about their availability in libraries. The WorldCat global catalog collates data on library holdings, offering a range of tools including the novel WorldCat Identities. This is based on author profiles and provides indicators relating to the availability of their books in library catalogs. Here, we investigate this new tool to identify its strengths and weaknesses based on a sample of Bibliometrics and Scientometrics authors. We review the problems that this entails and compare Library Catalog Analysis indicators with Google Scholar and Web of Science citations. The results show that WorldCat Identities can be a useful tool for book impact assessment but the value of its data is undermined by the provision of massive collections of ebooks to academic libraries.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Pot, kettle: Nonliteral titles aren't (natural) science
Authors:
Mike Thelwall
Abstract:
Researchers may be tempted to attract attention through poetic titles for their publications, but would this be mistaken in some fields? Whilst poetic titles are known to be common in medicine, it is not clear whether the practice is widespread elsewhere. This article investigates the prevalence of poetic expressions in journal article titles 1996-2019 in 3.3 million articles from all 27 Scopus br…
▽ More
Researchers may be tempted to attract attention through poetic titles for their publications, but would this be mistaken in some fields? Whilst poetic titles are known to be common in medicine, it is not clear whether the practice is widespread elsewhere. This article investigates the prevalence of poetic expressions in journal article titles 1996-2019 in 3.3 million articles from all 27 Scopus broad fields. Expressions were identified by manually checking all phrases with at least 5 words that occurred at least 25 times, finding 149 stock phrases, idioms, sayings, literary allusions, film names and song titles or lyrics. The expressions found are most common in the social sciences and the humanities. They are also relatively common in medicine, but almost absent from engineering and the natural and formal sciences. The differences may reflect the less hierarchical and more varied nature of the social sciences and humanities, where interesting titles may attract an audience. In engineering, natural science and formal science fields, authors should take extra care with poetic expressions, in case their choice is judged inappropriate. This includes interdisciplinary research overlapping these areas. Conversely, reviewers of interdisciplinary research involving the social sciences should be more tolerant of poetic license.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Coronavirus research before 2020 is more relevant than ever, especially when interpreted for COVID-19
Authors:
Mike Thelwall
Abstract:
The speed with which biomedical researchers were able to identify and characterise COVID-19 was clearly due to prior research with other coronaviruses. Early epidemiological comparisons with two previous coronaviruses, Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS), also made it easier to predict COVID-19's likely spread and lethality. This article assesses wh…
▽ More
The speed with which biomedical researchers were able to identify and characterise COVID-19 was clearly due to prior research with other coronaviruses. Early epidemiological comparisons with two previous coronaviruses, Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS), also made it easier to predict COVID-19's likely spread and lethality. This article assesses whether academic interest in prior coronavirus research has translated into interest in the primary source material, using Mendeley reader counts for early academic impact evidence. The results confirm that SARS and MERS research 2008-2017 experienced anomalously high increases in Mendeley readers in April-May 2020. Nevertheless, studies learning COVID-19 lessons from SARS and MERS or using them as a benchmark for COVID-19 have generated much more academic interest than primary studies of SARS or MERS. Thus, research that interprets prior relevant research for new diseases when they are discovered seems to be particularly important to help researchers to understand its implications in the new context.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
All downhill from the PhD? The typical impact trajectory of US academic careers
Authors:
Mike Thelwall,
Ruth Fairclough
Abstract:
Within academia, mature researchers tend to be more senior, but do they also tend to write higher impact articles? This article assesses long-term publishing (16+ years) United States (US) researchers, contrasting them with shorter-term publishing researchers (1, 6 or 10 years). A long-term US researcher is operationalised as having a first Scopus-indexed journal article in exactly 2001 and one in…
▽ More
Within academia, mature researchers tend to be more senior, but do they also tend to write higher impact articles? This article assesses long-term publishing (16+ years) United States (US) researchers, contrasting them with shorter-term publishing researchers (1, 6 or 10 years). A long-term US researcher is operationalised as having a first Scopus-indexed journal article in exactly 2001 and one in 2016-2019, with US main affiliations in their first and last articles. Researchers publishing in large teams (11+ authors) were excluded. The average field and year normalised citation impact of long- and shorter-term US researchers' journal articles decreases over time relative to the national average, with especially large falls to the last articles published that may be at least partly due to a decline in self-citations. In many cases researchers start by publishing above US average citation impact research and end by publishing below US average citation impact research. Thus, research managers should not assume that senior researchers will usually write the highest impact papers.
△ Less
Submitted 23 May, 2020;
originally announced May 2020.
-
Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations
Authors:
Alberto Martín-Martín,
Mike Thelwall,
Enrique Orduna-Malea,
Emilio Delgado López-Cózar
Abstract:
New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. In response, this paper investigates 3,073,351 citation…
▽ More
New sources of citation data have recently become available, such as Microsoft Academic, Dimensions, and the OpenCitations Index of CrossRef open DOI-to-DOI citations (COCI). Although these have been compared to the Web of Science (WoS), Scopus, or Google Scholar, there is no systematic evidence of their differences across subject categories. In response, this paper investigates 3,073,351 citations found by these six data sources to 2,515 English-language highly-cited documents published in 2006 from 252 subject categories, expanding and updating the largest previous study. Google Scholar found 88% of all citations, many of which were not found by the other sources, and nearly all citations found by the remaining sources (89%-94%). A similar pattern held within most subject categories. Microsoft Academic is the second largest overall (60% of all citations), including 82% of Scopus citations and 86% of Web of Science citations. In most categories, Microsoft Academic found more citations than Scopus and WoS (182 and 223 subject categories, respectively), but had coverage gaps in some areas, such as Physics and some Humanities categories. After Scopus, Dimensions is fourth largest (54% of all citations), including 84% of Scopus citations and 88% of WoS citations. It found more citations than Scopus in 36 categories, more than WoS in 185, and displays some coverage gaps, especially in the Humanities. Following WoS, COCI is the smallest, with 28% of all citations. Google Scholar is still the most comprehensive source. In many subject categories Microsoft Academic and Dimensions are good alternatives to Scopus and WoS in terms of coverage.
△ Less
Submitted 30 January, 2021; v1 submitted 29 April, 2020;
originally announced April 2020.
-
A gender equality paradox in academic publishing: Countries with a higher proportion of female first-authored journal articles have larger first author gender disparities between fields
Authors:
Mike Thelwall,
Amalia Mas-Bleda
Abstract:
Current attempts to address the shortfall of female researchers in Science, Technology, Engineering and Mathematics (STEM) have not yet succeeded despite other academic subjects having female majorities. This article investigates the extent to which gender disparities are subject-wide or nation-specific by a first author gender comparison of 30 million articles from all 27 Scopus broad fields with…
▽ More
Current attempts to address the shortfall of female researchers in Science, Technology, Engineering and Mathematics (STEM) have not yet succeeded despite other academic subjects having female majorities. This article investigates the extent to which gender disparities are subject-wide or nation-specific by a first author gender comparison of 30 million articles from all 27 Scopus broad fields within the 31 countries with the most Scopus-indexed articles 2014-18. The results show overall and geocultural patterns as well as individual national differences. Almost half of the subjects were always more male (7; e.g., Mathematics) or always more female (6; e.g., Immunology & Microbiology) than the national average. A strong overall trend (Spearman correlation 0.546) is for countries with a higher proportion of female first-authored research to also have larger differences in gender disparities between fields (correlation 0.314 for gender ratios). This confirms the international gender equality paradox previously found for degree subject choices: increased gender equality overall associates with moderately greater gender differentiation between subjects. This is consistent with previous USA-based claims that gender differences in academic careers are partly due to (socially constrained) gender differences in personal preferences. Radical solutions may therefore be needed for some STEM subjects to overcome gender disparities.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts
Authors:
Kayvan Kousha,
Mike Thelwall
Abstract:
The COVID-19 pandemic requires a fast response from researchers to help address biological, medical and public health issues to minimize its impact. In this rapidly evolving context, scholars, professionals and the public may need to quickly identify important new studies. In response, this paper assesses the coverage of scholarly databases and impact indicators during 21 March to 18 April 2020. T…
▽ More
The COVID-19 pandemic requires a fast response from researchers to help address biological, medical and public health issues to minimize its impact. In this rapidly evolving context, scholars, professionals and the public may need to quickly identify important new studies. In response, this paper assesses the coverage of scholarly databases and impact indicators during 21 March to 18 April 2020. The results confirm a rapid increase in the volume of research, which particularly accessible through Google Scholar and Dimensions, and less through Scopus, the Web of Science, PubMed. A few COVID-19 papers from the 21,395 in Dimensions were already highly cited, with substantial news and social media attention. For this topic, in contrast to previous studies, there seems to be a high degree of convergence between articles shared in the social web and citation counts, at least in the short term. In particular, articles that are extensively tweeted on the day first indexed are likely to be highly read and relatively highly cited three weeks later. Researchers needing wide scope literature searches (rather than health focused PubMed or medRxiv searches) should start with Google Scholar or Dimensions and can use tweet and Mendeley reader counts as indicators of likely importance.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
A thematic analysis of highly retweeted early COVID -19 tweets: Consensus, information, dissent, and lockdown life
Authors:
Mike Thelwall,
Saheeda Thelwall
Abstract:
Purpose: Public attitudes towards COVID-19 and social distancing are critical in reducing its spread. It is therefore important to understand public reactions and information dissemination in all major forms, including on social media. This article investigates important issues reflected on Twitter in the early stages of the public reaction to COVID-19. Design/methodology/approach: A thematic anal…
▽ More
Purpose: Public attitudes towards COVID-19 and social distancing are critical in reducing its spread. It is therefore important to understand public reactions and information dissemination in all major forms, including on social media. This article investigates important issues reflected on Twitter in the early stages of the public reaction to COVID-19. Design/methodology/approach: A thematic analysis of the most retweeted English-language tweets mentioning COVID-19 during March 10-29, 2020. Findings: The main themes identified for the 87 qualifying tweets accounting for 14 million retweets were: lockdown life; attitude towards social restrictions; politics; safety messages; people with COVID-19; support for key workers; work; and COVID-19 facts/news. Research limitations/implications: Twitter played many positive roles, mainly through unofficial tweets. Users shared social distancing information, helped build support for social distancing, criticised government responses, expressed support for key workers, and helped each other cope with social isolation. A few popular tweets not supporting social distancing show that government messages sometimes failed. Practical implications: Public health campaigns in future may consider encouraging grass roots social web activity to support campaign goals. At a methodological level, analysing retweet counts emphasised politics and ignored practical implementation issues. Originality/value: This is the first qualitative analysis of general COVID-19-related retweeting.
△ Less
Submitted 2 October, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Covid-19 Tweeting in English: Gender Differences
Authors:
Mike Thelwall,
Saheeda Thelwall
Abstract:
At the start of 2020, COVID-19 became the most urgent threat to global public health. Uniquely in recent times, governments have imposed partly voluntary, partly compulsory restrictions on the population to slow the spread of the virus. In this context, public attitudes and behaviors are vitally important for reducing the death rate. Analyzing tweets about the disease may therefore give insights i…
▽ More
At the start of 2020, COVID-19 became the most urgent threat to global public health. Uniquely in recent times, governments have imposed partly voluntary, partly compulsory restrictions on the population to slow the spread of the virus. In this context, public attitudes and behaviors are vitally important for reducing the death rate. Analyzing tweets about the disease may therefore give insights into public reactions that may help guide public information campaigns. This article analyses 3,038,026 English tweets about COVID-19 from March 10 to 23, 2020. It focuses on one relevant aspect of public reaction: gender differences. The results show that females are more likely to tweet about the virus in the context of family, social distancing and healthcare whereas males are more likely to tweet about sports cancellations, the global spread of the virus and political reactions. Thus, women seem to be taking a disproportionate share of the responsibility for directly keeping the population safe. The detailed results may be useful to inform public information announcements and to help understand the spread of the virus. For example, failure to impose a sporting bans whilst encouraging social distancing may send mixed messages to males.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Does the use of open, non-anonymous peer review in scholarly publishing introduce bias? Evidence from the F1000 post-publication open peer review publishing model
Authors:
Mike Thelwall,
Verena Weigert,
Liz Allen,
Zena Nyakoojo,
Eleanor-Rose Papas
Abstract:
This study examines whether there is any evidence of bias in two areas of common critique of open, non-anonymous peer review - and used in the post-publication, peer review system operated by the open-access scholarly publishing platform F1000Research. First, is there evidence of bias where a reviewer based in a specific country assesses the work of an author also based in the same country? Second…
▽ More
This study examines whether there is any evidence of bias in two areas of common critique of open, non-anonymous peer review - and used in the post-publication, peer review system operated by the open-access scholarly publishing platform F1000Research. First, is there evidence of bias where a reviewer based in a specific country assesses the work of an author also based in the same country? Second, are reviewers influenced by being able to see the comments and know the origins of previous reviewer? Methods: Scrutinising the open peer review comments published on F1000Research, we assess the extent of two frequently cited potential influences on reviewers that may be the result of the transparency offered by a fully attributable, open peer review publishing model: the national affiliations of authors and reviewers, and the ability of reviewers to view previously-published reviewer reports before submitting their own. The effects of these potential influences were investigated for all first versions of articles published by 8 July 2019 to F1000Research. In 16 out of the 20 countries with the most articles, there was a tendency for reviewers based in the same country to give a more positive review. The difference was statistically significant in one. Only 3 countries had the reverse tendency. Second, there is no evidence of a conformity bias. When reviewers mentioned a previous review in their peer review report, they were not more likely to give the same overall judgement. Although reviewers who had longer to potentially read a previously published reviewer reports were slightly less likely to agree with previous reviewer judgements, this could be due to these articles being difficult to judge rather than deliberate non-conformity.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
Academic collaboration rates and citation associations vary substantially between countries and fields
Authors:
Mike Thelwall,
Nabeil Maflahi
Abstract:
Research collaboration is promoted by governments and research funders but if the relative prevalence and merits of collaboration vary internationally different national and disciplinary strategies may be needed to promote it. This study compares the team size and field normalised citation impact of research across all 27 Scopus broad fields in the ten countries with the most journal articles inde…
▽ More
Research collaboration is promoted by governments and research funders but if the relative prevalence and merits of collaboration vary internationally different national and disciplinary strategies may be needed to promote it. This study compares the team size and field normalised citation impact of research across all 27 Scopus broad fields in the ten countries with the most journal articles indexed in Scopus 2008-2012. The results show that team size varies substantially by discipline and country, with Japan (4.2) having two thirds more authors per article than the UK (2.5). Solo authorship is rare in China (4%) but common in the UK (27%). Whilst increasing team size associates with higher citation impact in almost all countries and fields, this association is much weaker in China than elsewhere. There are also field differences in the association between citation impact and collaboration. For example, larger team sizes in the Business, Management & Accounting category do not seem to associate with greater research impact, and for China and India, solo authorship associates with higher citation impact. Overall, there are substantial international and field differences in the extent to which researchers collaborate and the extent to which collaboration associates with higher citation impact.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Mendeley Reader Counts for US Computer Science Conference Papers and Journal articles
Authors:
Mike Thelwall
Abstract:
Although bibliometrics are normally applied to journal articles when used to support research evaluations, conference papers are at least as important in fast-moving computing-related fields. It is therefore important to assess the relative advantages of citations and altmetrics for computing conference papers to make an informed decision about which, if any, to use. This paper compares Scopus cit…
▽ More
Although bibliometrics are normally applied to journal articles when used to support research evaluations, conference papers are at least as important in fast-moving computing-related fields. It is therefore important to assess the relative advantages of citations and altmetrics for computing conference papers to make an informed decision about which, if any, to use. This paper compares Scopus citations with Mendeley reader counts for conference papers and journal articles that were published between 1996 and 2018 in 11 computing fields and had at least one US author. The data showed high correlations between Scopus citation counts and Mendeley reader counts in all fields and most years, but with few Mendeley readers for older conference papers and few Scopus citations for new conference papers and journal articles. The results therefore suggest that Mendeley reader counts have a substantial advantage over citation counts for recently-published conference papers due to their greater speed, but are unsuitable for older conference papers.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Large publishing consortia produce higher citation impact research but co-author contributions are hard to evaluate
Authors:
Mike Thelwall
Abstract:
This paper introduces a simple agglomerative clustering method to identify large publishing consortia with at least 20 authors and 80% shared authorship between articles. Based on Scopus journal articles 1996-2018, under these criteria, nearly all (88%) of the large consortia published research with citation impact above the world average, with the exceptions being mainly the newer consortia for w…
▽ More
This paper introduces a simple agglomerative clustering method to identify large publishing consortia with at least 20 authors and 80% shared authorship between articles. Based on Scopus journal articles 1996-2018, under these criteria, nearly all (88%) of the large consortia published research with citation impact above the world average, with the exceptions being mainly the newer consortia for which average citation counts are unreliable. On average, consortium research had almost double (1.95) the world average citation impact on the log scale used (Mean Normalised Log Citation Score). At least partial alphabetical author ordering was the norm in most consortia. The 250 largest consortia were for nuclear physics and astronomy around expensive equipment, and for predominantly health-related issues in genomics, medicine, public health, microbiology and neuropsychology. For the health-related issues, except for the first and last few authors, authorship seem to primary indicate contributions to the shared project infrastructure necessary to gather the raw data. It is impossible for research evaluators to identify the contributions of individual authors in the huge alphabetical consortia of physics and astronomy, and problematic for the middle and end authors of health-related consortia. For small scale evaluations, authorship contribution statements could be used, when available.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.
-
Female citation impact superiority 1996-2018 in six out of seven English-speaking nations
Authors:
Mike Thelwall
Abstract:
Efforts to combat continuing gender inequalities in academia need to be informed by evidence about where differences occur. Citations are relevant as potential evidence in appointment and promotion decisions, but it is unclear whether there have been historical gender differences in average citation impact that might explain the current shortfall of senior female academics. This study investigates…
▽ More
Efforts to combat continuing gender inequalities in academia need to be informed by evidence about where differences occur. Citations are relevant as potential evidence in appointment and promotion decisions, but it is unclear whether there have been historical gender differences in average citation impact that might explain the current shortfall of senior female academics. This study investigates the evolution of gender differences in citation impact 1996-2018 for six million articles from seven large English-speaking nations: Australia, Canada, Ireland, Jamaica, New Zealand, UK, and the USA. The results show that a small female citation advantage has been the norm over time for all these countries except the USA, where there has been no practical difference. The female citation advantage is largest, and statistically significant in most years, for Australia and the UK. This suggests that any academic bias against citing female authored research cannot explain current employment inequalities. Nevertheless, comparisons using recent citation data, or avoiding it altogether, during appointments or promotion may disadvantage females in some countries by underestimating the likely impact of their work, especially in the long term.
△ Less
Submitted 2 October, 2019; v1 submitted 29 April, 2019;
originally announced April 2019.
-
Should Citations be Counted Separately from Each Originating Section
Authors:
Mike Thelwall
Abstract:
Articles are cited for different purposes and differentiating between reasons when counting citations may therefore give finer-grained citation count information. Although identifying and aggregating the individual reasons for each citation may be impractical, recording the number of citations that originate from different article sections might illuminate the general reasons behind a citation cou…
▽ More
Articles are cited for different purposes and differentiating between reasons when counting citations may therefore give finer-grained citation count information. Although identifying and aggregating the individual reasons for each citation may be impractical, recording the number of citations that originate from different article sections might illuminate the general reasons behind a citation count (e.g., 110 citations = 10 Introduction citations + 100 Methods citations). To help investigate whether this could be a practical and universal solution, this article compares 19 million citations with DOIs from six different standard sections in 799,055 PubMed Central open access articles across 21 out of 22 fields. There are apparently non-systematic differences between fields in the most citing sections and the extent to which citations from one section overlap with citations from another, with some degree of overlap in most cases. Thus, at a science-wide level, section headings are partly unreliable indicators of citation context, even if they are more standard within individual fields. They may still be used within fields to help identify individual highly cited articles that have had one type of impact, especially methodological (Methods) or context setting (Introduction), but expert judgement is needed to validate the results.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
The rhetorical structure of science? A multidisciplinary analysis of article headings
Authors:
Mike Thelwall
Abstract:
An effective structure helps an article to convey its core message. The optimal structure depends on the information to be conveyed and the expectations of the audience. In the current increasingly interdisciplinary era, structural norms can be confusing to the authors, reviewers and audiences of scientific articles. Despite this, no prior study has attempted to assess variations in the structure…
▽ More
An effective structure helps an article to convey its core message. The optimal structure depends on the information to be conveyed and the expectations of the audience. In the current increasingly interdisciplinary era, structural norms can be confusing to the authors, reviewers and audiences of scientific articles. Despite this, no prior study has attempted to assess variations in the structure of academic papers across all disciplines. This article reports on the headings commonly used by over 1 million research articles from the PubMed Central Open Access collection, spanning 22 broad categories covering all academia and 172 out of 176 narrow categories. The results suggest that no headings are close to ubiquitous in any broad field and that there are substantial differences in the extent to which most headings are used. In the humanities, headings may be avoided altogether. Researchers should therefore be aware of unfamiliar structures that are nevertheless legitimate when reading, writing and reviewing articles.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
Can Google Scholar and Mendeley help to assess the scholarly impacts of dissertations?
Authors:
Kayvan Kousha,
Mike Thelwall
Abstract:
Dissertations can be the single most important scholarly outputs of junior researchers. Whilst sets of journal articles are often evaluated with the help of citation counts from the Web of Science or Scopus, these do not index dissertations and so their impact is hard to assess. In response, this article introduces a new multistage method to extract Google Scholar citation counts for large collect…
▽ More
Dissertations can be the single most important scholarly outputs of junior researchers. Whilst sets of journal articles are often evaluated with the help of citation counts from the Web of Science or Scopus, these do not index dissertations and so their impact is hard to assess. In response, this article introduces a new multistage method to extract Google Scholar citation counts for large collections of dissertations from repositories indexed by Google. The method was used to extract Google Scholar citation counts for 77,884 American doctoral dissertations from 2013-2017 via ProQuest, with a precision of over 95%. Some ProQuest dissertations that were dual indexed with other repositories could not be retrieved with ProQuest-specific searches but could be found with Google Scholar searches of the other repositories. The Google Scholar citation counts were then compared with Mendeley reader counts, a known source of scholarly-like impact data. A fifth of the dissertations had at least one citation recorded in Google Scholar and slightly fewer had at least one Mendeley reader. Based on numerical comparisons, the Mendeley reader counts seem to be more useful for impact assessment purposes for dissertations that are less than two years old, whilst Google Scholar citations are more useful for older dissertations, especially in social sciences, arts and humanities. Google Scholar citation counts may reflect a more scholarly type of impact than that of Mendeley reader counts because dissertations attract a substantial minority of their citations from other dissertations. In summary, the new method now makes it possible for research funders, institutions and others to systematically evaluate the impact of dissertations, although additional Google Scholar queries for other online repositories are needed to ensure comprehensive coverage.
△ Less
Submitted 23 February, 2019;
originally announced February 2019.