-
Detection of metadata manipulations: Finding sneaked references in the scholarly literature
Authors:
Lonni Besançon,
Guillaume Cabanac,
Cyril Labbé,
Alexander Magazinov,
Jules di Scala,
Dominika Tkaczyk,
Kathryn Weber-Boer
Abstract:
We report evidence of a new set of sneaked references discovered in the scientific literature. Sneaked references are references registered in the metadata of publications without being listed in reference section or in the full text of the actual publications where they ought to be found. We document here 80,205 references sneaked in metadata of the International Journal of Innovative Science and…
▽ More
We report evidence of a new set of sneaked references discovered in the scientific literature. Sneaked references are references registered in the metadata of publications without being listed in reference section or in the full text of the actual publications where they ought to be found. We document here 80,205 references sneaked in metadata of the International Journal of Innovative Science and Research Technology (IJISRT). These sneaked references are registered with Crossref and all cite -- thus benefit -- this same journal. Using this dataset, we evaluate three different methods to automatically identify sneaked references. These methods compare reference lists registered with Crossref against the full text or the reference lists extracted from PDF files. In addition, we report attempts to scale the search for sneaked references to the scholarly literature.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Sneaked references: Cooked reference metadata inflate citation counts
Authors:
Lonni Besançon,
Guillaume Cabanac,
Cyril Labbé,
Alexander Magazinov
Abstract:
We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metad…
▽ More
We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (5,978/65,836) mainly benefiting two authors. Despite not existing in the articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered 'lost' references: the studied bibliometric platform failed to index at least 56% (36,939/65,836) of the references listed in the HTML version of the publications. The extent of the sneaked and lost references in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
The 'Problematic Paper Screener' automatically selects suspect publications for post-publication (re)assessment
Authors:
Guillaume Cabanac,
Cyril Labbé,
Alexander Magazinov
Abstract:
Post publication assessment remains necessary to check erroneous or fraudulent scientific publications. We present an online platform, the 'Problematic Paper Screener' (https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener) that leverages both automatic machine detection and human assessment to identify and flag already published problematic articles. We provide a new effective tool to…
▽ More
Post publication assessment remains necessary to check erroneous or fraudulent scientific publications. We present an online platform, the 'Problematic Paper Screener' (https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener) that leverages both automatic machine detection and human assessment to identify and flag already published problematic articles. We provide a new effective tool to curate the scientific literature.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Improper legitimization of hijacked journals through citations
Authors:
Anna Abalkina,
Guillaume Cabanac,
Cyril Labbé,
Alexander Magazinov
Abstract:
The goal is to study the prevalence of citajacked papers: papers in authentic scientific journals citing hijacked journals, in academic literature. A Citejacked detector was designed as a part of the Problematic Paper Screener (https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener/citejacked) to trace if the references to articles originating from hijacked journals infiltrate scientifi…
▽ More
The goal is to study the prevalence of citajacked papers: papers in authentic scientific journals citing hijacked journals, in academic literature. A Citejacked detector was designed as a part of the Problematic Paper Screener (https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener/citejacked) to trace if the references to articles originating from hijacked journals infiltrate scientific communication. A full-text search was performed between November 2021 and January 2022 in the Dimensions database using the name of 1 of the 12 hijacked journals. The analysis of the bibliography in these articles revealed that 828 of them cite unreliable articles from hijacked journals. During 01.Jan.2021-31.Jan.2022, an average of 2 citejacked articles has been published daily in established journals. Given the limited number of titles included in this study, the phenomenon might be wider and is not yet systematically studied.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals
Authors:
Guillaume Cabanac,
Cyril Labbé,
Alexander Magazinov
Abstract:
Probabilistic text generators have been used to produce fake scientific papers for more than a decade. Such nonsensical papers are easily detected by both human and machine. Now more complex AI-powered generation techniques produce texts indistinguishable from that of humans and the generation of scientific texts from a few keywords has been documented. Our study introduces the concept of tortured…
▽ More
Probabilistic text generators have been used to produce fake scientific papers for more than a decade. Such nonsensical papers are easily detected by both human and machine. Now more complex AI-powered generation techniques produce texts indistinguishable from that of humans and the generation of scientific texts from a few keywords has been documented. Our study introduces the concept of tortured phrases: unexpected weird phrases in lieu of established ones, such as 'counterfeit consciousness' instead of 'artificial intelligence.' We combed the literature for tortured phrases and study one reputable journal where these concentrated en masse. Hypothesising the use of advanced language models we ran a detector on the abstracts of recent articles of this journal and on several control sets. The pairwise comparisons reveal a concentration of abstracts flagged as 'synthetic' in the journal. We also highlight irregularities in its operation, such as abrupt changes in editorial timelines. We substantiate our call for investigation by analysing several individual dubious articles, stressing questionable features: tortured writing style, citation of non-existent literature, and unacknowledged image reuse. Surprisingly, some websites offer to rewrite texts for free, generating gobbledegook full of tortured phrases. We believe some authors used rewritten texts to pad their manuscripts. We wish to raise the awareness on publications containing such questionable AI-generated or rewritten texts that passed (poor) peer review. Deception with synthetic texts threatens the integrity of the scientific literature.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.