-
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
Authors:
Drahomira Herrmannova,
Nancy Pontika,
Petr Knoth
Abstract:
Recent years have seen fast growth in the number of policies mandating Open Access (OA) to research outputs. We conduct a large-scale analysis of over 800 thousand papers from repositories around the world published over a period of 5 years to investigate: a) if the time lag between the date of publication and date of deposit in a repository can be effectively tracked across thousands of repositor…
▽ More
Recent years have seen fast growth in the number of policies mandating Open Access (OA) to research outputs. We conduct a large-scale analysis of over 800 thousand papers from repositories around the world published over a period of 5 years to investigate: a) if the time lag between the date of publication and date of deposit in a repository can be effectively tracked across thousands of repositories globally, and b) if introducing deposit deadlines is associated with a reduction of time from acceptance to public availability of research outputs. We show that after the introduction of the UK REF 2021 OA policy, this time lag has decreased significantly in the UK and that the policy introduction might have accelerated the UK's move towards immediate OA compared to other countries. This supports the argument for the inclusion of a time-limited deposit requirement in OA policies.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Modelling student online behaviour in a virtual learning environment
Authors:
Martin Hlosta,
Drahomira Herrmannova,
Lucie Vachova,
Jakub Kuzilek,
Zdenek Zdrahal,
Annika Wolff
Abstract:
In recent years, distance education has enjoyed a major boom. Much work at The Open University (OU) has focused on improving retention rates in these modules by providing timely support to students who are at risk of failing the module. In this paper we explore methods for analysing student activity in online virtual learning environment (VLE) -- General Unary Hypotheses Automaton (GUHA) and Marko…
▽ More
In recent years, distance education has enjoyed a major boom. Much work at The Open University (OU) has focused on improving retention rates in these modules by providing timely support to students who are at risk of failing the module. In this paper we explore methods for analysing student activity in online virtual learning environment (VLE) -- General Unary Hypotheses Automaton (GUHA) and Markov chain-based analysis -- and we explain how this analysis can be relevant for module tutors and other student support staff. We show that both methods are a valid approach to modelling student activities. An advantage of the Markov chain-based approach is in its graphical output and in the possibility to model time dependencies of the student activities.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study
Authors:
Drahomira Herrmannova,
Steven R. Young,
Robert M. Patton,
Christopher G. Stahl,
Nicole C. Kleinstreuer,
Mary S. Wolfe
Abstract:
Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing…
▽ More
Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.
△ Less
Submitted 3 November, 2018;
originally announced November 2018.
-
Do Citations and Readership Identify Seminal Publications?
Authors:
Drahomira Herrmannova,
Robert M. Patton,
Petr Knoth,
Christopher G. Stahl
Abstract:
In this paper, we show that citation counts work better than a random baseline (by a margin of 10%) in distinguishing excellent research, while Mendeley reader counts don't work better than the baseline. Specifically, we study the potential of these metrics for distinguishing publications that caused a change in a research field from those that have not. The experiment has been conducted on a new…
▽ More
In this paper, we show that citation counts work better than a random baseline (by a margin of 10%) in distinguishing excellent research, while Mendeley reader counts don't work better than the baseline. Specifically, we study the potential of these metrics for distinguishing publications that caused a change in a research field from those that have not. The experiment has been conducted on a new dataset for bibliometric research called TrueImpactDataset. TrueImpactDataset is a collection of research publications of two types -- research papers which are considered seminal works in their area and papers which provide a literature review of a research area. We provide overview statistics of the dataset and propose to use it for validating research evaluation metrics. Using the dataset, we conduct a set of experiments to study how citation and reader counts perform in distinguishing these publication types, following the intuition that causing a change in a field signifies research contribution. We show that citation counts help in distinguishing research that strongly influenced later developments from works that predominantly discuss the current state of the art with a degree of accuracy (63%, i.e. 10% over the random baseline). In all setups, Mendeley reader counts perform worse than a random baseline.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
Simple Yet Effective Methods for Large-Scale Scholarly Publication Ranking
Authors:
Drahomira Herrmannova,
Petr Knoth
Abstract:
With the growing amount of published research, automatic evaluation of scholarly publications is becoming an important task. In this paper we address this problem and present a simple and transparent approach for evaluating the importance of scholarly publications. Our method has been ranked among the top performers in the WSDM Cup 2016 Challenge. The first part of this paper describes our method.…
▽ More
With the growing amount of published research, automatic evaluation of scholarly publications is becoming an important task. In this paper we address this problem and present a simple and transparent approach for evaluating the importance of scholarly publications. Our method has been ranked among the top performers in the WSDM Cup 2016 Challenge. The first part of this paper describes our method. In the second part we present potential improvements to the method and analyse the evaluation setup which was provided during the challenge. Finally, we discuss future challenges in automatic evaluation of papers including the use of full-texts based evaluation methods.
△ Less
Submitted 16 November, 2016;
originally announced November 2016.
-
Semantometrics: Towards Fulltext-based Research Evaluation
Authors:
Drahomira Herrmannova,
Petr Knoth
Abstract:
Over the recent years, there has been a growing interest in developing new research evaluation methods that could go beyond the traditional citation-based metrics. This interest is motivated on one side by the wider availability or even emergence of new information evidencing research performance, such as article downloads, views and Twitter mentions, and on the other side by the continued frustra…
▽ More
Over the recent years, there has been a growing interest in developing new research evaluation methods that could go beyond the traditional citation-based metrics. This interest is motivated on one side by the wider availability or even emergence of new information evidencing research performance, such as article downloads, views and Twitter mentions, and on the other side by the continued frustrations and problems surrounding the application of purely citation-based metrics to evaluate research performance in practice. Semantometrics are a new class of research evaluation metrics which build on the premise that full-text is needed to assess the value of a publication. This paper reports on the analysis carried out with the aim to investigate the properties of the semantometric contribution measure, which uses semantic similarity of publications to estimate research contribution, and provides a comparative study of the contribution measure with traditional bibliometric measures based on citation counting.
△ Less
Submitted 16 November, 2016; v1 submitted 13 May, 2016;
originally announced May 2016.