Search | arXiv e-print repository

A Fresh Look at FAIR for Research Software

Authors: Daniel S. Katz, Morane Gruenpeter, Tom Honeyman, Lorraine Hwang, Mark D. Wilkinson, Vanessa Sochat, Hartwig Anzt, Carole Goble, for FAIR4RS Subgroup 1

Abstract: This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the charac… ▽ More This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the characteristic and practiced differences of research software and data. This vision and understanding of initial conditions serves as a backdrop for an attempt at translating and interpreting the guiding principles to more fully align with research software. We have found that many of the principles remained relatively intact as written, as long as considerable interpretation was provided. This was particularly the case for the "Findable" and "Accessible" foundational principles. We found that "Interoperability" and "Reusability" are particularly prone to a broad and sometimes opposing set of interpretations as written. We propose two new principles modeled on existing ones, and provide modified guiding text for these principles to help clarify our final interpretation. A series of gaps in translation were captured during this process, and these remain to be addressed. We finish with a consideration of where these translated principles fall short of the vision laid out in the opening. △ Less

Submitted 9 February, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

arXiv:1902.11162 [pdf]

The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data

Authors: P. Wittenburg, H. Pergl Sustkova, A. Montesanti, S. M. Bloemers, S. H. de Waard, M. A. Musen, J. B. Graybeal, K. M. Hettne, A. Jacobsen, R. Pergl, R. W. W. Hooft, C. Staiger, C. W. G. van Gelder, S. L. Knijnenburg, A. C. van Arkel, B. Meerman, M. D. Wilkinson, S-A Sansone, P. Rocca-Serra, P. McQuilton, A. N. Gonzalez-Beltran, G. J. C. Aben, P. Henning, S. Alencar, C. Ribeiro , et al. (35 additional authors not shown)

Abstract: There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for… ▽ More There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for Machines (M4M), enabling any self-identified stakeholder to define and promote the reuse of standardized, comprehensive machine-actionable metadata. The funders of scientific research recognize that they have an important role to play in ensuring that experimental results are FAIR, and that high quality metadata and careful planning for FAIR data stewardship are central to these goals. We describe the outcome of a recent M4M workshop that has led to a pilot programme involving two national science funders, the Health Research Board of Ireland (HRB) and the Netherlands Organisation for Health Research and Development (ZonMW). These funding organizations will explore new technologies to define at the time that a request for proposals is issued the minimal set of machine-actionable metadata that they would like investigators to use to annotate their datasets, to enable investigators to create such metadata to help make their data FAIR, and to develop data-stewardship plans that ensure that experimental data will be managed appropriately abiding by the FAIR principles. The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop. △ Less

Submitted 6 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: This is a pre-print of the FAIR Funders pilot, an outcome of the first Metadata for Machines workshop, see: https://www.go-fair.org/resources/go-fair-workshop-series/metadata-for-machines-workshops/. Corresponding author: E. A Schultes, ORCID 0000-0001-8888-635X

arXiv:1502.06025 [pdf]

OntoLoki: an automatic, instance-based method for the evaluation of biological ontologies on the Semantic Web

Authors: Benjamin M. Good, Gavin Ha, Chi K. Ho, Mark D. Wilkinson

Abstract: The delineation of logical definitions for each class in an ontology and the consistent application of these definitions to the assignment of instances to classes are important criteria for ontology evaluation. If ontologies are specified with property-based restrictions on class membership, then such consistency can be checked automatically. If no such logical restrictions are applied, as is the… ▽ More The delineation of logical definitions for each class in an ontology and the consistent application of these definitions to the assignment of instances to classes are important criteria for ontology evaluation. If ontologies are specified with property-based restrictions on class membership, then such consistency can be checked automatically. If no such logical restrictions are applied, as is the case with many biological ontologies, there are currently no automated methods for measuring the semantic consistency of instance assignment on an ontology-wide scale, nor for inferring the patterns of properties that might define a particular class. We constructed a program that takes as its input an OWL/RDF knowledge base containing an ontology, instances associated with each of the classes in the ontology, and properties of those instances. For each class, it outputs: 1) a rule for determining class membership based on the properties of the instances and 2) a quantitative score for the class that reflects the ability of the identified rule to correctly predict class membership for the instances in the knowledge base. We evaluated this program using both artificial knowledge bases of known quality and real, widely used ontologies. The results indicate that the suggested method can be used to conduct objective, automatic, data-driven evaluations of biological ontologies without formal class definitions in regards to the property-based consistency of instance-assignment. This inductive method complements existing, purely deductive approaches to automatic consistency checking, offering not just the potential to help in the ontology engineering process but also in the knowledge discovery process. △ Less

Submitted 20 February, 2015; originally announced February 2015.

ACM Class: I.2.4

arXiv:1407.0165 [pdf, other]

Automatic annotation of bioinformatics workflows with biomedical ontologies

Authors: Beatriz García-Jiménez, Mark D. Wilkinson

Abstract: Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way… ▽ More Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way. Despite a dearth of even textual descriptions, we automatically annotated 530 myExperiment bioinformatics-related workflows, including more than 2600 workflow-associated services, with relevant ontological terms. Quantitative evaluation of the Information Content of these terms suggests that, in cases where annotation was possible at all, the annotation quality was comparable to manually curated bioinformatics resources. △ Less

Submitted 1 July, 2014; originally announced July 2014.

Comments: 6th International Symposium on Leveraging Applications (ISoLA 2014 conference), 15 pages, 4 figures

arXiv:1305.4455 [pdf, other]

SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web

Authors: Ben P Vandervalk, E Luke McCarthy, Mark D Wilkinson

Abstract: Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries ac… ▽ More Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries across distributed datasets, the system allows for a target dataset that is significantly larger than is possible under current, centralized approaches. Although the architecture is equally applicable to all types of data, the SHARE system targets bioinformatics, due to the large number of interoperable web services that are already available in this area. SHARE is built entirely on semantic web standards, and is the successor of the BioMOBY project. △ Less

Submitted 20 May, 2013; originally announced May 2013.

Comments: Third Asian Semantic Web Conference, ASWC2008 Bangkok, Thailand December 2008, Workshops Proceedings (NEFORS2008), pp69-78

Showing 1–5 of 5 results for author: Wilkinson, M D