-
Open and Sustainable AI: challenges, opportunities and the road ahead in the life sciences
Authors:
Gavin Farrell,
Eleni Adamidi,
Rafael Andrade Buono,
Mihail Anton,
Omar Abdelghani Attafi,
Salvador Capella Gutierrez,
Emidio Capriotti,
Leyla Jael Castro,
Davide Cirillo,
Lisa Crossman,
Christophe Dessimoz,
Alexandros Dimopoulos,
Raul Fernandez-Diaz,
Styliani-Christina Fragkouli,
Carole Goble,
Wei Gu,
John M. Hancock,
Alireza Khanteymoori,
Tom Lenaerts,
Fabio G. Liberante,
Peter Maccallum,
Alexander Miguel Monzon,
Magnus Palmblad,
Lucy Poveda,
Ovidiu Radulescu
, et al. (5 additional authors not shown)
Abstract:
Artificial intelligence (AI) has recently seen transformative breakthroughs in the life sciences, expanding possibilities for researchers to interpret biological information at an unprecedented capacity, with novel applications and advances being made almost daily. In order to maximise return on the growing investments in AI-based life science research and accelerate this progress, it has become u…
▽ More
Artificial intelligence (AI) has recently seen transformative breakthroughs in the life sciences, expanding possibilities for researchers to interpret biological information at an unprecedented capacity, with novel applications and advances being made almost daily. In order to maximise return on the growing investments in AI-based life science research and accelerate this progress, it has become urgent to address the exacerbation of long-standing research challenges arising from the rapid adoption of AI methods. We review the increased erosion of trust in AI research outputs, driven by the issues of poor reusability and reproducibility, and highlight their consequent impact on environmental sustainability. Furthermore, we discuss the fragmented components of the AI ecosystem and lack of guiding pathways to best support Open and Sustainable AI (OSAI) model development. In response, this perspective introduces a practical set of OSAI recommendations directly mapped to over 300 components of the AI ecosystem. Our work connects researchers with relevant AI resources, facilitating the implementation of sustainable, reusable and transparent AI. Built upon life science community consensus and aligned to existing efforts, the outputs of this perspective are designed to aid the future development of policy and structured pathways for guiding AI implementation.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
DOME Registry: Implementing community-wide recommendations for reporting supervised machine learning in biology
Authors:
Omar Abdelghani Attafi,
Damiano Clementel,
Konstantinos Kyritsis,
Emidio Capriotti,
Gavin Farrell,
Styliani-Christina Fragkouli,
Leyla Jael Castro,
András Hatos,
Tom Lenaerts,
Stanislav Mazurenko,
Soroush Mozaffari,
Franco Pradelli,
Patrick Ruch,
Castrense Savojardo,
Paola Turina,
Federico Zambelli,
Damiano Piovesan,
Alexander Miguel Monzon,
Fotis Psomopoulos,
Silvio C. E. Tosatto
Abstract:
Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The DOME recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by…
▽ More
Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The DOME recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME Registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, promoting transparency and reproducibility of ML in the life sciences.
△ Less
Submitted 16 August, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Synthetic data: How could it be used for infectious disease research?
Authors:
Styliani-Christina Fragkouli,
Dhwani Solanki,
Leyla J Castro,
Fotis E Psomopoulos,
Núria Queralt-Rosinach,
Davide Cirillo,
Lisa C Crossman
Abstract:
Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and…
▽ More
Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and fake news to deceive or manipulate, and displacement of human jobs across various market sectors.
Here, we consider both current and future positive advances and possibilities with synthetic datasets. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models. Generative AI is an artificial intelligence genre capable of creating text, images, video or other data using generative models. The recent explosion of interest in GenAI was heralded by the invention and speedy move to use of large language models (LLM). These computational models are able to achieve general-purpose language generation and other natural language processing tasks and are based on transformer architectures, which made an evolutionary leap from previous neural network architectures.
Fuelled by the advent of improved GenAI techniques and wide scale usage, this is surely the time to consider how synthetic data can be used to advance infectious disease research. In this commentary we aim to create an overview of the current and future position of synthetic data in infectious disease research.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
A mathematical model for fibrous dysplasia: The role of the flow of mutant cells
Authors:
Mariia Soloviova,
Juan Carlos Beltran Vargas,
Luis Fernandez de Castro,
Juan Belmonte-Beitia,
Víctor M. Pérez-García,
Magdalena Caballero
Abstract:
Fibrous dysplasia (FD) is a mosaic non-inheritable genetic disorder of the skeleton in which normal bone is replaced by structurally unsound fibro-osseous tissue. There is no curative treatment for FD, partly because its pathophysiology is not yet fully known. We present a simple mathematical model of the disease incorporating its basic known biology, to gain insight on the dynamics of the involve…
▽ More
Fibrous dysplasia (FD) is a mosaic non-inheritable genetic disorder of the skeleton in which normal bone is replaced by structurally unsound fibro-osseous tissue. There is no curative treatment for FD, partly because its pathophysiology is not yet fully known. We present a simple mathematical model of the disease incorporating its basic known biology, to gain insight on the dynamics of the involved bone-cell populations, and shed light on its pathophysiology. Our mathematical models account for the dynamic evolution over time of several interacting populations of bone cells averaged over a volume of bone of sufficient size in order to obtain consistent results. We develop an analytical study of the model and study its basic properties. The existence and stability of steady states are studied, an analysis of sensitivity on the model parameters is done, and different numerical simulations provide findings in agreement with the analytical results. We discuss the model dynamics match with known facts on the disease, and how some open questions could be addressed using the model.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Congruity of genomic and epidemiological data in modeling of local cholera outbreaks
Authors:
Mateusz Wilinski,
Lauren Castro,
Jeffrey Keithley,
Carrie Manore,
Josefina Campos,
Ethan Romero-Severson,
Daryl Domman,
Andrey Y. Lokhov
Abstract:
Cholera continues to be a global health threat. Understanding how cholera spreads between locations is fundamental to the rational, evidence-based design of intervention and control efforts. Traditionally, cholera transmission models have utilized cholera case count data. More recently, whole genome sequence data has qualitatively described cholera transmission. Integrating these data streams may…
▽ More
Cholera continues to be a global health threat. Understanding how cholera spreads between locations is fundamental to the rational, evidence-based design of intervention and control efforts. Traditionally, cholera transmission models have utilized cholera case count data. More recently, whole genome sequence data has qualitatively described cholera transmission. Integrating these data streams may provide much more accurate models of cholera spread, however no systematic analyses have been performed so far to compare traditional case-count models to the phylodynamic models from genomic data for cholera transmission. Here, we use high-fidelity case count and whole genome sequencing data from the 1991-1998 cholera epidemic in Argentina to directly compare the epidemiological model parameters estimated from these two data sources. We find that phylodynamic methods applied to cholera genomics data provide comparable estimates that are in line with established methods. Our methodology represents a critical step in building a framework for integrating case-count and genomic data sources for cholera epidemiology and other bacterial pathogens.
△ Less
Submitted 30 March, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.