-
Auto-ADMET: An Effective and Interpretable AutoML Method for Chemical ADMET Property Prediction
Authors:
Alex G. C. de Sá,
David B. Ascher
Abstract:
Machine learning (ML) has been playing important roles in drug discovery in the past years by providing (pre-)screening tools for prioritising chemical compounds to pass through wet lab experiments. One of the main ML tasks in drug discovery is to build quantitative structure-activity relationship (QSAR) models, associating the molecular structure of chemical compounds with an activity or property…
▽ More
Machine learning (ML) has been playing important roles in drug discovery in the past years by providing (pre-)screening tools for prioritising chemical compounds to pass through wet lab experiments. One of the main ML tasks in drug discovery is to build quantitative structure-activity relationship (QSAR) models, associating the molecular structure of chemical compounds with an activity or property. These properties -- including absorption, distribution, metabolism, excretion and toxicity (ADMET) -- are essential to model compound behaviour, activity and interactions in the organism. Although several methods exist, the majority of them do not provide an appropriate model's personalisation, yielding to bias and lack of generalisation to new data since the chemical space usually shifts from application to application. This fact leads to low predictive performance when completely new data is being tested by the model. The area of Automated Machine Learning (AutoML) emerged aiming to solve this issue, outputting tailored ML algorithms to the data at hand. Although an important task, AutoML has not been practically used to assist cheminformatics and computational chemistry researchers often, with just a few works related to the field. To address these challenges, this work introduces Auto-ADMET, an interpretable evolutionary-based AutoML method for chemical ADMET property prediction. Auto-ADMET employs a Grammar-based Genetic Programming (GGP) method with a Bayesian Network Model to achieve comparable or better predictive performance against three alternative methods -- standard GGP method, pkCSM and XGBOOST model -- on 12 benchmark chemical ADMET property prediction datasets. The use of a Bayesian Network model on Auto-ADMET's evolutionary process assisted in both shaping the search procedure and interpreting the causes of its AutoML performance.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Towards Evolutionary-based Automated Machine Learning for Small Molecule Pharmacokinetic Prediction
Authors:
Alex G. C. de Sá,
David B. Ascher
Abstract:
Machine learning (ML) is revolutionising drug discovery by expediting the prediction of small molecule properties essential for developing new drugs. These properties -- including absorption, distribution, metabolism and excretion (ADME)-- are crucial in the early stages of drug development since they provide an understanding of the course of the drug in the organism, i.e., the drug's pharmacokine…
▽ More
Machine learning (ML) is revolutionising drug discovery by expediting the prediction of small molecule properties essential for developing new drugs. These properties -- including absorption, distribution, metabolism and excretion (ADME)-- are crucial in the early stages of drug development since they provide an understanding of the course of the drug in the organism, i.e., the drug's pharmacokinetics. However, existing methods lack personalisation and rely on manually crafted ML algorithms or pipelines, which can introduce inefficiencies and biases into the process. To address these challenges, we propose a novel evolutionary-based automated ML method (AutoML) specifically designed for predicting small molecule properties, with a particular focus on pharmacokinetics. Leveraging the advantages of grammar-based genetic programming, our AutoML method streamlines the process by automatically selecting algorithms and designing predictive pipelines tailored to the particular characteristics of input molecular data. Results demonstrate AutoML's effectiveness in selecting diverse ML algorithms, resulting in comparable or even improved predictive performances compared to conventional approaches. By offering personalised ML-driven pipelines, our method promises to enhance small molecule research in drug discovery, providing researchers with a valuable tool for accelerating the development of novel therapeutic drugs.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Explainable Machine Learning for ICU Readmission Prediction
Authors:
Alex G. C. de Sá,
Daniel Gould,
Anna Fedyukova,
Mitchell Nicholas,
Lucy Dockrell,
Calvin Fletcher,
David Pilcher,
Daniel Capurro,
David B. Ascher,
Khaled El-Khawas,
Douglas E. V. Pires
Abstract:
The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to th…
▽ More
The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to this pathway's difficulty, occurring when patients are admitted again to the ICU in a short timeframe, resulting in high mortality rates and high resource utilisation. Several works have tried to predict readmission through patients' medical information. Although they have some level of success while predicting readmission, those works do not properly assess, characterise and understand readmission prediction. This work proposes a standardised and explainable machine learning pipeline to model patient readmission on a multicentric database (i.e., the eICU cohort with 166,355 patients, 200,859 admissions and 6,021 readmissions) while validating it on monocentric (i.e., the MIMIC IV cohort with 382,278 patients, 523,740 admissions and 5,984 readmissions) and multicentric settings. Our machine learning pipeline achieved predictive performance in terms of the area of the receiver operating characteristic curve (AUC) up to 0.7 with a Random Forest classification model, yielding an overall good calibration and consistency on validation sets. From explanations provided by the constructed models, we could also derive a set of insightful conclusions, primarily on variables related to vital signs and blood tests (e.g., albumin, blood urea nitrogen and hemoglobin levels), demographics (e.g., age, and admission height and weight), and ICU-associated variables (e.g., unit type). These insights provide an invaluable source of information during clinicians' decision-making while discharging ICU patients.
△ Less
Submitted 13 September, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
AI driven B-cell Immunotherapy Design
Authors:
Bruna Moreira da Silva,
David B. Ascher,
Nicholas Geard,
Douglas E. V. Pires
Abstract:
Antibodies, a prominent class of approved biologics, play a crucial role in detecting foreign antigens. The effectiveness of antigen neutralisation and elimination hinges upon the strength, sensitivity, and specificity of the paratope-epitope interaction, which demands resource-intensive experimental techniques for characterisation. In recent years, artificial intelligence and machine learning met…
▽ More
Antibodies, a prominent class of approved biologics, play a crucial role in detecting foreign antigens. The effectiveness of antigen neutralisation and elimination hinges upon the strength, sensitivity, and specificity of the paratope-epitope interaction, which demands resource-intensive experimental techniques for characterisation. In recent years, artificial intelligence and machine learning methods have made significant strides, revolutionising the prediction of protein structures and their complexes. The past decade has also witnessed the evolution of computational approaches aiming to support immunotherapy design. This review focuses on the progress of machine learning-based tools and their frameworks in the domain of B-cell immunotherapy design, encompassing linear and conformational epitope prediction, paratope prediction, and antibody design. We mapped the most commonly used data sources, evaluation metrics, and method availability and thoroughly assessed their significance and limitations, discussing the main challenges ahead.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
Modeling Adaptive Self-healing Systems
Authors:
Habtom Kahsay Gidey,
Diego Marmsoler,
Dominik Ascher
Abstract:
Motivation: Smart grids design requires energy distribution operations to be adaptable to abnormality. This requirement entails distribution system operators (DSOs) to optimize restoration to normal operational states dynamically. However, these design challenges demand collaborative research efforts on sophisticated modeling and simulation approaches. Approach: In the ESOSEG research project, ana…
▽ More
Motivation: Smart grids design requires energy distribution operations to be adaptable to abnormality. This requirement entails distribution system operators (DSOs) to optimize restoration to normal operational states dynamically. However, these design challenges demand collaborative research efforts on sophisticated modeling and simulation approaches. Approach: In the ESOSEG research project, analyzing the smart grid domain as a software-intensive system, we employed a dynamic architecture approach, particularly the FOCUS theory, to model and assure the domains' self-healing requirements. Although some works specify various self-healing systems, to the best of our knowledge, the use of the approach in smart grids is the first work to enable a formal specification and verification of self-healing properties in smart grids. Results: As a result, to support the modeling and verification process, we developed tool support with Eclipse Modeling Framework (EMF), Xtext, and other languages in the EMF ecosystem. The tool includes a grammar or a meta-model of the DSL, an interface to enable textual and graphical modeling of architectural patterns and code transformer engine for verification. Furthermore, we evaluated the modeling and verification features of the tool support with an e-Car charging scenario for modeling adaptive self-healing properties. Futureworks: As an outlook, future works could include investigation of comprehensive case studies. These, for instance, could be further particular adaptability scenarios addressing challenges in DSOs. Another interesting aspect could be the evaluation of the modeling approach by investigating its use with engineers involved in a smart grid design. Next, the evaluation could be followed with abstractions of the verification process to make it useable by system architects with no knowledge of the proof language, Isabelle/HOL.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Methodology for Holistic Reference Modeling in Systems Engineering
Authors:
Dominik Ascher,
Erik Heiland,
Diana Schnell,
Peter Hillmann,
Andreas Karcher
Abstract:
Models in face of increasing complexity support development of new systems and enterprises. For an efficient procedure, reference models are adapted in order to reach a solution with les overhead which covers all necessary aspects. Here, a key challenge is applying a consistent methodology for the descriptions of such reference designs. This paper presents a holistic approach to describe reference…
▽ More
Models in face of increasing complexity support development of new systems and enterprises. For an efficient procedure, reference models are adapted in order to reach a solution with les overhead which covers all necessary aspects. Here, a key challenge is applying a consistent methodology for the descriptions of such reference designs. This paper presents a holistic approach to describe reference models across different views and levels. Modeling stretches from the requirements and capabilities over their subdivision to services and components up to the realization in processes and data structures. Benefits include an end-to-end traceability of the capability coverage with performance parameters considered already at the starting point of the reference design. This enables focused development while considering design constraints and potential bottlenecks. We demonstrate the approach on the example of the development of a smart robot. Here, our methodology highly supports transferability of designs for the development of further systems.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Deep Learning in Diabetic Foot Ulcers Detection: A Comprehensive Evaluation
Authors:
Moi Hoon Yap,
Ryo Hachiuma,
Azadeh Alavi,
Raphael Brungel,
Bill Cassidy,
Manu Goyal,
Hongtao Zhu,
Johannes Ruckert,
Moshe Olshansky,
Xiao Huang,
Hideo Saito,
Saeed Hassanpour,
Christoph M. Friedrich,
David Ascher,
Anping Song,
Hiroki Kajita,
David Gillespie,
Neil D. Reeves,
Joseph Pappachan,
Claire O'Shea,
Eibe Frank
Abstract:
There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 i…
▽ More
There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 images for testing. This paper summarises the results of DFUC2020 by comparing the deep learning-based algorithms proposed by the winning teams: Faster R-CNN, three variants of Faster R-CNN and an ensemble method; YOLOv3; YOLOv5; EfficientDet; and a new Cascade Attention Network. For each deep learning method, we provide a detailed description of model architecture, parameter settings for training and additional stages including pre-processing, data augmentation and post-processing. We provide a comprehensive evaluation for each method. All the methods required a data augmentation stage to increase the number of images available for training and a post-processing stage to remove false positives. The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434. Finally, we demonstrate that the ensemble method based on different deep learning methods can enhanced the F1-Score but not the mAP.
△ Less
Submitted 24 May, 2021; v1 submitted 7 October, 2020;
originally announced October 2020.