-
PyRelationAL: a python library for active learning research and development
Authors:
Paul Scherer,
Alison Pouplin,
Alice Del Vecchio,
Suraj M S,
Oliver Bolton,
Jyothish Soman,
Jake P. Taylor-King,
Lindsay Edwards,
Thomas Gaudelet
Abstract:
Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data by strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit based around a two step design methodology for composing pool-based active learnin…
▽ More
Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data by strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit based around a two step design methodology for composing pool-based active learning strategies applicable to both single-acquisition and batch-acquisition strategies. This framework allows for the mathematical and practical specification of a broad number of existing and novel strategies under a consistent programming model and abstraction. Furthermore, we incorporate datasets and active learning tasks applicable to them to simplify comparative evaluation and benchmarking, along with an initial group of benchmarks across datasets included in this library. The toolkit is compatible with existing ML frameworks. PyRelationAL is maintained using modern software engineering practices -- with an inclusive contributor code of conduct -- to promote long term library quality and utilisation. PyRelationAL is available under a permissive Apache licence on PyPi and at https://github.com/RelationRx/pyrelational.
△ Less
Submitted 11 November, 2024; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Utilising Graph Machine Learning within Drug Discovery and Development
Authors:
Thomas Gaudelet,
Ben Day,
Arian R. Jamasb,
Jyothish Soman,
Cristian Regep,
Gertrude Liu,
Jeremy B. R. Hayter,
Richard Vickers,
Charles Roberts,
Jian Tang,
David Roblin,
Tom L. Blundell,
Michael M. Bronstein,
Jake P. Taylor-King
Abstract:
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development…
▽ More
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarise work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest graph machine learning will become a modelling framework of choice within biomedical machine learning.
△ Less
Submitted 10 February, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Sparse Dynamic Distribution Decomposition: Efficient Integration of Trajectory and Snapshot Time Series Data
Authors:
Jake P. Taylor-King,
Cristian Regep,
Jyothish Soman,
Flawnson Tong,
Catalina Cangea,
Charlie Roberts
Abstract:
Dynamic Distribution Decomposition (DDD) was introduced in Taylor-King et. al. (PLOS Comp Biol, 2020) as a variation on Dynamic Mode Decomposition. In brief, by using basis functions over a continuous state space, DDD allows for the fitting of continuous-time Markov chains over these basis functions and as a result continuously maps between distributions. The number of parameters in DDD scales by…
▽ More
Dynamic Distribution Decomposition (DDD) was introduced in Taylor-King et. al. (PLOS Comp Biol, 2020) as a variation on Dynamic Mode Decomposition. In brief, by using basis functions over a continuous state space, DDD allows for the fitting of continuous-time Markov chains over these basis functions and as a result continuously maps between distributions. The number of parameters in DDD scales by the square of the number of basis functions; we reformulate the problem and restrict the method to compact basis functions which leads to the inference of sparse matrices only -- hence reducing the number of parameters. Finally, we demonstrate how DDD is suitable to integrate both trajectory time series (paired between subsequent time points) and snapshot time series (unpaired time points). Methods capable of integrating both scenarios are particularly relevant for the analysis of biomedical data, whereby studies observe population at fixed time points (snapshots) and individual patient journeys with repeated follow ups (trajectories).
△ Less
Submitted 11 June, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.