Search | arXiv e-print repository

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Authors: Albert Bou, Morgan Thomas, Sebastian Dittert, Carles Navarro Ramírez, Maciej Majewski, Ye Wang, Shivam Patel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Simone Sciabola, Gianni De Fabritiis

Abstract: In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we… ▽ More In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license. △ Less

Submitted 22 July, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2306.15859 [pdf, other]

Evaluation of dynamic causal modelling and Bayesian model selection using simulations of networks of spiking neurons

Authors: Matthew G. Thomas

Abstract: Inferring the mechanisms underlying physiological and pathological processes in the brain from recorded electrical activity is challenging. Bayesian model selection and dynamic causal modelling aim to identify likely biophysical models to explain data and to fit the model parameters. Here, we use data generated by simulations to investigate the effectiveness of Bayesian model selection and dynamic… ▽ More Inferring the mechanisms underlying physiological and pathological processes in the brain from recorded electrical activity is challenging. Bayesian model selection and dynamic causal modelling aim to identify likely biophysical models to explain data and to fit the model parameters. Here, we use data generated by simulations to investigate the effectiveness of Bayesian model selection and dynamic causal modelling when applied at steady state in the frequency domain to identify and fit Jansen-Rit models. We first investigate the impact of the necessary assumption of linearity on the dynamics of the Jansen-Rit model. We then apply dynamic causal modelling and Bayesian model selection to data generated from simulations of linear neural mass models, non-linear neural mass models, and networks of discrete spiking neurons. Action potentials are a characteristic feature of neuronal dynamics but have not previously been explicitly included in simulations used to test Bayesian model selection or dynamic causal modelling. We find that the assumption of linearity abolishes the qualitative transitions seen as a function of the connectivity parameter in the original Jansen-Rit model. As with previous work, we find that the recovery procedures are effective when applied to data from linear Jansen-Rit neural mass models, however, when applying them to non-linear neural mass models and networks of discrete spiking neurons we find that their effectiveness is significantly reduced, suggesting caution is required when applying these methods. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 18 pages, 15 figures

arXiv:2212.01385 [pdf, other]

Re-evaluating sample efficiency in de novo molecule generation

Authors: Morgan Thomas, Noel M. O'Boyle, Andreas Bender, Chris De Graaf

Abstract: De novo molecule generation can suffer from data inefficiency; requiring large amounts of training data or many sampled data points to conduct objective optimization. The latter is a particular disadvantage when combining deep generative models with computationally expensive molecule scoring functions (a.k.a. oracles) commonly used in computer-aided drug design. Recent works have therefore focused… ▽ More De novo molecule generation can suffer from data inefficiency; requiring large amounts of training data or many sampled data points to conduct objective optimization. The latter is a particular disadvantage when combining deep generative models with computationally expensive molecule scoring functions (a.k.a. oracles) commonly used in computer-aided drug design. Recent works have therefore focused on methods to improve sample efficiency in the context of de novo molecule drug design, or to benchmark it. In this work, we discuss and adapt a recent sample efficiency benchmark to better reflect realistic goals also with respect to the quality of chemistry generated, which must always be considered in the context of small-molecule drug design; we then re-evaluate all benchmarked generative models. We find that accounting for molecular weight and LogP with respect to the training data, and the diversity of chemistry proposed, re-orders the ranking of generative models. In addition, we benchmark a recently proposed method to improve sample efficiency (Augmented Hill-Climb) and found it ranked top when considering both the sample efficiency and chemistry of molecules generated. Continual improvements in sample efficiency and chemical desirability enable more routine integration of computationally expensive scoring functions on a more realistic timescale. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: Submission to ELLIS ML4Molecules Workshop 2022

arXiv:2109.12517 [pdf, other]

doi 10.1007/978-3-030-87586-2_13

Dynamic Adaptive Spatio-temporal Graph Convolution for fMRI Modelling

Authors: Ahmed El-Gazzar, Rajat Mani Thomas, Guido van Wingen

Abstract: The characterisation of the brain as a functional network in which the connections between brain regions are represented by correlation values across time series has been very popular in the last years. Although this representation has advanced our understanding of brain function, it represents a simplified model of brain connectivity that has a complex dynamic spatio-temporal nature. Oversimplifi… ▽ More The characterisation of the brain as a functional network in which the connections between brain regions are represented by correlation values across time series has been very popular in the last years. Although this representation has advanced our understanding of brain function, it represents a simplified model of brain connectivity that has a complex dynamic spatio-temporal nature. Oversimplification of the data may hinder the merits of applying advanced non-linear feature extraction algorithms. To this end, we propose a dynamic adaptive spatio-temporal graph convolution (DAST-GCN) model to overcome the shortcomings of pre-defined static correlation-based graph structures. The proposed approach allows end-to-end inference of dynamic connections between brain regions via layer-wise graph structure learning module while mapping brain connectivity to a phenotype in a supervised learning framework. This leverages the computational power of the model, data and targets to represent brain connectivity, and could enable the identification of potential biomarkers for the supervised target in question. We evaluate our pipeline on the UKBiobank dataset for age and gender classification tasks from resting-state functional scans and show that it outperforms currently adapted linear and non-linear methods in neuroimaging. Further, we assess the generalizability of the inferred graph structure by transferring the pre-trained graph to an independent dataset for the same task. Our results demonstrate the task-robustness of the graph against different scanning parameters and demographics. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: Accepted at International Workshop on Machine Learning in Clinical Neuroimaging (MLCN2021)

Journal ref: Abdulkadir A. et al. (eds) Machine Learning in Clinical Neuroimaging. MLCN 2021. Lecture Notes in Computer Science, vol 13001. Springer, Cham

arXiv:2109.11352 [pdf]

Proteomics Standards Initiatives ProForma 2.0 Unifying the encoding of Proteoforms and Peptidoforms

Authors: Richard D. LeDuc, Eric W. Deutsch, Pierre-Alain Binz, Ryan T. Fellers, Anthony J. Cesnik, Joshua A. Klein, Tim Van Den Bossche, Ralf Gabriels, Arshika Yalavarthi, Yasset Perez-Riverol, Jeremy Carver, Wout Bittremieux, Shin Kawano, Benjamin Pullman, Nuno Bandeira, Neil L. Kelleher, Paul M. Thomas, Juan Antonio Vizcaíno

Abstract: There is the need to represent in a standard manner all the possible variations of a protein or peptide primary sequence, including both artefactual and post-translational modifications of peptides and proteins. With that overall aim, here, the Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has developed a notation, called ProForma 2.0, which is a substantial extension of… ▽ More There is the need to represent in a standard manner all the possible variations of a protein or peptide primary sequence, including both artefactual and post-translational modifications of peptides and proteins. With that overall aim, here, the Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has developed a notation, called ProForma 2.0, which is a substantial extension of the original ProForma notation, developed by the Consortium for Top-Down Proteomics (CTDP). ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. Therefore, this notation supports use cases needed for bottom-up and middle/topdown proteomics approaches and allows the encoding of highly modified proteins and peptides using a human and machine-readable string. ProForma 2.0 covers encoding protein modification names and accessions, cross-linking reagents including disulfides, glycans, modifications encoded using mass shifts and/or via chemical formulas, labile and C or N-terminal modifications, ambiguity in the modification position and representation of atomic isotopes, among other use cases. Notational conventions are based on public controlled vocabularies and ontologies. Detailed information about the notation and existing implementations are available at http://www.psidev.info/proforma and at the corresponding GitHub repository (https://github.com/HUPO-PSI/proforma). △ Less

Submitted 21 March, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2004.11460 [pdf, other]

Development of a Machine Learning Model and Mobile Application to Aid in Predicting Dosage of Vitamin K Antagonists Among Indian Patients

Authors: Amruthlal M, Devika S, Ameer Suhail P A, Aravind K Menon, Vignesh Krishnan, Alan Thomas, Manu Thomas, Sanjay G, Lakshmi Kanth L R, Jimmy Jose, Harikrishnan S

Abstract: Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - Internati… ▽ More Patients who undergo mechanical heart valve replacements or have conditions like Atrial Fibrillation have to take Vitamin K Antagonists (VKA) drugs to prevent coagulation of blood. These drugs have narrow therapeutic range and need to be very closely monitored due to life threatening side effects. The dosage of VKA drug is determined and revised by a physician based on Prothrombin Time - International Normalised Ratio (PT-INR) value obtained through a blood test. Our work aimed at predicting the maintenance dosage of warfarin, the present most widely recommended anticoagulant drug, using the de-identified medical data collected from 109 patients from Kerala. A Support Vector Machine (SVM) Regression model was built to predict the maintenance dosage of warfarin, for patients who have been undergoing treatment from a physician and have reached stable INR values between 2.0 and 4.0. △ Less

Submitted 19 April, 2020; originally announced April 2020.

arXiv:2004.11355 [pdf, other]

Excess registered deaths in England and Wales during the COVID-19 pandemic, March 2020 to May 2020

Authors: Drew M Thomas

Abstract: Official counts of COVID-19 deaths have been criticized for potentially including people who did not die of COVID-19 but merely died with COVID-19. I address that critique by fitting a generalized additive model to weekly counts of all deaths registered in England and Wales during the 2010s. The model produces baseline rates of death registrations expected without the COVID-19 pandemic, and compar… ▽ More Official counts of COVID-19 deaths have been criticized for potentially including people who did not die of COVID-19 but merely died with COVID-19. I address that critique by fitting a generalized additive model to weekly counts of all deaths registered in England and Wales during the 2010s. The model produces baseline rates of death registrations expected without the COVID-19 pandemic, and comparing those baselines to recent counts of registered deaths exposes the emergence of excess deaths late in March 2020. By April's end, England and Wales registered 45,300 $\pm$ 3200 excess deaths of adults aged 45+. Through 22 May, the last day of available all-deaths data, 56,600 $\pm$ 4400 were registered (about 53% of which were of men). Both the ONS's corresponding count of 43,205 death certificates which mention COVID-19, and the Department of Health and Social Care's count of 33,671 deaths, are appreciably less, implying that their counting methods have underestimated, not overestimated, the pandemic's true death toll. If underreporting rates have held steady during May, about 59,000 direct and indirect COVID-19 deaths might have been registered through the end of May but not yet publicly reported in full. △ Less

Submitted 2 June, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Comments: 27 A4 pages, 4 tables, 14 figures, 2 appendices

arXiv:2004.01810 [pdf, other]

doi 10.1016/j.bpj.2019.11.1710

A Diffusion-Based Embedding of the Stochastic Simulation Algorithm in Continuous Space

Authors: Marcus Thomas, Russell Schwartz

Abstract: A variety of simulation methodologies have been used for modeling reaction-diffusion dynamics -- including approaches based on Differential Equations (DE), the Stochastic Simulation Algorithm (SSA), Brownian Dynamics (BD), Green's Function Reaction Dynamics (GFRD), and variations thereon -- each offering trade-offs with respect to the ranges of phenomena they can model, their computational tractab… ▽ More A variety of simulation methodologies have been used for modeling reaction-diffusion dynamics -- including approaches based on Differential Equations (DE), the Stochastic Simulation Algorithm (SSA), Brownian Dynamics (BD), Green's Function Reaction Dynamics (GFRD), and variations thereon -- each offering trade-offs with respect to the ranges of phenomena they can model, their computational tractability, and the difficulty of fitting them to experimental measurements. Here, we develop a multiscale approach combining efficient SSA-like sampling suitable for well-mixed systems with aspects of the slower but space-aware GFRD model, assuming as with GFRD that reactions occur in a spatially heterogeneous environment that must be explicitly modeled. Our method extends the SSA approach in two major ways. First, we sample bimolecular association reactions following diffusive motion with a time-dependent reaction propensity. Second, reaction locations are sampled from within overlapping diffusion spheres describing the spatial probability densities of individual reactants. We show the approach to provide efficient simulation of spatially heterogeneous biochemistry in comparison to alternative methods via application to a Michaelis-Menten model. △ Less

Submitted 20 May, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

arXiv:1907.01288 [pdf, other]

Simple 1-D Convolutional Networks for Resting-State fMRI Based Classification in Autism

Authors: Ahmed El Gazzar, Leonardo Cerliani, Guido van Wingen, Rajat Mani Thomas

Abstract: Deep learning methods are increasingly being used with neuroimaging data like structural and function magnetic resonance imaging (MRI) to predict the diagnosis of neuropsychiatric and neurological disorders. For psychiatric disorders in particular, it is believed that one of the most promising modality is the resting-state functional MRI (rsfMRI), which captures the intrinsic connectivity between… ▽ More Deep learning methods are increasingly being used with neuroimaging data like structural and function magnetic resonance imaging (MRI) to predict the diagnosis of neuropsychiatric and neurological disorders. For psychiatric disorders in particular, it is believed that one of the most promising modality is the resting-state functional MRI (rsfMRI), which captures the intrinsic connectivity between regions in the brain. Because rsfMRI data points are inherently high-dimensional (~1M), it is impossible to process the entire input in its raw form. In this paper, we propose a very simple transformation of the rsfMRI images that captures all of the temporal dynamics of the signal but sub-samples its spatial extent. As a result, we use a very simple 1-D convolutional network which is fast to train, requires minimal preprocessing and performs at par with the state-of-the-art on the classification of Autism spectrum disorders. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: accepted for publication in IJCNN 2019

arXiv:1906.07837 [pdf, other]

doi 10.1109/VISUAL.2019.8933544

TempoCave: Visualizing Dynamic Connectome Datasets to Support Cognitive Behavioral Therapy

Authors: Ran Xu, Manu Mathew Thomas, Alex Leow, Olusola Ajilore, Angus G. Forbes

Abstract: We introduce TempoCave, a novel visualization application for analyzing dynamic brain networks, or connectomes. TempoCave provides a range of functionality to explore metrics related to the activity patterns and modular affiliations of different regions in the brain. These patterns are calculated by processing raw data retrieved functional magnetic resonance imaging (fMRI) scans, which creates a n… ▽ More We introduce TempoCave, a novel visualization application for analyzing dynamic brain networks, or connectomes. TempoCave provides a range of functionality to explore metrics related to the activity patterns and modular affiliations of different regions in the brain. These patterns are calculated by processing raw data retrieved functional magnetic resonance imaging (fMRI) scans, which creates a network of weighted edges between each brain region, where the weight indicates how likely these regions are to activate synchronously. In particular, we support the analysis needs of clinical psychologists, who examine these modular affiliations and weighted edges and their temporal dynamics, utilizing them to understand relationships between neurological disorders and brain activity, which could have a significant impact on the way in which patients are diagnosed and treated. We summarize the core functionality of TempoCave, which supports a range of comparative tasks, and runs both in a desktop mode and in an immersive mode. Furthermore, we present a real-world use case that analyzes pre- and post-treatment connectome datasets from 27 subjects in a clinical study investigating the use of cognitive behavior therapy to treat major depression disorder, indicating that TempoCave can provide new insight into the dynamic behavior of the human brain. △ Less

Submitted 6 August, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

arXiv:1404.4287 [pdf, other]

doi 10.1016/j.jtbi.2014.10.032

Network impact on persistence in a finite population dynamic diffusion model: application to an emergent seed exchange network

Authors: Pierre Barbillon, Mathieu Thomas, Isabelle Goldringer, Frédéric Hospital, Stéphane Robin

Abstract: Dynamic extinction colonisation models (also called contact processes) are widely studied in epidemiology and in metapopulation theory. Contacts are usually assumed to be possible only through a network of connected patches. This network accounts for a spatial landscape or a social organisation of interactions. Thanks to social network literature, heterogeneous networks of contacts can be consider… ▽ More Dynamic extinction colonisation models (also called contact processes) are widely studied in epidemiology and in metapopulation theory. Contacts are usually assumed to be possible only through a network of connected patches. This network accounts for a spatial landscape or a social organisation of interactions. Thanks to social network literature, heterogeneous networks of contacts can be considered. A major issue is to assess the influence of the network in the dynamic model. Most work with this common purpose uses deterministic models or an approximation of a stochastic Extinction-Colonisation model (sEC) which are relevant only for large networks. When working with a limited size network, the induced stochasticity is essential and has to be taken into account in the conclusions. Here, a rigorous framework is proposed for limited size networks and the limitations of the deterministic approximation are exhibited. This framework allows exact computations when the number of patches is small. Otherwise, simulations are used and enhanced by adapted simulation techniques when necessary. A sensitivity analysis was conducted to compare four main topologies of networks in contrasting settings to determine the role of the network. A challenging case was studied in this context: seed exchange of crop species in the Réseau Semences Paysannes (RSP), an emergent French farmers' organisation. A stochastic Extinction-Colonisation model was used to characterize the consequences of substantial changes in terms of RSP's social organisation on the ability of the system to maintain crop varieties. △ Less

Submitted 16 April, 2014; originally announced April 2014.

Journal ref: Journal of Theoretical Biology Volume 365, 21 January 2015, Pages 365 376

arXiv:1312.6639 [pdf]

doi 10.1038/nature13673

Ancient human genomes suggest three ancestral populations for present-day Europeans

Authors: Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Karola Kirsanow, Peter H. Sudmant, Joshua G. Schraiber, Sergi Castellano, Mark Lipson, Bonnie Berger, Christos Economou, Ruth Bollongino, Qiaomei Fu, Kirsten I. Bos, Susanne Nordenfelt, Heng Li, Cesare de Filippo, Kay Prüfer, Susanna Sawyer, Cosimo Posth, Wolfgang Haak, Fredrik Hallgren, Elin Fornander, Nadin Rohland , et al. (95 additional authors not shown)

Abstract: We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly diff… ▽ More We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had $\sim$44% ancestry from a "Basal Eurasian" lineage that split prior to the diversification of all other non-African lineages. △ Less

Submitted 1 April, 2014; v1 submitted 23 December, 2013; originally announced December 2013.

arXiv:1308.3673 [pdf, other]

Substance Abuse via Legally Prescribed Drugs: The Case of Vicodin in the United States

Authors: Wendy K. Caldwell, Benjamin Freedman, Luke Settles, Michael M. Thomas, Anarina Murillo, Erika Camacho, Stephen Wirkus

Abstract: Vicodin is the most commonly prescribed pain reliever in the United States. Research indicates that there are two million people who are currently abusing Vicodin, and the majority of those who abuse Vicodin were initially exposed to it via prescription. Our goal is to determine the most effective strategies for reducing the overall population of Vicodin abusers. More specifically, we focus on whe… ▽ More Vicodin is the most commonly prescribed pain reliever in the United States. Research indicates that there are two million people who are currently abusing Vicodin, and the majority of those who abuse Vicodin were initially exposed to it via prescription. Our goal is to determine the most effective strategies for reducing the overall population of Vicodin abusers. More specifically, we focus on whether prevention methods aimed at educating doctors and patients on the potential for drug abuse or treatment methods implemented after a person abuses Vicodin will have a greater overall impact. We consider one linear and two non-linear compartmental models in which medical users of Vicodin can transition into the abuser compartment or leave the population by no longer taking the drug. Once Vicodin abusers, people can transition into a treatment compartment, with the possibility of leaving the population through successful completion of treatment or of relapsing and re-entering the abusive compartment. The linear model assumes no social interaction, while both non-linear models consider interaction. One considers interaction with abusers affecting the relapse rate, while the other assumes both this and an additional interaction between the number of abusers and the number of new prescriptions. Sensitivity analyses are conducted varying the rates of success of these intervention methods measured by the parameters to determine which strategy has the greatest impact on controlling the population of Vicodin abusers. From these models and analyses, we determine that manipulating parameters tied to prevention measures has a greater impact on reducing the population of abusers than manipulating parameters associated with treatment. We also note that increasing the rate at which abusers seek treatment affects the population of abusers more than the success rate of treatment itself. △ Less

Submitted 26 July, 2013; originally announced August 2013.

Showing 1–13 of 13 results for author: Thomas, M