Search | arXiv e-print repository

arXiv:2009.00707 [pdf, ps, other]

doi 10.1016/j.trechm.2020.10.012

Pursuing a Prospective Perspective

Abstract: Retrospective testing of predictive models does not consider the real-world context in which models are deployed. Prospective validation, on the other hand, enables meaningful comparisons between data generation processes by incorporating trained models and considering the subjective decisions that affect reproducibility. Prospective experiments are essential for consistent progress in modeling. Retrospective testing of predictive models does not consider the real-world context in which models are deployed. Prospective validation, on the other hand, enables meaningful comparisons between data generation processes by incorporating trained models and considering the subjective decisions that affect reproducibility. Prospective experiments are essential for consistent progress in modeling. △ Less

Submitted 28 October, 2020; v1 submitted 26 August, 2020; originally announced September 2020.

Comments: Trends in Chemistry (2020)

arXiv:2007.13437 [pdf, other]

Energy-based View of Retrosynthesis

Authors: Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, Bo Dai

Abstract: Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) w… ▽ More Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown. △ Less

Submitted 8 December, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

arXiv:2002.02530 [pdf, other]

doi 10.1021/acs.jmedchem.0c00452

Machine learning on DNA-encoded libraries: A new paradigm for hit-finding

Authors: Kevin McCloskey, Eric A. Sigel, Steven Kearnes, Ling Xue, Xia Tian, Dennis Moccia, Diana Gikunju, Sana Bazzaz, Betty Chan, Matthew A. Clark, John W. Cuozzo, Marie-Aude Guié, John P. Guilinger, Christelle Huguet, Christopher D. Hupp, Anthony D. Keefe, Christopher J. Mulhern, Ying Zhang, Patrick Riley

Abstract: DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value through screening of libraries with up to billions of unique small molecules. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from a large commercial collection and a virtual library of easily syn… ▽ More DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value through screening of libraries with up to billions of unique small molecules. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from a large commercial collection and a virtual library of easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters with chemist review restricted to the removal of molecules with potential for instability or reactivity. We validate this approach with a large prospective study (nearly 2000 compounds tested) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of {\sim}30% at 30 {\textmu}M and discovery of potent compounds (IC50 <10 nM) for every target. The model makes useful predictions even for molecules dissimilar to the original DEL and the compounds identified are diverse, predominantly drug-like, and different from known ligands. Collectively, the quality and quantity of DEL selection data; the power of modern machine learning methods; and access to large, inexpensive, commercially-available libraries creates a powerful new approach for hit finding. △ Less

Submitted 31 January, 2020; originally announced February 2020.

arXiv:1904.08915 [pdf, other]

Decoding Molecular Graph Embeddings with Reinforcement Learning

Authors: Steven Kearnes, Li Li, Patrick Riley

Abstract: We present RL-VAE, a graph-to-graph variational autoencoder that uses reinforcement learning to decode molecular graphs from latent embeddings. Methods have been described previously for graph-to-graph autoencoding, but these approaches require sophisticated decoders that increase the complexity of training and evaluation (such as requiring parallel encoders and decoders or non-trivial graph match… ▽ More We present RL-VAE, a graph-to-graph variational autoencoder that uses reinforcement learning to decode molecular graphs from latent embeddings. Methods have been described previously for graph-to-graph autoencoding, but these approaches require sophisticated decoders that increase the complexity of training and evaluation (such as requiring parallel encoders and decoders or non-trivial graph matching). Here, we repurpose a simple graph generator to enable efficient decoding and generation of molecular graphs. △ Less

Submitted 4 June, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

arXiv:1810.08678 [pdf, other]

doi 10.1038/s41598-019-47148-x

Optimization of Molecules via Deep Reinforcement Learning

Authors: Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N. Zare, Patrick Riley

Abstract: We present a framework, which we call Molecule Deep $Q$-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double $Q$-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100\% chemical validity. Further, we operate without pre-training on any dataset to… ▽ More We present a framework, which we call Molecule Deep $Q$-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double $Q$-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100\% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. Inspired by problems faced during medicinal chemistry lead optimization, we extend our model with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. We further show the path through chemical space to achieve optimization for a molecule to understand how the model works. △ Less

Submitted 28 February, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

arXiv:1802.08219 [pdf, other]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

Authors: Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley

Abstract: We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as in… ▽ More We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as input (and guarantees as output) scalars, vectors, and higher-order tensors, in the geometric sense of these terms. We demonstrate the capabilities of tensor field networks with tasks in geometry, physics, and chemistry. △ Less

Submitted 18 May, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

Comments: changes for NIPS submission

arXiv:1603.00856 [pdf, other]

doi 10.1007/s10822-016-9938-8

Molecular Graph Convolutions: Moving Beyond Fingerprints

Authors: Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley

Abstract: Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning… ▽ More Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph---atoms, bonds, distances, etc.---which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement. △ Less

Submitted 18 August, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

Comments: See "Version information" section

Journal ref: J Comput Aided Mol Des (2016)

arXiv:1502.02072 [pdf, other]

Massively Multitask Networks for Drug Discovery

Authors: Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, Vijay Pande

Abstract: Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these architectures at scale, we gather large amounts of data from public sources to create a dataset of nearly 40 million measurements across more than 200 biological targets. We investigate several aspects of the multitask framework… ▽ More Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these architectures at scale, we gather large amounts of data from public sources to create a dataset of nearly 40 million measurements across more than 200 biological targets. We investigate several aspects of the multitask framework by performing a series of empirical studies and obtain some interesting results: (1) massively multitask networks obtain predictive accuracies significantly better than single-task methods, (2) the predictive power of multitask networks improves as additional tasks and data are added, (3) the total amount of data and the total number of tasks both contribute significantly to multitask improvement, and (4) multitask networks afford limited transferability to tasks not in the training set. Our results underscore the need for greater data sharing and further algorithmic innovation to accelerate the drug discovery process. △ Less

Submitted 6 February, 2015; originally announced February 2015.

Comments: Preliminary work. Under review by the International Conference on Machine Learning (ICML)

Showing 1–8 of 8 results for author: Kearnes, S