-
An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries
Authors:
Aryan Pedawi,
Pawel Gniewek,
Chaoyi Chang,
Brandon M. Anderson,
Henry van den Bedem
Abstract:
Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits…
▽ More
Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits explicit enumeration, presenting new challenges for virtual screening. To overcome these challenges, we propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative model represents such libraries as a differentiable, hierarchically-organized database. Given a compound from the library, the molecular encoder constructs a query for retrieval, which is utilized by the molecular decoder to reconstruct the compound by first decoding its chemical reaction and subsequently decoding its reactants. Our design minimizes autoregression in the decoder, facilitating the generation of large, valid molecular graphs. Our method performs fast and parallel batch inference for ultra-large synthesis libraries, enabling a number of important applications in early-stage drug discovery. Compounds proposed by our method are guaranteed to be in the library, and thus synthetically and cost-effectively accessible. Importantly, CSLVAE can encode out-of-library compounds and search for in-library analogues. In experiments, we demonstrate the capabilities of the proposed method in the navigation of massive combinatorial synthesis libraries.
△ Less
Submitted 19 October, 2022;
originally announced November 2022.
-
Learning physics confers pose-sensitivity in structure-based virtual screening
Authors:
Pawel Gniewek,
Bradley Worley,
Kate Stafford,
Henry van den Bedem,
Brandon Anderson
Abstract:
In drug discovery, structure-based virtual high-throughput screening (vHTS) campaigns aim to identify bioactive ligands or "hits" for therapeutic protein targets from docked poses at specific binding sites. However, while generally successful at this task, many deep learning methods are known to be insensitive to protein-ligand interactions, decreasing the reliability of hit detection and hinderin…
▽ More
In drug discovery, structure-based virtual high-throughput screening (vHTS) campaigns aim to identify bioactive ligands or "hits" for therapeutic protein targets from docked poses at specific binding sites. However, while generally successful at this task, many deep learning methods are known to be insensitive to protein-ligand interactions, decreasing the reliability of hit detection and hindering discovery at novel binding sites. Here, we overcome this limitation by introducing a class of models with two key features: 1) we condition bioactivity on pose quality score, and 2) we present poor poses of true binders to the model as negative examples. The conditioning forces the model to learn details of physical interactions. We evaluate these models on a new benchmark designed to detect pose-sensitivity.
△ Less
Submitted 1 December, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Sequence-guided protein structure determination using graph convolutional and recurrent networks
Authors:
Po-Nan Li,
Saulo H. P. de Oliveira,
Soichi Wakatsuki,
Henry van den Bedem
Abstract:
Single particle, cryogenic electron microscopy (cryo-EM) experiments now routinely produce high-resolution data for large proteins and their complexes. Building an atomic model into a cryo-EM density map is challenging, particularly when no structure for the target protein is known a priori. Existing protocols for this type of task often rely on significant human intervention and can take hours to…
▽ More
Single particle, cryogenic electron microscopy (cryo-EM) experiments now routinely produce high-resolution data for large proteins and their complexes. Building an atomic model into a cryo-EM density map is challenging, particularly when no structure for the target protein is known a priori. Existing protocols for this type of task often rely on significant human intervention and can take hours to many days to produce an output. Here, we present a fully automated, template-free model building approach that is based entirely on neural networks. We use a graph convolutional network (GCN) to generate an embedding from a set of rotamer-based amino acid identities and candidate 3-dimensional C$α$ locations. Starting from this embedding, we use a bidirectional long short-term memory (LSTM) module to order and label the candidate identities and atomic locations consistent with the input protein sequence to obtain a structural model. Our approach paves the way for determining protein structures from cryo-EM densities at a fraction of the time of existing approaches and without the need for human intervention.
△ Less
Submitted 2 September, 2020; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Kinematic Flexibility Analysis: Hydrogen Bonding Patterns Impart a Spatial Hierarchy of Protein Motion
Authors:
Dominik Budday,
Sigrid Leyendecker,
Henry van den Bedem
Abstract:
Elastic network models (ENM) and constraint-based, topological rigidity analysis are two distinct, coarse-grained approaches to study conformational flexibility of macromolecules. In the two decades since their introduction, both have contributed significantly to insights into protein molecular mechanisms and function. However, despite a shared purpose of these approaches, the topological nature o…
▽ More
Elastic network models (ENM) and constraint-based, topological rigidity analysis are two distinct, coarse-grained approaches to study conformational flexibility of macromolecules. In the two decades since their introduction, both have contributed significantly to insights into protein molecular mechanisms and function. However, despite a shared purpose of these approaches, the topological nature of rigidity analysis, and thereby the absence of motion modes, has impeded a direct comparison. Here, we present an alternative, kinematic approach to rigidity analysis, which circumvents these drawbacks. We introduce a novel protein hydrogen bond network spectral decomposition, which provides an orthonormal basis for collective motions modulated by non-covalent interactions, analogous to the eigenspectrum of normal modes, and decomposes proteins into rigid clusters identical to those from topological rigidity. Our kinematic flexibility analysis bridges topological rigidity theory and ENM, and enables a detailed analysis of motion modes obtained from both approaches. Our analysis reveals that collectivity of protein motions, reported by the Shannon entropy, is significantly lower for rigidity theory versus normal mode approaches. Strikingly, kinematic flexibility analysis suggests that the hydrogen bonding network encodes a protein-fold specific, spatial hierarchy of motions, which goes nearly undetected in ENM. This hierarchy reveals distinct motion regimes that rationalize protein stiffness changes observed from experiment and molecular dynamics simulations. A formal expression for changes in free energy derived from the spectral decomposition indicates that motions across nearly 40% of modes obey enthalpy-entropy compensation. Taken together, our analysis suggests that hydrogen bond networks have evolved to modulate protein structure and dynamics.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.