Search | arXiv e-print repository

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials

Authors: Peter Eastman, Raimondas Galvelis, Raúl P. Peláez, Charlles R. A. Abreu, Stephen E. Farr, Emilio Gallicchio, Anton Gorenko, Michael M. Henry, Frank Hu, Jing Huang, Andreas Krämer, Julien Michel, Joshua A. Mitchell, Vijay S. Pande, João PGLM Rodrigues, Jaime Rodriguez-Guerra, Andrew C. Simmonett, Sukrit Singh, Jason Swails, Philip Turner, Yuanqing Wang, Ivy Zhang, John D. Chodera, Gianni De Fabritiis, Thomas E. Markland

Abstract: Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general… ▽ More Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost. △ Less

Submitted 29 November, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 16 pages, 5 figures

ACM Class: J.2; J.3

arXiv:2303.08993 [pdf]

doi 10.1016/j.bpj.2023.03.028

Folding@home: achievements from over twenty years of citizen science herald the exascale era

Authors: Vincent A. Voelz, Vijay S. Pande, Gregory R. Bowman

Abstract: Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances… ▽ More Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances this perspective has enabled. As the project's name implies, the early years of Folding@home focused on driving advances in our understanding of protein folding by developing statistical methods for capturing long-timescale processes and facilitating insight into complex dynamical processes. Success laid a foundation for broadening the scope of Folding@home to address other functionally relevant conformational changes, such as receptor signaling, enzyme dynamics, and ligand binding. Continued algorithmic advances, hardware developments such as GPU-based computing, and the growing scale of Folding@home have enabled the project to focus on new areas where massively parallel sampling can be impactful. While previous work sought to expand toward larger proteins with slower conformational changes, new work focuses on large-scale comparative studies of different protein sequences and chemical compounds to better understand biology and inform the development of small molecule drugs. Progress on these fronts enabled the community to pivot quickly in response to the COVID-19 pandemic, expanding to become the world's first exascale computer and deploying this massive resource to provide insight into the inner workings of the SARS-CoV-2 virus and aid the development of new antivirals. This success provides a glimpse of what's to come as exascale supercomputers come online, and Folding@home continues its work. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 24 pages, 6 figures

arXiv:1910.10675 [pdf, other]

Classical Quantum Optimization with Neural Network Quantum States

Authors: Joseph Gomes, Keri A. McKiernan, Peter Eastman, Vijay S. Pande

Abstract: The classical simulation of quantum systems typically requires exponential resources. Recently, the introduction of a machine learning-based wavefunction ansatz has led to the ability to solve the quantum many-body problem in regimes that had previously been intractable for existing exact numerical methods. Here, we demonstrate the utility of the variational representation of quantum states based… ▽ More The classical simulation of quantum systems typically requires exponential resources. Recently, the introduction of a machine learning-based wavefunction ansatz has led to the ability to solve the quantum many-body problem in regimes that had previously been intractable for existing exact numerical methods. Here, we demonstrate the utility of the variational representation of quantum states based on artificial neural networks for performing quantum optimization. We show empirically that this methodology achieves high approximation ratio solutions with polynomial classical computing resources for a range of instances of the Maximum Cut (MaxCut) problem whose solutions have been encoded into the ground state of quantum many-body systems up to and including 256 qubits. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: Second Workshop on Machine Learning and the Physical Sciences (NeurIPS 2019), Vancouver, Canada

arXiv:1908.00971 [pdf]

Physical machine learning outperforms "human learning" in Quantum Chemistry

Authors: Anton V. Sinitskiy, Vijay S. Pande

Abstract: Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms… ▽ More Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. By generalizing the famous Hohenberg-Kohn theorems, we derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than current versions of DFT. In particular, we reached a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, noticeably lower than those of DFT (down to ~3 kcal/mol on the same set of molecules) and ML (down to ~1.5 kcal/mol) methods. A simultaneous improvement in the accuracy of predictions of electron densities and energies suggests that the proposed approach describes the physics of molecules better than DFT functionals developed by "human learning" earlier. Thus, physics-based ML offers exciting opportunities for modeling, with high-theory-level quantum chemical accuracy, of much larger molecular systems than currently possible. △ Less

Submitted 27 February, 2020; v1 submitted 1 August, 2019; originally announced August 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1809.02723

arXiv:1907.03041 [pdf]

Predicting Gene Expression Between Species with Neural Networks

Authors: Peter Eastman, Vijay S. Pande

Abstract: We train a neural network to predict human gene expression levels based on experimental data for rat cells. The network is trained with paired human/rat samples from the Open TG-GATES database, where paired samples were treated with the same compound at the same dose. When evaluated on a test set of held out compounds, the network successfully predicts human expression levels. On the majority of t… ▽ More We train a neural network to predict human gene expression levels based on experimental data for rat cells. The network is trained with paired human/rat samples from the Open TG-GATES database, where paired samples were treated with the same compound at the same dose. When evaluated on a test set of held out compounds, the network successfully predicts human expression levels. On the majority of the test compounds, the list of differentially expressed genes determined from predicted expression levels agrees well with the list of differentially expressed genes determined from actual human experimental data. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: 12 pages, 5 figures

arXiv:1903.11789 [pdf, other]

Step Change Improvement in ADMET Prediction with PotentialNet Deep Featurization

Authors: Evan N. Feinberg, Robert Sheridan, Elizabeth Joshi, Vijay S. Pande, Alan C. Cheng

Abstract: The Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) properties of drug candidates are estimated to account for up to 50% of all clinical trial failures. Predicting ADMET properties has therefore been of great interest to the cheminformatics and medicinal chemistry communities in recent decades. Traditional cheminformatics approaches, whether the learner is a random forest o… ▽ More The Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) properties of drug candidates are estimated to account for up to 50% of all clinical trial failures. Predicting ADMET properties has therefore been of great interest to the cheminformatics and medicinal chemistry communities in recent decades. Traditional cheminformatics approaches, whether the learner is a random forest or a deep neural network, leverage fixed fingerprint feature representations of molecules. In contrast, in this paper, we learn the features most relevant to each chemical task at hand by representing each molecule explicitly as a graph, where each node is an atom and each edge is a bond. By applying graph convolutions to this explicit molecular representation, we achieve, to our knowledge, unprecedented accuracy in prediction of ADMET properties. By challenging our methodology with rigorous cross-validation procedures and prospective analyses, we show that deep featurization better enables molecular predictors to not only interpolate but also extrapolate to new regions of chemical space. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: 41 pages

arXiv:1902.00060 [pdf]

Predicting Toxicity from Gene Expression with Neural Networks

Authors: Peter Eastman, Vijay S. Pande

Abstract: We train a neural network to predict chemical toxicity based on gene expression data. The input to the network is a full expression profile collected either in vitro from cultured cells or in vivo from live animals. The output is a set of fine grained predictions for the presence of a variety of pathological effects in treated animals. When trained on the Open TG-GATEs database it produces good re… ▽ More We train a neural network to predict chemical toxicity based on gene expression data. The input to the network is a full expression profile collected either in vitro from cultured cells or in vivo from live animals. The output is a set of fine grained predictions for the presence of a variety of pathological effects in treated animals. When trained on the Open TG-GATEs database it produces good results, outperforming classical models trained on the same data. This is a promising approach for efficiently screening chemicals for toxic effects, and for more accurately evaluating drug candidates based on preclinical data. △ Less

Submitted 31 January, 2019; originally announced February 2019.

Comments: 12 pages, 2 figures, 4 tables

arXiv:1809.02723 [pdf]

Deep Neural Network Computes Electron Densities and Energies of a Large Set of Organic Molecules Faster than Density Functional Theory (DFT)

Authors: Anton V. Sinitskiy, Vijay S. Pande

Abstract: Density functional theory (DFT) is one of the main methods in Quantum Chemistry that offers an attractive trade off between the cost and accuracy of quantum chemical computations. The electron density plays a key role in DFT. In this work, we explore whether machine learning - more specifically, deep neural networks (DNNs) - can be trained to predict electron densities faster than DFT. First, we c… ▽ More Density functional theory (DFT) is one of the main methods in Quantum Chemistry that offers an attractive trade off between the cost and accuracy of quantum chemical computations. The electron density plays a key role in DFT. In this work, we explore whether machine learning - more specifically, deep neural networks (DNNs) - can be trained to predict electron densities faster than DFT. First, we choose a practically efficient combination of a DFT functional and a basis set (PBE0/pcS-3) and use it to generate a database of DFT solutions for more than 133,000 organic molecules from a previously published database QM9. Next, we train a DNN to predict electron densities and energies of such molecules. The only input to the DNN is an approximate electron density computed with a cheap quantum chemical method in a small basis set (HF/cc-VDZ). We demonstrate that the DNN successfully learns differences in the electron densities arising both from electron correlation and small basis set artifacts in the HF computations. All qualitative features in density differences, including local minima on lone pairs, local maxima on nuclei, toroidal shapes around C-H and C-C bonds, complex shapes around aromatic and cyclopropane rings and CN group, etc. are captured by the DNN. Accuracy of energy predictions by the DNN is ~ 1 kcal/mol, on par with other models reported in the literature, while those models do not predict the electron density. Computations with the DNN, including HF computations, take much less time that DFT computations (by a factor of ~20-30 for most QM9 molecules in the current version, and it is clear how it could be further improved). △ Less

Submitted 7 September, 2018; originally announced September 2018.

arXiv:1804.08206 [pdf]

doi 10.1016/j.bpj.2017.11.390

Binding Pathway of Opiates to $μ$ Opioid Receptors Revealed by Unsupervised Machine Learning

Authors: Amir Barati Farimani, Evan N. Feinberg, Vijay S. Pande

Abstract: Many important analgesics relieve pain by binding to the $μ$-Opioid Receptor ($μ$OR), which makes the $μ$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pathways of the GPCRs, the mechanism of opiate binding and the selectivity of $μ$OR are largely unknown. We performed extensive molecular dynamics (MD) simu… ▽ More Many important analgesics relieve pain by binding to the $μ$-Opioid Receptor ($μ$OR), which makes the $μ$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pathways of the GPCRs, the mechanism of opiate binding and the selectivity of $μ$OR are largely unknown. We performed extensive molecular dynamics (MD) simulation and analysis to find the selective allosteric binding sites of the $μ$OR and the path opiates take to bind to the orthosteric site. In this study, we predicted that the allosteric site is responsible for the attraction and selection of opiates. Using Markov state models and machine learning, we traced the pathway of opiates in binding to the orthosteric site, the main binding pocket. Our results have important implications in designing novel analgesics. △ Less

Submitted 22 April, 2018; originally announced April 2018.

Comments: 25 pages, 8 figures

arXiv:1803.08993 [pdf, other]

Deep Learning Phase Segregation

Authors: Amir Barati Farimani, Joseph Gomes, Rishi Sharma, Franklin L. Lee, Vijay S. Pande

Abstract: Phase segregation, the process by which the components of a binary mixture spontaneously separate, is a key process in the evolution and design of many chemical, mechanical, and biological systems. In this work, we present a data-driven approach for the learning, modeling, and prediction of phase segregation. A direct mapping between an initially dispersed, immiscible binary fluid and the equilibr… ▽ More Phase segregation, the process by which the components of a binary mixture spontaneously separate, is a key process in the evolution and design of many chemical, mechanical, and biological systems. In this work, we present a data-driven approach for the learning, modeling, and prediction of phase segregation. A direct mapping between an initially dispersed, immiscible binary fluid and the equilibrium concentration field is learned by conditional generative convolutional neural networks. Concentration field predictions by the deep learning model conserve phase fraction, correctly predict phase transition, and reproduce area, perimeter, and total free energy distributions up to 98% accuracy. △ Less

Submitted 23 March, 2018; originally announced March 2018.

Comments: arXiv admin note: text overlap with arXiv:1709.02432

arXiv:1803.06449 [pdf, other]

doi 10.1063/1.5043303

Note: Variational Encoding of Protein Dynamics Benefits from Maximizing Latent Autocorrelation

Authors: Hannah K. Wayment-Steele, Vijay S. Pande

Abstract: As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the timescale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We additionally provide e… ▽ More As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the timescale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We additionally provide evidence that the VDE framework (Hernández et al., 2017), which uses this autocorrelation loss along with a time-lagged reconstruction loss, obtains a variationally optimized latent coordinate in comparison with related loss functions. We thus recommend leveraging the autocorrelation of the latent space while training neural network models of biomolecular simulation data to better represent slow processes. △ Less

Submitted 16 March, 2018; originally announced March 2018.

arXiv:1803.04479 [pdf]

Machine Learning Harnesses Molecular Dynamics to Discover New $μ$ Opioid Chemotypes

Authors: Evan N. Feinberg, Amir Barati Farimani, Rajendra Uprety, Amanda Hunkele, Gavril W. Pasternak, Susruta Majumdar, Vijay S. Pande

Abstract: Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the $μ$ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of $μOR$ with molecular dynamics simulation and then machine learn ligand-structure rela… ▽ More Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the $μ$ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of $μOR$ with molecular dynamics simulation and then machine learn ligand-structure relationships to predict opioid ligand function. These artificial intelligence models identified a novel $μ$ opioid chemotype. △ Less

Submitted 12 March, 2018; originally announced March 2018.

Comments: 28 pages, machine learning, computational biology, GPCRs, molecular dynamics, molecular docking, molecular simulation

arXiv:1803.04465 [pdf, other]

PotentialNet for Molecular Property Prediction

Authors: Evan N. Feinberg, Debnil Sur, Zhenqin Wu, Brooke E. Husic, Huanghao Mai, Yang Li, Saisai Sun, Jianyi Yang, Bharath Ramsundar, Vijay S. Pande

Abstract: The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning model… ▽ More The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor $EF_χ^{(R)}$, to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks. △ Less

Submitted 22 October, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

Comments: 13 pages, 5 figures, 8 tables

arXiv:1803.03146 [pdf]

SentRNA: Improving computational RNA design by incorporating a prior of human design strategies

Authors: Jade Shi, Rhiju Das, Vijay S. Pande

Abstract: Solving the RNA inverse folding problem is a critical prerequisite to RNA design, an emerging field in bioengineering with a broad range of applications from reaction catalysis to cancer therapy. Although significant progress has been made in developing machine-based inverse RNA folding algorithms, current approaches still have difficulty designing sequences for large or complex targets. On the ot… ▽ More Solving the RNA inverse folding problem is a critical prerequisite to RNA design, an emerging field in bioengineering with a broad range of applications from reaction catalysis to cancer therapy. Although significant progress has been made in developing machine-based inverse RNA folding algorithms, current approaches still have difficulty designing sequences for large or complex targets. On the other hand, human players of the online RNA design game EteRNA have consistently shown superior performance in this regard, being able to readily design sequences for targets that are challenging for machine algorithms. Here we present a novel approach to the RNA design problem, SentRNA, a design agent consisting of a fully-connected neural network trained end-to-end using human-designed RNA sequences. We show that through this approach, SentRNA can solve complex targets previously unsolvable by any machine-based approach and achieve state-of-the-art performance on two separate challenging test sets. Our results demonstrate that incorporating human design strategies into a design algorithm can significantly boost machine performance and suggests a new paradigm for machine-based RNA design. △ Less

Submitted 5 March, 2019; v1 submitted 8 March, 2018; originally announced March 2018.

Comments: 27 pages (not including Supplementary Information), 9 figures, 7 tables

arXiv:1802.10548 [pdf, other]

Using Deep Learning for Segmentation and Counting within Microscopy Data

Authors: Carlos X. Hernández, Mohammad M. Sultan, Vijay S. Pande

Abstract: Cell counting is a ubiquitous, yet tedious task that would greatly benefit from automation. From basic biological questions to clinical trials, cell counts provide key quantitative feedback that drive research. Unfortunately, cell counting is most commonly a manual task and can be time-intensive. The task is made even more difficult due to overlapping cells, existence of multiple focal planes, and… ▽ More Cell counting is a ubiquitous, yet tedious task that would greatly benefit from automation. From basic biological questions to clinical trials, cell counts provide key quantitative feedback that drive research. Unfortunately, cell counting is most commonly a manual task and can be time-intensive. The task is made even more difficult due to overlapping cells, existence of multiple focal planes, and poor imaging quality, among other factors. Here, we describe a convolutional neural network approach, using a recently described feature pyramid network combined with a VGG-style neural network, for segmenting and subsequent counting of cells in a given microscopy image. △ Less

Submitted 28 February, 2018; originally announced February 2018.

arXiv:1802.10510 [pdf]

Automated design of collective variables using supervised machine learning

Authors: Mohammad M. Sultan, Vijay S. Pande

Abstract: Selection of appropriate collective variables for enhancing sampling of molecular simulations remains an unsolved problem in computational biophysics. In particular, picking initial collective variables (CVs) is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even… ▽ More Selection of appropriate collective variables for enhancing sampling of molecular simulations remains an unsolved problem in computational biophysics. In particular, picking initial collective variables (CVs) is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even begin to pick starting coordinates for investigation? This remains true even in the case of simple two state systems and only increases in difficulty for multi-state systems. In this work, we solve the initial CV problem using a data-driven approach inspired by the filed of supervised machine learning. In particular, we show how the decision functions in supervised machine learning (SML) algorithms can be used as initial CVs (SML_cv) for accelerated sampling. Using solvated alanine dipeptide and Chignolin mini-protein as our test cases, we illustrate how the distance to the Support Vector Machines' decision hyperplane, the output probability estimates from Logistic Regression, the outputs from deep neural network classifiers, and other classifiers may be used to reversibly sample slow structural transitions. We discuss the utility of other SML algorithms that might be useful for identifying CVs for accelerating molecular simulations. △ Less

Submitted 13 May, 2018; v1 submitted 28 February, 2018; originally announced February 2018.

Comments: 26 pages, 11 figures

arXiv:1802.05555 [pdf, ps, other]

doi 10.1063/1.5025826

Adaptive Boundaries in Multiscale Simulations

Authors: Jason A. Wagoner, Vijay S. Pande

Abstract: Combined-resolution simulations are an effective way to study molecular properties across a range of length- and time-scales. These simulations can benefit from adaptive boundaries that allow the high-resolution region to adapt (change size and/or shape) as the simulation progresses. The number of degrees of freedom required to accurately represent even a simple molecular process can vary by sever… ▽ More Combined-resolution simulations are an effective way to study molecular properties across a range of length- and time-scales. These simulations can benefit from adaptive boundaries that allow the high-resolution region to adapt (change size and/or shape) as the simulation progresses. The number of degrees of freedom required to accurately represent even a simple molecular process can vary by several orders of magnitude throughout the course of a simulation, and adaptive boundaries react to these changes to include an appropriate but not excessive amount of detail. Here, we derive the Hamiltonian and distribution function for such a molecular simulation. We also design an algorithm that can efficiently sample the boundary as a new coordinate of the system. We apply this framework to a mixed explicit/continuum representation of a peptide in solvent. We use this example to discuss the conditions necessary for a successful implementation of adaptive boundaries that is both efficient and accurate in reproducing molecular properties. △ Less

Submitted 3 April, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

arXiv:1801.00636 [pdf]

Transferable neural networks for enhanced sampling of protein dynamics

Authors: Mohammad M. Sultan, Hannah K. Wayment-Steele, Vijay S. Pande

Abstract: Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first d… ▽ More Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable and is generally applicable to sets of related simulations, enabling us to probe the effects of variation in increasingly large systems of biophysical interest. △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: 20 pages, 10 figures

arXiv:1712.07704 [pdf, other]

Unsupervised learning of dynamical and molecular similarity using variance minimization

Authors: Brooke E. Husic, Vijay S. Pande

Abstract: In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Ward's minimum variance objective function. We first apply the minimum variance clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon divergence between Markovian transition matrices in ord… ▽ More In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Ward's minimum variance objective function. We first apply the minimum variance clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon divergence between Markovian transition matrices in order to gain insight into how point mutations affect protein dynamics. Then, we extend the method to partition two chemoinformatic datasets according to structural similarity to motivate a train/validation/test split for supervised learning that avoids overfitting. △ Less

Submitted 20 December, 2017; originally announced December 2017.

Comments: NIPS 2017 Workshop on Machine Learning for Molecules and Materials

arXiv:1711.08576 [pdf, other]

doi 10.1103/PhysRevE.97.062412

Variational Encoding of Complex Dynamics

Authors: Carlos X. Hernández, Hannah K. Wayment-Steele, Mohammad M. Sultan, Brooke E. Husic, Vijay S. Pande

Abstract: Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged co-variate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear… ▽ More Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged co-variate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear dynamics into just a few characteristic components. Recent work in the field of deep learning has led to the development of variational autoencoders (VAE), which are able to compress complex datasets into simpler manifolds. We present the use of a time-lagged VAE, or variational dynamics encoder (VDE), to reduce complex, nonlinear processes to a single embedding with high fidelity to the underlying dynamics. We demonstrate how the VDE is able to capture nontrivial dynamics in a variety of examples, including Brownian dynamics and atomistic protein folding. Additionally, we demonstrate a method for analyzing the VDE model, inspired by saliency mapping, to determine what features are selected by the VDE model to describe dynamics. The VDE presents an important step in applying techniques from deep learning to more accurately model and interpret complex biophysics. △ Less

Submitted 1 December, 2017; v1 submitted 23 November, 2017; originally announced November 2017.

Comments: Fixed typos and added references

Journal ref: Phys. Rev. E 97, 062412 (2018)

arXiv:1709.02432 [pdf, other]

Deep Learning the Physics of Transport Phenomena

Authors: Amir Barati Farimani, Joseph Gomes, Vijay S. Pande

Abstract: We have developed a new data-driven paradigm for the rapid inference, modeling and simulation of the physics of transport phenomena by deep learning. Using conditional generative adversarial networks (cGAN), we train models for the direct generation of solutions to steady state heat conduction and incompressible fluid flow purely on observation without knowledge of the underlying governing equatio… ▽ More We have developed a new data-driven paradigm for the rapid inference, modeling and simulation of the physics of transport phenomena by deep learning. Using conditional generative adversarial networks (cGAN), we train models for the direct generation of solutions to steady state heat conduction and incompressible fluid flow purely on observation without knowledge of the underlying governing equations. Rather than using iterative numerical methods to approximate the solution of the constitutive equations, cGANs learn to directly generate the solutions to these phenomena, given arbitrary boundary conditions and domain, with high test accuracy (MAE$<$1\%) and state-of-the-art computational performance. The cGAN framework can be used to learn causal models directly from experimental observations where the underlying physical model is complex or unknown. △ Less

Submitted 7 September, 2017; originally announced September 2017.

arXiv:1708.08120 [pdf, other]

doi 10.1063/1.5002086

MSM lag time cannot be used for variational model selection

Authors: Brooke E. Husic, Vijay S. Pande

Abstract: The variational principle for conformational dynamics has enabled the systematic construction of Markov state models through the optimization of hyperparameters by approximating the transfer operator. In this note we discuss why lag time of the operator being approximated must be held constant in the variational approach. The variational principle for conformational dynamics has enabled the systematic construction of Markov state models through the optimization of hyperparameters by approximating the transfer operator. In this note we discuss why lag time of the operator being approximated must be held constant in the variational approach. △ Less

Submitted 27 August, 2017; originally announced August 2017.

Journal ref: J. Chem. Phys. 2017, 147, 176101

arXiv:1708.03011 [pdf]

doi 10.1063/1.5005058

Theoretical restrictions on longest implicit timescales in Markov state models of biomolecular dynamics

Authors: Anton V. Sinitskiy, Vijay S. Pande

Abstract: Markov state models (MSMs) have been widely used to analyze computer simulations of various biomolecular systems. They can capture conformational transitions much slower than an average or maximal length of a single molecular dynamics (MD) trajectory from the set of trajectories used to build the MSM. A rule of thumb claiming that the slowest implicit timescale captured by an MSM should be compara… ▽ More Markov state models (MSMs) have been widely used to analyze computer simulations of various biomolecular systems. They can capture conformational transitions much slower than an average or maximal length of a single molecular dynamics (MD) trajectory from the set of trajectories used to build the MSM. A rule of thumb claiming that the slowest implicit timescale captured by an MSM should be comparable by the order of magnitude to the aggregate duration of all MD trajectories used to build this MSM has been known in the field. However, this rule have never been formally proved. In this work, we present analytical results for the slowest timescale in several types of MSMs, supporting the above rule. We conclude that the slowest implicit timescale equals the product of the aggregate sampling and four factors that quantify: (1) how much statistics on the conformational transitions corresponding to the longest implicit timescale is available, (2) how good the sampling of the destination Markov state is, (3) the gain in statistics from using a sliding window for counting transitions between Markov states, and (4) a bias in the estimate of the implicit timescale arising from finite sampling of the conformational transitions. We demonstrate that in many practically important cases all these four factors are on the order of unity, and we analyze possible scenarios that could lead to their significant deviation from unity. Overall, we provide for the first time analytical results on the slowest timescales captured by MSMs. These results can guide further practical applications of MSMs to biomolecular dynamics and allow for higher computational efficiency of simulations. △ Less

Submitted 9 August, 2017; originally announced August 2017.

arXiv:1703.10603 [pdf, other]

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Authors: Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande

Abstract: Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or f… ▽ More Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction. △ Less

Submitted 30 March, 2017; originally announced March 2017.

arXiv:1612.06319 [pdf]

Computationally Discovered Potentiating Role of Glycans on NMDA Receptors

Authors: Anton V. Sinitskiy, Nathaniel H. Stanley, David H. Hackos, Jesse E. Hanson, Benjamin D. Sellers, Vijay S. Pande

Abstract: N-methyl-D-aspartate receptors (NMDARs) are glycoproteins in the brain central to learning and memory. The effects of glycosylation on the structure and dynamics of NMDARs are largely unknown. In this work, we use extensive molecular dynamics simulations of GluN1 and GluN2B ligand binding domains (LBDs) of NMDARs to investigate these effects. Our simulations predict that intra-domain interactions… ▽ More N-methyl-D-aspartate receptors (NMDARs) are glycoproteins in the brain central to learning and memory. The effects of glycosylation on the structure and dynamics of NMDARs are largely unknown. In this work, we use extensive molecular dynamics simulations of GluN1 and GluN2B ligand binding domains (LBDs) of NMDARs to investigate these effects. Our simulations predict that intra-domain interactions involving the glycan attached to residue GluN1-N440 stabilize closed-clamshell conformations of the GluN1 LBD. The glycan on GluN2B-N688 shows a similar, though weaker, effect. Based on these results, and assuming the transferability of the results of LBD simulations to the full receptor, we predict that glycans at GluN1-N440 might play a potentiator role in NMDARs. To validate this prediction, we perform electrophysiological analysis of full-length NMDARs with a glycosylation-preventing GluN1-N440Q mutation, and demonstrate an increase in the glycine EC50 value. Overall, our results suggest an intramolecular potentiating role of glycans on NMDA receptors. △ Less

Submitted 19 December, 2016; originally announced December 2016.

arXiv:1610.01642 [pdf]

Learning Protein Dynamics with Metastable Switching Systems

Authors: Bharath Ramsundar, Vijay S. Pande

Abstract: We introduce a machine learning approach for extracting fine-grained representations of protein evolution from molecular dynamics datasets. Metastable switching linear dynamical systems extend standard switching models with a physically-inspired stability constraint. This constraint enables the learning of nuanced representations of protein dynamics that closely match physical reality. We derive a… ▽ More We introduce a machine learning approach for extracting fine-grained representations of protein evolution from molecular dynamics datasets. Metastable switching linear dynamical systems extend standard switching models with a physically-inspired stability constraint. This constraint enables the learning of nuanced representations of protein dynamics that closely match physical reality. We derive an EM algorithm for learning, where the E-step extends the forward-backward algorithm for HMMs and the M-step requires the solution of large biconvex optimization problems. We construct an approximate semidefinite program solver based on the Frank-Wolfe algorithm and use it to solve the M-step. We apply our EM algorithm to learn accurate dynamics from large simulation datasets for the opioid peptide met-enkephalin and the proto-oncogene Src-kinase. Our learned models demonstrate significant improvements in temporal coherence over HMMs and standard switching models for met-enkephalin, and sample transition paths (possibly useful in rational drug design) for Src-kinase. △ Less

Submitted 5 October, 2016; originally announced October 2016.

arXiv:1602.08776 [pdf, other]

doi 10.1063/1.4974306

Identification of simple reaction coordinates from complex dynamics

Authors: Robert T. McGibbon, Brooke E. Husic, Vijay S. Pande

Abstract: Reaction coordinates are widely used throughout chemical physics to model and understand complex chemical transformations. We introduce a definition of the natural reaction coordinate, suitable for condensed phase and biomolecular systems, as a maximally predictive one-dimensional projection. We then show this criterion is uniquely satisfied by a dominant eigenfunction of an integral operator asso… ▽ More Reaction coordinates are widely used throughout chemical physics to model and understand complex chemical transformations. We introduce a definition of the natural reaction coordinate, suitable for condensed phase and biomolecular systems, as a maximally predictive one-dimensional projection. We then show this criterion is uniquely satisfied by a dominant eigenfunction of an integral operator associated with the ensemble dynamics. We present a new sparse estimator for these eigenfunctions which can search through a large candidate pool of structural order parameters and build simple, interpretable approximations that employ only a small number of these order parameters. Example applications with a small molecule's rotational dynamics and simulations of protein conformational change and folding show that this approach can filter through statistical noise to identify simple reaction coordinates from complex dynamics. △ Less

Submitted 6 January, 2017; v1 submitted 28 February, 2016; originally announced February 2016.

Comments: 18 pages, 10 figures

arXiv:1504.01804 [pdf, other]

Efficient maximum likelihood parameterization of continuous-time Markov processes

Authors: Robert T. McGibbon, Vijay S. Pande

Abstract: Continuous-time Markov processes over finite state-spaces are widely used to model dynamical processes in many fields of natural and social science. Here, we introduce an maximum likelihood estimator for constructing such models from data observed at a finite time interval. This estimator is dramatically more efficient than prior approaches, enables the calculation of deterministic confidence inte… ▽ More Continuous-time Markov processes over finite state-spaces are widely used to model dynamical processes in many fields of natural and social science. Here, we introduce an maximum likelihood estimator for constructing such models from data observed at a finite time interval. This estimator is dramatically more efficient than prior approaches, enables the calculation of deterministic confidence intervals in all model parameters, and can easily enforce important physical constraints on the models such as detailed balance. We demonstrate and discuss the advantages of these models over existing discrete-time Markov models for the analysis of molecular dynamics simulations. △ Less

Submitted 30 June, 2015; v1 submitted 7 April, 2015; originally announced April 2015.

arXiv:1408.5446 [pdf, ps, other]

doi 10.1063/1.4895044

Perspective: Markov Models for Long-Timescale Biomolecular Dynamics

Authors: Christian R. Schwantes, Robert T. McGibbon, Vijay S. Pande

Abstract: Molecular dynamics simulations have the potential to provide atomic-level detail and insight to important questions in chemical physics that cannot be observed in typical experiments. However, simply generating a long trajectory is insufficient, as researchers must be able to transform the data in a simulation trajectory into specific scientific insights. Although this analysis step has often been… ▽ More Molecular dynamics simulations have the potential to provide atomic-level detail and insight to important questions in chemical physics that cannot be observed in typical experiments. However, simply generating a long trajectory is insufficient, as researchers must be able to transform the data in a simulation trajectory into specific scientific insights. Although this analysis step has often been taken for granted, it deserves further attention as large-scale simulations become increasingly routine. In this perspective, we discuss the application of Markov models to the analysis of large-scale biomolecular simulations. We draw attention to recent improvements in the construction of these models as well as several important open issues. In addition, we highlight recent theoretical advances that pave the way for a new generation of models of molecular kinetics. △ Less

Submitted 22 August, 2014; originally announced August 2014.

Comments: 7 pages

arXiv:1408.0255 [pdf, ps, other]

Efficient inference of protein structural ensembles

Authors: Thomas J. Lane, Christian R. Schwantes, Kyle A. Beauchamp, Vijay S. Pande

Abstract: It is becoming clear that traditional, single-structure models of proteins are insufficient for understanding their biological function. Here, we outline one method for inferring, from experiments, not only the most common structure a protein adopts (native state), but the entire ensemble of conformations the system can adopt. Such ensemble mod- els are necessary to understand intrinsically disord… ▽ More It is becoming clear that traditional, single-structure models of proteins are insufficient for understanding their biological function. Here, we outline one method for inferring, from experiments, not only the most common structure a protein adopts (native state), but the entire ensemble of conformations the system can adopt. Such ensemble mod- els are necessary to understand intrinsically disordered proteins, enzyme catalysis, and signaling. We suggest that the most difficult aspect of generating such a model will be finding a small set of configurations to accurately model structural heterogeneity and present one way to overcome this challenge. △ Less

Submitted 1 August, 2014; originally announced August 2014.

arXiv:1407.8083 [pdf, other]

doi 10.1063/1.4916292

Variational cross-validation of slow dynamical modes in molecular kinetics

Authors: Robert T. McGibbon, Vijay S. Pande

Abstract: Markov state models (MSMs) are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these meth… ▽ More Markov state models (MSMs) are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these methods to novel biological systems. Here, we consider cross-validation with a new objective function for estimators of these slow dynamical modes, a generalized matrix Rayleigh quotient (GMRQ), which measures the ability of a rank-$m$ projection operator to capture the slow subspace of the system. It is shown that a variational theorem bounds the GMRQ from above by the sum of the first $m$ eigenvalues of the system's propagator, but that this bound can be violated when the requisite matrix elements are estimated subject to statistical uncertainty. This overfitting can be detected and avoided through cross-validation. These result make it possible to construct Markov state models for protein dynamics in a way that appropriately captures the tradeoff between systematic and statistical errors. △ Less

Submitted 27 March, 2015; v1 submitted 30 July, 2014; originally announced July 2014.

Journal ref: J. Chem. Phys. 142, 124105 (2015)

arXiv:1405.1444 [pdf, other]

Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models

Authors: Robert T. McGibbon, Bharath Ramsundar, Mohammad M. Sultan, Gert Kiss, Vijay S. Pande

Abstract: We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity of providing a… ▽ More We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity of providing accessible interpretations, critical for both cellular biology and rational drug design. We present an EM algorithm for learning and introduce a model selection criteria based on the physical notion of convergence in relaxation timescales. We contrast our model with standard methods in biophysics and demonstrate improved robustness. We implement our algorithm on GPUs and apply the method to two large protein simulation datasets generated respectively on the NCSA Bluewaters supercomputer and the Folding@Home distributed computing network. Our analysis identifies the conformational dynamics of the ubiquitin protein critical to cellular signaling, and elucidates the stepwise activation mechanism of the c-Src kinase protein. △ Less

Submitted 6 May, 2014; originally announced May 2014.

Journal ref: Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014

arXiv:1305.0963 [pdf, other]

doi 10.1063/1.4823502

Probing the Origins of Two-State Folding

Authors: Thomas J. Lane, Christian R. Schwantes, Kyle A. Beauchamp, Vijay S. Pande

Abstract: Many protein systems fold in a two-state manner. Random models, however, rarely display two-state kinetics and thus such behavior should not be accepted as a default. To date, many theories for the prevalence of two-state kinetics have been presented, but none sufficiently explain the breadth of experimental observations. A model, making a minimum of assumptions, is introduced that suggests two-st… ▽ More Many protein systems fold in a two-state manner. Random models, however, rarely display two-state kinetics and thus such behavior should not be accepted as a default. To date, many theories for the prevalence of two-state kinetics have been presented, but none sufficiently explain the breadth of experimental observations. A model, making a minimum of assumptions, is introduced that suggests two-state behavior is likely for any system with an overwhelmingly populated native state. We show two-state folding is emergent and strengthened by increasing the occupancy population of the native state. Further, the model exhibits a hub-like behavior, with slow interconversions between unfolded states. Despite this, the unfolded state equilibrates quickly relative to the folding time. This apparent paradox is readily understood through this model. Finally, our results compare favorable with experimental measurements of protein folding rates as a function of chain length and Keq, and provide new insight into these results. △ Less

Submitted 4 May, 2013; originally announced May 2013.

arXiv:1301.4302 [pdf, other]

doi 10.1371/journal.pone.0078606

Inferring the Rate-Length Law of Protein Folding

Authors: Thomas J. Lane, Vijay S. Pande

Abstract: We investigate the rate-length scaling law of protein folding, a key undetermined scaling law in the analytical theory of protein folding. We demonstrate that chain length is a dominant factor determining folding times, and that the unambiguous determination of the way chain length corre- lates with folding times could provide key mechanistic insight into the folding process. Four specific propose… ▽ More We investigate the rate-length scaling law of protein folding, a key undetermined scaling law in the analytical theory of protein folding. We demonstrate that chain length is a dominant factor determining folding times, and that the unambiguous determination of the way chain length corre- lates with folding times could provide key mechanistic insight into the folding process. Four specific proposed laws (power law, exponential, and two stretched exponentials) are tested against one an- other, and it is found that the power law best explains the data. At the same time, the fit power law results in rates that are very fast, nearly unreasonably so in a biological context. We show that any of the proposed forms are viable, conclude that more data is necessary to unequivocally infer the rate-length law, and that such data could be obtained through a small number of protein folding experiments on large protein domains. △ Less

Submitted 18 January, 2013; originally announced January 2013.

arXiv:1209.5944 [pdf, ps, other]

doi 10.1063/1.4769301

Reducing the effect of Metropolization on mixing times in molecular dynamics simulations

Authors: Jason A. Wagoner, Vijay S. Pande

Abstract: Molecular dynamics algorithms are subject to some amount of error dependent on the size of the time step that is used. This error can be corrected by periodically updating the system with a Metropolis criteria, where the integration step is treated as a selection probability for candidate state generation. Such a method, closely related to generalized hybrid Monte Carlo (GHMC), satisfies the balan… ▽ More Molecular dynamics algorithms are subject to some amount of error dependent on the size of the time step that is used. This error can be corrected by periodically updating the system with a Metropolis criteria, where the integration step is treated as a selection probability for candidate state generation. Such a method, closely related to generalized hybrid Monte Carlo (GHMC), satisfies the balance condition by imposing a reversal of momenta upon candidate rejection. In the present study, we demonstrate that such momentum reversals can have a significant impact on molecular kinetics and extend the time required for system decorrelation, resulting in an order of magnitude increase in the integrated autocorrelation times of molecular variables for the worst cases. We present a simple method, referred to as reduced-flipping GHMC, that uses the information of the previous, current, and candidate states to reduce the probability of momentum flipping following candidate rejection while rigorously satisfying the balance condition. This method is a simple modification to traditional, automatic-flipping, GHMC methods and significantly mitigates the impact of such algorithms on molecular kinetics and simulation mixing times. △ Less

Submitted 24 September, 2012; originally announced September 2012.

Comments: 6 pages, 3 figures

arXiv:1108.2304 [pdf, other]

A robust approach to estimating rates from time-correlation functions

Authors: John D. Chodera, Phillip J. Elms, William C. Swope, Jan-Hendrik Prinz, Susan Marqusee, Carlos Bustamante, Frank Noé, Vijay S. Pande

Abstract: While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an ap… ▽ More While seemingly straightforward in principle, the reliable estimation of rate constants is seldom easy in practice. Numerous issues, such as the complication of poor reaction coordinates, cause obvious approaches to yield unreliable estimates. When a reliable order parameter is available, the reactive flux theory of Chandler allows the rate constant to be extracted from the plateau region of an appropriate reactive flux function. However, when applied to real data from single-molecule experiments or molecular dynamics simulations, the rate can sometimes be difficult to extract due to the numerical differentiation of a noisy empirical correlation function or difficulty in locating the plateau region at low sampling frequencies. We present a modified version of this theory which does not require numerical derivatives, allowing rate constants to be robustly estimated from the time-correlation function directly. We compare these approaches using single-molecule force spectroscopy measurements of an RNA hairpin. △ Less

Submitted 10 August, 2011; originally announced August 2011.

arXiv:1105.0710 [pdf, other]

doi 10.1103/PhysRevLett.107.098102

Splitting probabilities as a test of reaction coordinate choice in single-molecule experiments

Authors: John D. Chodera, Vijay S. Pande

Abstract: To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, the experimental observable is often chosen as a putative reaction coordinate along which kinetic behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting p… ▽ More To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, the experimental observable is often chosen as a putative reaction coordinate along which kinetic behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting probability with that computed from the kinetic model provides a simple test to reject poor reaction coordinates. We demonstrate this test for a force spectroscopy measurement of a DNA hairpin. △ Less

Submitted 13 July, 2011; v1 submitted 3 May, 2011; originally announced May 2011.

Journal ref: Phys. Rev. Lett., 107:098102 (2011)

arXiv:1007.0315 [pdf, ps, other]

doi 10.1103/PhysRevLett.105.198101

A simple theory of protein folding kinetics

Authors: Vijay S. Pande

Abstract: We present a simple model of protein folding dynamics that captures key qualitative elements recently seen in all-atom simulations. The goals of this theory are to serve as a simple formalism for gaining deeper insight into the physical properties seen in detailed simulations as well as to serve as a model to easily compare why these simulations suggest a different kinetic mechanism than previous… ▽ More We present a simple model of protein folding dynamics that captures key qualitative elements recently seen in all-atom simulations. The goals of this theory are to serve as a simple formalism for gaining deeper insight into the physical properties seen in detailed simulations as well as to serve as a model to easily compare why these simulations suggest a different kinetic mechanism than previous simple models. Specifically, we find that non-native contacts play a key role in determining the mechanism, which can shift dramatically as the energetic strength of non-native interactions is changed. For protein-like non-native interactions, our model finds that the native state is a kinetic hub, connecting the strength of relevant interactions directly to the nature of folding kinetics. △ Less

Submitted 2 July, 2010; originally announced July 2010.

arXiv:0910.0505 [pdf, other]

Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

Authors: Imran S. Haque, Vijay S. Pande

Abstract: Graphics processing units (GPUs) are gaining widespread use in computational chemistry and other scientific simulation contexts because of their huge performance advantages relative to conventional CPUs. However, the reliability of GPUs in error-intolerant applications is largely unproven. In particular, a lack of error checking and correcting (ECC) capability in the memory subsystems of graphic… ▽ More Graphics processing units (GPUs) are gaining widespread use in computational chemistry and other scientific simulation contexts because of their huge performance advantages relative to conventional CPUs. However, the reliability of GPUs in error-intolerant applications is largely unproven. In particular, a lack of error checking and correcting (ECC) capability in the memory subsystems of graphics cards has been cited as a hindrance to the acceptance of GPUs as high-performance coprocessors, but the impact of this design has not been previously quantified. In this article we present MemtestG80, our software for assessing memory error rates on NVIDIA G80 and GT200-architecture-based graphics cards. Furthermore, we present the results of a large-scale assessment of GPU error rate, conducted by running MemtestG80 on over 20,000 hosts on the Folding@home distributed computing network. Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our survey over cards on Folding@home finds that, in their installed environments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitive rate of memory soft errors. We demonstrate that these errors persist after controlling for overclocking and environmental proxies for temperature, but depend strongly on board architecture. △ Less

Submitted 13 November, 2009; v1 submitted 2 October, 2009; originally announced October 2009.

Comments: 10 pages, 5 figures. For associated code and binaries, see https://simtk.org/home/memtest . Poster version to be presented at Supercomputing 2009. Version 1 of submission contained erroneous analysis of transaction coalescing on GT200

ACM Class: B.3.4

arXiv:0901.0866 [pdf]

Folding@Home and Genome@Home: Using distributed computing to tackle previously intractable problems in computational biology

Authors: Stefan M. Larson, Christopher D. Snow, Michael Shirts, Vijay S. Pande

Abstract: For decades, researchers have been applying computer simulation to address problems in biology. However, many of these "grand challenges" in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding protein would require decades on the fastest modern CPUs. Here, we review novel methods to fundamental… ▽ More For decades, researchers have been applying computer simulation to address problems in biology. However, many of these "grand challenges" in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding protein would require decades on the fastest modern CPUs. Here, we review novel methods to fundamentally speed such previously intractable problems using a new computational paradigm: distributed computing. By efficiently harnessing tens of thousands of computers throughout the world, we have been able to break previous computational barriers. However, distributed computing brings new challenges, such as how to efficiently divide a complex calculation of many PCs that are connected by relatively slow networking. Moreover, even if the challenge of accurately reproducing reality can be conquered, a new challenge emerges: how can we take the results of these simulations (typically tens to hundreds of gigabytes of raw data) and gain some insight into the questions at hand. This challenge of the analysis of the sea of data resulting from large-scale simulation will likely remain for decades to come. △ Less

Submitted 7 January, 2009; originally announced January 2009.

arXiv:0802.0522 [pdf, other]

doi 10.1529/biophysj.108.131037

Potential for modulation of the hydrophobic effect inside chaperonins

Authors: Jeremy L. England, Vijay S. Pande

Abstract: Despite the spontaneity of some in vitro protein folding reactions, native folding in vivo often requires the participation of barrel-shaped multimeric complexes known as chaperonins. Although it has long been known that chaperonin substrates fold upon sequestration inside the chaperonin barrel, the precise mechanism by which confinement within this space facilitates folding remains unknown. In… ▽ More Despite the spontaneity of some in vitro protein folding reactions, native folding in vivo often requires the participation of barrel-shaped multimeric complexes known as chaperonins. Although it has long been known that chaperonin substrates fold upon sequestration inside the chaperonin barrel, the precise mechanism by which confinement within this space facilitates folding remains unknown. In this study, we examine the possibility that the chaperonin mediates a favorable reorganization of the solvent for the folding reaction. We begin by discussing the effect of electrostatic charge on solvent-mediated hydrophobic forces in an aqueous environment. Based on these initial physical arguments, we construct a simple, phenomenological theory for the thermodynamics of density and hydrogen bond order fluctuations in liquid water. Within the framework of this model, we investigate the effect of confinement within a chaperonin-like cavity on the configurational free energy of water by calculating solvent free energies for cavities corresponding to the different conformational states in the ATP- driven catalytic cycle of the prokaryotic chaperonin GroEL. Our findings suggest that one function of chaperonins may be to trap unfolded proteins and subsequently expose them to a micro-environment in which the hydrophobic effect, a crucial thermodynamic driving force for folding, is enhanced. △ Less

Submitted 4 February, 2008; originally announced February 2008.

arXiv:cond-mat/9609062 [pdf, ps, other]

doi 10.1103/PhysRevLett.77.3565

Freezing Transition of Compact Polyampholytes

Authors: Vijay S. Pande, Alexander Yu. Grosberg, Chris Joerg, Mehran Kardar, Toyoichi Tanaka

Abstract: Polyampholytes (PAs) are heteropolymers with long range Coulomb interactions. Unlike polymers with short range forces, PA energy levels have non-vanishing correlations and are thus very different from the Random Energy Model (REM). Nevertheless, if charges in the PA globule are screened as in a regular plasma, PAs freeze in REM fashion. Our results shed light on the potential role of Coulomb int… ▽ More Polyampholytes (PAs) are heteropolymers with long range Coulomb interactions. Unlike polymers with short range forces, PA energy levels have non-vanishing correlations and are thus very different from the Random Energy Model (REM). Nevertheless, if charges in the PA globule are screened as in a regular plasma, PAs freeze in REM fashion. Our results shed light on the potential role of Coulomb interactions in folding and evolution of {\it proteins}, which are weakly charged PAs, in particular making connection with the finding that sequences of charged amino acids in proteins are not random. △ Less

Submitted 5 September, 1996; originally announced September 1996.

Comments: 4 pages, 3 eps figures

arXiv:cond-mat/9604147 [pdf, ps, other]

doi 10.1103/PhysRevLett.76.3987

Is Heteropolymer Freezing Well Described by the Random Energy Model?

Authors: Vijay S. Pande, Alexander Yu. Grosberg, Chris Joerg, Toyoichi Tanaka

Abstract: It is widely held that the Random Energy Model (REM) describes the freezing transition of a variety of types of heteropolymers. We demonstrate that the hallmark property of REM, statistical independence of the energies of states over disorder, is violated in different ways for models commonly employed in heteropolymer freezing studies. The implications for proteins are also discussed. It is widely held that the Random Energy Model (REM) describes the freezing transition of a variety of types of heteropolymers. We demonstrate that the hallmark property of REM, statistical independence of the energies of states over disorder, is violated in different ways for models commonly employed in heteropolymer freezing studies. The implications for proteins are also discussed. △ Less

Submitted 23 April, 1996; originally announced April 1996.

Comments: 4 pages, 3 eps figures To appear in Physical Review Letters, May 1996

arXiv:cond-mat/9510123 [pdf, ps, other]

doi 10.1063/1.470009

How Accurate Must Potentials Be for Successful Modeling of Protein Folding?

Authors: Vijay S. Pande, Alexander Yu. Grosberg, Toyoichi Tanaka

Abstract: Protein sequences are believed to have been selected to provide the stability of, and reliable renaturation to, an encoded unique spatial fold. In recently proposed theoretical schemes, this selection is modeled as ``minimal frustration,'' or ``optimal energy'' of the desirable target conformation over all possible sequences, such that the ``design'' of the sequence is governed by the interactio… ▽ More Protein sequences are believed to have been selected to provide the stability of, and reliable renaturation to, an encoded unique spatial fold. In recently proposed theoretical schemes, this selection is modeled as ``minimal frustration,'' or ``optimal energy'' of the desirable target conformation over all possible sequences, such that the ``design'' of the sequence is governed by the interactions between monomers. With replica mean field theory, we examine the possibility to reconstruct the renaturation, or freezing transition, of the ``designed'' heteropolymer given the inevitable errors in the determination of interaction energies, that is, the difference between sets (matrices) of interactions governing chain design and conformations, respectively. We find that the possibility of folding to the designed conformation is controlled by the correlations of the elements of the design and renaturation interaction matrices; unlike random heteropolymers, the ground state of designed heteropolymers is sufficiently stable, such that even a substantial error in the interaction energy should still yield correct renaturation. △ Less

Submitted 20 October, 1995; originally announced October 1995.

Comments: 28 pages, 3 postscript figures; tared, compressed, uuencoded

arXiv:cond-mat/9412006 [pdf, ps, other]

doi 10.1103/PhysRevE.51.3381

Freezing Transition of Random Heteropolymers Consisting of an Arbitrary Set of Monomers

Authors: Vijay S. Pande, Alexander Yu. Grosberg, Toyoichi Tanaka

Abstract: Mean field replica theory is employed to analyze the freezing transition of random heteropolymers comprised of an arbitrary number ($q$) of types of monomers. Our formalism assumes that interactions are short range and heterogeneity comes only from pairwise interactions, which are defined by an arbitrary $q \times q$ matrix. We show that, in general, there exists a freezing transition from a ran… ▽ More Mean field replica theory is employed to analyze the freezing transition of random heteropolymers comprised of an arbitrary number ($q$) of types of monomers. Our formalism assumes that interactions are short range and heterogeneity comes only from pairwise interactions, which are defined by an arbitrary $q \times q$ matrix. We show that, in general, there exists a freezing transition from a random globule, in which the thermodynamic equilibrium is comprised of an essentially infinite number polymer conformations, to a frozen globule, in which equilibrium ensemble is dominated by one or very few conformations. We also examine some special cases of interaction matrices to analyze the relationship between the freezing transition and the nature of interactions involved. △ Less

Submitted 1 December, 1994; originally announced December 1994.

Comments: 30 pages, 1 postscript figure

Showing 1–45 of 45 results for author: Pande, V S