-
What geometrically constrained folding models can tell us about real-world protein contact maps
Authors:
Nora Molkenthin,
J. J. Güven,
Steffen Mühle,
Antonia S. J. S. Mey
Abstract:
The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world 3D protein-protein properties well. One approach is using protein contact maps to better understand proteins' properties. Here, we investigate the…
▽ More
The mechanisms by which a protein's 3D structure can be determined based on its amino acid sequence have long been one of the key mysteries of biophysics. Often simplistic models, such as those derived from geometric constraints, capture bulk real-world 3D protein-protein properties well. One approach is using protein contact maps to better understand proteins' properties. Here, we investigate the emergent behaviour of contact maps for different geometrically constrained models and real-world protein systems. We derive an analytical approximation for the distribution of model amino acid distances, $s$, by means of a mean-field approach. This approximation is then validated for simulations using a 2D and 3D random interaction model, as well as from contact maps of real-world protein data. Using data from the RCSB Protein Data Bank (PDB) and AlphaFold~2 database, the analytical approximation is fitted to protein chain lengths of $L\approx100$, $L\approx200$, and $L\approx300$. While a universal scaling behaviour for protein chains of different lengths could not be deduced, we present evidence that the amino acid distance distributions can be attributed to geometric constraints of protein chains in bulk and amino acid sequences only play a secondary role.
△ Less
Submitted 29 December, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks
Authors:
David F. Hahn,
Christopher I. Bayly,
Hannah E. Bruce Macdonald,
John D. Chodera,
Vytautas Gapsys,
Antonia S. J. S. Mey,
David L. Mobley,
Laura Perez Benito,
Christina E. M. Schindler,
Gary Tresadern,
Gregory L. Warren
Abstract:
Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and develop…
▽ More
Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. These assessments require construction of a benchmark - a set of well-prepared, high quality systems with corresponding experimental measurements designed to ensure the resulting calculations provide a realistic assessment of expected performance when these methods are deployed within their domains of applicability. To date, the community has not yet adopted a common standardized benchmark, and existing benchmark reports suffer from a myriad of issues, including poor data quality, limited statistical power, and statistically deficient analyses, all of which can conspire to produce benchmarks that are poorly predictive of real-world performance. Here, we address these issues by presenting guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields.
△ Less
Submitted 12 November, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states
Authors:
Hao Wu,
Antonia S. J. S. Mey,
Edina Rosta,
Frank Noé
Abstract:
We propose a discrete transition-based reweighting analysis method (dTRAM) for analyzing configuration-space-discretized simulation trajectories produced at different thermodynamic states (temperatures, Hamiltonians, etc.) dTRAM provides maximum-likelihood estimates of stationary quantities (probabilities, free energies, expectation values) at any thermodynamic state. In contrast to the weighted h…
▽ More
We propose a discrete transition-based reweighting analysis method (dTRAM) for analyzing configuration-space-discretized simulation trajectories produced at different thermodynamic states (temperatures, Hamiltonians, etc.) dTRAM provides maximum-likelihood estimates of stationary quantities (probabilities, free energies, expectation values) at any thermodynamic state. In contrast to the weighted histogram analysis method (WHAM), dTRAM does not require data to be sampled from global equilibrium, and can thus produce superior estimates for enhanced sampling data such as parallel/simulated tempering, replica exchange, umbrella sampling, or metadynamics. In addition, dTRAM provides optimal estimates of Markov state models (MSMs) from the discretized state-space trajectories at all thermodynamic states. Under suitable conditions, these MSMs can be used to calculate kinetic quantities (e.g. rates, timescales). In the limit of a single thermodynamic state, dTRAM estimates a maximum likelihood reversible MSM, while in the limit of uncorrelated sampling data, dTRAM is identical to WHAM. dTRAM is thus a generalization to both estimators.
△ Less
Submitted 30 November, 2014; v1 submitted 11 November, 2014;
originally announced November 2014.
-
xTRAM: Estimating equilibrium expectations from time-correlated simulation data at multiple thermodynamic states
Authors:
Antonia S. J. S. Mey,
Hao Wu,
Frank Noé
Abstract:
Computing the equilibrium properties of complex systems, such as free energy differences, is often hampered by rare events in the dynamics. Enhanced sampling methods may be used in order to speed up sampling by, for example, using high temperatures, as in parallel tempering, or simulating with a biasing potential such as in the case of umbrella sampling. The equilibrium properties of the thermodyn…
▽ More
Computing the equilibrium properties of complex systems, such as free energy differences, is often hampered by rare events in the dynamics. Enhanced sampling methods may be used in order to speed up sampling by, for example, using high temperatures, as in parallel tempering, or simulating with a biasing potential such as in the case of umbrella sampling. The equilibrium properties of the thermodynamic state of interest (e.g., lowest temperature or unbiased potential) can be computed using reweighting estimators such as the weighted histogram analysis method or the multistate Bennett acceptance ratio (MBAR). weighted histogram analysis method and MBAR produce unbiased estimates, the simulation samples from the global equilibria at their respective thermodynamic state--a requirement that can be prohibitively expensive for some simulations such as a large parallel tempering ensemble of an explicitly solvated biomolecule. Here, we introduce the transition-based reweighting analysis method (TRAM)--a class of estimators that exploit ideas from Markov modeling and only require the simulation data to be in local equilibrium within subsets of the configuration space. We formulate the expanded TRAM (xTRAM) estimator that is shown to be asymptotically unbiased and a generalization of MBAR. Using four exemplary systems of varying complexity, we demonstrate the improved convergence (ranging from a twofold improvement to several orders of magnitude) of xTRAM in comparison to a direct counting estimator and MBAR, with respect to the invested simulation effort. Lastly, we introduce a random-swapping simulation protocol that can be used with xTRAM, gaining orders-of-magnitude advantages over simulation protocols that require the constraint of sampling from a global equilibrium.
△ Less
Submitted 10 February, 2015; v1 submitted 1 July, 2014;
originally announced July 2014.
-
Rare-event trajectory ensemble analysis reveals metastable dynamical phases in lattice proteins
Authors:
Antonia S. J. S. Mey,
Phillip L. Geissler,
Juan P. Garrahan
Abstract:
We explore the dynamical large-deviations of a lattice heteropolymer model of a protein by means of path sampling of trajectories. We uncover the existence of non-equilibrium dynamical phase-transitions in ensembles of trajectories between active and inactive dynamical phases, whose nature depends on properties of the interaction potential. When the full heterogeneity of interactions due to the am…
▽ More
We explore the dynamical large-deviations of a lattice heteropolymer model of a protein by means of path sampling of trajectories. We uncover the existence of non-equilibrium dynamical phase-transitions in ensembles of trajectories between active and inactive dynamical phases, whose nature depends on properties of the interaction potential. When the full heterogeneity of interactions due to the amino-acid sequence is preserved, as in a fully interacting model or in a heterogeneous version of the Gō model where only native interactions are considered, the transition is between the equilibrium native state and a highly native but kinetically trapped state. In contrast, for the homogeneous Gō model, where there is a single native energy and the sequence plays no role, the dynamical transition is a direct consequence of the static bi-stability between unfolded and native states. In the heterogeneous case the native-active and native-inactive states, despite their static similarity, have widely varying dynamical properties, and the transition between them occurs even in lattice proteins whose sequences are designed to make them optimal folders.
△ Less
Submitted 24 May, 2013;
originally announced May 2013.