-
Quantitative and Predictive Folding Models from Limited Single-Molecule Data Using Simulation-Based Inference
Authors:
Lars Dingeldein,
Aaron Lyons,
Pilar Cossio,
Michael Woodside,
Roberto Covino
Abstract:
The study of biomolecular folding has been greatly advanced by single-molecule force spectroscopy (SMFS), which enables the observation of the dynamics of individual molecules. However, extracting quantitative models of fundamental properties such as folding landscapes from SNFS data is very challenging due to instrumental noise, linker artifacts, and the inherent stochasticity of the process, oft…
▽ More
The study of biomolecular folding has been greatly advanced by single-molecule force spectroscopy (SMFS), which enables the observation of the dynamics of individual molecules. However, extracting quantitative models of fundamental properties such as folding landscapes from SNFS data is very challenging due to instrumental noise, linker artifacts, and the inherent stochasticity of the process, often requiring extensive datasets and complex calibration experiments. Here, we introduce a framework based on simulation-based inference (SBI) that overcomes these limitations by integrating physics-based modeling with deep learning. We apply this framework to analyze constant-force measurements of a DNA hairpin. From a single, short experimental trajectory of only two seconds, we successfully reconstruct the hairpin's free energy landscape and folding dynamics, obtaining results that are in close agreement with established deconvolution methods that require approximately 100 times more data. Furthermore, the Bayesian nature of this approach robustly quantifies uncertainties for inferred parameter values, including the free-energy profile, diffusion coefficients, and linker stiffness, without needing independent measurements of instrumental properties. The inferred model is predictive, generating simulated trajectories that quantitatively reproduce the thermodynamic and kinetic properties of the experimental data. This work establishes SBI as a highly efficient and powerful tool for analyzing single-molecule experiments. The ability to derive statistically robust models from minimal datasets is crucial for investigating complex biomolecular systems where extensive data collection is impractical or impossible. Consequently, our SBI framework enables the rigorous quantitative analysis of previously intractable biomolecular systems, paving the way for novel applications of SMFS.
△ Less
Submitted 4 August, 2025;
originally announced August 2025.
-
Understanding Reaction Mechanisms from Start to Finish
Authors:
Rik S. Breebaart,
Gianmarco Lazzeri,
Roberto Covino,
Peter G. Bolhuis
Abstract:
Understanding mechanisms of rare but important events in complex molecular systems, such as protein folding or ligand (un)binding, requires accurately mapping transition paths from an initial to a final state. The committor is the ideal reaction coordinate for this purpose, but calculating it for high-dimensional, nonlinear systems has long been considered intractable. Here, we introduce an iterat…
▽ More
Understanding mechanisms of rare but important events in complex molecular systems, such as protein folding or ligand (un)binding, requires accurately mapping transition paths from an initial to a final state. The committor is the ideal reaction coordinate for this purpose, but calculating it for high-dimensional, nonlinear systems has long been considered intractable. Here, we introduce an iterative path sampling strategy for computing the committor function for systems with high free energy barriers. We start with an initial guess to define isocommittor interfaces for transition interface sampling. The resulting path ensemble is then reweighted and used to train a neural network, yielding a more accurate committor model. This process is repeated until convergence, effectively solving the long-standing circular problem in enhanced sampling where a good reaction coordinate is needed to generate efficient sampling, and vice-versa. The final, converged committor model can be interrogated to extract mechanistic insights. We demonstrate the power of our method on a benchmark 2D potential and a more complex host-guest (un)binding process in explicit solvent.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Follow the MEP: Scalable Neural Representations for Minimum-Energy Path Discovery in Molecular Systems
Authors:
Magnus Petersen,
Gemma Roig,
Roberto Covino
Abstract:
Characterizing conformational transitions in physical systems remains a fundamental challenge, as traditional sampling methods struggle with the high-dimensional nature of molecular systems and high-energy barriers between stable states. These rare events often represent the most biologically significant processes, yet may require months of continuous simulation to observe. One way to understand t…
▽ More
Characterizing conformational transitions in physical systems remains a fundamental challenge, as traditional sampling methods struggle with the high-dimensional nature of molecular systems and high-energy barriers between stable states. These rare events often represent the most biologically significant processes, yet may require months of continuous simulation to observe. One way to understand the function and mechanics of such systems is through the minimum energy path (MEP), which represents the most probable transition pathway between stable states in the high-friction, low-temperature limit. We present a method that reformulates MEP discovery as a fast and scalable neural optimization problem. By representing paths as implicit neural representations and training with differentiable molecular force fields, our method discovers transition pathways without expensive sampling. Our approach scales to large biomolecular systems through a simple loss function derived from the path's likelihood via the Onsager-Machlup action and a scalable new architecture, AdaPath. We demonstrate this approach on two proteins, including an explicitly hydrated BPTI system with more than 3,500 atoms. Our method identifies a MEP that captures the same conformational change observed in a millisecond-scale molecular dynamics (MD) simulation in just minutes on a standard GPU, rather than weeks on a specialized cluster.
△ Less
Submitted 18 September, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
Optimal Rejection-Free Path Sampling
Authors:
Gianmarco Lazzeri,
Peter G. Bolhuis,
Roberto Covino
Abstract:
We propose an efficient novel path sampling-based framework designed to accelerate the investigation of rare events in complex molecular systems. A key innovation is the shift from sampling restricted path ensemble distributions, as in transition path sampling, to directly sampling the distribution of shooting points. This allows for a rejection-free algorithm that samples the entire path ensemble…
▽ More
We propose an efficient novel path sampling-based framework designed to accelerate the investigation of rare events in complex molecular systems. A key innovation is the shift from sampling restricted path ensemble distributions, as in transition path sampling, to directly sampling the distribution of shooting points. This allows for a rejection-free algorithm that samples the entire path ensemble efficiently. Optimal sampling is achieved by applying a selection bias that is the inverse of the free energy along a reaction coordinate. The optimal reaction coordinate, the committor, is iteratively constructed as a neural network using AI for Molecular Mechanism Discovery (AIMMD), concurrently with the free energy profile, which is obtained through reweighting the sampled path ensembles. We showcase our algorithm on theoretical and molecular bechnmarks, and demonstrate how it provides at the same time molecular mechanism, free energy, and rates at a moderate computational cost.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Simulation-based inference of single-molecule experiments
Authors:
Lars Dingeldein,
Pilar Cossio,
Roberto Covino
Abstract:
Single-molecule experiments are a unique tool to characterize the structural dynamics of biomolecules. However, reconstructing molecular details from noisy single-molecule data is challenging. Simulation-based inference (SBI) integrates statistical inference, physics-based simulators, and machine learning and is emerging as a powerful framework for analysing complex experimental data. Recent advan…
▽ More
Single-molecule experiments are a unique tool to characterize the structural dynamics of biomolecules. However, reconstructing molecular details from noisy single-molecule data is challenging. Simulation-based inference (SBI) integrates statistical inference, physics-based simulators, and machine learning and is emerging as a powerful framework for analysing complex experimental data. Recent advances in deep learning have accelerated the development of new SBI methods, enabling the application of Bayesian inference to an ever-increasing number of scientific problems. Here, we review the nascent application of SBI to the analysis of single-molecule experiments. We introduce parametric Bayesian inference and discuss its limitations. We then overview emerging deep-learning-based SBI methods to perform Bayesian inference for complex models encoded in computer simulators. We illustrate the first applications of SBI to single-molecule force-spectroscopy and cryo-electron microscopy experiments. SBI allows us to leverage powerful computer algorithms modeling complex biomolecular phenomena to connect scientific models and experiments in a principled way.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Free energy, rates, and mechanism of transmembrane dimerization in lipid bilayers from dynamically unbiased molecular dynamics simulations
Authors:
Emil Jackel,
Gianmarco Lazzeri,
Roberto Covino
Abstract:
The assembly of proteins in membranes plays a key role in many crucial cellular pathways. Despite their importance, characterizing transmembrane assembly remains challenging for experiments and simulations. Equilibrium molecular dynamics simulations do not cover the time scales required to sample the typical transmembrane assembly. Hence, most studies rely on enhanced sampling schemes that steer t…
▽ More
The assembly of proteins in membranes plays a key role in many crucial cellular pathways. Despite their importance, characterizing transmembrane assembly remains challenging for experiments and simulations. Equilibrium molecular dynamics simulations do not cover the time scales required to sample the typical transmembrane assembly. Hence, most studies rely on enhanced sampling schemes that steer the dynamics of transmembrane proteins along a collective variable that should encode all slow degrees of freedom. However, given the complexity of the condensed-phase lipid environment, this is far from trivial, with the consequence that free energy profiles of dimerization can be poorly converged. Here, we introduce an alternative approach, which relies only on simulating short, dynamically unbiased trajectory segments, avoiding using collective variables or biasing forces. By merging all trajectories, we obtain free energy profiles, rates, and mechanisms of transmembrane dimerization with the same set of simulations. We showcase our algorithm by sampling the spontaneous association and dissociation of a transmembrane protein in a lipid bilayer, the popular coarse-grained Martini force field. Our algorithm represents a promising way to investigate assembly processes in biologically relevant membranes, overcoming some of the challenges of conventional methods.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Sampling a rare protein transition with a hybrid classical-quantum computing algorithm
Authors:
Danial Ghamari,
Roberto Covino,
Pietro Faccioli
Abstract:
Simulating spontaneous structural rearrangements in macromolecules with classical Molecular Dynamics (MD) is an outstanding challenge. Conventional supercomputers can access time intervals up to tens of $μ$s, while many key events occur on exponentially longer time scales. Transition path sampling techniques have the advantage of focusing the computational power on barrier-crossing trajectories, b…
▽ More
Simulating spontaneous structural rearrangements in macromolecules with classical Molecular Dynamics (MD) is an outstanding challenge. Conventional supercomputers can access time intervals up to tens of $μ$s, while many key events occur on exponentially longer time scales. Transition path sampling techniques have the advantage of focusing the computational power on barrier-crossing trajectories, but generating uncorrelated transition paths that explore diverse conformational regions remains an unsolved problem. We employ a path-sampling paradigm combining machine learning (ML) with quantum computing (QC) to address this issue. We use ML on a classical computer to perform a preliminary uncharted exploration of the conformational space. The data set generated in this exploration is then post-processed to obtain a network representation of the reactive kinetics.
Quantum annealing machines can exploit quantum superposition to encode all the transition pathways in this network in the initial quantum state and ensure the generation of completely uncorrelated transition paths. In particular, we resort to the DWAVE quantum computer to perform an all-atom simulation of a protein conformational transition that occurs on the ms timescale. Our results match those of a special purpose supercomputer designed to perform MD simulations. These results highlight the role of biomolecular simulation as a ground for applying, testing, and advancing quantum technologies.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Molecular free energies, rates, and mechanisms from data-efficient path sampling simulations
Authors:
Gianmarco Lazzeri,
Hendrik Jung,
Peter G. Bolhuis,
Roberto Covino
Abstract:
Molecular dynamics is a powerful tool for studying the thermodynamics and kinetics of complex molecular events. However, these simulations can rarely sample the required time scales in practice. Transition path sampling overcomes this limitation by collecting unbiased trajectories capturing the relevant events. Moreover, the integration of machine learning can boost the sampling while simultaneous…
▽ More
Molecular dynamics is a powerful tool for studying the thermodynamics and kinetics of complex molecular events. However, these simulations can rarely sample the required time scales in practice. Transition path sampling overcomes this limitation by collecting unbiased trajectories capturing the relevant events. Moreover, the integration of machine learning can boost the sampling while simultaneously learning a quantitative representation of the mechanism. Still, the resulting trajectories are by construction non-Boltzmann-distributed, preventing the calculation of free energies and rates. We developed an algorithm to approximate the equilibrium path ensemble from machine learning-guided path sampling data. At the same time, our algorithm provides efficient sampling, the mechanism, free energy, and rates of rare molecular events at a very moderate computational cost. We tested the method on the folding of the mini-protein chignolin. Our algorithm is straightforward and data-efficient, opening the door to applications on many challenging molecular systems.
△ Less
Submitted 28 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Simulation-based inference of single-molecule force spectroscopy
Authors:
Lars Dingeldein,
Pilar Cossio,
Roberto Covino
Abstract:
Single-molecule force spectroscopy (smFS) is a powerful approach to studying molecular self-organization. However, the coupling of the molecule with the ever-present experimental device introduces artifacts, that complicates the interpretation of these experiments. Performing statistical inference to learn hidden molecular properties is challenging because these measurements produce non-Markovian…
▽ More
Single-molecule force spectroscopy (smFS) is a powerful approach to studying molecular self-organization. However, the coupling of the molecule with the ever-present experimental device introduces artifacts, that complicates the interpretation of these experiments. Performing statistical inference to learn hidden molecular properties is challenging because these measurements produce non-Markovian time-series, and even minimal models lead to intractable likelihoods. To overcome these challenges, we developed a computational framework built on novel statistical methods called simulation-based inference (SBI). SBI enabled us to directly estimate the Bayesian posterior, and extract reduced quantitative models from smFS, by encoding a mechanistic model into a simulator in combination with probabilistic deep learning. Using synthetic data, we could systematically disentangle the measurement of hidden molecular properties from experimental artifacts. The integration of physical models with machine learning density estimation is general, transparent, easy to use, and broadly applicable to other types of biophysical experiments.
△ Less
Submitted 14 November, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
Sampling Rare Conformational Transitions with a Quantum Computer
Authors:
Danial Ghamari,
Philipp Hauke,
Roberto Covino,
Pietro Faccioli
Abstract:
Spontaneous structural rearrangements play a central role in the organization and function of complex biomolecular systems. In principle, physics-based computer simulations like Molecular Dynamics (MD) enable us to investigate these thermally activated processes with an atomic level of resolution. However, rare conformational transitions are intrinsically hard to investigate with MD, because an ex…
▽ More
Spontaneous structural rearrangements play a central role in the organization and function of complex biomolecular systems. In principle, physics-based computer simulations like Molecular Dynamics (MD) enable us to investigate these thermally activated processes with an atomic level of resolution. However, rare conformational transitions are intrinsically hard to investigate with MD, because an exponentially large fraction of computational resources must be invested to simulate thermal fluctuations in metastable states. Path sampling methods like Transition Path Sampling hold the great promise of focusing the available computational power on sampling the rare stochastic transition between metastable states. In these approaches, one of the outstanding limitations is to generate paths that visit significantly different regions of the conformational space at a low computational cost. To overcome these problems we introduce a rigorous approach that integrates a machine learning algorithm and MD simulations implemented on a classical computer with adiabatic quantum computing. First, using functional integral methods, we derive a rigorous low-resolution representation of the system's dynamics, based on a small set of molecular configurations generated with machine learning. Then, a quantum annealing machine is employed to explore the transition path ensemble of this low-resolution theory, without introducing un-physical biasing forces to steer the system's dynamics. Using the D-Wave quantum computer, we validate our scheme by simulating a benchmark conformational transition in a state-of-the-art atomistic description. We show that the quantum computing step generates uncorrelated trajectories, thus facilitating the sampling of the transition region in configuration space. Our results provide a new paradigm for MD simulations to integrate machine learning and quantum computing.
△ Less
Submitted 8 February, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Autonomous artificial intelligence discovers mechanisms of molecular self-organization in virtual experiments
Authors:
Hendrik Jung,
Roberto Covino,
A Arjun,
Peter G. Bolhuis,
Gerhard Hummer
Abstract:
Molecular self-organization driven by concerted many-body interactions produces the ordered structures that define both inanimate and living matter. Understanding the physical mechanisms that govern the formation of molecular complexes and crystals is key to controlling the assembly of nanomachines and new materials. We present an artificial intelligence (AI) agent that uses deep reinforcement lea…
▽ More
Molecular self-organization driven by concerted many-body interactions produces the ordered structures that define both inanimate and living matter. Understanding the physical mechanisms that govern the formation of molecular complexes and crystals is key to controlling the assembly of nanomachines and new materials. We present an artificial intelligence (AI) agent that uses deep reinforcement learning and transition path theory to discover the mechanism of molecular self-organization phenomena from computer simulations. The agent adaptively learns how to sample complex molecular events and, on the fly, constructs quantitative mechanistic models. By using the mechanistic understanding for AI-driven sampling, the agent closes the learning cycle and overcomes time-scale gaps of many orders of magnitude. Symbolic regression condenses the mechanism into a human-interpretable form. Applied to ion association in solution, gas-hydrate crystal formation, and membrane-protein assembly, the AI agent identifies the many-body solvent motions governing the assembly process, discovers the variables of classical nucleation theory, and reveals competing assembly pathways. The mechanistic descriptions produced by the agent are predictive and transferable to close thermodynamic states and similar systems. Autonomous AI sampling has the power to discover assembly and reaction mechanisms from materials science to biology.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Molecular free energy profiles from force spectroscopy experiments by inversion of observed committors
Authors:
Roberto Covino,
Michael T. Woodside,
Gerhard Hummer,
Attila Szabo,
Pilar Cossio
Abstract:
In single-molecule force spectroscopy experiments, a biomolecule is attached to a force probe via polymer linkers, and the total extension -- of molecule plus apparatus -- is monitored as a function of time. In a typical unfolding experiment at constant force, the total extension jumps between two values that correspond to the folded and unfolded states of the molecule. For several biomolecular sy…
▽ More
In single-molecule force spectroscopy experiments, a biomolecule is attached to a force probe via polymer linkers, and the total extension -- of molecule plus apparatus -- is monitored as a function of time. In a typical unfolding experiment at constant force, the total extension jumps between two values that correspond to the folded and unfolded states of the molecule. For several biomolecular systems the committor, which is the probability to fold starting from a given extension, has been used to extract the molecular activation barrier (a technique known as "committor inversion"). In this work, we study the influence of the force probe, which is much larger than the molecule being measured, on the activation barrier obtained by committor inversion. We use a two-dimensional framework in which the diffusion coefficient of the molecule and of the pulling device can differ. We systematically study the free energy profile along the total extension obtained from the committor, by numerically solving the Onsager equation and using Brownian dynamics simulations. We analyze the dependence of the extracted barrier on the linker stiffness, molecular barrier height, and diffusion anisotropy, and thus, establish the range of validity of committor inversion. Along the way, we showcase the committor of 2-dimensional diffusive models and illustrate how it is affected by barrier asymmetry and diffusion anisotropy.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Artificial Intelligence Assists Discovery of Reaction Coordinates and Mechanisms from Molecular Dynamics Simulations
Authors:
Hendrik Jung,
Roberto Covino,
Gerhard Hummer
Abstract:
Exascale computing holds great opportunities for molecular dynamics (MD) simulations. However, to take full advantage of the new possibilities, we must learn how to focus computational power on the discovery of complex molecular mechanisms, and how to extract them from enormous amounts of data. Both aspects still rely heavily on human experts, which becomes a serious bottleneck when a large number…
▽ More
Exascale computing holds great opportunities for molecular dynamics (MD) simulations. However, to take full advantage of the new possibilities, we must learn how to focus computational power on the discovery of complex molecular mechanisms, and how to extract them from enormous amounts of data. Both aspects still rely heavily on human experts, which becomes a serious bottleneck when a large number of parallel simulations have to be orchestrated to take full advantage of the available computing power. Here, we use artificial intelligence (AI) both to guide the sampling and to extract the relevant mechanistic information. We combine advanced sampling schemes with statistical inference, artificial neural networks, and deep learning to discover molecular mechanisms from MD simulations. Our framework adaptively and autonomously initializes simulations and learns the sampled mechanism, and is thus suitable for massively parallel computing architectures. We propose practical solutions to make the neural networks interpretable, as illustrated in applications to molecular systems.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
iMapD: intrinsic Map Dynamics exploration for uncharted effective free energy landscapes
Authors:
Eliodoro Chiavazzo,
Ronald R. Coifman,
Roberto Covino,
C. William Gear,
Anastasia S. Georgiou,
Gerhard Hummer,
Ioannis G. Kevrekidis
Abstract:
We describe and implement iMapD, a computer-assisted approach for accelerating the exploration of uncharted effective Free Energy Surfaces (FES), and more generally for the extraction of coarse-grained, macroscopic information from atomistic or stochastic (here Molecular Dynamics, MD) simulations. The approach functionally links the MD simulator with nonlinear manifold learning techniques. The add…
▽ More
We describe and implement iMapD, a computer-assisted approach for accelerating the exploration of uncharted effective Free Energy Surfaces (FES), and more generally for the extraction of coarse-grained, macroscopic information from atomistic or stochastic (here Molecular Dynamics, MD) simulations. The approach functionally links the MD simulator with nonlinear manifold learning techniques. The added value comes from biasing the simulator towards new, unexplored phase space regions by exploiting the smoothness of the (gradually, as the exploration progresses) revealed intrinsic low-dimensional geometry of the FES.
△ Less
Submitted 31 December, 2016;
originally announced January 2017.
-
Folding Pathways of a Knotted Protein with a Realistic Atomistic Force Field
Authors:
Silvio a Beccara,
Tatjana Skrbic,
Roberto Covino,
Cristian Micheletti,
Pietro Faccioli
Abstract:
We report on atomistic simulation of the folding of a natively-knotted protein, MJ0366, based on a realistic force field. To the best of our knowledge this is the first reported effort where a realistic force field is used to investigate the folding pathways of a protein with complex native topology. By using the dominant-reaction pathway scheme we collected about 30 successful folding trajectorie…
▽ More
We report on atomistic simulation of the folding of a natively-knotted protein, MJ0366, based on a realistic force field. To the best of our knowledge this is the first reported effort where a realistic force field is used to investigate the folding pathways of a protein with complex native topology. By using the dominant-reaction pathway scheme we collected about 30 successful folding trajectories for the 82-amino acid long trefoil-knotted protein. Despite the dissimilarity of their initial unfolded configuration, these trajectories reach the natively-knotted state through a remarkably similar succession of steps. In particular it is found that knotting occurs essentially through a threading mechanism, involving the passage of the C-terminal through an open region created by the formation of the native beta-sheet at an earlier stage. The dominance of the knotting by threading mechanism is not observed in MJ0366 folding simulations using simplified, native-centric models. This points to a previously underappreciated role of concerted amino acid interactions, including non-native ones, in aiding the appropriate order of contact formation to achieve knotting.
△ Less
Submitted 8 February, 2013;
originally announced February 2013.