-
How pairwise coevolutionary models capture the collective residue variability in proteins
Authors:
Matteo Figliuzzi,
Pierre Barrat-Charlaix,
Martin Weigt
Abstract:
Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness ef…
▽ More
Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness effects of amino-acid substitutions in proteins, and to predict evolutionary conserved protein-protein interactions. These models are based on two currently unjustified hypotheses: (a) correlations in the amino-acid usage of different positions are resulting collectively from networks of direct couplings; and (b) pairwise couplings are sufficient to capture the amino-acid variability. Here we propose a highly precise inference scheme based on Boltzmann-machine learning, which allows us to systematically address these hypotheses. We show how correlations are built up in a highly collective way by a large number of coupling paths, which are based on the protein's three-dimensional structure. We further find that pairwise coevolutionary models capture the collective residue variability across homologous proteins even for quantities which are not imposed by the inference procedure, like three-residue correlations, the clustered structure of protein families in sequence space or the sequence distances between homologs. These findings strongly suggest that pairwise coevolutionary models are actually sufficient to accurately capture the residue variability in homologous protein families.
△ Less
Submitted 12 January, 2018;
originally announced January 2018.
-
Inverse Statistical Physics of Protein Sequences: A Key Issues Review
Authors:
Simona Cocco,
Christoph Feinauer,
Matteo Figliuzzi,
Remi Monasson,
Martin Weigt
Abstract:
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e.~evolutionarily related protein sequences, to which method…
▽ More
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e.~evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
Improving landscape inference by integrating heterogeneous data in the inverse Ising problem
Authors:
Pierre Barrat-Charlaix,
Matteo Figliuzzi,
Martin Weigt
Abstract:
The inverse Ising problem and its generalizations to Potts and continuous spin models have recently attracted much attention thanks to their successful applications in the statistical modeling of biological data. In the standard setting, the parameters of an Ising model (couplings and fields) are inferred using a sample of equilibrium configurations drawn from the Boltzmann distribution. However,…
▽ More
The inverse Ising problem and its generalizations to Potts and continuous spin models have recently attracted much attention thanks to their successful applications in the statistical modeling of biological data. In the standard setting, the parameters of an Ising model (couplings and fields) are inferred using a sample of equilibrium configurations drawn from the Boltzmann distribution. However, in the context of biological applications, quantitative information for a limited number of microscopic spins configurations has recently become available. In this paper, we extend the usual setting of the inverse Ising model by developing an integrative approach combining the equilibrium sample with (possibly noisy) measurements of the energy performed for a number of arbitrary configurations. Using simulated data, we show that our integrative approach outperforms standard inference based only on the equilibrium sample or the energy measurements, including error correction of noisy energy measurements. As a biological proof-of-concept application, we show that mutational fitness landscapes in proteins can be better described when combining evolutionary sequence data with complementary structural information about mutant sequences.
△ Less
Submitted 5 November, 2016; v1 submitted 19 September, 2016;
originally announced September 2016.
-
Probing the limits to microRNA-mediated control of gene expression
Authors:
Araks Martirosyan,
Matteo Figliuzzi,
Enzo Marinari,
Andrea De Martino
Abstract:
According to the `ceRNA hypothesis', microRNAs (miRNAs) may act as mediators of an effective positive interaction between long coding or non-coding RNA molecules, carrying significant potential implications for a variety of biological processes. Here, inspired by recent work providing a quantitative description of small regulatory elements as information-conveying channels, we characterize the eff…
▽ More
According to the `ceRNA hypothesis', microRNAs (miRNAs) may act as mediators of an effective positive interaction between long coding or non-coding RNA molecules, carrying significant potential implications for a variety of biological processes. Here, inspired by recent work providing a quantitative description of small regulatory elements as information-conveying channels, we characterize the effectiveness of miRNA-mediated regulation in terms of the optimal information flow achievable between modulator (transcription factors) and target nodes (long RNAs). Our findings show that, while a sufficiently large degree of target derepression is needed to activate miRNA-mediated transmission, (a) in case of differential mechanisms of complex processing and/or transcriptional capabilities, regulation by a post-transcriptional miRNA-channel can outperform that achieved through direct transcriptional control; moreover, (b) in the presence of large populations of weakly interacting miRNA molecules the extra noise coming from titration disappears, allowing the miRNA-channel to process information as effectively as the direct channel. These observations establish the limits of miRNA-mediated post-transcriptional cross-talk and suggest that, besides providing a degree of noise buffering, this type of control may be effectively employed in cells both as a failsafe mechanism and as a preferential fine tuner of gene expression, pointing to the specific situations in which each of these functionalities is maximized.
△ Less
Submitted 26 January, 2016;
originally announced January 2016.
-
Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1
Authors:
Matteo Figliuzzi,
Hervé Jacquier,
Alexander Schug,
Olivier Tenaillon,
Martin Weigt
Abstract:
The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, e.g., of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large al…
▽ More
The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, e.g., of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large alignments of homologs of the protein of interest. Our method is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear. Compared to recent large-scale mutagenesis data of the beta-lactamase TEM-1, a protein providing resistance against beta-lactam antibiotics, our method leads to an increase of about 40% in explicative power as compared to approaches neglecting epistasis. We find that the informative sequence context extends to residues at native distances of about 20Å from the mutated site, reaching thus far beyond residues in direct physical contact.
△ Less
Submitted 12 October, 2015;
originally announced October 2015.
-
RNA-based regulation: dynamics and response to perturbations of competing RNAs
Authors:
Matteo Figliuzzi,
Andrea De Martino,
Enzo Marinari
Abstract:
The observation that, through a titration mechanism, microRNAs (miRNAs) can act as mediators of effective interactions among their common targets (competing endogenous RNAs or ceRNAs) has brought forward the idea ('ceRNA hypothesis') that RNAs can regulate each other in extended 'cross-talk' networks. Such an ability might play a major role in post-transcriptional regulation (PTR) in shaping a cel…
▽ More
The observation that, through a titration mechanism, microRNAs (miRNAs) can act as mediators of effective interactions among their common targets (competing endogenous RNAs or ceRNAs) has brought forward the idea ('ceRNA hypothesis') that RNAs can regulate each other in extended 'cross-talk' networks. Such an ability might play a major role in post-transcriptional regulation (PTR) in shaping a cell's protein repertoire. Recent work focusing on the emergent properties of the cross-talk networks has emphasized the high flexibility and selectivity that may be achieved at stationarity. On the other hand, dynamical aspects, possibly crucial on the relevant time scales, are far less clear. We have carried out a dynamical study of the ceRNA hypothesis on a model of PTR. Sensitivity analysis shows that ceRNA cross-talk is dynamically extended, i.e. it may take place on time scales shorter than those required to achieve stationairity even in cases where no cross-talk occurs in the steady state, and is possibly amplified. Besides, in case of large, transfection-like perturbations the system may develop strongly non-linear, threshold response. Finally, we show that the ceRNA effect provides a very efficient way for a cell to achieve fast positive shifts in the level of a ceRNA when necessary. These results indicate that competition for miRNAs may indeed provide an elementary mechanism to achieve system-level regulatory effects on the transcriptome over physiologically relevant time scales.
△ Less
Submitted 19 December, 2013;
originally announced December 2013.
-
MicroRNAs as a selective channel of communication between competing RNAs: a steady-state theory
Authors:
Matteo Figliuzzi,
Enzo Marinari,
Andrea De Martino
Abstract:
It has recently been suggested that the competition for a finite pool of microRNAs (miRNA) gives rise to effective interactions among their common targets (competing endogenous RNAs or ceRNAs) that could prove to be crucial for post-transcriptional regulation (PTR). We have studied a minimal model of PTR where the emergence and the nature of such interactions can be characterized in detail at stea…
▽ More
It has recently been suggested that the competition for a finite pool of microRNAs (miRNA) gives rise to effective interactions among their common targets (competing endogenous RNAs or ceRNAs) that could prove to be crucial for post-transcriptional regulation (PTR). We have studied a minimal model of PTR where the emergence and the nature of such interactions can be characterized in detail at steady state. Sensitivity analysis shows that binding free energies and repression mechanisms are the key ingredients for the cross-talk between ceRNAs to arise. Interactions emerge in specific ranges of repression values, can be symmetrical (one ceRNA influences another and vice-versa) or asymmetrical (one ceRNA influences another but not the reverse) and may be highly selective, while possibly limited by noise. In addition, we show that non-trivial correlations among ceRNAs can emerge in experimental readouts due to transcriptional fluctuations even in absence of miRNA-mediated cross-talk.
△ Less
Submitted 30 January, 2013; v1 submitted 8 October, 2012;
originally announced October 2012.
-
A scalable algorithm to explore the Gibbs energy landscape of genome-scale metabolic networks
Authors:
Daniele De Martino,
Matteo Figliuzzi,
Andrea De Martino,
Enzo Marinari
Abstract:
The integration of various types of genomic data into predictive models of biological networks is one of the main challenges currently faced by computational biology. Constraint-based models in particular play a key role in the attempt to obtain a quantitative understanding of cellular metabolism at genome scale. In essence, their goal is to frame the metabolic capabilities of an organism based on…
▽ More
The integration of various types of genomic data into predictive models of biological networks is one of the main challenges currently faced by computational biology. Constraint-based models in particular play a key role in the attempt to obtain a quantitative understanding of cellular metabolism at genome scale. In essence, their goal is to frame the metabolic capabilities of an organism based on minimal assumptions that describe the steady states of the underlying reaction network via suitable stoichiometric constraints, specifically mass balance and energy balance (i.e. thermodynamic feasibility). The implementation of these requirements to generate viable configurations of reaction fluxes and/or to test given flux profiles for thermodynamic feasibility can however prove to be computationally intensive. We propose here a fast and scalable stoichiometry-based method to explore the Gibbs energy landscape of a biochemical network at steady state. The method is applied to the problem of reconstructing the Gibbs energy landscape underlying metabolic activity in the human red blood cell, and to that of identifying and removing thermodynamically infeasible reaction cycles in the Escherichia coli metabolic network (iAF1260). In the former case, we produce consistent predictions for chemical potentials (or log-concentrations) of intracellular metabolites; in the latter, we identify a restricted set of loops (23 in total) in the periplasmic and cytoplasmic core as the origin of thermodynamic infeasibility in a large sample ($10^6$) of flux configurations generated randomly and compatibly with the prior information available on reaction reversibility.
△ Less
Submitted 22 June, 2012;
originally announced June 2012.
-
Computing fluxes and chemical potential distributions in biochemical networks: energy balance analysis of the human red blood cell
Authors:
Daniele De Martino,
Matteo Figliuzzi,
Andrea De Martino,
Enzo Marinari
Abstract:
The analysis of non-equilibrium steady states of biochemical reaction networks relies on finding the configurations of fluxes and chemical potentials satisfying stoichiometric (mass balance) and thermodynamic (energy balance) constraints. Efficient methods to explore such states are crucial to predict reaction directionality, calculate physiologic ranges of variability, estimate correlations, and…
▽ More
The analysis of non-equilibrium steady states of biochemical reaction networks relies on finding the configurations of fluxes and chemical potentials satisfying stoichiometric (mass balance) and thermodynamic (energy balance) constraints. Efficient methods to explore such states are crucial to predict reaction directionality, calculate physiologic ranges of variability, estimate correlations, and reconstruct the overall energy balance of the network from the underlying molecular processes. While different techniques for sampling the space generated by mass balance constraints are currently available, thermodynamics is generically harder to incorporate. Here we introduce a method to sample the free energy landscape of a reaction network at steady state. In its most general form, it allows to calculate distributions of fluxes and concentrations starting from trial functions that may contain prior biochemical information. We apply our method to the human red blood cell's metabolic network, whose space of mass-balanced flux states has been sampled extensively in recent years. Specifically, we profile its thermodynamically feasible flux configurations, characterizing in detail how fluctuations of fluxes and potentials are correlated. Based on this, we derive the cell's energy balance in terms of entropy production, chemical work done and thermodynamic efficiency.
△ Less
Submitted 12 July, 2011;
originally announced July 2011.
-
One way to grow, many ways to shrink: the reversible Von Neumann expanding model
Authors:
A. De Martino,
M. Figliuzzi,
M. Marsili
Abstract:
We study the solutions of Von Neumann's expanding model with reversible processes for an infinite reaction network. We show that, contrary to the irreversible case, the solution space need not be convex in contracting phases (i.e. phases where the concentrations of reagents necessarily decrease over time). At optimality, this implies that, while multiple dynamical paths of global contraction exist…
▽ More
We study the solutions of Von Neumann's expanding model with reversible processes for an infinite reaction network. We show that, contrary to the irreversible case, the solution space need not be convex in contracting phases (i.e. phases where the concentrations of reagents necessarily decrease over time). At optimality, this implies that, while multiple dynamical paths of global contraction exist, optimal expansion is achieved by a unique time evolution of reaction fluxes. This scenario is investigated in a statistical mechanics framework by a replica symmetric theory. The transition from a non-convex to a convex solution space, which turns out to be well described by a phenomenological order parameter (the fraction of unused reversible reactions) is analyzed numerically.
△ Less
Submitted 7 June, 2010;
originally announced June 2010.