-
The energy-spectrum of bicompatible sequences
Authors:
Fenix W. Huang,
Christopher L. Barrett,
Christian M. Reidys
Abstract:
Background: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences i.e.~sequences that satisfy the base pairing constraints of a given RNA structure play an important role in the context of neutral networks and inverse folding. Sequences satisfying the constraints of two structures simultaneously a…
▽ More
Background: Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences i.e.~sequences that satisfy the base pairing constraints of a given RNA structure play an important role in the context of neutral networks and inverse folding. Sequences satisfying the constraints of two structures simultaneously are called bicompatible and phenotypic change, induced by erroneously replicating populations of RNA sequences, is closely connected to bicompatibility. Furthermore, bicompatible sequences are relevant for riboswitch sequences, beacons of evolution, realizing two distinct phenotypes.
Results: We present a full loop energy model Boltzmann sampler of bicompatible sequences for pairs of structures. The novel dynamic programming algorithm is based on a topological framework encapsulating the relations between loops. We utilize our sequence sampler to study the energy spectra and density of bicompatible sequences, the rankings of the structures and key properties for evolutionary transitions.
Conclusion: Our analysis of riboswitch sequences shows that key properties of bicompatible sequences depend on the particular pair of structures. While there always exist bicompatible sequences for random structure pairs, they are less suited to facilitate transitions. We show that native riboswitch sequences exhibit a distinct signature with regards to the ranking of their two phenotypes relative to the minimum free energy, suggesting a new criterion for identifying native sequences and sequences subjected to evolutionary pressure.
△ Less
Submitted 30 September, 2019;
originally announced October 2019.
-
On an enhancement of RNA probing data using Information Theory
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree,…
▽ More
Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than $90\%$.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
The block spectrum of RNA pseudoknot structures
Authors:
Thomas J. X. Li,
Christina S. Burris,
Christian M. Reidys
Abstract:
In this paper we analyze the length-spectrum of blocks in $γ$-structures. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time RNA folding. A $γ$-structure is constructed by nesting and concatenating specific building components having topological genus at most $γ$. A block is a substructure enclosed by crossing maximal arcs with respect t…
▽ More
In this paper we analyze the length-spectrum of blocks in $γ$-structures. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time RNA folding. A $γ$-structure is constructed by nesting and concatenating specific building components having topological genus at most $γ$. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated $γ$-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least $n-O(n^{1/2})$ and that with high probability any other block has finite length. For fixed $γ$, we prove that the length of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
The rainbow-spectrum of RNA secondary structures
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high pro…
▽ More
In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite $k$, the distribution of rainbows of length $k$, becomes for large $n$ a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Genetic robustness of let-7 miRNA sequence-structure pairs
Authors:
Qijun He,
Fenix W. Huang,
Christopher Barrett,
Christian M. Reidys
Abstract:
Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mu…
▽ More
Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mutations. In these studies, the structure is merely a reference. However, recent work revealed evidence that structure itself contributes to the genetic robustness of ncRNAs. We follow this line of thought and consider sequence-structure pairs as the unit of evolution and introduce the spectrum of inverse folding rates (IFR-spectrum) as a measurement of genetic robustness. Our analysis of the miRNA let-7 family captures key features of structure-modulated evolution and facilitates the study of robustness against multiple-point mutations.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
An efficient dual sampling algorithm with Hamming distance filtration
Authors:
Fenix W. Huang,
Qijun He,
Christopher Barrett,
Christian M. Reidys
Abstract:
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was d…
▽ More
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs, led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms.
We present here the Hamming distance filtered, dual partition function, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is $O(h^2n)$, where $h,n$ are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by $O(n^2)$. We then present two applications, the first being in the context of the evolution of natural sequence-structure pairs of microRNAs and the second constructing neutral paths. The former studies the inverse fold rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve towards higher levels of robustness, i.e.,~increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler in order to construct short paths connecting them, consisting of sequences all contained in the neutral network.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
The boundary length and point spectrum enumeration of partial chord diagrams using cut and join recursion
Authors:
Jørgen Ellegaard Andersen,
Hiroyuki Fuji,
Robert C. Penner,
Christian M. Reidys
Abstract:
We introduce the boundary length and point spectrum, as a joint generalization of the boundary length spectrum and boundary point spectrum in arXiv:1307.0967. We establish by cut-and-join methods that the number of partial chord diagrams filtered by the boundary length and point spectrum satisfies a recursion relation, which combined with an initial condition determines these numbers uniquely. Thi…
▽ More
We introduce the boundary length and point spectrum, as a joint generalization of the boundary length spectrum and boundary point spectrum in arXiv:1307.0967. We establish by cut-and-join methods that the number of partial chord diagrams filtered by the boundary length and point spectrum satisfies a recursion relation, which combined with an initial condition determines these numbers uniquely. This recursion relation is equivalent to a second order, non-linear, algebraic partial differential equation for the generating function of the numbers of partial chord diagrams filtered by the boundary length and point spectrum.
△ Less
Submitted 1 April, 2017; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Statistics of topological RNA structures
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we study properties of topological RNA structures, i.e.~RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- ,…
▽ More
In this paper we study properties of topological RNA structures, i.e.~RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- , interior- and multi-loops. We then extend this analysis to H-type pseudoknots, kissing hairpins as well as $3$-knots and compute their respective expectation values. Finally we discuss our results and put them into context with data obtained by uniform sampling structures of fixed genus.
△ Less
Submitted 22 June, 2016;
originally announced June 2016.
-
Topological language for RNA
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$}…
▽ More
In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$} acts on formal, arc-labeled RNA secondary structures, called $λ$-structures. $λ$-structures correspond one-to-one to pk-structures together with some additional information. This information consists of the specific rearrangement of the backbone, by which a pk-structure can be made cross-free. {\it RNAFeatures$^*$} is an extension of the grammar for secondary structures and employs an enhancement by labelings of the symbols as well as the production rules. We discuss how to use {\it RNAFeatures$^*$} to obtain a stochastic context-free grammar for pk-structures, using data of RNA sequences and structures. The induced grammar facilitates fast Boltzmann sampling and statistical analysis. As a first application, we present an $O(n log(n))$ runtime algorithm which samples pk-structures based on ninety tRNA sequences and structures from the Nucleic Acid Database (NDB).
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
RNA secondary structures having a compatible sequence of certain nucleotide ratios
Authors:
Christopher L. Barrett,
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of…
▽ More
Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of long sequences, a random structure a.a.s.~has compatible sequence with these ratios and outside of which a.a.s.~a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. In particular, for {\bf GC}-sequences having a ratio of {\bf G} nucleotides smaller than $1/3$, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s.~no such compatible sequence. For sequences having a ratio of {\bf G} nucleotides larger than $1/3$, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.
△ Less
Submitted 11 March, 2016;
originally announced March 2016.
-
Sequence-structure relations of biopolymers
Authors:
Christopher Barrett,
Fenix W. Huang,
Christian M. Reidys
Abstract:
Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identi…
▽ More
Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments.
△ Less
Submitted 22 August, 2016; v1 submitted 10 November, 2015;
originally announced November 2015.
-
Shapes of interacting RNA complexes
Authors:
Benjamin MingMing Fu,
Christian M. Reidys
Abstract:
Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape-projection preserves the topological core of the RNA complex and for fixed topological genus there are only finitely many such shapes.Our main result is a new bijection that relates the sha…
▽ More
Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape-projection preserves the topological core of the RNA complex and for fixed topological genus there are only finitely many such shapes.Our main result is a new bijection that relates the shapes of RNA complexes with shapes of RNA structures.This allows to compute the shape polynomial of RNA complexes via the shape polynomial of RNA structures. We furthermore present a linear time uniform sampling algorithm for shapes of RNA complexes of fixed topological genus.
△ Less
Submitted 20 May, 2014; v1 submitted 14 May, 2014;
originally announced May 2014.
-
Shapes of topological RNA structures
Authors:
Fenix W. D. Huang,
Christian M. Reidys
Abstract:
A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic o…
▽ More
A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic ordering of their half-edges. In this paper we prove a bijection of shapes of topological RNA structures. We furthermore derive a linear time algorithm generating shapes of fixed topological genus. We derive explicit expressions for the coefficients of the generating polynomial of these shapes and the generating function of RNA structures of genus $g$. Furthermore we outline how shapes can be used in order to extract essential information of RNA structure databases.
△ Less
Submitted 11 March, 2014;
originally announced March 2014.
-
Uniform generation of RNA-RNA interaction structures of fixed topological genus
Authors:
Benjamin Mingming Fu,
Hillary Siwei Han,
Christian M. Reidys
Abstract:
Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the co…
▽ More
Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the complex into a pair of single stranded RNA structures. Our main result is proved bijectively. It provides an explicit algorithm of how to rewire the corresponding complexes and an unambiguous decomposition grammar. Using the concept of genus induction, we construct bicellular maps of fixed topological genus $g$ uniformly in linear time. We present various statistics on these topological RNA complexes and compare our findings with biological complexes. Furthermore we show how to construct loop-energy based complexes using our decomposition grammar.
△ Less
Submitted 12 April, 2014; v1 submitted 4 November, 2013;
originally announced November 2013.
-
Uniform generation of RNA pseudoknot structures with genus filtration
Authors:
Fenix W. D. Huang,
Markus E. Nebel,
Christian M. Reidys
Abstract:
In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. Fo…
▽ More
In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. For this process the partition function of the energy functional has to be computed once, which has $O(n^2)$ time complexity.
△ Less
Submitted 27 April, 2013;
originally announced April 2013.
-
Enumeration of RNA complexes via random matrix theory
Authors:
Jørgen E. Andersen,
Leonid O. Chekhov,
R. C. Penner,
Christian M. Reidys,
Piotr Sułkowski
Abstract:
We review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), where s and t are respective generating parameters for the number of RNA molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological…
▽ More
We review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), where s and t are respective generating parameters for the number of RNA molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological recursion, which is a powerful new formalism arising from random matrix theory. These numbers of RNA complexes also have profound meaning in mathematics: they provide the number of chord diagrams of fixed genus with specified numbers of backbones and chords as well as the number of cells in Riemann's moduli spaces for bordered surfaces of fixed topological type.
△ Less
Submitted 6 March, 2013;
originally announced March 2013.
-
Topological recursion for chord diagrams, RNA complexes, and cells in moduli spaces
Authors:
Jørgen E. Andersen,
Leonid O. Chekhov,
R. C. Penner,
Christian M. Reidys,
Piotr Sułkowski
Abstract:
We introduce and study the Hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), which enumerates the number of linear chord diagrams of fixed genus with specified numbers of backbones generated by s and chords generated by t. For the one-cut solution, the partition function, correlators and free energies are convergent for small t and all s as a perturbation of the Gaussian potential, whic…
▽ More
We introduce and study the Hermitian matrix model with potential V(x)=x^2/2-stx/(1-tx), which enumerates the number of linear chord diagrams of fixed genus with specified numbers of backbones generated by s and chords generated by t. For the one-cut solution, the partition function, correlators and free energies are convergent for small t and all s as a perturbation of the Gaussian potential, which arises for st=0. This perturbation is computed using the formalism of the topological recursion. The corresponding enumeration of chord diagrams gives at once the number of RNA complexes of a given topology as well as the number of cells in Riemann's moduli spaces for bordered surfaces. The free energies are computed here in principle for all genera and explicitly for genera less than four.
△ Less
Submitted 3 May, 2012;
originally announced May 2012.
-
The topological filtration of $γ$-structures
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
In this paper we study $γ$-structures filtered by topological genus. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our…
▽ More
In this paper we study $γ$-structures filtered by topological genus. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our main results are the derivation of a new bivariate generating function for $γ$-structures via symbolic methods, the singularity analysis of the solutions and a central limit theorem for the distribution of topological genus in $γ$-structures of given length. In our derivation specific bivariate polynomials play a central role. Their coefficients count particular motifs of fixed topological genus and they are of relevance in the context of genus recursion and novel folding algorithms.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
Combinatorial analysis of interacting RNA molecules
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no "zig-zag" confi…
▽ More
Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no "zig-zag" configuration. This paper studies joint structures with arc-length at least four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The key idea in this paper is to consider a new type of shape, based on which joint structures can be derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number of joint structures with surprisingly small exponential growth rates. They are of interest in the context of designing prediction algorithms for RNA-RNA interactions.
△ Less
Submitted 21 June, 2010;
originally announced June 2010.
-
Combinatorics of RNA-RNA interaction
Authors:
Thomas J. X. Li,
Christian M. Reidys
Abstract:
RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free,…
▽ More
RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, and that there is no so-called ``zig-zag'' configuration. This paper presents the combinatorics of RNA interaction structures including their generating function, singularity analysis as well as explicit recurrence relations. In particular, our results imply simple asymptotic formulas for the number of joint structures.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.
-
RNA-RNA interaction prediction based on multiple sequence alignments
Authors:
Andrew X. Li,
Manja Marz,
Jing Qin,
Christian M. Reidys
Abstract:
Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglecte…
▽ More
Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{[email protected]}
△ Less
Submitted 14 July, 2010; v1 submitted 21 March, 2010;
originally announced March 2010.
-
Inverse Folding of RNA Pseudoknot Structures
Authors:
James Z. M. Gao,
Linda Y. M. Li,
Christian M. Reidys
Abstract:
Background: RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searchi…
▽ More
Background: RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, {\tt RNAinverse}, {\tt RNA-SSD} as well as {\tt INFO-RNA} are limited to RNA secondary structures, we present in this paper the inverse folding algorithm {\tt Inv} which can deal with 3-noncrossing, canonical pseudoknot structures.
Results: In this paper we present the inverse folding algorithm {\tt Inv}. We give a detailed analysis of {\tt Inv}, including pseudocodes. We show that {\tt Inv} allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures--a class which is difficult to construct via dynamic programming routines. {\tt Inv} is freely available at \url{http://www.combinatorics.cn/cbpc/inv.html}.
Conclusions: The algorithm {\tt Inv} extends inverse folding capabilities to RNA pseudoknot structures. In comparison with {\tt RNAinverse} it uses new ideas, for instance by considering sets of competing structures. As a result, {\tt Inv} is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.
△ Less
Submitted 9 March, 2010;
originally announced March 2010.
-
Target prediction and a statistical sampling algorithm for RNA-RNA interaction
Authors:
F. W. D. Huang,
J. Qin,
C. M. Reidys,
P. F. Stadler
Abstract:
It has been proven that the accessibility of the target sites has a critical influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not only the energetically most favorable targets site based on the hybrid-probability, but also a statistical sampling structure to illustrate the statistical characterization and representation of the Boltzmann ensemble of RNA-RNA interaction…
▽ More
It has been proven that the accessibility of the target sites has a critical influence for miRNA and siRNA. In this paper, we present a program, rip2.0, not only the energetically most favorable targets site based on the hybrid-probability, but also a statistical sampling structure to illustrate the statistical characterization and representation of the Boltzmann ensemble of RNA-RNA interaction structures. The outputs are retrieved via backtracing an improved dynamic programming solution for the partition function based on the approach of Huang et al. (Bioinformatics). The $O(N^6)$ time and $O(N^4)$ space algorithm is implemented in C (available from \url{http://www.combinatorics.cn/cbpc/rip2.html})
△ Less
Submitted 5 August, 2009;
originally announced August 2009.
-
Inverse folding of RNA pseudoknot structures
Authors:
James Z. M. Gao,
Linda Y. M. Li,
Christian M. Reidys
Abstract:
Background:
RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searc…
▽ More
Background:
RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, {\tt RNAinverse}, {\tt RNA-SSD} as well as {\tt INFO-RNA} are limited to RNA secondary structures, we present in this paper the inverse folding algorithm {\tt Inv} which can deal with 3-noncrossing, canonical pseudoknot structures.
Results:
In this paper we present the inverse folding algorithm {\tt Inv}. We give a detailed analysis of {\tt Inv}, including pseudocodes. We show that {\tt Inv} allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures-a class which is difficult to construct via dynamic programming routines. {\tt Inv} is freely available at \url{http://www.combinatorics.cn/cbpc/inv.html}.
Conclusions:
The algorithm {\tt Inv} extends inverse folding capabilities to RNA pseudoknot structures. In comparison with {\tt RNAinverse} it uses new ideas, for instance by considering sets of competing structures. As a result, {\tt Inv} is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.
△ Less
Submitted 10 March, 2010; v1 submitted 5 May, 2009;
originally announced May 2009.
-
Irreducibility in RNA structures
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
In this paper we study irreducibility in RNA structures. By RNA structure we mean RNA secondary as well as RNA pseudoknot structures. In our analysis we shall contrast random and minimum free energy (mfe) configurations. We compute various distributions: of the numbers of irreducible substructures, their locations and sizes, parameterized in terms of the maximal number of mutually crossing arcs,…
▽ More
In this paper we study irreducibility in RNA structures. By RNA structure we mean RNA secondary as well as RNA pseudoknot structures. In our analysis we shall contrast random and minimum free energy (mfe) configurations. We compute various distributions: of the numbers of irreducible substructures, their locations and sizes, parameterized in terms of the maximal number of mutually crossing arcs, $k-1$, and the minimal size of stacks $σ$. In particular, we analyze the size of the largest irreducible substructure for random and mfe structures, which is the key factor for the folding time of mfe configurations.
△ Less
Submitted 22 February, 2009;
originally announced February 2009.
-
On the decomposition of $k$-noncrossing RNA structures
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
An $k$-noncrossing RNA structure can be identified with an $k$-noncrossing diagram over $[n]$, which in turn corresponds to a vacillating tableaux having at most $(k-1)$ rows. In this paper we derive the limit distribution of irreducible substructures via studying their corresponding vacillating tableaux. Our main result proves, that the limit distribution of the numbers of irreducible substruct…
▽ More
An $k$-noncrossing RNA structure can be identified with an $k$-noncrossing diagram over $[n]$, which in turn corresponds to a vacillating tableaux having at most $(k-1)$ rows. In this paper we derive the limit distribution of irreducible substructures via studying their corresponding vacillating tableaux. Our main result proves, that the limit distribution of the numbers of irreducible substructures in $k$-noncrossing, $σ$-canonical RNA structures is determined by the density function of a $Γ(-\lnτ_k,2)$-distribution for some $τ_k<1$.
△ Less
Submitted 22 February, 2009;
originally announced February 2009.
-
$k$-noncrossing RNA structures with arc-length $\ge 3$
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
In this paper we enumerate $k$-noncrossing RNA pseudoknot structures with given minimum arc- and stack-length. That is, we study the numbers of RNA pseudoknot structures with arc-length $\ge 3$, stack-length $\ge σ$ and in which there are at most $k-1$ mutually crossing bonds, denoted by ${\sf T}_{k,σ}^{[3]}(n)$. In particular we prove that the numbers of 3, 4 and 5-noncrossing RNA structures wi…
▽ More
In this paper we enumerate $k$-noncrossing RNA pseudoknot structures with given minimum arc- and stack-length. That is, we study the numbers of RNA pseudoknot structures with arc-length $\ge 3$, stack-length $\ge σ$ and in which there are at most $k-1$ mutually crossing bonds, denoted by ${\sf T}_{k,σ}^{[3]}(n)$. In particular we prove that the numbers of 3, 4 and 5-noncrossing RNA structures with arc-length $\ge 3$ and stack-length $\ge 2$ satisfy ${\sf T}_{3,2}^{[3]}(n)^{}\sim K_3 n^{-5} 2.5723^n$, ${\sf T}^{[3]}_{4,2}(n)\sim K_4 n^{-{21/2}} 3.0306^n$, and ${\sf T}^{[3]}_{5,2}(n)\sim K_5 n^{-18} 3.4092^n$, respectively, where $K_3,K_4,K_5$ are constants. Our results are of importance for prediction algorithms for RNA pseudoknot structures.
△ Less
Submitted 4 December, 2007; v1 submitted 15 November, 2007;
originally announced November 2007.
-
RNA-LEGO: Combinatorial Design of Pseudoknot RNA
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
In this paper we enumerate $k$-noncrossing RNA pseudoknot structures with given minimum stack-length. We show that the numbers of $k$-noncrossing structures without isolated base pairs are significantly smaller than the number of all $k$-noncrossing structures. In particular we prove that the number of 3- and 4-noncrossing RNA structures with stack-length $\ge 2$ is for large $n$ given by…
▽ More
In this paper we enumerate $k$-noncrossing RNA pseudoknot structures with given minimum stack-length. We show that the numbers of $k$-noncrossing structures without isolated base pairs are significantly smaller than the number of all $k$-noncrossing structures. In particular we prove that the number of 3- and 4-noncrossing RNA structures with stack-length $\ge 2$ is for large $n$ given by $311.2470 \frac{4!}{n(n-1)...(n-4)}2.5881^n$ and $1.217\cdot 10^{7} n^{-{21/2}} 3.0382^n$, respectively. We furthermore show that for $k$-noncrossing RNA structures the drop in exponential growth rates between the number of all structures and the number of all structures with stack-size $\ge 2$ increases significantly. Our results are of importance for prediction algorithms for pseudoknot-RNA and provide evidence that there exist neutral networks of RNA pseudoknot structures.
△ Less
Submitted 4 December, 2007; v1 submitted 8 November, 2007;
originally announced November 2007.
-
Pseudoknot RNA Structures with Arc-Length $\ge 3$
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
In this paper we study $k$-noncrossing RNA structures with arc-length $\ge 3$, i.e. RNA molecules in which for any $i$, the nucleotides labeled $i$ and $i+j$ ($j=1,2$) cannot form a bond and in which there are at most $k-1$ mutually crossing arcs. Let ${\sf S}_{k,3}(n)$ denote their number. Based on a novel functional equation for the generating function $\sum_{n\ge 0}{\sf S}_{k,3}(n)z^n$, we de…
▽ More
In this paper we study $k$-noncrossing RNA structures with arc-length $\ge 3$, i.e. RNA molecules in which for any $i$, the nucleotides labeled $i$ and $i+j$ ($j=1,2$) cannot form a bond and in which there are at most $k-1$ mutually crossing arcs. Let ${\sf S}_{k,3}(n)$ denote their number. Based on a novel functional equation for the generating function $\sum_{n\ge 0}{\sf S}_{k,3}(n)z^n$, we derive for arbitrary $k\ge 3$ exponential growth factors and for $k=3$ the subexponential factor. Our main result is the derivation of the formula ${\sf S}_{3,3}(n) \sim \frac{6.11170\cdot 4!}{n(n-1)...(n-4)} 4.54920^n$.
△ Less
Submitted 23 August, 2007;
originally announced August 2007.
-
Central and Local Limit Theorems for RNA Structures
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
A k-noncrossing RNA pseudoknot structure is a graph over $\{1,...,n\}$ without 1-arcs, i.e. arcs of the form (i,i+1) and in which there exists no k-set of mutually intersecting arcs. In particular, RNA secondary structures are 2-noncrossing RNA structures. In this paper we prove a central and a local limit theorem for the distribution of the numbers of 3-noncrossing RNA structures over n nucleot…
▽ More
A k-noncrossing RNA pseudoknot structure is a graph over $\{1,...,n\}$ without 1-arcs, i.e. arcs of the form (i,i+1) and in which there exists no k-set of mutually intersecting arcs. In particular, RNA secondary structures are 2-noncrossing RNA structures. In this paper we prove a central and a local limit theorem for the distribution of the numbers of 3-noncrossing RNA structures over n nucleotides with exactly h bonds. We will build on the results of \cite{Reidys:07rna1} and \cite{Reidys:07rna2}, where the generating function of k-noncrossing RNA pseudoknot structures and the asymptotics for its coefficients have been derived. The results of this paper explain the findings on the numbers of arcs of RNA secondary structures obtained by molecular folding algorithms and predict the distributions for k-noncrossing RNA folding algorithms which are currently being developed.
△ Less
Submitted 31 July, 2007; v1 submitted 29 July, 2007;
originally announced July 2007.
-
Asymptotic Enumeration of RNA Structures with Pseudoknots
Authors:
Emma Y. Jin,
Christian M. Reidys
Abstract:
In this paper we present the asymptotic enumeration of RNA structures with pseudoknots. We develop a general framework for the computation of exponential growth rate and the sub exponential factors for $k$-noncrossing RNA structures. Our results are based on the generating function for the number of $k$-noncrossing RNA pseudoknot structures, ${\sf S}_k(n)$, derived in \cite{Reidys:07pseu}, where…
▽ More
In this paper we present the asymptotic enumeration of RNA structures with pseudoknots. We develop a general framework for the computation of exponential growth rate and the sub exponential factors for $k$-noncrossing RNA structures. Our results are based on the generating function for the number of $k$-noncrossing RNA pseudoknot structures, ${\sf S}_k(n)$, derived in \cite{Reidys:07pseu}, where $k-1$ denotes the maximal size of sets of mutually intersecting bonds. We prove a functional equation for the generating function $\sum_{n\ge 0}{\sf S}_k(n)z^n$ and obtain for $k=2$ and $k=3$ the analytic continuation and singular expansions, respectively. It is implicit in our results that for arbitrary $k$ singular expansions exist and via transfer theorems of analytic combinatorics we obtain asymptotic expression for the coefficients. We explicitly derive the asymptotic expressions for 2- and 3-noncrossing RNA structures. Our main result is the derivation of the formula ${\sf S}_3(n) \sim \frac{10.4724\cdot 4!}{n(n-1)...(n-4)} (\frac{5+\sqrt{21}}{2})^n$.
△ Less
Submitted 21 June, 2007;
originally announced June 2007.
-
Neutral Networks of Sequence to Shape Maps
Authors:
Emma Y. Jin,
Jing Qin,
Christian M. Reidys
Abstract:
In this paper we present a novel framework for sequence to shape maps. These combinatorial maps realize exponentially many shapes, and have preimages which contain extended connected subgraphs of diameter n (neutral networks). We prove that all basic properties of RNA folding maps also hold for combinatorial maps. Our construction is as follows: suppose we are given a graph $H$ over the…
▽ More
In this paper we present a novel framework for sequence to shape maps. These combinatorial maps realize exponentially many shapes, and have preimages which contain extended connected subgraphs of diameter n (neutral networks). We prove that all basic properties of RNA folding maps also hold for combinatorial maps. Our construction is as follows: suppose we are given a graph $H$ over the $\{1 >...,n\}$ and an alphabet of nucleotides together with a symmetric relation $\mathcal{R}$, implied by base pairing rules. Then the shape of a sequence of length n is the maximal H subgraph in which all pairs of nucleotides incident to H-edges satisfy $\mathcal{R}$. Our main result is to prove the existence of at least $\sqrt{2}^{n-1}$ shapes with extended neutral networks, i.e. shapes that have a preimage with diameter $n$ and a connected component of size at least $(\frac{1+\sqrt{5}}{2})^n+(\frac{1-\sqrt{5}}{2})^n$. Furthermore, we show that there exists a certain subset of shapes which carries a natural graph structure. In this graph any two shapes are connected by a path of shapes with respective neutral networks of distance one. We finally discuss our results and provide a comparison with RNA folding maps.
△ Less
Submitted 4 August, 2007; v1 submitted 6 June, 2007;
originally announced June 2007.