Conservation and losses of avian non-coding RNA loci
Authors:
Paul P. Gardner,
Mario Fasold,
Sarah W. Burge,
Maria Ninova,
Jana Hertel,
Stephanie Kehr,
Tammy E. Steeves,
Sam Griffiths-Jones,
Peter F. Stadler
Abstract:
Here we present the results of a large-scale bioinformatic annotation of non-coding RNA loci in 48 avian genomes. Our approach uses probabilistic models of hand-curated families from the Rfam database to infer conserved RNA families within each avian genome. We supplement these annotations with predictions from the tRNA annotation tool, tRNAscan-SE and microRNAs from miRBase. We show that a number…
▽ More
Here we present the results of a large-scale bioinformatic annotation of non-coding RNA loci in 48 avian genomes. Our approach uses probabilistic models of hand-curated families from the Rfam database to infer conserved RNA families within each avian genome. We supplement these annotations with predictions from the tRNA annotation tool, tRNAscan-SE and microRNAs from miRBase. We show that a number of lncRNA-associated loci are conserved between birds and mammals, including several intriguing cases where the reported mammalian lncRNA function is not conserved in birds. We also demonstrate extensive conservation of classical ncRNAs (e.g., tRNAs) and more recently discovered ncRNAs (e.g., snoRNAs and miRNAs) in birds. Furthermore, we describe numerous "losses" of several RNA families, and attribute these to genuine loss, divergence or missing data. In particular, we show that many of these losses are due to the challenges associated with assembling Avian microchromosomes. These combined results illustrate the utility of applying homology-based methods for annotating novel vertebrate genomes.
△ Less
Submitted 27 June, 2014;
originally announced June 2014.
Studying RNA homology and conservation with Infernal: from single sequences to RNA families
Authors:
Lars Barquist,
Sarah W. Burge,
Paul P. Gardner
Abstract:
Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gath…
▽ More
Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remains difficult. This protocol introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs, then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process.
△ Less
Submitted 26 January, 2016; v1 submitted 18 June, 2012;
originally announced June 2012.