-
Evidence-based gene models for structural and functional annotations of the oil palm genome
Authors:
Chan Kuang Lim,
Tatiana V. Tatarinova,
Rozana Rosli,
Nadzirah Amiruddin,
Norazah Azizi,
Mohd Amin Ab Halim,
Nik Shazana Nik Mohd Sanusi,
Jayanthi Nagappan,
Petr Ponomarenko,
Martin Triska,
Victor Solovyev,
Mohd Firdaus-Raih,
Ravigadevi Sambanthamurthi,
Denis Murphy,
Leslie Low Eng Ti
Abstract:
The advent of rapid and inexpensive DNA sequencing has led to an explosion of data waiting to be transformed into knowledge about genome organization and function. Gene prediction is customarily the starting point for genome analysis. This paper presents a bioinformatics study of the oil palm genome, including comparative genomics analysis, database and tools development, and mining of biological…
▽ More
The advent of rapid and inexpensive DNA sequencing has led to an explosion of data waiting to be transformed into knowledge about genome organization and function. Gene prediction is customarily the starting point for genome analysis. This paper presents a bioinformatics study of the oil palm genome, including comparative genomics analysis, database and tools development, and mining of biological data for genes of interest. We have annotated 26,059 oil palm genes integrated from two independent gene-prediction pipelines, Fgenesh++ and Seqping. This integrated annotation constitutes a significant improvement in comparison to the preliminary annotation published in 2013. We conducted a comprehensive analysis of intronless, resistance and fatty acid biosynthesis genes, and demonstrated that the high quality of the current genome annotation. 3,658 intronless genes were identified in the oil palm genome, an important resource for evolutionary study. Further analysis of the oil palm genes revealed 210 candidate resistance genes involved in pathogen defense. Fatty acids have diverse applications ranging from food to industrial feedstocks, and we identified 42 key genes involved in fatty acid biosynthesis in oil palm. These results provide an important resource for studies of plant genomes and a theoretical foundation for marker-assisted breeding of oil palm and related crops.
△ Less
Submitted 5 April, 2017; v1 submitted 23 February, 2017;
originally announced February 2017.
-
The mysterious orphans of Mycoplasmataceae
Authors:
Tatiana V. Tatarinova,
Inna Lysnyansky,
Yuri V. Nikolsky,
Alexander Bolshoy
Abstract:
Background: The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on average, short…
▽ More
Background: The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on average, shorter than those with homologs. Most experts now agree that the length distributions are distinctly different between protein sequences with and without homologs in bacterial and archaeal genomes. In this study, we examine this postulate by a comprehensive analysis of all annotated prokaryotic genomes and focusing on certain exceptions.
Results: We compared lengths distributions of having homologs proteins (HHPs) and non-having homologs proteins (orphans or ORFans) in all currently annotated completely sequenced prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed, are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80% of atypical genomes.
Conclusions: We confirmed on a ubiquitous set of genomes the previous observation that HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon.
△ Less
Submitted 24 August, 2015;
originally announced August 2015.
-
Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry
Authors:
Pavel Flegontov,
Piya Changmai,
Anastassiya Zidkova,
Maria D. Logacheva,
Olga Flegontova,
Mikhail S. Gelfand,
Evgeny S. Gerasimov,
Ekaterina E. Khrameeva,
Olga P. Konovalova,
Tatiana Neretina,
Yuri V. Nikolsky,
George Starostin,
Vita V. Stepanova,
Igor V. Travinsky,
Martin Tříska,
Petr Tříska,
Tatiana V. Tatarinova
Abstract:
The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes using original data from 46 unrelated samples of Ket…
▽ More
The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, determined mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals. We established that the Kets belong to the cluster of Siberian populations related to Paleo-Eskimos. Unlike other members of this cluster (Nganasans, Ulchi, Yukaghirs, and Evens), Kets and closely related Selkups have a high degree of Mal'ta ancestry. Implications of these findings for the linguistic hypothesis uniting Ket and Na-Dene languages into a language macrofamily are discussed.
△ Less
Submitted 12 August, 2015;
originally announced August 2015.
-
benchNGS : An approach to benchmark short reads alignment tools
Authors:
Farzana Rahman,
Mehedi Hassan,
Alona Kryshchenko,
Inna Dubchak,
Tatiana V. Tatarinova,
Nickolai Alexandrov
Abstract:
In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when the NGS reads are quite different from the available reference genome. We propose a benchmark to assess accuracy of short reads mapping based on the pre-computed…
▽ More
In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when the NGS reads are quite different from the available reference genome. We propose a benchmark to assess accuracy of short reads mapping based on the pre-computed global alignment of related genome sequences.
In this paper we propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. We outline the method and also present a short report of an experiment performed on five popular alignment tools based on the pairwise alignments of Escherichia coli O6 CFT073 genome with genomes of seven other bacteria.
△ Less
Submitted 24 April, 2015;
originally announced April 2015.
-
An extended reply to Mendez et al.: The 'extremely ancient' chromosome that still isn't
Authors:
Eran Elhaik,
Tatiana V. Tatarinova,
Anatole A. Klyosov,
Dan Graur
Abstract:
Earlier this year, we published a scathing critique of a paper by Mendez et al. (2013) in which the claim was made that a Y chromosome was 237,000-581,000 years old. Elhaik et al. (2014) also attacked a popular article in Scientific American by the senior author of Mendez et al. (2013), whose title was "Sex with other human species might have been the secret of Homo sapiens's [sic] success" (Hamme…
▽ More
Earlier this year, we published a scathing critique of a paper by Mendez et al. (2013) in which the claim was made that a Y chromosome was 237,000-581,000 years old. Elhaik et al. (2014) also attacked a popular article in Scientific American by the senior author of Mendez et al. (2013), whose title was "Sex with other human species might have been the secret of Homo sapiens's [sic] success" (Hammer 2013). Five of the 11 authors of Mendez et al. (2013) have now written a "rebuttal," and we were allowed to reply. Unfortunately, our reply was censored for being "too sarcastic and inflamed." References were removed, meanings were castrated, and a dedication in the Acknowledgments was deleted. Now, that the so-called rebuttal by 45% of the authors of Mendez et al. (2013) has been published together with our vasectomized reply, we decided to make public our entire reply to the so called "rebuttal." In fact, we go one step further, and publish a version of the reply that has not even been self-censored.
△ Less
Submitted 20 October, 2014; v1 submitted 15 October, 2014;
originally announced October 2014.