-
RNA-seq data science: From raw data to effective interpretation
Authors:
Dhrithi Deshpande,
Karishma Chhugani,
Yutong Chang,
Aaron Karlsberg,
Caitlin Loeffler,
Jinyang Zhang,
Agata Muszynska,
Jeremy Rotman,
Laura Tao,
Brunilda Balliu,
Elizabeth Tseng,
Eleazar Eskin,
Fangqing Zhao,
Pejman Mohammadi,
Pawel P Labaj,
Serghei Mangul
Abstract:
RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It gene…
▽ More
RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It generates enormous amounts of transcriptomic data in the form of nucleotide sequences, known as reads. RNA-seq analysis enables the probing of genes and corresponding transcripts which is essential for answering important biological questions, such as detecting novel exons, transcripts, gene expressions, and studying alternative splicing structure. However, obtaining meaningful biological signals from raw data using computational methods is challenging due to the limitations of modern sequencing technologies. The need to leverage these technological challenges have pushed the rapid development of many novel computational tools which have evolved and diversified in accordance with technological advancements, leading to the current myriad population of RNA-seq tools. Our review provides a systemic overview of RNA-seq technology and 235 available RNA-seq tools across various domains published from 2008 to 2020, discussing the interdisciplinary nature of bioinformatics involved in RNA sequencing, analysis, and software development.
△ Less
Submitted 16 February, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Technology dictates algorithms: Recent developments in read alignment
Authors:
Mohammed Alser,
Jeremy Rotman,
Kodi Taraszka,
Huwenbo Shi,
Pelin Icer Baykal,
Harry Taegyun Yang,
Victor Xue,
Sergey Knyazev,
Benjamin D. Singer,
Brunilda Balliu,
David Koslicki,
Pavel Skums,
Alex Zelikovsky,
Can Alkan,
Onur Mutlu,
Serghei Mangul
Abstract:
Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants…
▽ More
Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies.
△ Less
Submitted 9 July, 2020; v1 submitted 28 February, 2020;
originally announced March 2020.
-
Metagenomics for clinical diagnostics: technologies and informatics
Authors:
Caitlin Loeffler,
Keylie M. Gibson,
Lana Martin,
Liz Chang,
Jeremy Rotman,
Ian V. Toma,
Christopher E. Mason,
Eleazar Eskin,
Joseph P. Zackular,
Keith A. Crandall,
David Koslicki,
Serghei Mangul
Abstract:
The human-associated microbiome is closely tied to human health and is of substantial clinical interest. Metagenomics-based tools are emerging for clinical diagnostics, tracking the spread of diseases, and surveillance of potential pathogens. In some cases, these tools are overcoming limitations of traditional clinical approaches. Metagenomics has limitations barring the tools from clinical valida…
▽ More
The human-associated microbiome is closely tied to human health and is of substantial clinical interest. Metagenomics-based tools are emerging for clinical diagnostics, tracking the spread of diseases, and surveillance of potential pathogens. In some cases, these tools are overcoming limitations of traditional clinical approaches. Metagenomics has limitations barring the tools from clinical validation. Once these hurdles are overcome, clinical metagenomics will inform doctors of the best, targeted treatment for their patients and provide early detection of disease. Here we present an overview of metagenomics methods with a discussion of computational challenges and limitations.
△ Less
Submitted 7 August, 2020; v1 submitted 25 November, 2019;
originally announced November 2019.