-
Genomic reproducibility in the bioinformatics era
Authors:
Pelin Icer Baykal,
Paweł P. Łabaj,
Florian Markowetz,
Lynn M. Schriml,
Daniel J. Stekhoven,
Serghei Mangul,
Niko Beerenwinkel
Abstract:
In biomedical research, validation of a new scientific discovery is tied to the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility still remain imprecise. Here, we argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent genomics results across technical replicates, is key to gener…
▽ More
In biomedical research, validation of a new scientific discovery is tied to the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility still remain imprecise. Here, we argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent genomics results across technical replicates, is key to generating scientific knowledge and enabling medical applications. We first discuss different concepts of reproducibility and then focus on reproducibility in the context of genomics, aiming to establish clear definitions of relevant terms. We then focus on the role of bioinformatics tools and their impact on genomic reproducibility and assess methods of evaluating bioinformatics tools in terms of genomic reproducibility. Lastly, we suggest best practices for enhancing genomic reproducibility, with an emphasis on assessing the performance of bioinformatics tools through rigorous testing across multiple technical replicates.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
RNA-seq data science: From raw data to effective interpretation
Authors:
Dhrithi Deshpande,
Karishma Chhugani,
Yutong Chang,
Aaron Karlsberg,
Caitlin Loeffler,
Jinyang Zhang,
Agata Muszynska,
Jeremy Rotman,
Laura Tao,
Brunilda Balliu,
Elizabeth Tseng,
Eleazar Eskin,
Fangqing Zhao,
Pejman Mohammadi,
Pawel P Labaj,
Serghei Mangul
Abstract:
RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It gene…
▽ More
RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It generates enormous amounts of transcriptomic data in the form of nucleotide sequences, known as reads. RNA-seq analysis enables the probing of genes and corresponding transcripts which is essential for answering important biological questions, such as detecting novel exons, transcripts, gene expressions, and studying alternative splicing structure. However, obtaining meaningful biological signals from raw data using computational methods is challenging due to the limitations of modern sequencing technologies. The need to leverage these technological challenges have pushed the rapid development of many novel computational tools which have evolved and diversified in accordance with technological advancements, leading to the current myriad population of RNA-seq tools. Our review provides a systemic overview of RNA-seq technology and 235 available RNA-seq tools across various domains published from 2008 to 2020, discussing the interdisciplinary nature of bioinformatics involved in RNA sequencing, analysis, and software development.
△ Less
Submitted 16 February, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Assessing Technical Performance in Differential Gene Expression Experiments with External Spike-in RNA Control Ratio Mixtures
Authors:
Sarah A. Munro,
Steve P. Lund,
P. Scott Pine,
Hans Binder,
Djork-Arné Clevert,
Ana Conesa,
Joaquin Dopazo,
Mario Fasold,
Sepp Hochreiter,
Huixiao Hong,
Nederah Jafari,
David P. Kreil,
Paweł P. Łabaj,
Sheng Li,
Yang Liao,
Simon Lin,
Joseph Meehan,
Christopher E. Mason,
Javier Santoyo,
Robert A. Setterquist,
Leming Shi,
Wei Shi,
Gordon K. Smyth,
Nancy Stralis-Pavese,
Zhenqiang Su
, et al. (8 additional authors not shown)
Abstract:
There is a critical need for standard approaches to assess, report, and compare the technical performance of genome-scale differential gene expression experiments. We assess technical performance with a proposed "standard" dashboard of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagn…
▽ More
There is a critical need for standard approaches to assess, report, and compare the technical performance of genome-scale differential gene expression experiments. We assess technical performance with a proposed "standard" dashboard of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagnostic performance of differentially expressed transcript lists, limit of detection of ratio (LODR) estimates, and expression ratio variability and measurement bias. The performance metrics suite is applicable to analysis of a typical experiment, and here we also apply these metrics to evaluate technical performance among laboratories. An interlaboratory study using identical samples shared amongst 12 laboratories with three different measurement processes demonstrated generally consistent diagnostic power across 11 laboratories. Ratio measurement variability and bias were also comparable amongst laboratories for the same measurement process. Different biases were observed for measurement processes using different mRNA enrichment protocols.
△ Less
Submitted 18 June, 2014;
originally announced June 2014.