-
Variant tolerant read mapping using min-hashing
Authors:
Jens Quedenfeld,
Sven Rahmann
Abstract:
DNA read mapping is a ubiquitous task in bioinformatics, and many tools have been developed to solve the read mapping problem. However, there are two trends that are changing the landscape of readmapping: First, new sequencing technologies provide very long reads with high error rates (up to 15%). Second, many genetic variants in the population are known, so the reference genome is not considered…
▽ More
DNA read mapping is a ubiquitous task in bioinformatics, and many tools have been developed to solve the read mapping problem. However, there are two trends that are changing the landscape of readmapping: First, new sequencing technologies provide very long reads with high error rates (up to 15%). Second, many genetic variants in the population are known, so the reference genome is not considered as a single string over ACGT, but as a complex object containing these variants. Most existing read mappers do not handle these new circumstances appropriately.
We introduce a new read mapper prototype called VATRAM that considers variants. It is based on Min-Hashing of q-gram sets of reference genome windows. Min-Hashing is one form of locality sensitive hashing. The variants are directly inserted into VATRAMs index which leads to a fast mapping process. Our results show that VATRAM achieves better precision and recall than state-of-the-art read mappers like BWA under certain cirumstances. VATRAM is open source and can be accessed at https://bitbucket.org/Quedenfeld/vatram-src/.
△ Less
Submitted 8 February, 2017; v1 submitted 6 February, 2017;
originally announced February 2017.
-
Massively parallel read mapping on GPUs with PEANUT
Authors:
Johannes Köster,
Sven Rahmann
Abstract:
We present PEANUT (ParallEl AligNment UTility), a highly parallel GPU-based read mapper with several distinguishing features, including a novel q-gram index (called the q-group index) with small memory footprint built on-the-fly over the reads and the possibility to output both the best hits or all hits of a read. Designing the algorithm particularly for the GPU architecture, we were able to reach…
▽ More
We present PEANUT (ParallEl AligNment UTility), a highly parallel GPU-based read mapper with several distinguishing features, including a novel q-gram index (called the q-group index) with small memory footprint built on-the-fly over the reads and the possibility to output both the best hits or all hits of a read. Designing the algorithm particularly for the GPU architecture, we were able to reach maximum core occupancy for several key steps. Our benchmarks show that PEANUT outperforms other state-of- the-art mappers in terms of speed and sensitivity. The software is available at http://peanut.readthedocs.org.
△ Less
Submitted 7 March, 2014;
originally announced March 2014.
-
Protein Hypernetworks: a Logic Framework for Interaction Dependencies and Perturbation Effects in Protein Networks
Authors:
Johannes Köster,
Eli Zamir,
Sven Rahmann
Abstract:
Motivation: Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of such systems emerge not from the protein interactions themselves but from the dependencies between these interactions. Therefore, a comprehensive approach for integrating and using information about such dependencies is required. Result…
▽ More
Motivation: Protein interactions are fundamental building blocks of biochemical reaction systems underlying cellular functions. The complexity and functionality of such systems emerge not from the protein interactions themselves but from the dependencies between these interactions. Therefore, a comprehensive approach for integrating and using information about such dependencies is required. Results: We present an approach for endowing protein networks with interaction dependencies using propositional logic, thereby obtaining protein hypernetworks. First we demonstrate how this framework straightforwardly improves the prediction of protein complexes. Next we show that modeling protein perturbations in hypernetworks, rather than in networks, allows to better infer the functional necessity of proteins for yeast. Furthermore, hypernetworks improve the prediction of synthetic lethal interactions in yeast, indicating their capability to capture high-order functional relations between proteins. Conclusion: Protein hypernetworks are a consistent formal framework for modeling dependencies between protein interactions within protein networks. First applications of protein hypernetworks on the yeast interactome indicate their value for inferring functional features of complex biochemical systems.
△ Less
Submitted 13 June, 2011;
originally announced June 2011.
-
Probabilistic Arithmetic Automata and their Applications
Authors:
Tobias Marschall,
Inke Herms,
Hans-Michael Kaltenbach,
Sven Rahmann
Abstract:
We present probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. Here, we present five d…
▽ More
We present probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. Here, we present five different applications, namely (1) pattern matching statistics on random texts, including the computation of the distribution of occurrence counts, waiting time and clump size under HMM background models; (2) exact analysis of window-based pattern matching algorithms; (3) sensitivity of filtration seeds used to detect candidate sequence alignments; (4) length and mass statistics of peptide fragments resulting from enzymatic cleavage reactions; and (5) read length statistics of 454 sequencing reads. The diversity of these applications indicates the flexibility and unifying character of the presented framework.
While the construction of a PAA depends on the particular application, we single out a frequently applicable construction method for pattern statistics: We introduce deterministic arithmetic automata (DAAs) to model deterministic calculations on sequences, and demonstrate how to construct a PAA from a given DAA and a finite-memory random text model. We show how to transform a finite automaton into a DAA and then into the corresponding PAA.
△ Less
Submitted 26 November, 2010;
originally announced November 2010.