-
Probabilistic Models of k-mer Frequencies (Extended Abstract)
Authors:
Askar Gafurov,
Tomáš Vinař,
Broňa Brejová
Abstract:
In this article, we review existing probabilistic models for modeling abundance of fixed-length strings (k-mers) in DNA sequencing data. These models capture dependence of the abundance on various phenomena, such as the size and repeat content of the genome, heterozygosity levels, and sequencing error rate. This in turn allows to estimate these properties from k-mer abundance histograms observed i…
▽ More
In this article, we review existing probabilistic models for modeling abundance of fixed-length strings (k-mers) in DNA sequencing data. These models capture dependence of the abundance on various phenomena, such as the size and repeat content of the genome, heterozygosity levels, and sequencing error rate. This in turn allows to estimate these properties from k-mer abundance histograms observed in real data. We also briefly discuss the issue of comparing k-mer abundance between related sequencing samples and meaningfully summarizing the results.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Technical comment on The impact of population-wide rapid antigen testing on SARS-CoV-2 prevalence in Slovakia
Authors:
Matus Medo,
Martin Suster,
Katarina Bodova,
Alexandra Brazinova,
Brona Brejova,
Richard Kollar,
Vladimir Leksa,
Jana Lindbloom,
Jozef Nosek,
Tomas Vinar
Abstract:
Pavelka et al. (Science, Reports, 7 May 2021) claim that a single round of population-wide antigen testing in Slovakia reduced the observed COVID-19 prevalence by 58%, and that it played a substantial role in curbing the pandemic. We argue that this estimate, which is based on incorrect assumptions, is exaggerated, and that the relief was short-lived with little effect on mitigating the pandemic.
Pavelka et al. (Science, Reports, 7 May 2021) claim that a single round of population-wide antigen testing in Slovakia reduced the observed COVID-19 prevalence by 58%, and that it played a substantial role in curbing the pandemic. We argue that this estimate, which is based on incorrect assumptions, is exaggerated, and that the relief was short-lived with little effect on mitigating the pandemic.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Dynamic Pooling Improves Nanopore Base Calling Accuracy
Authors:
Vladimír Boža,
Peter Perešíni,
Broňa Brejová,
Tomáš Vinař
Abstract:
In nanopore sequencing, electrical signal is measured as DNA molecules pass through the sequencing pores. Translating these signals into DNA bases (base calling) is a highly non-trivial task, and its quality has a large impact on the sequencing accuracy. The most successful nanopore base callers to date use convolutional neural networks (CNN) to accomplish the task.
Convolutional layers in CNNs…
▽ More
In nanopore sequencing, electrical signal is measured as DNA molecules pass through the sequencing pores. Translating these signals into DNA bases (base calling) is a highly non-trivial task, and its quality has a large impact on the sequencing accuracy. The most successful nanopore base callers to date use convolutional neural networks (CNN) to accomplish the task.
Convolutional layers in CNNs are typically composed of filters with constant window size, performing best in analysis of signals with uniform speed. However, the speed of nanopore sequencing varies greatly both within reads and between sequencing runs. Here, we present dynamic pooling, a novel neural network component, which addresses this problem by adaptively adjusting the pooling ratio. To demonstrate the usefulness of dynamic pooling, we developed two base callers: Heron and Osprey. Heron improves the accuracy beyond the experimental high-accuracy base caller Bonito developed by Oxford Nanopore. Osprey is a fast base caller that can compete in accuracy with Guppy high-accuracy mode, but does not require GPU acceleration and achieves a near real-time speed on common desktop CPUs.
Availability: https://github.com/fmfi-compbio/osprey, https://github.com/fmfi-compbio/heron
Keywords: nanopore sequencing, base calling, convolutional neural networks, pooling
△ Less
Submitted 16 May, 2021;
originally announced May 2021.
-
B.1.258$Δ$, a SARS-CoV-2 variant with $Δ$H69/$Δ$V70 in the Spike protein circulating in the Czech Republic and Slovakia
Authors:
Broňa Brejová,
Viktória Hodorová,
Kristína Boršová,
Viktória Čabanová,
Lenka Reizigová,
Evan D. Paul,
Pavol Čekan,
Boris Klempa,
Jozef Nosek,
Tomáš Vinař
Abstract:
SARS-CoV-2 mutants carrying the $Δ$H69/$Δ$V70 deletion in the amino terminal domain of the Spike protein emerged independently in at least six lineages of the virus (namely, B.1.1.7, B.1.1.298, B.1.160, B.1.177, B.1.258, B.1.375). Routine RT-qPCR tests including TaqPath or similar assays based on a drop-out of the Spike gene target are incapable of distinguishing among these lineages and often lea…
▽ More
SARS-CoV-2 mutants carrying the $Δ$H69/$Δ$V70 deletion in the amino terminal domain of the Spike protein emerged independently in at least six lineages of the virus (namely, B.1.1.7, B.1.1.298, B.1.160, B.1.177, B.1.258, B.1.375). Routine RT-qPCR tests including TaqPath or similar assays based on a drop-out of the Spike gene target are incapable of distinguishing among these lineages and often lead to the false conclusion that clinical samples contain the B.1.1.7 variant, which recently emerged in the United Kingdom and is quickly spreading through the human population. We analyzed SARS-CoV-2 samples collected from various regions of Slovakia between November and December 2020 that were presumed to contain the B.1.1.7 variant due to traveling history of the virus carriers or their contacts. Sequencing of these isolates revealed that although in some cases the samples were indeed confirmed as B.1.1.7, a substantial fraction of isolates contained another $Δ$H69/$Δ$V70 carrying mutant belonging to the lineage B.1.258, which has been circulating in Central Europe since August 2020, long before the import of B.1.1.7. Phylogenetic analysis shows that the early sublineage of B.1.258 acquired the N439K substitution in the receptor binding domain (RBD) of the Spike protein and, later on, also the deletion $Δ$H69/$Δ$V70 in the Spike N-terminal domain (NTD). This variant is particularly common in several European countries including Czech Republic and Slovakia, and we propose to name it B.1.258$Δ$.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Nanopore Base Calling on the Edge
Authors:
Peter Perešíni,
Vladimír Boža,
Broňa Brejová,
Tomáš Vinař
Abstract:
We developed a new base caller DeepNano-coral for nanopore sequencing, which is optimized to run on the Coral Edge Tensor Processing Unit, a small USB-attached hardware accelerator. To achieve this goal, we have designed new versions of two key components used in convolutional neural networks for speech recognition and base calling. In our components, we propose a new way of factorization of a ful…
▽ More
We developed a new base caller DeepNano-coral for nanopore sequencing, which is optimized to run on the Coral Edge Tensor Processing Unit, a small USB-attached hardware accelerator. To achieve this goal, we have designed new versions of two key components used in convolutional neural networks for speech recognition and base calling. In our components, we propose a new way of factorization of a full convolution into smaller operations, which decreases memory access operations, memory access being a bottleneck on this device. DeepNano-coral achieves real-time base calling during sequencing with the accuracy slightly better than the fast mode of the Guppy base caller and is extremely energy efficient, using only 10W of power. Availability: https://github.com/fmfi-compbio/coral-basecaller
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Improving Nanopore Reads Raw Signal Alignment
Authors:
Vladimír Boža,
Broňa Brejová,
Tomáš Vinař
Abstract:
We investigate usage of dynamic time warping (DTW) algorithm for aligning raw signal data from MinION sequencer. DTW is mostly using for fast alignment for selective sequencing to quickly determine whether a read comes from sequence of interest.
We show that standard usage of DTW has low discriminative power mainly due to problem with accurate estimation of scaling parameters. We propose a simpl…
▽ More
We investigate usage of dynamic time warping (DTW) algorithm for aligning raw signal data from MinION sequencer. DTW is mostly using for fast alignment for selective sequencing to quickly determine whether a read comes from sequence of interest.
We show that standard usage of DTW has low discriminative power mainly due to problem with accurate estimation of scaling parameters. We propose a simple variation of DTW algorithm, which does not suffer from scaling problems and has much higher discriminative power.
△ Less
Submitted 3 May, 2017;
originally announced May 2017.
-
Using Sequence Ensembles for Seeding Alignments of MinION Sequencing Data
Authors:
Rastislav Rabatin,
Broňa Brejová,
Tomáš Vinař
Abstract:
Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error r…
▽ More
Oxford Nanopore MinION sequencer is currently the smallest sequencing device available. While being able to produce very long reads (reads of up to 100~kbp were reported), it is prone to high sequencing error rates of up to 30%. Since most of these errors are insertions or deletions, it is very difficult to adapt popular seed-based algorithms designed for aligning data sets with much lower error rates.
Base calling of MinION reads is typically done using hidden Markov models. In this paper, we propose to represent each sequencing read by an ensemble of sequences sampled from such a probabilistic model. This approach can improve the sensitivity and false positive rate of seeding an alignment compared to using a single representative base call sequence for each read.
△ Less
Submitted 28 June, 2016;
originally announced June 2016.
-
DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads
Authors:
Vladimír Boža,
Broňa Brejová,
Tomáš Vinař
Abstract:
Motivation: The MinION device by Oxford Nanopore is the first portable sequencing device. MinION is able to produce very long reads (reads over 100~kBp were reported), however it suffers from high sequencing error rate. In this paper, we show that the error rate can be reduced by improving the base calling process.
Results: We present the first open-source DNA base caller for the MinION sequenci…
▽ More
Motivation: The MinION device by Oxford Nanopore is the first portable sequencing device. MinION is able to produce very long reads (reads over 100~kBp were reported), however it suffers from high sequencing error rate. In this paper, we show that the error rate can be reduced by improving the base calling process.
Results: We present the first open-source DNA base caller for the MinION sequencing platform by Oxford Nanopore. By employing carefully crafted recurrent neural networks, our tool improves the base calling accuracy compared to the default base caller supplied by the manufacturer. This advance may further enhance applicability of MinION for genome sequencing and various clinical applications.
Availability: DeepNano can be downloaded at http://compbio.fmph.uniba.sk/deepnano/.
Contact: [email protected]
△ Less
Submitted 30 March, 2016;
originally announced March 2016.
-
Probabilistic Approaches to Alignment with Tandem Repeats
Authors:
Michal Nánási,
Tomáš Vinař,
Broňa Brejová
Abstract:
We propose a simple tractable pair hidden Markov model for pairwise sequence alignment that accounts for the presence of short tandem repeats. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe the resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms specialized…
▽ More
We propose a simple tractable pair hidden Markov model for pairwise sequence alignment that accounts for the presence of short tandem repeats. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe the resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms specialized for our model. We compare the accuracy of individual decoding algorithms on simulated data and find our approach superior to the classical three-state pair HMM in simulations.
△ Less
Submitted 30 July, 2013;
originally announced July 2013.
-
A New Approach to the Small Phylogeny Problem
Authors:
Jakub Kováč,
Broňa Brejová,
Tomáš Vinař
Abstract:
In the small phylogeny problem we, are given a phylogenetic tree and gene orders of the extant species and our goal is to reconstruct all of the ancestral genomes so that the number of evolutionary operations is minimized. Algorithms for reconstructing evolutionary history from gene orders are usually based on repeatedly computing medians of genomes at neighbouring vertices of the tree. We propose…
▽ More
In the small phylogeny problem we, are given a phylogenetic tree and gene orders of the extant species and our goal is to reconstruct all of the ancestral genomes so that the number of evolutionary operations is minimized. Algorithms for reconstructing evolutionary history from gene orders are usually based on repeatedly computing medians of genomes at neighbouring vertices of the tree. We propose a new, more general approach, based on an iterative local optimization procedure. In each step, we propose candidates for ancestral genomes and choose the best ones by dynamic programming. We have implemented our method and used it to reconstruct evolutionary history of 16 yeast mtDNAs and 13 Campanulaceae cpDNAs.
△ Less
Submitted 4 December, 2010;
originally announced December 2010.
-
The Highest Expected Reward Decoding for HMMs with Application to Recombination Detection
Authors:
Michal Nánási,
Tomáš Vinař,
Broňa Brejová
Abstract:
Hidden Markov models are traditionally decoded by the Viterbi algorithm which finds the highest probability state path in the model. In recent years, several limitations of the Viterbi decoding have been demonstrated, and new algorithms have been developed to address them \citep{Kall2005,Brejova2007,Gross2007,Brown2010}.
In this paper, we propose a new efficient highest expected reward decodin…
▽ More
Hidden Markov models are traditionally decoded by the Viterbi algorithm which finds the highest probability state path in the model. In recent years, several limitations of the Viterbi decoding have been demonstrated, and new algorithms have been developed to address them \citep{Kall2005,Brejova2007,Gross2007,Brown2010}.
In this paper, we propose a new efficient highest expected reward decoding algorithm (HERD) that allows for uncertainty in boundaries of individual sequence features. We demonstrate usefulness of our approach on jumping HMMs for recombination detection in viral genomes.
△ Less
Submitted 25 January, 2010;
originally announced January 2010.