Skip to main content

Showing 1–6 of 6 results for author: Goga, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.04538  [pdf, ps, other

    cs.DS

    Faster Maximal Exact Matches with Lazy LCP Evaluation

    Authors: Adrián Goga, Lore Depuydt, Nathaniel K. Brown, Jan Fostier, Travis Gagie, Gonzalo Navarro

    Abstract: MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the ope… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  2. arXiv:2308.09836  [pdf, other

    cs.DS

    Wheeler maps

    Authors: Andrej Baláz, Travis Gagie, Adrián Goga, Simon Heumos, Gonzalo Navarro, Alessia Petescia, Jouni Sirén

    Abstract: Motivated by challenges in pangenomic read alignment, we propose a generalization of Wheeler graphs that we call Wheeler maps. A Wheeler map stores a text $T[1..n]$ and an assignment of tags to the characters of $T$ such that we can preprocess a pattern $P[1..m]$ and then, given $i$ and $j$, quickly return all the distinct tags labeling the first characters of the occurrences of $P[i..j]$ in $T$.… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  3. arXiv:2212.02327  [pdf, ps, other

    cs.DS

    Space-efficient conversions from SLPs

    Authors: Travis Gagie, Adrián Goga, Artur Jeż, Gonzalo Navarro

    Abstract: We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(δ\log\frac{n}δ)$ from the SLP within $O(g+g_{lc})$ space and in… ▽ More

    Submitted 10 October, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

  4. arXiv:2209.09218  [pdf, ps, other

    cs.DS

    MARIA: Multiple-alignment $r$-index with aggregation

    Authors: Adrián Goga, Andrej Baláž, Alessia Petescia, Travis Gagie

    Abstract: There now exist compact indexes that can efficiently list all the occurrences of a pattern in a dataset consisting of thousands of genomes, or even all the occurrences of all the pattern's maximal exact matches (MEMs) with respect to the dataset. Unless we are lucky and the pattern is specific to only a few genomes, however, we could be swamped by hundreds of matches -- or even hundreds per MEM --… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  5. Prefix-free parsing for building large tunnelled Wheeler graphs

    Authors: Adrián Goga, Andrej Baláž

    Abstract: We propose a new technique for creating a space-efficient index for large repetitive text collections, such as pangenomic databases containing sequences of many individuals from the same species. We combine two recent techniques from this area: Wheeler graphs (Gagie et al., 2017) and prefix-free parsing (PFP, Boucher et al., 2019). Wheeler graphs (WGs) are a general framework encompassing several… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: 12 pages, 3 figures, 2 tables, to be published in the WABI (Workshop on Algorithms in Bioinformatics) 2022 conference proceedings

    ACM Class: E.1

  6. arXiv:2106.13649  [pdf

    q-bio.GN cs.CE

    SnakeLines: integrated set of computational pipelines for sequencing reads

    Authors: Jaroslav Budis, Werner Krampl, Marcel Kucharik, Rastislav Hekel, Adrian Goga, Michal Lichvar, David Smolak, Miroslav Bohmer, Andrej Balaz, Frantisek Duris, Juraj Gazdarica, Katarina Soltys, Jan Turna, Jan Radvanszky, Tomas Szemes

    Abstract: Background: With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilizing sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibil… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: 22 pages, 3 figures, 1 table