Skip to main content

Showing 1–8 of 8 results for author: Cunial, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:1901.10165  [pdf, other

    cs.DS

    Fully-functional bidirectional Burrows-Wheeler indexes

    Authors: Fabio Cunial, Djamal Belazzougui

    Abstract: Given a string $T$ on an alphabet of size $σ$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|\logσ)$ bits of space, and that supports the addition \emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they cou… ▽ More

    Submitted 9 June, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

  2. arXiv:1707.08197  [pdf, ps, other

    cs.DS

    Fast Label Extraction in the CDAWG

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: The compact directed acyclic word graph (CDAWG) of a string $T$ of length $n$ takes space proportional just to the number $e$ of right extensions of the maximal repeats of $T$, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which $e$ grows significantly more slowly than $n$. We reduce from $O(m\log{\log{n}})$ to $O(m)$ the tim… ▽ More

    Submitted 26 September, 2017; v1 submitted 25 July, 2017; originally announced July 2017.

    Comments: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.08640

  3. arXiv:1705.08640  [pdf, other

    cs.DS

    Representing the suffix tree with the CDAWG

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: Given a string $T$, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with $e_T$ arcs, taking overall $O(e_T+e_{\overline{T}})$ words of space, where ${\overline{T}}$ is the reverse of $T$, and supporting some key operations in time between $O(1)$ and $O(\log{\log{n}})$ in the worst case. This representation is especially appealing for highly… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: 16 pages, 1 figure. Presented at the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

  4. arXiv:1609.06378  [pdf, ps, other

    cs.DS

    Linear-time string indexing and analysis in small space

    Authors: Djamal Belazzougui, Fabio Cunial, Juha Kärkkäinen, Veli Mäkinen

    Abstract: The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array (CSA) by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input s… ▽ More

    Submitted 20 September, 2016; originally announced September 2016.

    Comments: Journal submission (52 pages, 2 figures)

  5. arXiv:1604.06002  [pdf, other

    cs.DS

    Practical combinations of repetition-aware data structures

    Authors: Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, Mathieu Raffinot

    Abstract: Highly-repetitive collections of strings are increasingly being amassed by genome sequencing and genetic variation experiments, as well as by storing all versions of human-generated files, like webpages and source code. Existing indexes for locating all the exact occurrences of a pattern in a highly-repetitive string take advantage of a single measure of repetition. However, multiple, distinct mea… ▽ More

    Submitted 21 April, 2016; v1 submitted 20 April, 2016; originally announced April 2016.

    Comments: arXiv admin note: text overlap with arXiv:1502.05937

  6. arXiv:1508.02968  [pdf, other

    cs.DS

    Space-efficient detection of unusual words

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: Detecting all the strings that occur in a text more frequently or less frequently than expected according to an IID or a Markov model is a basic problem in string mining, yet current algorithms are based on data structures that are either space-inefficient or incur large slowdowns, and current implementations cannot scale to genomes or metagenomes in practice. In this paper we engineer an algorith… ▽ More

    Submitted 12 August, 2015; originally announced August 2015.

    Comments: arXiv admin note: text overlap with arXiv:1502.06370

  7. arXiv:1502.06370  [pdf, ps, other

    cs.DS

    A framework for space-efficient string kernels

    Authors: Djamal Belazzougui, Fabio Cunial

    Abstract: String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the $k$-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels w… ▽ More

    Submitted 23 February, 2015; originally announced February 2015.

  8. arXiv:1502.05937  [pdf, other

    cs.DS

    Composite repetition-aware data structures

    Authors: Djamal Belazzougui, Fabio Cunial, Travis Gagie, Nicola Prezza, Mathieu Raffinot

    Abstract: In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for… ▽ More

    Submitted 23 February, 2015; v1 submitted 20 February, 2015; originally announced February 2015.

    Comments: (the name of the third co-author was inadvertently omitted from previous version)