Skip to main content

Showing 1–12 of 12 results for author: Pisanti, N

.
  1. arXiv:2506.01092  [pdf, ps, other

    cs.DS

    BWT for string collections

    Authors: Davide Cenzato, Zsuzsanna Lipták, Nadia Pisanti, Giovanna Rosone, Marinella Sciortino

    Abstract: We survey the different methods used for extending the BWT to collections of strings, following largely [Cenzato and Lipták, CPM 2022, Bioinformatics 2024]. We analyze the specific aspects and combinatorial properties of the resulting BWT variants and give a categorization of publicly available tools for computing the BWT of string collections. We show how the specific method used impacts on the r… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 29 pages, 2 figures, 7 tables

  2. arXiv:2504.05917  [pdf, other

    cs.DS cs.DB

    Indexing Strings with Utilities

    Authors: Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Veronica Guerrini, Grigorios Loukides, Nadia Pisanti, and Solon P. Pissis

    Abstract: Applications in domains ranging from bioinformatics to advertising feature strings that come with numerical scores (utilities). The utilities quantify the importance, interest, profit, or risk of the letters occurring at every position of a string. Motivated by the ever-increasing rate of generating such data, as well as by their importance in several domains, we introduce Useful String Indexing (… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: ICDE 2025 (abstract abridged to satisfy arXiv requirements)

  3. arXiv:2411.07782  [pdf, other

    cs.DS

    Elastic-Degenerate String Comparison

    Authors: Esteban Gabory, Moses Njagi Mwaniki, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski, Michelle Sweering, Wiktor Zuba

    Abstract: An elastic-degenerate (ED) string $T$ is a sequence of $n$ sets $T[1],\ldots,T[n]$ containing $m$ strings in total whose cumulative length is $N$. We call $n$, $m$, and $N$ the length, the cardinality and the size of $T$, respectively. The language of $T$ is defined as $L(T)=\{S_1 \cdots S_n\,:\,S_i \in T[i]$ for all $i\in[1,n]\}$. ED strings have been introduced to represent a set of closely-rela… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  4. arXiv:2410.20932  [pdf, other

    cs.DS q-bio.GN

    Popping Bubbles in Pangenome Graphs

    Authors: Njagi Mwaniki, Erik Garrison, Nadia Pisanti

    Abstract: In this paper, we introduce flubbles, a new definition of "bubbles" corresponding to variants in a (pan)genome graph $G$. We then show a characterization for flubbles in terms of equivalence classes regarding cycles in an intermediate data structure we built from the spanning tree of the $G$, which leads us to a linear time and space solution for finding all flubbles. Furthermore, we show how a re… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  5. arXiv:2206.03242  [pdf, other

    cs.DS q-bio.GN

    Fast Exact String to D-Texts Alignments

    Authors: Njagi Moses Mwaniki, Erik Garrison, Nadia Pisanti

    Abstract: In recent years, aligning a sequence to a pangenome has become a central problem in genomics and pangenomics. A fast and accurate solution to this problem can serve as a toolkit to many crucial tasks such as read-correction, Multiple Sequences Alignment (MSA), genome assemblies, variant calling, just to name a few. In this paper we propose a new, fast and exact method to align a string to a D-stri… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  6. arXiv:2006.16137  [pdf, other

    cs.DS

    Pattern Masking for Dictionary Matching

    Authors: Panagiotis Charalampopoulos, Huiping Chen, Peter Christen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski

    Abstract: In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary $\mathcal{D}$ of $d$ strings, each of length $\ell$, a query string $q$ of length $\ell$, and a positive integer $z$, and we are asked to compute a smallest set $K\subseteq\{1,\ldots,\ell\}$, so that if $q[i]$, for all $i\in K$, is replaced by a wildcard, then $q$ matches at least $z$ strings from… ▽ More

    Submitted 8 March, 2024; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Published in Algorithmica. Abstract abridged due to arXiv requirements

  7. arXiv:1906.11030  [pdf, other

    cs.DS

    Combinatorial Algorithms for String Sanitization

    Authors: Giulia Bernardini, Huiping Chen, Alessio Conte, Roberto Grossi, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone, Michelle Sweering

    Abstract: String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge. In this paper, we consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility, in two settings that are relevant… ▽ More

    Submitted 28 December, 2019; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: Extended version of a paper accepted to ECML/PKDD 2019

  8. arXiv:1905.02298  [pdf, other

    cs.DS

    Elastic-Degenerate String Matching via Fast Matrix Multiplication

    Authors: Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone

    Abstract: An elastic-degenerate (ED) string is a sequence of $n$ sets of strings of total length $N$, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length $m$ in an ED text. An $O(nm^{1.5}\sqrt{\log m}+N)$-time algorithm for EDSM is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on… ▽ More

    Submitted 4 May, 2021; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: Extended version of paper in ICALP 2019

  9. arXiv:1810.02099  [pdf, other

    cs.DS

    Longest Property-Preserved Common Factor

    Authors: Lorraine A. K Ayad, Giulia Bernardini, Roberto Grossi, Costas S. Iliopoulos, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone

    Abstract: In this paper we introduce a new family of string processing problems. We are given two or more strings and we are asked to compute a factor common to all strings that preserves a specific property and has maximal length. Here we consider three fundamental string properties: square-free factors, periodic factors, and palindromic factors under three different settings, one per property. In the firs… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

    Comments: Extended version of SPIRE 2018 paper

  10. arXiv:1805.01876  [pdf, other

    cs.DS

    Detecting Mutations by eBWT

    Authors: Nicola Prezza, Nadia Pisanti, Marinella Sciortino, Giovanna Rosone

    Abstract: In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the… ▽ More

    Submitted 10 May, 2018; v1 submitted 4 May, 2018; originally announced May 2018.

    Comments: simplified Proposition 4; extended Thm 2 to ambiguous clusters

  11. arXiv:1205.2766  [pdf, ps, other

    cs.DS

    Optimal Listing of Cycles and st-Paths in Undirected Graphs

    Authors: Rui Ferreira, Roberto Grossi, Andrea Marino, Nadia Pisanti, Romeo Rizzi, Gustavo Sacomoto

    Abstract: We present the first optimal algorithm for the classical problem of listing all the cycles in an undirected graph. We exploit their properties so that the total cost is the time taken to read the input graph plus the time to list the output, namely, the edges in each of the cycles. The algorithm uses a reduction to the problem of listing all the paths from a vertex s to a vertex t which we also so… ▽ More

    Submitted 5 July, 2012; v1 submitted 12 May, 2012; originally announced May 2012.

    Comments: 12 Pages, 7 Page Appendix

  12. arXiv:1002.0874  [pdf, ps, other

    cs.DS

    MADMX: A Novel Strategy for Maximal Dense Motif Extraction

    Authors: Roberto Grossi, Andrea Pietracaprina, Nadia Pisanti, Geppino Pucci, Eli Upfal, Fabio Vandin

    Abstract: We develop, analyze and experiment with a new tool, called MADMX, which extracts frequent motifs, possibly including don't care characters, from biological sequences. We introduce density, a simple and flexible measure for bounding the number of don't cares in a motif, defined as the ratio of solid (i.e., different from don't care) characters to the total length of the motif. By extracting only… ▽ More

    Submitted 3 February, 2010; originally announced February 2010.

    Comments: A preliminary version of this work was presented in WABI 2009. 10 pages, 0 figures