Skip to main content

Showing 1–9 of 9 results for author: Pirola, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2202.13884  [pdf, other

    q-bio.GN cs.FL cs.LG

    Numeric Lyndon-based feature embedding of sequencing reads for machine learning approaches

    Authors: Paola Bonizzoni, Matteo Costantini, Clelia De Felice, Alessia Petescia, Yuri Pirola, Marco Previtali, Raffaella Rizzi, Jens Stoye, Rocco Zaccagnino, Rosalba Zizza

    Abstract: Feature embedding methods have been proposed in literature to represent sequences as numeric vectors to be used in some bioinformatics investigations, such as family classification and protein structure prediction. Recent theoretical results showed that the well-known Lyndon factorization preserves common factors in overlapping strings. Surprisingly, the fingerprint of a sequencing read, which is… ▽ More

    Submitted 2 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    ACM Class: I.2.6; F.4.3

    Journal ref: Information Sciences 607 (2022) 458-476

  2. Computing the BWT and LCP array of a Set of Strings in External Memory

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi

    Abstract: Indexing very large collections of strings, such as those produced by the widespread next generation sequencing technologies, heavily relies on multistring generalization of the Burrows-Wheeler Transform (BWT): large requirements of in-memory approaches have stimulated recent developments on external memory algorithms. The related problem of computing the Longest Common Prefix (LCP) array of a set… ▽ More

    Submitted 4 December, 2020; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: Theoretical Computer Science (2020). arXiv admin note: text overlap with arXiv:1607.08342

  3. arXiv:1604.03587  [pdf, ps, other

    cs.DS q-bio.GN

    FSG: Fast String Graph Construction for De Novo Assembly of Reads Data

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi

    Abstract: The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FM-index and Burrows-Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection… ▽ More

    Submitted 29 May, 2017; v1 submitted 12 April, 2016; originally announced April 2016.

    Comments: Accepted to Journal of Computational Biology

  4. arXiv:1405.7520  [pdf, other

    cs.DS q-bio.GN

    An External-Memory Algorithm for String Graph Construction

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali, Raffaella Rizzi

    Abstract: Some recent results have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows-Wheeler Transform (BWT) of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome fr… ▽ More

    Submitted 11 June, 2015; v1 submitted 29 May, 2014; originally announced May 2014.

  5. Covering Pairs in Directed Acyclic Graphs

    Authors: Niko Beerenwinkel, Stefano Beretta, Paola Bonizzoni, Riccardo Dondi, Yuri Pirola

    Abstract: The Minimum Path Cover problem on directed acyclic graphs (DAGs) is a classical problem that provides a clear and simple mathematical formulation for several applications in different areas and that has an efficient algorithmic solution. In this paper, we study the computational complexity of two constrained variants of Minimum Path Cover motivated by the recent introduction of next-generation seq… ▽ More

    Submitted 18 October, 2013; originally announced October 2013.

    Journal ref: Proc. of Language and Automata Theory and Applications (LATA 2014), LNCS Vol. 8370, 2014, pp 126-137

  6. arXiv:1107.3724  [pdf, other

    cs.DS q-bio.PE

    Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers

    Authors: Yuri Pirola, Gianluca Della Vedova, Stefano Biffani, Alessandra Stella, Paola Bonizzoni

    Abstract: The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances and refinements that led to some efficient algorithms, its applicability to real datasets has been limited by the absence of some important characteristics of these d… ▽ More

    Submitted 19 July, 2011; originally announced July 2011.

    Comments: 14 pages, 1 figure, 4 tables, the associated software reHCstar is available at http://www.algolab.eu/reHCstar

    ACM Class: F.2.2

    Journal ref: IEEE/ACM Trans. on Computational Biology and Bioinformatics 9.6 (2012) 1582-1594

  7. Pure Parsimony Xor Haplotyping

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola, Romeo Rizzi

    Abstract: The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact sol… ▽ More

    Submitted 8 January, 2010; originally announced January 2010.

    Journal ref: IEEE/ACM Trans. on Computational Biology and Bioinformatics 7.4 (2010) 598-610

  8. Variants of Constrained Longest Common Subsequence

    Authors: Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola

    Abstract: In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such tha… ▽ More

    Submitted 2 December, 2009; originally announced December 2009.

    Journal ref: Information Processing Letters 110.20 (2010) 877-881

  9. arXiv:0910.3148  [pdf, other

    cs.DS cs.DB cs.DM

    Parameterized Complexity of the k-anonymity Problem

    Authors: Stefano Beretta, Paola Bonizzoni, Gianluca Della Vedova, Riccardo Dondi, Yuri Pirola

    Abstract: The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization that has been recently proposed is the $k$-anonymity. This approach requires that the rows of a table are partitioned in clusters of size at least $k$ and that all the rows in a cluster become the same tuple, after the suppression of some entries. The natural optimiz… ▽ More

    Submitted 17 May, 2010; v1 submitted 16 October, 2009; originally announced October 2009.

    Comments: 22 pages, 2 figures

    Journal ref: J. of Combinatorial Optimization 26.1 (2013) 19-43