Skip to main content

Showing 1–11 of 11 results for author: Medvedev, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2205.01785  [pdf, other

    cs.DS

    The theoretical analysis of sequencing bioinformatics algorithms and beyond

    Authors: Paul Medvedev

    Abstract: The theoretical analysis of performance has been an important tool in the engineering of algorithms in many application domains. Its goals are to predict the empirical performance of an algorithm and to be a yardstick that drives the design of novel algorithms that perform well in practice. While these goals have been achieved in many instances, they have not been achieved ubiquitously across cruc… ▽ More

    Submitted 14 November, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

  2. arXiv:2204.09535  [pdf, ps, other

    cs.DS q-bio.GN

    Theoretical analysis of edit distance algorithms: an applied perspective

    Authors: Paul Medvedev

    Abstract: Given its status as a classic problem and its importance to both theoreticians and practitioners, edit distance provides an excellent lens through which to understand how the theoretical analysis of algorithms impacts practical implementations. From an applied perspective, the goals of theoretical analysis are to predict the empirical performance of an algorithm and to serve as a yardstick to desi… ▽ More

    Submitted 30 January, 2023; v1 submitted 20 April, 2022; originally announced April 2022.

  3. arXiv:1903.12312  [pdf, other

    cs.DS q-bio.GN

    Data structures to represent a set of k-long DNA sequences

    Authors: Rayan Chikhi, Jan Holub, Paul Medvedev

    Abstract: The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k-mer set has emerged as a shared underlying component. A set of k-mers has unique fea… ▽ More

    Submitted 11 June, 2020; v1 submitted 28 March, 2019; originally announced March 2019.

  4. arXiv:1805.04765  [pdf, other

    cs.DM

    Bipartite Graphs of Small Readability

    Authors: Rayan Chikhi, Vladan Jovicic, Stefan Kratsch, Paul Medvedev, Martin Milanic, Sofya Raskhodnikova, Nithin Varma

    Abstract: We study a parameter of bipartite graphs called readability, introduced by Chikhi et al. (Discrete Applied Mathematics, 2016) and motivated by applications of overlap graphs in bioinformatics. The behavior of the parameter is poorly understood. The complexity of computing it is open and it is not known whether the decision version of the problem is in NP. The only known upper bound on the readabil… ▽ More

    Submitted 12 May, 2018; originally announced May 2018.

    Comments: 16 pages (including references) and 5 figures

  5. arXiv:1706.05429  [pdf, other

    cs.DS cs.DM q-bio.GN

    Modeling Biological Problems in Computer Science: A Case Study in Genome Assembly

    Authors: Paul Medvedev

    Abstract: As computer scientists working in bioinformatics/computational biology, we often face the challenge of coming up with an algorithm to answer a biological question. This occurs in many areas, such as variant calling, alignment, and assembly. In this tutorial, we use the example of the genome assembly problem to demonstrate how to go from a question in the biological realm to a solution in the compu… ▽ More

    Submitted 2 January, 2018; v1 submitted 16 June, 2017; originally announced June 2017.

  6. arXiv:1602.05856  [pdf, other

    cs.DS q-bio.GN

    TwoPaCo: An efficient algorithm to build the compacted de Bruijn graph from many complete genomes

    Authors: Ilia Minkin, Son Pham, Paul Medvedev

    Abstract: Motivation: De Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results: In this paper, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construc… ▽ More

    Submitted 18 February, 2016; originally announced February 2016.

    MSC Class: 68

  7. arXiv:1601.02932  [pdf, other

    q-bio.QM cs.DM cs.DS q-bio.GN

    Safe and complete contig assembly via omnitigs

    Authors: Alexandru I. Tomescu, Paul Medvedev

    Abstract: Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a… ▽ More

    Submitted 16 August, 2016; v1 submitted 12 January, 2016; originally announced January 2016.

    Comments: Full version of the paper in the proceedings of RECOMB 2016

  8. arXiv:1504.04616  [pdf, ps, other

    cs.DM cs.DS math.CO q-bio.GN

    On the readability of overlap digraphs

    Authors: Rayan Chikhi, Paul Medvedev, Martin Milanic, Sofya Raskhodnikova

    Abstract: We introduce the graph parameter readability and study it as a function of the number of vertices in a graph. Given a digraph D, an injective overlap labeling assigns a unique string to each vertex such that there is an arc from x to y if and only if x properly overlaps y. The readability of D is the minimum string length for which an injective overlap labeling exists. In applications that utilize… ▽ More

    Submitted 17 April, 2015; originally announced April 2015.

    Comments: This is a full version of a conference paper of the same title at the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015)

  9. arXiv:1401.5383  [pdf, ps, other

    q-bio.QM cs.DS q-bio.GN

    On the representation of de Bruijn graphs

    Authors: Rayan Chikhi, Antoine Limasset, Shaun Jackman, Jared Simpson, Paul Medvedev

    Abstract: The de Bruijn graph plays an important role in bioinformatics, especially in the context of de novo assembly. However, the representation of the de Bruijn graph in memory is a computational bottleneck for many assemblers. Recent papers proposed a navigational data structure approach in order to improve memory usage. We prove several theoretical space lower bounds to show the limitation of these ty… ▽ More

    Submitted 6 October, 2014; v1 submitted 21 January, 2014; originally announced January 2014.

    Comments: Journal version (JCB). A preliminary version of this article was published in the proceedings of RECOMB 2014

  10. Shortest paths between shortest paths and independent sets

    Authors: Marcin Kaminski, Paul Medvedev, Martin Milanic

    Abstract: We study problems of reconfiguration of shortest paths in graphs. We prove that the shortest reconfiguration sequence can be exponential in the size of the graph and that it is NP-hard to compute the shortest reconfiguration sequence even when we know that the sequence has polynomial length. Moreover, we also study reconfiguration of independent sets in three different models and analyze relations… ▽ More

    Submitted 7 February, 2011; v1 submitted 26 August, 2010; originally announced August 2010.

  11. The Plane-Width of Graphs

    Authors: Marcin Kaminski, Paul Medvedev, Martin Milanic

    Abstract: Map vertices of a graph to (not necessarily distinct) points of the plane so that two adjacent vertices are mapped at least a unit distance apart. The plane-width of a graph is the minimum diameter of the image of the vertex set over all such mappings. We establish a relation between the plane-width of a graph and its chromatic number, and connect it to other well-known areas, including the circ… ▽ More

    Submitted 23 December, 2008; originally announced December 2008.

    Journal ref: Journal of Graph Theory 68 (2011) 229-245