Skip to main content

Showing 1–2 of 2 results for author: Hartikainen, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:1605.09362  [pdf, other

    cs.IR

    Document Retrieval on Repetitive String Collections

    Authors: Travis Gagie, Aleksi Hartikainen, Kalle Karhu, Juha Kärkkäinen, Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén

    Abstract: Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient docum… ▽ More

    Submitted 18 May, 2017; v1 submitted 30 May, 2016; originally announced May 2016.

    Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. Accepted to the Information Retrieval Journal

  2. arXiv:1409.6780  [pdf, other

    cs.DS

    Document Counting in Practice

    Authors: Travis Gagie, Aleksi Hartikainen, Juha Kärkkäinen, Gonzalo Navarro, Simon J. Puglisi, Jouni Sirén

    Abstract: We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. We implement these solutions and develop some new variants, comparing them experimentally on various datasets. Our results not only show which are the best options for each situation a… ▽ More

    Submitted 1 October, 2015; v1 submitted 23 September, 2014; originally announced September 2014.

    Comments: This is a slightly extended version of the paper that was presented at DCC 2015. The implementations are available at http://jltsiren.kapsi.fi/rlcsa and https://github.com/ahartik/succinct