Skip to main content

Showing 1–3 of 3 results for author: Doleschal, J

Searching in archive cs. Search in all archives.
.
  1. The Complexity of Aggregates over Extractions by Regular Expressions

    Authors: Johannes Doleschal, Benny Kimelfeld, Wim Martens

    Abstract: Regular expressions with capture variables, also known as regex-formulas, extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile… ▽ More

    Submitted 8 August, 2023; v1 submitted 20 February, 2020; originally announced February 2020.

    Journal ref: Logical Methods in Computer Science, Volume 19, Issue 3 (August 9, 2023) lmcs:8623

  2. Weight Annotation in Information Extraction

    Authors: Johannes Doleschal, Benny Kimelfeld, Wim Martens, Liat Peterfreund

    Abstract: The framework of document spanners abstracts the task of information extraction from text as a function that maps every document (a string) into a relation over the document's spans (intervals identified by their start and end indices). For instance, the regular spanners are the closure under the Relational Algebra (RA) of the regular expressions with capture variables, and the expressive power of… ▽ More

    Submitted 28 January, 2022; v1 submitted 30 August, 2019; originally announced August 2019.

    Journal ref: Logical Methods in Computer Science, Volume 18, Issue 1 (January 31, 2022) lmcs:6936

  3. arXiv:1810.03367  [pdf, ps, other

    cs.DB

    Split-Correctness in Information Extraction

    Authors: Johannes Doleschal, Benny Kimelfeld, Wim Martens, Frank Neven, Matthias Niewerth

    Abstract: Programs for extracting structured information from text, namely information extractors, often operate separately on document segments obtained from a generic splitting operation such as sentences, paragraphs, k-grams, HTTP requests, and so on. An automated detection of this behavior of extractors, which we refer to as split-correctness, would allow text analysis systems to devise query plans with… ▽ More

    Submitted 20 May, 2021; v1 submitted 8 October, 2018; originally announced October 2018.