-
Characterization and Decidability of FC-Definable Regular Languages
Authors:
Sam M. Thompson,
Nicole Schweikardt,
Dominik D. Freydenberger
Abstract:
FC is a first-order logic that reasons over all factors of a finite word using concatenation, and can define non-regular languages like that of all squares (ww). In this paper, we establish that there are regular languages that are not FC-definable. Moreover, we give a decidable characterization of the FC-definable regular languages in terms of algebra, automata, and regular expressions. The latte…
▽ More
FC is a first-order logic that reasons over all factors of a finite word using concatenation, and can define non-regular languages like that of all squares (ww). In this paper, we establish that there are regular languages that are not FC-definable. Moreover, we give a decidable characterization of the FC-definable regular languages in terms of algebra, automata, and regular expressions. The latter of which is natural and concise: Star-free generalized regular expressions extended with the Kleene star of terminal words.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Generalized Core Spanner Inexpressibility via Ehrenfeucht-Fraïssé Games for FC
Authors:
Sam M. Thompson,
Dominik D. Freydenberger
Abstract:
Despite considerable research on document spanners, little is known about the expressive power of generalized core spanners. In this paper, we use Ehrenfeucht-Fraïssé games to obtain general inexpressibility lemmas for the logic FC (a finite-model variant of the theory of concatenation). Applying these lemmas give inexpressibility results for FC that we lift to generalized core spanners. In partic…
▽ More
Despite considerable research on document spanners, little is known about the expressive power of generalized core spanners. In this paper, we use Ehrenfeucht-Fraïssé games to obtain general inexpressibility lemmas for the logic FC (a finite-model variant of the theory of concatenation). Applying these lemmas give inexpressibility results for FC that we lift to generalized core spanners. In particular, we give several relations that cannot be selected by generalized core spanners, thus demonstrating the effectiveness of the inexpressibility lemmas. As an immediate consequence, we also gain new insights into the expressive power of core spanners.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Conjunctive Queries for Logic-Based Information Extraction
Authors:
Sam M. Thompson
Abstract:
This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some fin…
▽ More
This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. The second approach is to consider the dynamic complexity of FC.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Splitting Spanner Atoms: A Tool for Acyclic Core Spanners
Authors:
Dominik D. Freydenberger,
Sam M. Thompson
Abstract:
This paper investigates regex CQs with string equalities (SERCQs), a subclass of core spanners. As shown by Freydenberger, Kimelfeld, and Peterfreund (PODS 2018), these queries are intractable, even if restricted to acyclic queries. This previous result defines acyclicity by treating regex formulas as atoms. In contrast to this, we propose an alternative definition by converting SERCQs into FC-CQs…
▽ More
This paper investigates regex CQs with string equalities (SERCQs), a subclass of core spanners. As shown by Freydenberger, Kimelfeld, and Peterfreund (PODS 2018), these queries are intractable, even if restricted to acyclic queries. This previous result defines acyclicity by treating regex formulas as atoms. In contrast to this, we propose an alternative definition by converting SERCQs into FC-CQs -- conjunctive queries in FC, a logic that is based on word equations. We introduce a way to decompose word equations of unbounded arity into a conjunction of binary word equations. If the result of the decomposition is acyclic, then evaluation and enumeration of results become tractable. The main result of this work is an algorithm that decides in polynomial time whether an FC-CQ can be decomposed into an acyclic FC-CQ. We also give an efficient conversion from synchronized SERCQs to FC-CQs with regular constraints. As a consequence, tractability results for acyclic relational CQs directly translate to a large class of SERCQs.
△ Less
Submitted 19 January, 2022; v1 submitted 10 April, 2021;
originally announced April 2021.
-
Dynamic Complexity of Document Spanners
Authors:
Dominik D. Freydenberger,
Sam M. Thompson
Abstract:
The present paper investigates the dynamic complexity of document spanners, a formal framework for information extraction introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (JACM 2015). We first look at the class of regular spanners and prove that any regular spanner can be maintained in the dynamic complexity class DynPROP. This result follows from work done previously on the dynamic complexi…
▽ More
The present paper investigates the dynamic complexity of document spanners, a formal framework for information extraction introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (JACM 2015). We first look at the class of regular spanners and prove that any regular spanner can be maintained in the dynamic complexity class DynPROP. This result follows from work done previously on the dynamic complexity of formal languages by Gelade, Marquardt, and Schwentick (TOCL 2012).
To investigate core spanners we use SpLog, a concatenation logic that exactly captures core spanners. We show that the dynamic complexity class DynCQ, is more expressive than SpLog and therefore can maintain any core spanner. This result is then extended to show that DynFO can maintain any generalized core spanner and that DynFO is at least as powerful as SpLog with negation.
△ Less
Submitted 9 January, 2020; v1 submitted 24 September, 2019;
originally announced September 2019.