-
Jumbled Scattered Factors
Authors:
Pamela Fleischmann,
Annika Huch,
Melf Kammholz,
Tore Koß
Abstract:
In this work, we combine the research on (absent) scattered factors with the one of jumbled words. For instance, $\mathtt{wolf}$ is an absent scattered factor of $\mathtt{cauliflower}$ but since $\mathtt{lfow}$, a jumbled (or abelian) version of $\mathtt{wolf}$, is a scattered factor, $\mathtt{wolf}$ occurs as a jumbled scattered factor in $\mathtt{cauliflower}$. A \emph{jumbled scattered factor}…
▽ More
In this work, we combine the research on (absent) scattered factors with the one of jumbled words. For instance, $\mathtt{wolf}$ is an absent scattered factor of $\mathtt{cauliflower}$ but since $\mathtt{lfow}$, a jumbled (or abelian) version of $\mathtt{wolf}$, is a scattered factor, $\mathtt{wolf}$ occurs as a jumbled scattered factor in $\mathtt{cauliflower}$. A \emph{jumbled scattered factor} $u$ of a word $w$ is constructed by letters of $w$ with the only rule that the number of occurrences per letter in $u$ is smaller than or equal to the one in $w$. We proceed to partition and characterise the set of jumbled scattered factors by the number of jumbled letters and use the latter as a measure. For this new class of words, we relate the folklore longest common subsequence (scattered factor) to the number of required jumbles. Further, we investigate the smallest possible number of jumbles alongside the jumbled scattered factor relation as well as Simon's congruence from the point of view of jumbled scattered factors and jumbled universality.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
$k$-Universality of Regular Languages Revisited
Authors:
Duncan Adamson,
Pamela Fleischmann,
Annika Huch,
Tore Koß,
Florin Manea
Abstract:
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \cdots w[i_k]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \vert w \vert$. A word $w$ is \emph{$k$-subsequence universal} over an alphabet $Σ$ if every word over $Σ$ up to length $k$ appears in $w$ as a subsequence. In this paper, we revisit the problem $k$-ESU of deciding, for a given integer $k$, whether a r…
▽ More
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \cdots w[i_k]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \vert w \vert$. A word $w$ is \emph{$k$-subsequence universal} over an alphabet $Σ$ if every word over $Σ$ up to length $k$ appears in $w$ as a subsequence. In this paper, we revisit the problem $k$-ESU of deciding, for a given integer $k$, whether a regular language, given either as nondeterministic finite automaton or as a regular expression, contains a $k$-universal word. [Adamson et al., ISAAC 2023] showed that this problem is NP-hard, even in the case when $k=1$, and an FPT algorithm w.r.t. the size of the input alphabet was given. In this paper, we improve the aforementioned algorithmic result and complete the analysis of this problem w.r.t. other parameters. That is, we propose a more efficient FPT algorithm for $k$-ESU, with respect to the size of the input alphabet, and propose new FPT algorithms for this problem w.r.t.~the number of states of the input automaton and the length of the input regular expression. We also discuss corresponding lower bounds. Our results significantly improve the understanding of this problem.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis
Authors:
Martin Mayr,
Julian Krenz,
Katharina Neumeier,
Anna Bub,
Simon Bürcky,
Nina Brolich,
Klaus Herbers,
Mechthild Habermann,
Peter Fleischmann,
Andreas Maier,
Vincent Christlein
Abstract:
Most datasets in the field of document analysis utilize highly standardized labels, which, while simplifying specific tasks, often produce outputs that are not directly applicable to humanities research. In contrast, the Nuremberg Letterbooks dataset, which comprises historical documents from the early 15th century, addresses this gap by providing multiple types of transcriptions and accompanying…
▽ More
Most datasets in the field of document analysis utilize highly standardized labels, which, while simplifying specific tasks, often produce outputs that are not directly applicable to humanities research. In contrast, the Nuremberg Letterbooks dataset, which comprises historical documents from the early 15th century, addresses this gap by providing multiple types of transcriptions and accompanying metadata. This approach allows for developing methods that are more closely aligned with the needs of the humanities. The dataset includes 4 books containing 1711 labeled pages written by 10 scribes. Three types of transcriptions are provided for handwritten text recognition: Basic, diplomatic, and regularized. For the latter two, versions with and without expanded abbreviations are also available. A combination of letter ID and writer ID supports writer identification due to changing writers within pages. In the technical validation, we established baselines for various tasks, demonstrating data consistency and providing benchmarks for future research to build upon.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Generalized Word-Representable Graphs
Authors:
Zhidan Feng,
Henning Fernau,
Pamela Fleischmann,
Kevin Mann,
Silas Cato Sacher
Abstract:
The literature on word-representable graphs is quite rich, and a number of variations of the original definition have been proposed over the years. We are initiating a systematic study of such variations based on formal languages. In our framework, we can associate a graph class to each language over the binary alphabet \{0,1\}. All graph classes that are language-representable in this sense are h…
▽ More
The literature on word-representable graphs is quite rich, and a number of variations of the original definition have been proposed over the years. We are initiating a systematic study of such variations based on formal languages. In our framework, we can associate a graph class to each language over the binary alphabet \{0,1\}. All graph classes that are language-representable in this sense are hereditary and enjoy further common properties. Besides word-representable graphs and, more generally, 1^k- or k-11-representable graphs, we can identify many more graph classes in our framework, like (co)bipartite graphs, (co)comparability graphs, to name a few. It was already known that any graph is 111- or 2-11-representable. When such representations are considered for storing graphs, 111- or 2-11-representability bears the disadvantage of being significantly inferior to standard adjacency matrices or lists. We prove that quite famous languages like the palindromes, the copy language or the Lyndon words can match the efficiency of standard graph representations. The perspective of language theory allows us to prove general results that hold for all graph classes that can be defined in this way. This includes certain closure properties (e.g., all language-definable graph classes are hereditary) as well as certain limitations (e.g., all language-representable graph classes contain graphs of arbitrarily large treewidth and of arbitrarily large degeneracy, except a trivial case). As each language describes a graph class, we can also ask decidability questions concerning graph classes, given a concrete presentation of a formal language. We also present a systematic study of graph classes that can be represented by languages in which each letter occurs at most twice. Here, we find graph classes like interval, permutation, circle, bipartite chain, convex, and threshold graphs.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
$k$-local Graphs
Authors:
Christian Beth,
Pamela Fleischmann,
Annika Huch,
Daniyal Kazempour,
Peer Kröger,
Andrea Kulow,
Matthias Renz
Abstract:
In 2017 Day et al. introduced the notion of locality as a structural complexity-measure for patterns in the field of pattern matching established by Angluin in 1980. In 2019 Casel et al. showed that determining the locality of an arbitrary pattern is NP-complete. Inspired by hierarchical clustering, we extend the notion to coloured graphs, i.e., given a coloured graph determine an enumeration of t…
▽ More
In 2017 Day et al. introduced the notion of locality as a structural complexity-measure for patterns in the field of pattern matching established by Angluin in 1980. In 2019 Casel et al. showed that determining the locality of an arbitrary pattern is NP-complete. Inspired by hierarchical clustering, we extend the notion to coloured graphs, i.e., given a coloured graph determine an enumeration of the colours such that colouring the graph stepwise according to the enumeration leads to as few clusters as possible. Next to first theoretical results on graph classes, we propose a priority search algorithm to compute the $k$-locality of a graph. The algorithm is optimal in the number of marking prefix expansions, and is faster by orders of magnitude than an exhaustive search. Finally, we perform a case study on a DBLP subgraph to demonstrate the potential of $k$-locality for knowledge discovery.
△ Less
Submitted 8 May, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Rollercoasters with Plateaus
Authors:
Duncan Adamson,
Pamela Fleischmann,
Annika Huch
Abstract:
In this paper we investigate the problem of detecting, counting, and enumerating (generating) all maximum length plateau-$k$-rollercoasters appearing as a subsequence of some given word (sequence, string), while allowing for plateaus. We define a plateau-$k$-rollercoaster as a word consisting of an alternating sequence of (weakly) increasing and decreasing \emph{runs}, with each run containing at…
▽ More
In this paper we investigate the problem of detecting, counting, and enumerating (generating) all maximum length plateau-$k$-rollercoasters appearing as a subsequence of some given word (sequence, string), while allowing for plateaus. We define a plateau-$k$-rollercoaster as a word consisting of an alternating sequence of (weakly) increasing and decreasing \emph{runs}, with each run containing at least $k$ \emph{distinct} elements, allowing the run to contain multiple copies of the same symbol consecutively. This differs from previous work, where runs within rollercoasters have been defined only as sequences of distinct values. Here, we are concerned with rollercoasters of \emph{maximum} length embedded in a given word $w$, that is, the longest rollercoasters that are a subsequence of $w$.
We present algorithms allowing us to determine the longest plateau-$k$-roller\-coasters appearing as a subsequence in any given word $w$ of length $n$ over an alphabet of size $σ$ in $O(n σk)$ time, to count the number of plateau-$k$-rollercoasters in $w$ of maximum length in $O(n σk)$ time, and to output all of them with $O(n)$ delay after $O(n σk)$ preprocessing. Furthermore, we present an algorithm to determine the longest common plateau-$k$-rollercoaster within a set of words in $O(N k σ)$ where $N$ is the product of all word lengths within the set.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Tight Bounds for the Number of Absent Scattered Factors
Authors:
Duncan Adamson,
Pamela Fleischmann,
Annika Huch,
Max Wiedenhöft
Abstract:
A scattered factor of a word $w$ is a word $u$ that can be obtained by deleting arbitary letters from $w$ and keep the order of the remaining. Barker et al. introduced the notion of $k$-universality, calling a word $k$-universal, if it contains all possible words of length $k$ over a given alphabet $Σ$ as a scattered factor. Kosche et al. introduced the notion of absent scattered factors to catego…
▽ More
A scattered factor of a word $w$ is a word $u$ that can be obtained by deleting arbitary letters from $w$ and keep the order of the remaining. Barker et al. introduced the notion of $k$-universality, calling a word $k$-universal, if it contains all possible words of length $k$ over a given alphabet $Σ$ as a scattered factor. Kosche et al. introduced the notion of absent scattered factors to categorise the words not being scattered factors of a given word.
In this paper, we investigate tight bounds on the possible number of absent scattered factors of a given length $k$ (also strictly longer than the shortest absent scattered factors) among all words with the same universality extending the results of Kosche et al. Specifically, given a length $k$ and universality index $ι$, we characterize $ι$-universal words with both the maximal and minimal number of absent scattered factors of length $k$. For the lower bound, we provide the exact number in a closed form. For the upper bound, we offer efficient algorithms to compute the number based on the constructed words. Moreover, by combining old results, we present an enumeration with constant delay of the set of scattered factors of a fixed length in time $O(|Σ||w|)$.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
$k$-Universality of Regular Languages
Authors:
Duncan Adamson,
Pamela Fleischmann,
Annika Huch,
Tore Koß,
Florin Manea,
Dirk Nowotka
Abstract:
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ an…
▽ More
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ and regular languages over $Σ$. We call a regular language $L$ \emph{$k$-$\exists$-subsequence universal} if there exists a $k$-subsequence universal word in $L$, and \emph{$k$-$\forall$-subsequence universal} if every word of $L$ is $k$-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is \emph{$k$-$\exists$-subsequence universal} and, respectively, if it is \emph{$k$-$\forall$-subsequence universal}, for a given $k$. The algorithms are FPT w.r.t.~the size of the input alphabet, and their run-time does not depend on $k$; they run in polynomial time in the number $n$ of states of the input automaton when the size of the input alphabet is $O(\log n)$. Moreover, we show that the problem of deciding if a given regular language is \emph{$k$-$\exists$-subsequence universal} is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of $k$-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of $k$-subsequence universal words accepted by a given finite automaton.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Matching Patterns with Variables Under Simon's Congruence
Authors:
Pamela Fleischmann,
Sungmin Kim,
Tore Koß,
Florin Manea,
Dirk Nowotka,
Stefan Siemer,
Max Wiedenhöft
Abstract:
We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity.
We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
$α$-$β$-Factorization and the Binary Case of Simon's Congruence
Authors:
Pamela Fleischmann,
Jonas Höfer,
Annika Huch,
Dirk Nowotka
Abstract:
In 1991 Hébrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of $k$-richness and later on Barker et al. the notion of $k$-universality. In 2022 Fleischmann et al. presented a generalization of th…
▽ More
In 1991 Hébrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of $k$-richness and later on Barker et al. the notion of $k$-universality. In 2022 Fleischmann et al. presented a generalization of the arch factorization by intersecting the arch factorization of a word and its reverse. While the authors merely used this factorization for the investigation of shortest absent scattered factors, in this work we investigate this new $α$-$β$-factorization as such. We characterize the famous Simon congruence of $k$-universal words in terms of $1$-universal words. Moreover, we apply these results to binary words. In this special case, we obtain a full characterization of the classes and calculate the index of the congruence. Lastly, we start investigating the ternary case, present a full list of possibilities for $αβα$-factors, and characterize their congruence.
△ Less
Submitted 11 September, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
MaxSAT with Absolute Value Functions: A Parameterized Perspective
Authors:
Max Bannach,
Pamela Fleischmann,
Malte Skambath
Abstract:
The natural generalization of the Boolean satisfiability problem to optimization problems is the task of determining the maximum number of clauses that can simultaneously be satisfied in a propositional formula in conjunctive normal form. In the weighted maximum satisfiability problem each clause has a positive weight and one seeks an assignment of maximum weight. The literature almost solely cons…
▽ More
The natural generalization of the Boolean satisfiability problem to optimization problems is the task of determining the maximum number of clauses that can simultaneously be satisfied in a propositional formula in conjunctive normal form. In the weighted maximum satisfiability problem each clause has a positive weight and one seeks an assignment of maximum weight. The literature almost solely considers the case of positive weights. While the general case of the problem is only restricted slightly by this constraint, many special cases become trivial in the absence of negative weights. In this work we study the problem with negative weights and observe that the problem becomes computationally harder - which we formalize from a parameterized perspective in the sense that various variations of the problem become W[1]-hard if negative weights are present.
Allowing negative weights also introduces new variants of the problem: Instead of maximizing the sum of weights of satisfied clauses, we can maximize the absolute value of that sum. This turns out to be surprisingly expressive even restricted to monotone formulas in disjunctive normal form with at most two literals per clause. In contrast to the versions without the absolute value, however, we prove that these variants are fixed-parameter tractable. As technical contribution we present a kernelization for an auxiliary problem on hypergraphs in which we seek, given an edge-weighted hypergraph, an induced subgraph that maximizes the absolute value of the sum of edge-weights.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
On the Self Shuffle Language
Authors:
Pamela Fleischmann,
Tero Harju,
Lukas Haschke,
Jonas Höfer,
Dirk Nowotka
Abstract:
The shuffle product \(u\shuffle v\) of two words \(u\) and \(v\) is the set of all words which can be obtained by interleaving \(u\) and \(v\). Motivated by the paper \emph{The Shuffle Product: New Research Directions} by Restivo (2015) we investigate a special case of the shuffle product. In this work we consider the shuffle of a word with itself called the \emph{self shuffle} or \emph{shuffle sq…
▽ More
The shuffle product \(u\shuffle v\) of two words \(u\) and \(v\) is the set of all words which can be obtained by interleaving \(u\) and \(v\). Motivated by the paper \emph{The Shuffle Product: New Research Directions} by Restivo (2015) we investigate a special case of the shuffle product. In this work we consider the shuffle of a word with itself called the \emph{self shuffle} or \emph{shuffle square}, showing first that the self shuffle language and the shuffle of the language are in general different sets. We prove that the language of all words arising as a self shuffle of some word is context sensitive but not context free. Furthermore, we show that the self shuffle \(w \shuffle w\) uniquely determines \(w\).
△ Less
Submitted 2 March, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
m-Nearly k-Universal Words -- Investigating Simon Congruence
Authors:
Pamela Fleischmann,
Lukas Haschke,
Annika Huch,
Annika Mayrock,
Dirk Nowotka
Abstract:
Determining the index of the Simon congruence is a long outstanding open problem. Two words $u$ and $v$ are called Simon congruent if they have the same set of scattered factors, which are parts of the word in the correct order but not necessarily consecutive, e.g., $\mathtt{oath}$ is a scattered factor of $\mathtt{logarithm}$. Following the idea of scattered factor $k$-universality, we investigat…
▽ More
Determining the index of the Simon congruence is a long outstanding open problem. Two words $u$ and $v$ are called Simon congruent if they have the same set of scattered factors, which are parts of the word in the correct order but not necessarily consecutive, e.g., $\mathtt{oath}$ is a scattered factor of $\mathtt{logarithm}$. Following the idea of scattered factor $k$-universality, we investigate $m$-nearly $k$-universality, i.e., words where $m$ scattered factors of length $k$ are absent, w.r.t. Simon congruence. We present a full characterisation as well as the index of the congruence for $m=1$. For $m\neq 1$, we show some results if in addition $w$ is $(k-1)$-universal as well as some further insights for different $m$.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
The Show Must Go On -- Examination During a Pandemic
Authors:
Pamela Fleischmann,
Mitja Kulczynski,
Dirk Nowotka
Abstract:
When unexpected incidents occur, new innovative and flexible solutions are required. If this event is something such radical and dramatic like the COVID-19 pandemic, these solutions must aim to guarantee as much normality as possible while protecting lives. After a moment of shock our university decided that the students have to be able to pursue their studies for guaranteeing a degree in the expe…
▽ More
When unexpected incidents occur, new innovative and flexible solutions are required. If this event is something such radical and dramatic like the COVID-19 pandemic, these solutions must aim to guarantee as much normality as possible while protecting lives. After a moment of shock our university decided that the students have to be able to pursue their studies for guaranteeing a degree in the expected time since most of them faced immediate financial problems due to the loss of their student jobs. This implied, for us as teachers, that we had to reorganise not only the teaching methods from nearly one day to the next, but we also had to come up with an adjusted way of examinations which had to take place in person with pen and paper under strict hygiene rules. On the other hand the correction should avoid personal contacts. We developed a framework which allowed us to correct the digitalised exams safely at home while providing the high standards given by the general data protection regulation of our country. Moreover, the time spent in the offices could be reduced to a minimum thanks to automatically generated exam sheets, automatically re-digitalised and sorted worked-on exams.
△ Less
Submitted 25 June, 2021;
originally announced July 2021.
-
Scattered Factor Universality -- The Power of the Remainder
Authors:
Pamela Fleischmann,
Sebastian Bernhard Germann,
Dirk Nowotka
Abstract:
Scattered factor (circular) universality was firstly introduced by Barker et al. in 2020. A word $w$ is called $k$-universal for some natural number $k$, if every word of length $k$ of $w$'s alphabet occurs as a scattered factor in $w$; it is called circular $k$-universal if a conjugate of $w$ is $k$-universal. Here, a word $u=u_1\cdots u_n$ is called a scattered factor of $w$ if $u$ is obtained f…
▽ More
Scattered factor (circular) universality was firstly introduced by Barker et al. in 2020. A word $w$ is called $k$-universal for some natural number $k$, if every word of length $k$ of $w$'s alphabet occurs as a scattered factor in $w$; it is called circular $k$-universal if a conjugate of $w$ is $k$-universal. Here, a word $u=u_1\cdots u_n$ is called a scattered factor of $w$ if $u$ is obtained from $w$ by deleting parts of $w$, i.e. there exists (possibly empty) words $v_1,\dots,v_{n+1}$ with $w=v_1u_1v_2\cdots v_nu_nv_{n+1}$. In this work, we prove two problems, left open in the aforementioned paper, namely a generalisation of one of their main theorems to arbitrary alphabets and a slight modification of another theorem such that we characterise the circular universality by the universality. On the way, we present deep insights into the behaviour of the remainder of the so called arch factorisation by Hebrard when repetitions of words are considered.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Blocksequences of k-local Words
Authors:
Pamela Fleischmann,
Lukas Haschke,
Florin Manea,
Dirk Nowotka,
Cedric Tsatia Tsida,
Judith Wiedenbeck
Abstract:
The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the lette…
▽ More
The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the letters of the word are marked in steps: in the ith marking step, all occurrences of the ith letter of the marking sequence are marked. As such, after each marking step, the word can be seen as a sequence of blocks of marked letters separated by blocks of non-marked letters. By keeping track of the evolution of the marked blocks of the word through the marking defined by a marking sequence, one defines the blocksequence of the respective marking sequence. We first show that the words sharing the same blocksequence are only loosely connected, so we consider the stronger notion of extended blocksequence, which stores additional information on the form of each single marked block. In this context, we present a series of combinatorial results for words sharing the extended blocksequence.
△ Less
Submitted 17 August, 2020; v1 submitted 8 August, 2020;
originally announced August 2020.
-
The Edit Distance to $k$-Subsequence Universality
Authors:
Pamela Fleischmann,
Maria Kosche,
Tore Koß,
Florin Manea,
Stefan Siemer
Abstract:
A word $u$ is a subsequence of another word $w$ if $u$ can be obtained from $w$ by deleting some of its letters. The word $w$ with alph$(w)=Σ$ is called $k$-subsequence universal if the set of subsequences of length $k$ of $w$ contains all possible words of length $k$ over $Σ$. We propose a series of efficient algorithms computing the minimal number of edit operations (insertion, deletion, substit…
▽ More
A word $u$ is a subsequence of another word $w$ if $u$ can be obtained from $w$ by deleting some of its letters. The word $w$ with alph$(w)=Σ$ is called $k$-subsequence universal if the set of subsequences of length $k$ of $w$ contains all possible words of length $k$ over $Σ$. We propose a series of efficient algorithms computing the minimal number of edit operations (insertion, deletion, substitution) one needs to apply to a given word in order to reach the set of $k$-subsequence universal words.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Weighted Prefix Normal Words: Mind the Gap
Authors:
Yannik Eikmeier,
Pamela Fleischmann,
Mitja Kulczynski,
Dirk Nowotka
Abstract:
A prefix normal word is a binary word whose prefixes contain at least as many 1s as any of its factors of the same length. Introduced by Fici and Lipták in 2011 the notion of prefix normality is so far only defined for words over the binary alphabet. In this work we investigate a generalisation for finite words over arbitrary finite alphabets, namely weighted prefix normality. We prove that weight…
▽ More
A prefix normal word is a binary word whose prefixes contain at least as many 1s as any of its factors of the same length. Introduced by Fici and Lipták in 2011 the notion of prefix normality is so far only defined for words over the binary alphabet. In this work we investigate a generalisation for finite words over arbitrary finite alphabets, namely weighted prefix normality. We prove that weighted prefix normality is more expressive than binary prefix normality. Furthermore, we investigate the existence of a weighted prefix normal form since weighted prefix normality comes with several new peculiarities that did not already occur in the binary case. We characterise these issues and finally present a standard technique to obtain a generalised prefix normal form for all words overarbitrary, finite alphabets.
△ Less
Submitted 19 April, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Scattered Factor-Universality of Words
Authors:
Laura Barker,
Pamela Fleischmann,
Katharina Harwardt,
Florin Manea,
Dirk Nowotka
Abstract:
A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two wor…
▽ More
A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two words are equal or not, running in optimal time. Secondly, we consider a notion of scattered-factors universality: the word $w$, with $\letters(w)=Σ$, is called $k$-universal if its $k$-spectrum includes all words of length $k$ over the alphabet $Σ$; we extend this notion to $k$-circular universality. After a series of preliminary combinatorial results, we present an algorithm computing, for a given $k'$-universal word $w$ the minimal $i$ such that $w^i$ is $k$-universal for some $k>k'$. Several other connected problems~are~also~considered.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Reconstructing Words from Right-Bounded-Block Words
Authors:
Pamela Fleischmann,
Marie Lejeune,
Florin Manea,
Dirk Nowotka,
Michel Rigo
Abstract:
A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors o…
▽ More
A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors of the form $a^{i} b$. Moreover, we generalize the result to alphabets of the form $\{1,\ldots,q\}$ by showing that at most $ \sum^{q-1}_{i=1} |w|_i (q-i+1)$ scattered factors suffices to reconstruct $w$. Both results improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here.
△ Less
Submitted 16 March, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
On Collapsing Prefix Normal Words
Authors:
Pamela Fleischmann,
Mitja Kulczynski,
Dirk Nowotka
Abstract:
Prefix normal words are binary words in which each prefix has at least the same number of $\so$s as any factor of the same length. Firstly introduced by Fici and Lipták in 2011, the problem of determining the index of the prefix equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending th…
▽ More
Prefix normal words are binary words in which each prefix has at least the same number of $\so$s as any factor of the same length. Firstly introduced by Fici and Lipták in 2011, the problem of determining the index of the prefix equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending the notion of critical words). We prove characterizations for both the palindromes and the collapsing words and show their connection. Based on this, we show that still open problems regarding prefix normal words can be split into certain subproblems.
△ Less
Submitted 19 May, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
k-Spectra of weakly-c-Balanced Words
Authors:
Joel D. Day,
Pamela Fleischmann,
Florin Manea,
Dirk Nowotka
Abstract:
A word $u$ is a scattered factor of $w$ if $u$ can be obtained from $w$ by deleting some of its letters. That is, there exist the (potentially empty) words $u_1,u_2,..., u_n$, and $v_0,v_1,..,v_n$ such that $u = u_1u_2...u_n$ and $w = v_0u_1v_1u_2v_2...u_nv_n$. We consider the set of length-$k$ scattered factors of a given word w, called here $k$-spectrum and denoted $\ScatFact_k(w)$. We prove a s…
▽ More
A word $u$ is a scattered factor of $w$ if $u$ can be obtained from $w$ by deleting some of its letters. That is, there exist the (potentially empty) words $u_1,u_2,..., u_n$, and $v_0,v_1,..,v_n$ such that $u = u_1u_2...u_n$ and $w = v_0u_1v_1u_2v_2...u_nv_n$. We consider the set of length-$k$ scattered factors of a given word w, called here $k$-spectrum and denoted $\ScatFact_k(w)$. We prove a series of properties of the sets $\ScatFact_k(w)$ for binary strictly balanced and, respectively, $c$-balanced words $w$, i.e., words over a two-letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has $c$-more occurrences than the other. In particular, we consider the question which cardinalities $n= |\ScatFact_k(w)|$ are obtainable, for a positive integer $k$, when $w$ is either a strictly balanced binary word of length $2k$, or a $c$-balanced binary word of length $2k-c$. We also consider the problem of reconstructing words from their $k$-spectra.
△ Less
Submitted 24 May, 2019; v1 submitted 19 April, 2019;
originally announced April 2019.
-
Graph and String Parameters: Connections Between Pathwidth, Cutwidth and the Locality Number
Authors:
Katrin Casel,
Joel D. Day,
Pamela Fleischmann,
Tomasz Kociumaka,
Florin Manea,
Markus L. Schmid
Abstract:
We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard, but fixed-parameter tractable, if parameterised by the locality number or by the alphabet siz…
▽ More
We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard, but fixed-parameter tractable, if parameterised by the locality number or by the alphabet size, which has been formulated as open problems in the literature. Moreover, the locality number can be approximated with ratio O(sqrt(log(opt)) log(n)). An important aspect of our work -- that is relevant in its own right and of independent interest -- is that we identify connections between the string parameter of the locality number on the one hand, and the famous graph parameters of cutwidth and pathwidth, on the other hand. These two parameters have been jointly investigated in the literature and are arguably among the most central graph parameters that are based on "linearisations" of graphs. In this way, we also identify a direct approximation preserving reduction from cutwidth to pathwidth, which shows that any polynomial f(opt,|V|)-approximation algorithm for pathwidth yields a polynomial 2f(2 opt,h)-approximation algorithm for cutwidth on multigraphs (where h is the number of edges). In particular, this translates known approximation ratios for pathwidth into new approximation ratios for cutwidth, namely O(sqrt(log(opt)) log(h)) and O(sqrt(log(opt)) opt) for (multi) graphs with h edges.
△ Less
Submitted 25 April, 2024; v1 submitted 28 February, 2019;
originally announced February 2019.
-
Repetition avoidance in products of factors
Authors:
Pamela Fleischmann,
Pascal Ochem,
Kamellia Reshadi
Abstract:
We consider a variation on a classical avoidance problem from combinatorics on words that has been introduced by Mousavi and Shallit at DLT 2013. Let $\texttt{pexp}_i(w)$ be the supremum of the exponent over the products of $i$ factors of the word $w$. The repetition threshold $\texttt{RT}_i(k)$ is then the infimum of $\texttt{pexp}_i(w)$ over all words $w\inΣ^ω_k$. Mousavi and Shallit obtained th…
▽ More
We consider a variation on a classical avoidance problem from combinatorics on words that has been introduced by Mousavi and Shallit at DLT 2013. Let $\texttt{pexp}_i(w)$ be the supremum of the exponent over the products of $i$ factors of the word $w$. The repetition threshold $\texttt{RT}_i(k)$ is then the infimum of $\texttt{pexp}_i(w)$ over all words $w\inΣ^ω_k$. Mousavi and Shallit obtained that $\texttt{RT}_i(2)=2i$ and $\texttt{RT}_2(3)=\tfrac{13}4$. We show that $\texttt{RT}_i(3)=\tfrac{3i}2+\tfrac14$ if $i$ is even and $\texttt{RT}_i(3)=\tfrac{3i}2+\tfrac16$ if $i$ is odd and $i\ge3$.
△ Less
Submitted 27 April, 2019; v1 submitted 5 September, 2018;
originally announced September 2018.