Classifying Words with 3-sort Automata
Authors:
Tomasz Jastrząb,
Frédéric Lardeux,
Eric Monfroy
Abstract:
Grammatical inference consists in learning a language or a grammar from data. In this paper, we consider a number of models for inferring a non-deterministic finite automaton (NFA) with 3 sorts of states, that must accept some words, and reject some other words from a given sample. We then propose a transformation from this 3-sort NFA into weighted-frequency and probabilistic NFA, and we apply the…
▽ More
Grammatical inference consists in learning a language or a grammar from data. In this paper, we consider a number of models for inferring a non-deterministic finite automaton (NFA) with 3 sorts of states, that must accept some words, and reject some other words from a given sample. We then propose a transformation from this 3-sort NFA into weighted-frequency and probabilistic NFA, and we apply the latter to a classification task. The experimental evaluation of our approach shows that the probabilistic NFAs can be successfully applied for classification tasks on both real-life and superficial benchmark data sets.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
Taking advantage of a very simple property to efficiently infer NFAs
Authors:
Tomasz Jastrzab,
Frédéric Lardeux,
Eric Monfroy
Abstract:
Grammatical inference consists in learning a formal grammar as a finite state machine or as a set of rewrite rules. In this paper, we are concerned with inferring Nondeterministic Finite Automata (NFA) that must accept some words, and reject some other words from a given sample. This problem can naturally be modeled in SAT. The standard model being enormous, some models based on prefixes, suffixe…
▽ More
Grammatical inference consists in learning a formal grammar as a finite state machine or as a set of rewrite rules. In this paper, we are concerned with inferring Nondeterministic Finite Automata (NFA) that must accept some words, and reject some other words from a given sample. This problem can naturally be modeled in SAT. The standard model being enormous, some models based on prefixes, suffixes, and hybrids were designed to generate smaller SAT instances. There is a very simple and obvious property that says: if there is an NFA of size k for a given sample, there is also an NFA of size k+1. We first strengthen this property by adding some characteristics to the NFA of size k+1. Hence, we can use this property to tighten the bounds of the size of the minimal NFA for a given sample. We then propose simplified and refined models for NFA of size k+1 that are smaller than the initial models for NFA of size k. We also propose a reduction algorithm to build an NFA of size k from a specific NFA of size k+1. Finally, we validate our proposition with some experimentation that shows the efficiency of our approach.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.