-
5' -> 3' Watson-Crick Automata accepting Necklaces
Authors:
Benedek Nagy
Abstract:
Watson-Crick (WK) finite automata work on a Watson-Crick tape representing a DNA molecule. They have two reading heads. In 5'->3' WK automata, the heads move and read the input in opposite physical directions. In this paper, we consider such inputs which are necklaces, i.e., they represent circular DNA molecules. In sensing 5'->3' WK automata, the computation on the input is finished when the h…
▽ More
Watson-Crick (WK) finite automata work on a Watson-Crick tape representing a DNA molecule. They have two reading heads. In 5'->3' WK automata, the heads move and read the input in opposite physical directions. In this paper, we consider such inputs which are necklaces, i.e., they represent circular DNA molecules. In sensing 5'->3' WK automata, the computation on the input is finished when the heads meet. As the original model is capable of accepting the linear context-free languages, the necklace languages we are investigating here have strong relations to that class. Here, we use these automata in two different acceptance modes. On the one hand, in weak acceptance mode the heads are starting nondeterministically at any point of the input, like the necklace is cut at a nondeterministically chosen point), and if the input is accepted, it is in the accepted necklace language. These languages can be seen as the languages obtained from the linear context-free languages by taking their closure under cyclic shift operation. On the other hand, in strong acceptance mode, it is required that the input is accepted starting the heads in the computation from every point of the cycle. These languages can be seen as the maximal cyclic shift closed languages included in a linear language. On the other hand, as it will be shown, they have a kind of locally testable property. We present some hierarchy results based on restricted variants of the WK automata, such as stateless or all-final variants.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Metronome: tracing variation in poetic meters via local sequence alignment
Authors:
Ben Nagy,
Artjoms Šeļa,
Mirella De Sisto,
Petr Plecháč
Abstract:
All poetic forms come from somewhere. Prosodic templates can be copied for generations, altered by individuals, imported from foreign traditions, or fundamentally changed under the pressures of language evolution. Yet these relationships are notoriously difficult to trace across languages and times. This paper introduces an unsupervised method for detecting structural similarities in poems using l…
▽ More
All poetic forms come from somewhere. Prosodic templates can be copied for generations, altered by individuals, imported from foreign traditions, or fundamentally changed under the pressures of language evolution. Yet these relationships are notoriously difficult to trace across languages and times. This paper introduces an unsupervised method for detecting structural similarities in poems using local sequence alignment. The method relies on encoding poetic texts as strings of prosodic features using a four-letter alphabet; these sequences are then aligned to derive a distance measure based on weighted symbol (mis)matches. Local alignment allows poems to be clustered according to emergent properties of their underlying prosodic patterns. We evaluate method performance on a meter recognition tasks against strong baselines and show its potential for cross-lingual and historical research using three short case studies: 1) mutations in quantitative meter in classical Latin, 2) European diffusion of the Renaissance hendecasyllable, and 3) comparative alignment of modern meters in 18--19th century Czech, German and Russian. We release an implementation of the algorithm as a Python package with an open license.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
(Not) Understanding Latin Poetic Style with Deep Learning
Authors:
Ben Nagy
Abstract:
This article summarizes some mostly unsuccessful attempts to understand authorial style by examining the attention of various neural networks (LSTMs and CNNs) trained on a corpus of classical Latin verse that has been encoded to include sonic and metrical features. Carefully configured neural networks are shown to be extremely strong authorship classifiers, so it is hoped that they might therefore…
▽ More
This article summarizes some mostly unsuccessful attempts to understand authorial style by examining the attention of various neural networks (LSTMs and CNNs) trained on a corpus of classical Latin verse that has been encoded to include sonic and metrical features. Carefully configured neural networks are shown to be extremely strong authorship classifiers, so it is hoped that they might therefore teach `traditional' readers something about how the authors differ in style. Sadly their reasoning is, so far, inscrutable. While the overall goal has not yet been reached, this work reports some useful findings in terms of effective ways to encode and embed verse, the relative strengths and weaknesses of the neural network families, and useful (and not so useful) techniques for designing and inspecting NN models in this domain. This article suggests that, for poetry, CNNs are better choices than LSTMs -- they train more quickly, have equivalent accuracy, and (potentially) offer better interpretability. Based on a great deal of experimentation, it also suggests that simple, trainable embeddings are more effective than domain-specific schemes, and stresses the importance of techniques to reduce overfitting, like dropout and batch normalization.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
On Languages Generated by Signed Grammars
Authors:
Ömer Eğecioğlu,
Benedek Nagy
Abstract:
We consider languages defined by signed grammars which are similar to context-free grammars except productions with signs associated to them are allowed. As a consequence, the words generated also have signs. We use the structure of the formal series of yields of all derivation trees over such a grammar as a method of specifying a formal language and study properties of the resulting family of lan…
▽ More
We consider languages defined by signed grammars which are similar to context-free grammars except productions with signs associated to them are allowed. As a consequence, the words generated also have signs. We use the structure of the formal series of yields of all derivation trees over such a grammar as a method of specifying a formal language and study properties of the resulting family of languages.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Proceedings of the 13th International Workshop on Non-Classical Models of Automata and Applications
Authors:
Benedek Nagy,
Rudolf Freund
Abstract:
The Thirteenth International Workshop on Non-Classical Models of Automata and Applications (NCMA 2023) was held in Famagusta, North Cyprus, on September 18 and 19, 2023, organized by the Eastern Mediterranean University. The NCMA workshop series was established in 2009 as an annual event for researchers working on non-classical and classical models of automata, grammars or related devices. Such mo…
▽ More
The Thirteenth International Workshop on Non-Classical Models of Automata and Applications (NCMA 2023) was held in Famagusta, North Cyprus, on September 18 and 19, 2023, organized by the Eastern Mediterranean University. The NCMA workshop series was established in 2009 as an annual event for researchers working on non-classical and classical models of automata, grammars or related devices. Such models are investigated both as theoretical models and as formal models for applications from various points of view.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
State-deterministic Finite Automata with Translucent Letters and Finite Automata with Nondeterministically Translucent Letters
Authors:
Benedek Nagy
Abstract:
Deterministic and nondeterministic finite automata with translucent letters were introduced by Nagy and Otto more than a decade ago as Cooperative Distributed systems of a kind of stateless restarting automata with window size one. These finite state machines have a surprisingly large expressive power: all commutative semi-linear languages and all rational trace languages can be accepted by them i…
▽ More
Deterministic and nondeterministic finite automata with translucent letters were introduced by Nagy and Otto more than a decade ago as Cooperative Distributed systems of a kind of stateless restarting automata with window size one. These finite state machines have a surprisingly large expressive power: all commutative semi-linear languages and all rational trace languages can be accepted by them including various not context-free languages. While the nondeterministic variant defines a language class with nice closure properties, the deterministic variant is weaker, however it contains all regular languages, some non-regular context-free languages, as the Dyck language, and also some languages that are not even context-free. In all those models for each state, the letters of the alphabet could be in one of the following categories: the automaton cannot see the letter (it is translucent), there is a transition defined on the letter (maybe more than one transitions in nondeterministic case) or none of the above categories (the automaton gets stuck by seeing this letter at the given state and this computation is not accepting).
State-deterministic automata are recent models, where the next state of the computation determined by the structure of the automata and it is independent of the processed letters. In this paper our aim is twofold, on the one hand, we investigate state-deterministic finite automata with translucent letters. These automata are specially restricted deterministic finite automata with translucent letters.
In the other novel model we present, it is allowed that for a state the set of translucent letters and the set of letters for which transition is defined are not disjoint. One can interpret this fact that the automaton has a nondeterministic choice for each occurrence of such letters to see them (and then erase and make the transition) or not to see that occurrence at that time. Based on these semi-translucent letters, the expressive power of the automata increases, i.e., in this way a proper generalization of the previous models is obtained.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
From stage to page: language independent bootstrap measures of distinctiveness in fictional speech
Authors:
Artjoms Šeļa,
Ben Nagy,
Joanna Byszuk,
Laura Hernández-Lorenzo,
Botond Szemes,
Maciej Eder
Abstract:
Stylometry is mostly applied to authorial style. Recently, researchers have begun investigating the style of characters, finding that the variation remains within authorial bounds. We address the stylistic distinctiveness of characters in drama. Our primary contribution is methodological; we introduce and evaluate two non-parametric methods to produce a summary statistic for character distinctiven…
▽ More
Stylometry is mostly applied to authorial style. Recently, researchers have begun investigating the style of characters, finding that the variation remains within authorial bounds. We address the stylistic distinctiveness of characters in drama. Our primary contribution is methodological; we introduce and evaluate two non-parametric methods to produce a summary statistic for character distinctiveness that can be usefully applied and compared across languages and times. Our first method is based on bootstrap distances between 3-gram probability distributions, the second (reminiscent of 'unmasking' techniques) on word keyness curves. Both methods are validated and explored by applying them to a reasonably large corpus (a subset of DraCor): we analyse 3301 characters drawn from 2324 works, covering five centuries and four languages (French, German, Russian, and the works of Shakespeare). Both methods appear useful; the 3-gram method is statistically more powerful but the word keyness method offers rich interpretability. Both methods are able to capture phonological differences such as accent or dialect, as well as broad differences in topic and lexical richness. Based on exploratory analysis, we find that smaller characters tend to be more distinctive, and that women are cross-linguistically more distinctive than men, with this latter finding carefully interrogated using multiple regression. This greater distinctiveness stems from a historical tendency for female characters to be restricted to an 'internal narrative domain' covering mainly direct discourse and family/romantic themes. It is hoped that direct, comparable statistical measures will form a basis for more sophisticated future studies, and advances in theory.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Comparing Spectroscopy Measurements in the Prediction of in Vitro Dissolution Profile using Artificial Neural Networks
Authors:
Mohamed Azouz Mrad,
Kristóf Csorba,
Dorián László Galata,
Zsombor Kristóf Nagy,
Brigitta Nagy
Abstract:
Dissolution testing is part of the target product quality that is essential in approving new products in the pharmaceutical industry. The prediction of the dissolution profile based on spectroscopic data is an alternative to the current destructive and time-consuming method. Raman and near-infrared (NIR) spectroscopies are two fast and complementary methods that provide information on the tablets'…
▽ More
Dissolution testing is part of the target product quality that is essential in approving new products in the pharmaceutical industry. The prediction of the dissolution profile based on spectroscopic data is an alternative to the current destructive and time-consuming method. Raman and near-infrared (NIR) spectroscopies are two fast and complementary methods that provide information on the tablets' physical and chemical properties and can help predict their dissolution profiles. This work aims to compare the information collected by these spectroscopy methods to support the decision of which measurements should be used so that the accuracy requirement of the industry is met. Artificial neural network models were created, in which the spectroscopy data and the measured compression curves were used as an input individually and in different combinations in order to estimate the dissolution profiles. Results showed that using only the NIR transmission method along with the compression force data or the Raman and NIR reflection methods, the dissolution profile was estimated within the acceptance limits of the f2 similarity factor. Adding further spectroscopy measurements increased the prediction accuracy.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Quasi-deterministic 5' -> 3' Watson-Crick Automata
Authors:
Benedek Nagy
Abstract:
Watson-Crick (WK) finite automata are working on a Watson-Crick tape, that is, on a DNA molecule. A double stranded DNA molecule contains two strands, each having a 5' and a 3' end, and these two strands together form the molecule with the following properties. The strands have the same length, their 5' to 3' directions are opposite, and in each position, the two strands have nucleotides that are…
▽ More
Watson-Crick (WK) finite automata are working on a Watson-Crick tape, that is, on a DNA molecule. A double stranded DNA molecule contains two strands, each having a 5' and a 3' end, and these two strands together form the molecule with the following properties. The strands have the same length, their 5' to 3' directions are opposite, and in each position, the two strands have nucleotides that are complement of each other (by the Watson-Crick complementary relation). Consequently, WK automata have two reading heads, one for each strand. In traditional WK automata both heads read the whole input in the same physical direction, but in 5'->3' WK automata the heads start from the two extremes and read the input in opposite direction. In sensing 5'->3' WK automata, the process on the input is finished when the heads meet, and the model is capable to accept the class of linear context-free languages. Deterministic variants are weaker, the class named 2detLIN, a proper subclass of linear languages is accepted by them. Recently, another specific variants, the state-deterministic sensing 5'->3' WK automata are investigated in which the graph of the automaton has the special property that for each node of the graph, all out edges (if any) go to a sole node, i.e., for each state there is (at most) one state that can be reached by a direct transition. It was shown that this concept is somewhat orthogonal to the usual concept of determinism in case of sensing 5'->3' WK automata. In this paper a new concept, the quasi-determinism is investigated, that is in each configuration of a computation (if it is not finished yet), the next state is uniquely determined although the next configuration may not be, in case various transitions are enabled at the same time. We show that this new concept is a common generalisation of the usual determinism and the state-determinism, i.e., the class of quasi-deterministic sensing 5'->3' WK automata is a superclass of both of the mentioned other classes. There are various usual restrictions on WK automata, e.g., stateless or 1-limited variants. We also prove some hierarchy results among language classes accepted by various subclasses of quasi-deterministic sensing 5'->3' WK automata and also some other already known language classes.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Some Stylometric Remarks on Ovid's Heroides and the Epistula Sapphus
Authors:
Ben Nagy
Abstract:
This article aims to contribute to two well-worn areas of debate in classical Latin philology, relating to Ovid's Heroides. The first is the question of the authenticity (and, to a lesser extent the correct position) of the letter placed fifteenth by almost every editor -- the so-called Epistula Sapphus (henceforth ES). The secondary question, although perhaps now less fervently debated, is the au…
▽ More
This article aims to contribute to two well-worn areas of debate in classical Latin philology, relating to Ovid's Heroides. The first is the question of the authenticity (and, to a lesser extent the correct position) of the letter placed fifteenth by almost every editor -- the so-called Epistula Sapphus (henceforth ES). The secondary question, although perhaps now less fervently debated, is the authenticity of the 'Double Heroides', placed by those who accept them as letters 16-21. I employ a variety of methods drawn from the domain of computational stylometry to consider the poetics and the lexico-grammatical features of these elegiac poems in the broader context of a corpus of 'shorter' (from 20 to 546 lines) elegiac works from five authors (266 poems in all) comprising more or less all of the non-fragmentary classical corpus. Based on a variety of techniques, every measure gives clear indication that the poetic style of the Heroides is Ovidian, but distinctive; they can be accurately isolated from Ovid more broadly. The Single and Double Heroides split into two clear groups, with the ES grouped consistently with the single letters. Furthermore, by comparing the style of the letters with the 'early' (although there are complications in this label) works of the Amores and the late works of the Ex Ponto, the evidence supports sequential composition -- meaning that the ES is correctly placed -- and, further, supports the growing consensus that the double letters were composed significantly later, in exile.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO
Authors:
Balazs Nagy,
Philipp Foehn,
Davide Scaramuzza
Abstract:
The recent introduction of powerful embedded graphics processing units (GPUs) has allowed for unforeseen improvements in real-time computer vision applications. It has enabled algorithms to run onboard, well above the standard video rates, yielding not only higher information processing capability, but also reduced latency. This work focuses on the applicability of efficient low-level, GPU hardwar…
▽ More
The recent introduction of powerful embedded graphics processing units (GPUs) has allowed for unforeseen improvements in real-time computer vision applications. It has enabled algorithms to run onboard, well above the standard video rates, yielding not only higher information processing capability, but also reduced latency. This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms in the field of visual-inertial odometry (VIO). While most steps of a VIO pipeline work on visual features, they rely on image data for detection and tracking, of which both steps are well suited for parallelization. Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency. Our work first revisits the problem of non-maxima suppression for feature detection specifically on GPUs, and proposes a solution that selects local response maxima, imposes spatial feature distribution, and extracts features simultaneously. Our second contribution introduces an enhanced FAST feature detector that applies the aforementioned non-maxima suppression method. Finally, we compare our method to other state-of-the-art CPU and GPU implementations, where we always outperform all of them in feature tracking and detection, resulting in over 1000fps throughput on an embedded Jetson TX2 platform. Additionally, we demonstrate our work integrated in a VIO pipeline achieving a metric state estimation at ~200fps.
△ Less
Submitted 3 August, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Metre as a stylometric feature in Latin hexameter poetry
Authors:
Benjamin Nagy
Abstract:
This paper demonstrates that metre is a privileged indicator of authorial style in classical Latin hexameter poetry. Using only metrical features, pairwise classification experiments are performed between 5 first-century authors (10 comparisons) using four different machine-learning models. The results showed a two-label classification accuracy of at least 95% with samples as small as ten lines an…
▽ More
This paper demonstrates that metre is a privileged indicator of authorial style in classical Latin hexameter poetry. Using only metrical features, pairwise classification experiments are performed between 5 first-century authors (10 comparisons) using four different machine-learning models. The results showed a two-label classification accuracy of at least 95% with samples as small as ten lines and no greater than eighty lines (up to around 500 words). These sample sizes are an order of magnitude smaller than those typically recommended for BOW ('bag of words') or n-gram approaches, and the reported accuracy is outstanding. Additionally, this paper explores the potential for novelty (forgery) detection, or 'one-class classification'. An analysis of the disputed Aldine Additamentum (Sil. Ital. Puni. 8:144-225) concludes (p=0.0013) that the metrical style differs significantly from that of the rest of the poem.
△ Less
Submitted 1 December, 2019; v1 submitted 27 November, 2019;
originally announced November 2019.
-
Counting of Shortest Paths in Cubic Grid
Authors:
Mousumi Dutt,
Arindam Biswas,
Benedek Nagy
Abstract:
The enumeration of shortest paths in cubic grid is presented herein, which could have importance in image processing and also in the network sciences. The cubic grid considers three neighborhoods - namely, 6-, 18- and 26-neighborhood related to face connectivity, edge connectivity and vertex connectivity, respectively. The formulation for distance metrics is given. L1, D18, and L_$\infty$ are the…
▽ More
The enumeration of shortest paths in cubic grid is presented herein, which could have importance in image processing and also in the network sciences. The cubic grid considers three neighborhoods - namely, 6-, 18- and 26-neighborhood related to face connectivity, edge connectivity and vertex connectivity, respectively. The formulation for distance metrics is given. L1, D18, and L_$\infty$ are the three metrics for 6-neighborhood, 18-neighborhood and 26-neighborhood. The task is to count the number of minimal paths, based on given neighborhood relations, from any given point to any other, in the three-dimensional cubic grid. Based on the coordinate triplets describing the grid, the formulations for the three neighborhoods are presented in this work. The problem both of theoretical importance and has several practical aspects.
△ Less
Submitted 14 November, 2024; v1 submitted 12 March, 2018;
originally announced March 2018.
-
A New Sensing 5'-->3' Watson-Crick Automata Concept
Authors:
Benedek Nagy,
Shaghayegh Parchami,
Hamid Mir-Mohammad-Sadeghi
Abstract:
Watson-Crick (WK) finite automata are working on a Watson-Crick tape, that is, on a DNA molecule. Therefore, it has two reading heads. While in traditional WK automata both heads read the whole input in the same physical direction, in 5'->3' WK automata the heads start from the two extremes and read the input in opposite direction. In sensing 5'->3' WK automata the process on the input is finished…
▽ More
Watson-Crick (WK) finite automata are working on a Watson-Crick tape, that is, on a DNA molecule. Therefore, it has two reading heads. While in traditional WK automata both heads read the whole input in the same physical direction, in 5'->3' WK automata the heads start from the two extremes and read the input in opposite direction. In sensing 5'->3' WK automata the process on the input is finished when the heads meet. Since the heads of a WK automaton may read longer strings in a transition, in previous models a so-called sensing parameter took care for the proper meeting of the heads (not allowing to read the same positions of the input in the last step). In this paper, a new model is investigated, which works without the sensing parameter (it is done by an appropriate change of the concept of configuration). Consequently, the accepted language classes of the variants are also changed. Various hierarchy results are proven in the paper.
△ Less
Submitted 21 August, 2017;
originally announced August 2017.
-
Representations of Circular Words
Authors:
László Hegedüs,
Benedek Nagy
Abstract:
In this article we give two different ways of representations of circular words. Representations with tuples are intended as a compact notation, while representations with trees give a way to easily process all conjugates of a word. The latter form can also be used as a graphical representation of periodic properties of finite (in some cases, infinite) words. We also define iterative representatio…
▽ More
In this article we give two different ways of representations of circular words. Representations with tuples are intended as a compact notation, while representations with trees give a way to easily process all conjugates of a word. The latter form can also be used as a graphical representation of periodic properties of finite (in some cases, infinite) words. We also define iterative representations which can be seen as an encoding utilizing the flexible properties of circular words. Every word over the two letter alphabet can be constructed starting from ab by applying the fractional power and the cyclic shift operators one after the other, iteratively.
△ Less
Submitted 21 May, 2014;
originally announced May 2014.
-
Computing discrete logarithm by interval-valued paradigm
Authors:
Benedek Nagy,
Sándor Vályi
Abstract:
Interval-valued computing is a relatively new computing paradigm. It uses finitely many interval segments over the unit interval in a computation as data structure. The satisfiability of Quantified Boolean formulae and other hard problems, like integer factorization, can be solved in an effective way by its massive parallelism. The discrete logarithm problem plays an important role in practice, th…
▽ More
Interval-valued computing is a relatively new computing paradigm. It uses finitely many interval segments over the unit interval in a computation as data structure. The satisfiability of Quantified Boolean formulae and other hard problems, like integer factorization, can be solved in an effective way by its massive parallelism. The discrete logarithm problem plays an important role in practice, there are cryptographical methods based on its computational hardness. In this paper we show that the discrete logarithm problem is computable by an interval-valued computing in a polynomial number of steps (within this paradigm).
△ Less
Submitted 31 March, 2014;
originally announced April 2014.
-
Pumping lemmas for linear and nonlinear context-free languages
Authors:
Géza Horváth,
Benedek Nagy
Abstract:
Pumping lemmas are created to prove that given languages are not belong to certain language classes. There are several known pumping lemmas for the whole class and some special classes of the context-free languages. In this paper we prove new, interesting pumping lemmas for special linear and context-free language classes. Some of them can be used to pump regular languages in two place simultaneou…
▽ More
Pumping lemmas are created to prove that given languages are not belong to certain language classes. There are several known pumping lemmas for the whole class and some special classes of the context-free languages. In this paper we prove new, interesting pumping lemmas for special linear and context-free language classes. Some of them can be used to pump regular languages in two place simultaneously. Other lemma can be used to pump context-free languages in arbitrary many places.
△ Less
Submitted 30 November, 2010;
originally announced December 2010.
-
Approximating the Euclidean circle in the square grid using neighbourhood sequences
Authors:
Janos Farkas,
Szabolcs Bajak,
Benedek Nagy
Abstract:
Distance measuring is a very important task in digital geometry and digital image processing. Due to our natural approach to geometry we think of the set of points that are equally far from a given point as a Euclidean circle. Using the classical neighbourhood relations on digital grids, we get circles that greatly differ from the Euclidean circle. In this paper we examine different methods of app…
▽ More
Distance measuring is a very important task in digital geometry and digital image processing. Due to our natural approach to geometry we think of the set of points that are equally far from a given point as a Euclidean circle. Using the classical neighbourhood relations on digital grids, we get circles that greatly differ from the Euclidean circle. In this paper we examine different methods of approximating the Euclidean circle in the square grid, considering the possible motivations as well. We compare the perimeter-, area-, curve- and noncompactness-based approximations and examine their realization using neighbourhood sequences. We also provide a table which summarizes our results, and can be used when developing applications that support neighbourhood sequences.
△ Less
Submitted 17 June, 2010;
originally announced June 2010.