-
Characterization of Isometric Words based on Swap and Mismatch Distance
Authors:
M. Anselmo,
G. Castiglione,
M. Flores,
D. Giammarresi,
M. Madonia,
S. Mantaci
Abstract:
In this paper we consider an edit distance with swap and mismatch operations, called tilde-distance, and introduce the corresponding definition of tilde-isometric word. Isometric words are classically defined with respect to Hamming distance and combine the notion of edit distance with the property that a word does not appear as factor in other words. A word f is said tilde-isometric if, for any p…
▽ More
In this paper we consider an edit distance with swap and mismatch operations, called tilde-distance, and introduce the corresponding definition of tilde-isometric word. Isometric words are classically defined with respect to Hamming distance and combine the notion of edit distance with the property that a word does not appear as factor in other words. A word f is said tilde-isometric if, for any pair of f-free words u and v, there exists a transformation from u to v via the related edit operations such that all the intermediate words are also f -free. This new setting is here studied giving a full characterization of the tilde-isometric words in terms of overlaps with errors.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Time-Constrained Continuous Subgraph Matching Using Temporal Information for Filtering and Backtracking
Authors:
Seunghwan Min,
Jihoon Jang,
Kunsoo Park,
Dora Giammarresi,
Giuseppe F. Italiano,
Wook-Shin Han
Abstract:
Real-time analysis of graphs containing temporal information, such as social media streams, Q&A networks, and cyber data sources, plays an important role in various applications. Among them, detecting patterns is one of the fundamental graph analysis problems. In this paper, we study time-constrained continuous subgraph matching, which detects a pattern with a strict partial order on the edge set…
▽ More
Real-time analysis of graphs containing temporal information, such as social media streams, Q&A networks, and cyber data sources, plays an important role in various applications. Among them, detecting patterns is one of the fundamental graph analysis problems. In this paper, we study time-constrained continuous subgraph matching, which detects a pattern with a strict partial order on the edge set in real-time whenever a temporal data graph changes over time. We propose a new algorithm based on two novel techniques. First, we introduce a filtering technique called time-constrained matchable edge that uses temporal information for filtering with polynomial space. Second, we develop time-constrained pruning techniques that reduce the search space by pruning some of the parallel edges in backtracking, utilizing temporal information. Extensive experiments on real and synthetic datasets show that our approach outperforms the state-of-the-art algorithm by up to two orders of magnitude in terms of query processing time.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Hypercubes and Isometric Words based on Swap and Mismatch Distance
Authors:
Marcella Anselmo,
Giuseppa Castiglione,
Manuela Flores,
Dora Giammarresi,
Maria Madonia,
Sabrina Mantaci
Abstract:
The hypercube of dimension n is the graph whose vertices are the 2^n binary words of length n, and there is an edge between two of them if they have Hamming distance 1. We consider an edit distance based on swaps and mismatches, to which we refer as tilde-distance, and define the tilde-hypercube with edges linking words at tilde-distance 1. Then, we introduce and study some isometric subgraphs of…
▽ More
The hypercube of dimension n is the graph whose vertices are the 2^n binary words of length n, and there is an edge between two of them if they have Hamming distance 1. We consider an edit distance based on swaps and mismatches, to which we refer as tilde-distance, and define the tilde-hypercube with edges linking words at tilde-distance 1. Then, we introduce and study some isometric subgraphs of the tilde-hypercube obtained by using special words called tilde-isometric words. The subgraphs keep only the vertices that avoid a given tilde-isometric word as a factor. In the case of word 11, the subgraph is called tilde-Fibonacci cube, as a generalization of the classical Fibonacci cube. The tilde-hypercube and the tilde-Fibonacci cube can be recursively defined; the same holds for the number of their edges. This allows an asymptotic estimation of the number of edges in the tilde-Fibonacci cube, in comparison to the total number in the tilde-hypercube.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Isometric Words based on Swap and Mismatch Distance
Authors:
Marcella Anselmo,
Giuseppa Castiglione,
Manuela Flores,
Dora Giammarresi,
Maria Madonia,
Sabrina Mantaci
Abstract:
An edit distance is a metric between words that quantifies how two words differ by counting the number of edit operations needed to transform one word into the other one. A word f is said isometric with respect to an edit distance if, for any pair of f-free words u and v, there exists a transformation of minimal length from u to v via the related edit operations such that all the intermediate word…
▽ More
An edit distance is a metric between words that quantifies how two words differ by counting the number of edit operations needed to transform one word into the other one. A word f is said isometric with respect to an edit distance if, for any pair of f-free words u and v, there exists a transformation of minimal length from u to v via the related edit operations such that all the intermediate words are also f-free. The adjective 'isometric' comes from the fact that, if the Hamming distance is considered (i.e., only mismatches), then isometric words are connected with definitions of isometric subgraphs of hypercubes. We consider the case of edit distance with swap and mismatch. We compare it with the case of mismatch only and prove some properties of isometric words that are related to particular features of their overlaps.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Symmetric Continuous Subgraph Matching with Bidirectional Dynamic Programming
Authors:
Seunghwan Min,
Sung Gwan Park,
Kunsoo Park,
Dora Giammarresi,
Giuseppe F. Italiano,
Wook-Shin Han
Abstract:
In many real datasets such as social media streams and cyber data sources, graphs change over time through a graph update stream of edge insertions and deletions. Detecting critical patterns in such dynamic graphs plays an important role in various application domains such as fraud detection, cyber security, and recommendation systems for social networks. Given a dynamic data graph and a query gra…
▽ More
In many real datasets such as social media streams and cyber data sources, graphs change over time through a graph update stream of edge insertions and deletions. Detecting critical patterns in such dynamic graphs plays an important role in various application domains such as fraud detection, cyber security, and recommendation systems for social networks. Given a dynamic data graph and a query graph, the continuous subgraph matching problem is to find all positive matches for each edge insertion and all negative matches for each edge deletion. The state-of-the-art algorithm TurboFlux uses a spanning tree of a query graph for filtering. However, using the spanning tree may have a low pruning power because it does not take into account all edges of the query graph. In this paper, we present a symmetric and much faster algorithm SymBi which maintains an auxiliary data structure based on a directed acyclic graph instead of a spanning tree, which maintains the intermediate results of bidirectional dynamic programming between the query graph and the dynamic graph. Extensive experiments with real and synthetic datasets show that SymBi outperforms the state-of-the-art algorithm by up to three orders of magnitude in terms of the elapsed time.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Construction of Non-expandable Non-overlapping Sets of Pictures
Authors:
Marcella Anselmo,
Dora Giammarresi,
Maria Madonia
Abstract:
The non-overlapping sets of pictures are sets such that no two pictures in the set (properly) overlap. They are the generalization to two dimensions of the cross-bifix-free sets of strings. Non-overlapping sets of pictures are non-expandable when no other picture can be added without violating the property. We present a construction of non-expandable non-overlapping (NENO) sets of pictures and sho…
▽ More
The non-overlapping sets of pictures are sets such that no two pictures in the set (properly) overlap. They are the generalization to two dimensions of the cross-bifix-free sets of strings. Non-overlapping sets of pictures are non-expandable when no other picture can be added without violating the property. We present a construction of non-expandable non-overlapping (NENO) sets of pictures and show some examples of application.
△ Less
Submitted 29 May, 2016;
originally announced May 2016.
-
Tiling-Recognizable Two-Dimensional Languages: From Non-Determinism to Determinism through Unambiguity
Authors:
Dora Giammarresi
Abstract:
Tiling recognizable two-dimensional languages, also known as REC, generalize recognizable string languages to two dimensions and share with them several theoretical properties. Nevertheless REC is not closed under complementation and the membership problem is NP-complete. This implies that this family REC is intrinsically non-deterministic. The natural and immediate definition of unambiguity corre…
▽ More
Tiling recognizable two-dimensional languages, also known as REC, generalize recognizable string languages to two dimensions and share with them several theoretical properties. Nevertheless REC is not closed under complementation and the membership problem is NP-complete. This implies that this family REC is intrinsically non-deterministic. The natural and immediate definition of unambiguity corresponds to a family UREC of languages that is strictly contained in REC. On the other hand this definition of unambiguity leads to an undecidability result and therefore it cannot correspond to any deterministic notion. We introduce the notion of line-unambiguous tiling recognizable languages and prove that it corresponds or somehow naturally introduces different notions of determin- ism that define a hierarchy inside REC.
△ Less
Submitted 3 December, 2010;
originally announced December 2010.