-
MultiGraphMatch: a subgraph matching algorithm for multigraphs
Authors:
Giovanni Micale,
Antonio Di Maria,
Roberto Grasso,
Vincenzo Bonnici,
Alfredo Ferro,
Dennis Shasha,
Rosalba Giugno,
Alfredo Pulvirenti
Abstract:
Subgraph matching is the problem of finding all the occurrences of a small graph, called the query, in a larger graph, called the target. Although the problem has been widely studied in simple graphs, few solutions have been proposed for multigraphs, in which two nodes can be connected by multiple edges, each denoting a possibly different type of relationship. In our new algorithm MultiGraphMatch,…
▽ More
Subgraph matching is the problem of finding all the occurrences of a small graph, called the query, in a larger graph, called the target. Although the problem has been widely studied in simple graphs, few solutions have been proposed for multigraphs, in which two nodes can be connected by multiple edges, each denoting a possibly different type of relationship. In our new algorithm MultiGraphMatch, nodes and edges can be associated with labels and multiple properties. MultiGraphMatch introduces a novel data structure called bit matrix to efficiently index both the query and the target and filter the set of target edges that are matchable with each query edge. In addition, the algorithm proposes a new technique for ordering the processing of query edges based on the cardinalities of the sets of matchable edges. Using the CYPHER query definition language, MultiGraphMatch can perform queries with logical conditions on node and edge labels. We compare MultiGraphMatch with SuMGra and graph database systems Memgraph and Neo4J, showing comparable or better performance in all queries on a wide variety of synthetic and real-world graphs.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Spectral concepts in genome informational analysis
Authors:
Vincenzo Bonnici,
Giuditta Franco,
Vincenzo Manca
Abstract:
The concept of k-spectrum for genomes is here investigated as a basic tool to analyze genomes. Related spectral notions based on k-mers are introduced with some related mathematical properties which are relevant for informational analysis of genomes. Procedures to generate spectral segmentations of genomes are provided and are tested (under several values of length k for k-mers) on cases of real g…
▽ More
The concept of k-spectrum for genomes is here investigated as a basic tool to analyze genomes. Related spectral notions based on k-mers are introduced with some related mathematical properties which are relevant for informational analysis of genomes. Procedures to generate spectral segmentations of genomes are provided and are tested (under several values of length k for k-mers) on cases of real genomes, such as some human chromosomes and Saccharomyces cerevisiae.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
A word recurrence based algorithm to extract genomic dictionaries
Authors:
Vincenzo Bonnici,
Giuditta Franco,
Vincenzo Manca
Abstract:
Genomes may be analyzed from an information viewpoint as very long strings, containing functional elements of variable length, which have been assembled by evolution. In this work an innovative information theory based algorithm is proposed, to extract significant (relatively small) dictionaries of genomic words. Namely, conceptual analyses are here combined with empirical studies, to open up a me…
▽ More
Genomes may be analyzed from an information viewpoint as very long strings, containing functional elements of variable length, which have been assembled by evolution. In this work an innovative information theory based algorithm is proposed, to extract significant (relatively small) dictionaries of genomic words. Namely, conceptual analyses are here combined with empirical studies, to open up a methodology for the extraction of variable length dictionaries from genomic sequences, based on the information content of some factors. Its application to human chromosomes highlights an original inter-chromosomal similarity in terms of factor distributions.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
Kullback-Leibler divergence between quantum distributions, and its upper-bound
Authors:
Vincenzo Bonnici
Abstract:
This work presents an upper-bound to value that the Kullback-Leibler (KL) divergence can reach for a class of probability distributions called quantum distributions (QD). The aim is to find a distribution $U$ which maximizes the KL divergence from a given distribution $P$ under the assumption that $P$ and $U$ have been generated by distributing a given discrete quantity, a quantum. Quantum distrib…
▽ More
This work presents an upper-bound to value that the Kullback-Leibler (KL) divergence can reach for a class of probability distributions called quantum distributions (QD). The aim is to find a distribution $U$ which maximizes the KL divergence from a given distribution $P$ under the assumption that $P$ and $U$ have been generated by distributing a given discrete quantity, a quantum. Quantum distributions naturally represent a wide range of probability distributions that are used in practical applications. Moreover, such a class of distributions can be obtained as an approximation of any probability distribution. The retrieving of an upper-bound for the entropic divergence is here shown to be possible under the condition that the compared distributions are quantum distributions over the same quantum value, thus they become comparable. Thus, entropic divergence acquires a more powerful meaning when it is applied to comparable distributions. This aspect should be taken into account in future developments of divergences. The theoretical findings are used for proposing a notion of normalized KL divergence that is empirically shown to behave differently from already known measures.
△ Less
Submitted 10 December, 2020; v1 submitted 13 August, 2020;
originally announced August 2020.
-
MultiRI: Fast Subgraph Matching in Labeled Multigraphs
Authors:
Giovanni Micale,
Vincenzo Bonnici,
Alfredo Ferro,
Dennis Shasha,
Rosalba Giugno,
Alfredo Pulvirenti
Abstract:
The Subgraph Matching (SM) problem consists of finding all the embeddings of a given small graph, called the query, into a large graph, called the target. The SM problem has been widely studied for simple graphs, i.e. graphs where there is exactly one edge between two nodes and nodes have single labels, but few approaches have been devised for labeled multigraphs, i.e. graphs having possibly multi…
▽ More
The Subgraph Matching (SM) problem consists of finding all the embeddings of a given small graph, called the query, into a large graph, called the target. The SM problem has been widely studied for simple graphs, i.e. graphs where there is exactly one edge between two nodes and nodes have single labels, but few approaches have been devised for labeled multigraphs, i.e. graphs having possibly multiple labels on nodes in which pair of nodes may have multiple labeled edges between them. Here we present MultiRI, a novel algorithm for the Sub-Multigraph Matching (SMM) problem, i.e. subgraph matching in labeled multigraphs. MultiRI improves on the state-of-the-art by computing compatibility domains and symmetry breaking conditions on query nodes to filter the search space of possible solutions. Empirically, we show that MultiRI outperforms the state-of-the-art method for the SMM problem in both synthetic and real graphs, with a multiplicative speedup between five and ten for large graphs, by using a limited amount of memory.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.