-
Improving Skip-Gram based Graph Embeddings via Centrality-Weighted Sampling
Authors:
Pedro Almagro-Blanco,
Fernando Sancho-Caparrini
Abstract:
Network embedding techniques inspired by word2vec represent an effective unsupervised relational learning model. Commonly, by means of a Skip-Gram procedure, these techniques learn low dimensional vector representations of the nodes in a graph by sampling node-context examples. Although many ways of sampling the context of a node have been proposed, the effects of the way a node is chosen have not…
▽ More
Network embedding techniques inspired by word2vec represent an effective unsupervised relational learning model. Commonly, by means of a Skip-Gram procedure, these techniques learn low dimensional vector representations of the nodes in a graph by sampling node-context examples. Although many ways of sampling the context of a node have been proposed, the effects of the way a node is chosen have not been analyzed in depth. To fill this gap, we have re-implemented the main four word2vec inspired graph embedding techniques under the same framework and analyzed how different sampling distributions affects embeddings performance when tested in node classification problems. We present a set of experiments on different well known real data sets that show how the use of popular centrality distributions in sampling leads to improvements, obtaining speeds of up to 2 times in learning times and increasing accuracy in all cases.
△ Less
Submitted 20 July, 2019;
originally announced July 2019.
-
Semantic Preserving Embeddings for Generalized Graphs
Authors:
Pedro Almagro-Blanco,
Fernando Sancho-Caparrini
Abstract:
A new approach to the study of Generalized Graphs as semantic data structures using machine learning techniques is presented. We show how vector representations maintaining semantic characteristics of the original data can be obtained from a given graph using neural encoding architectures and considering the topological properties of the graph. Semantic features of these new representations are te…
▽ More
A new approach to the study of Generalized Graphs as semantic data structures using machine learning techniques is presented. We show how vector representations maintaining semantic characteristics of the original data can be obtained from a given graph using neural encoding architectures and considering the topological properties of the graph. Semantic features of these new representations are tested by using some machine learning tasks and new directions on efficient link discovery, entitity retrieval and long distance query methodologies on large relational datasets are investigated using real datasets.
----
En este trabajo se presenta un nuevo enfoque en el contexto del aprendizaje automático multi-relacional para el estudio de Grafos Generalizados. Se muestra cómo se pueden obtener representaciones vectoriales que mantienen características semánticas del grafo original utilizando codificadores neuronales y considerando las propiedades topológicas del grafo. Además, se evalúan las características semánticas capturadas por estas nuevas representaciones y se investigan nuevas metodologías eficientes relacionadas con Link Discovery, Entity Retrieval y consultas a larga distancia en grandes conjuntos de datos relacionales haciendo uso de bases de datos reales.
△ Less
Submitted 7 September, 2017;
originally announced September 2017.
-
Induction of Decision Trees based on Generalized Graph Queries
Authors:
Pedro Almagro-Blanco,
Fernando Sancho-Caparrini
Abstract:
Usually, decision tree induction algorithms are limited to work with non relational data. Given a record, they do not take into account other objects attributes even though they can provide valuable information for the learning task. In this paper we present GGQ-ID3, a multi-relational decision tree learning algorithm that uses Generalized Graph Queries (GGQ) as predicates in the decision nodes. G…
▽ More
Usually, decision tree induction algorithms are limited to work with non relational data. Given a record, they do not take into account other objects attributes even though they can provide valuable information for the learning task. In this paper we present GGQ-ID3, a multi-relational decision tree learning algorithm that uses Generalized Graph Queries (GGQ) as predicates in the decision nodes. GGQs allow to express complex patterns (including cycles) and they can be refined step-by-step. Also, they can evaluate structures (not only single records) and perform Regular Pattern Matching. GGQ are built dynamically (pattern mining) during the GGQ-ID3 tree construction process. We will show how to use GGQ-ID3 to perform multi-relational machine learning keeping complexity under control. Finally, some real examples of automatically obtained classification trees and semantic patterns are shown.
-----
Normalmente, los algoritmos de inducción de árboles de decisión trabajan con datos no relacionales. Dado un registro, no tienen en cuenta los atributos de otros objetos a pesar de que éstos pueden proporcionar información útil para la tarea de aprendizaje. En este artículo presentamos GGQ-ID3, un algoritmo de aprendizaje de árboles de decisiones multi-relacional que utiliza Generalized Graph Queries (GGQ) como predicados en los nodos de decisión. Los GGQs permiten expresar patrones complejos (incluyendo ciclos) y pueden ser refinados paso a paso. Además, pueden evaluar estructuras (no solo registros) y llevar a cabo Regular Pattern Matching. En GGQ-ID3, los GGQ son construidos dinámicamente (pattern mining) durante el proceso de construcción del árbol. Además, se muestran algunos ejemplos reales de árboles de clasificación multi-relacionales y patrones semánticos obtenidos automáticamente.
△ Less
Submitted 18 August, 2017;
originally announced August 2017.
-
Generalized Graph Pattern Matching
Authors:
Pedro Almagro-Blanco,
Fernando Sancho-Caparrini
Abstract:
Most of the machine learning algorithms are limited to learn from flat data: a recordset with prefixed structure. When learning from a record, these types of algorithms don't take into account other objects even though they are directly connected to it and can provide valuable information for the learning task. In this paper we present the concept of Generalized Graph Query, a query tool over grap…
▽ More
Most of the machine learning algorithms are limited to learn from flat data: a recordset with prefixed structure. When learning from a record, these types of algorithms don't take into account other objects even though they are directly connected to it and can provide valuable information for the learning task. In this paper we present the concept of Generalized Graph Query, a query tool over graphs or multi-relational data structures. They are built using the same graph structure as generalized graphs and allow to express powerful relational and non-relational restrictions on this type of data. Also, this paper shows mechanisms to build this kind of queries dynamically and how they can be used to perform bottom-up discovery processes through machine laerning techniques.
-----
La mayoría de los algoritmos que aprenden a partir de datos están limitados ya que sólo son capaces de aprender a partir de datos estructurados en forma de tabla en la que cada fila representa un registro y cada columna una propiedad asociada. Estos algoritmos, no tienen en cuenta los atributos de las estructuras con las que un registro dado puede estar relacionado, a pesar de que éstos pueden aportar información útil a la hora de llevar a cabo la tarea de aprendizaje. En este artículo presentamos el concepto de Generalized Graph Query, una herramienta de consulta de patrones en grafos generalizados. Dicha herramienta ha sido construida utilizando la estructura de Grafo Generalizado y permite expresar restricciones relacionales y no relacionales sobre este tipo de estructuras. Además, en este artículo se presentan mecanismos para la construcción automática de este tipo de consultas y se muestra cómo éstas pueden ser utilizadas en procesos de descubrimiento tipo bottom-up a través de técnicas relacionadas con el Aprendizaje Automático.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.