-
NeuSTIP: A Novel Neuro-Symbolic Model for Link and Time Prediction in Temporal Knowledge Graphs
Authors:
Ishaan Singh,
Navdeep Kaur,
Garima Gaur,
Mausam
Abstract:
While Knowledge Graph Completion (KGC) on static facts is a matured field, Temporal Knowledge Graph Completion (TKGC), that incorporates validity time into static facts is still in its nascent stage. The KGC methods fall into multiple categories including embedding-based, rule-based, GNN-based, pretrained Language Model based approaches. However, such dimensions have not been explored in TKG. To t…
▽ More
While Knowledge Graph Completion (KGC) on static facts is a matured field, Temporal Knowledge Graph Completion (TKGC), that incorporates validity time into static facts is still in its nascent stage. The KGC methods fall into multiple categories including embedding-based, rule-based, GNN-based, pretrained Language Model based approaches. However, such dimensions have not been explored in TKG. To that end, we propose a novel temporal neuro-symbolic model, NeuSTIP, that performs link prediction and time interval prediction in a TKG. NeuSTIP learns temporal rules in the presence of the Allen predicates that ensure the temporal consistency between neighboring predicates in a given rule. We further design a unique scoring function that evaluates the confidence of the candidate answers while performing link prediction and time interval prediction by utilizing the learned rules. Our empirical evaluation on two time interval based TKGC datasets suggests that our model outperforms state-of-the-art models for both link prediction and the time interval prediction task.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge Graphs
Authors:
Garima Gaur,
Abhishek Dang,
Arnab Bhattacharya,
Srikanta Bedathur
Abstract:
Knowledge graphs (KG) that model the relationships between entities as labeled edges (or facts) in a graph are mostly constructed using a suite of automated extractors, thereby inherently leading to uncertainty in the extracted facts. Modeling the uncertainty as probabilistic confidence scores results in a probabilistic knowledge graph. Graph queries over such probabilistic KGs require answer comp…
▽ More
Knowledge graphs (KG) that model the relationships between entities as labeled edges (or facts) in a graph are mostly constructed using a suite of automated extractors, thereby inherently leading to uncertainty in the extracted facts. Modeling the uncertainty as probabilistic confidence scores results in a probabilistic knowledge graph. Graph queries over such probabilistic KGs require answer computation along with the computation of those result probabilities, aka, probabilistic inference. We propose a system, HAPPI (How Provenance of Probabilistic Inference), to handle such query processing. Complying with the standard provenance semiring model, we propose a novel commutative semiring to symbolically compute the probability of the result of a query. These provenance-polynomiallike symbolic expressions encode fine-grained information about the probability computation process. We leverage this encoding to efficiently compute as well as maintain the probability of results as the underlying KG changes. Focusing on a popular class of conjunctive basic graph pattern queries on the KG, we compare the performance of HAPPI against a possible-world model of computation and a knowledge compilation tool over two large datasets. We also propose an adaptive system that leverages the strengths of both HAPPI and compilation based techniques. Since existing systems for probabilistic databases mostly focus on query computation, they default to re-computation when facts in the KG are updated. HAPPI, on the other hand, does not just perform probabilistic inference and maintain their provenance, but also provides a mechanism to incrementally maintain them as the KG changes. We extend this maintainability as part of our proposed adaptive system.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
BERT Meets Relational DB: Contextual Representations of Relational Databases
Authors:
Siddhant Arora,
Vinayak Gupta,
Garima Gaur,
Srikanta Bedathur
Abstract:
In this paper, we address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables. Embeddings help to capture semantics encoded in the database and can be used in a variety of settings like auto-completion of tables, fully-neural query processing of relational joins queries, seamlessly handling missing values, and more. Current work is…
▽ More
In this paper, we address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables. Embeddings help to capture semantics encoded in the database and can be used in a variety of settings like auto-completion of tables, fully-neural query processing of relational joins queries, seamlessly handling missing values, and more. Current work is restricted to working with just single table, or using pretrained embeddings over an external corpus making them unsuitable for use in real-world databases. In this work, we look into ways of using these attention-based model to learn embeddings for entities in the relational database. We are inspired by BERT style pretraining methods and are interested in observing how they can be extended for representation learning on structured databases. We evaluate our approach of the autocompletion of relational databases and achieve improvement over standard baselines.
△ Less
Submitted 30 April, 2021;
originally announced April 2021.
-
Tracking entities in technical procedures -- a new dataset and baselines
Authors:
Saransh Goyal,
Pratyush Pandey,
Garima Gaur,
Subhalingam D,
Srikanta Bedathur,
Maya Ramanath
Abstract:
We introduce TechTrack, a new dataset for tracking entities in technical procedures. The dataset, prepared by annotating open domain articles from WikiHow, consists of 1351 procedures, e.g., "How to connect a printer", identifies more than 1200 unique entities with an average of 4.7 entities per procedure. We evaluate the performance of state-of-the-art models on the entity-tracking task and find…
▽ More
We introduce TechTrack, a new dataset for tracking entities in technical procedures. The dataset, prepared by annotating open domain articles from WikiHow, consists of 1351 procedures, e.g., "How to connect a printer", identifies more than 1200 unique entities with an average of 4.7 entities per procedure. We evaluate the performance of state-of-the-art models on the entity-tracking task and find that they are well below the human annotation performance. We describe how TechTrack can be used to take forward the research on understanding procedures from temporal texts.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs
Authors:
Garima Gaur,
Arnab Bhattacharya,
Srikanta Bedathur
Abstract:
Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be…
▽ More
Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This brings us to the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing these issues, we present HUKA which uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---to the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.