-
Towards Computer-Using Personal Agents
Authors:
Piero A. Bonatti,
John Domingue,
Anna Lisa Gentile,
Andreas Harth,
Olaf Hartig,
Aidan Hogan,
Katja Hose,
Ernesto Jimenez-Ruiz,
Deborah L. McGuinness,
Chang Sun,
Ruben Verborgh,
Jesse Wright
Abstract:
Computer-Using Agents (CUA) enable users to automate increasingly-complex tasks using graphical interfaces such as browsers. As many potential tasks require personal data, we propose Computer-Using Personal Agents (CUPAs) that have access to an external repository of the user's personal data. Compared with CUAs, CUPAs offer users better control of their personal data, the potential to automate mor…
▽ More
Computer-Using Agents (CUA) enable users to automate increasingly-complex tasks using graphical interfaces such as browsers. As many potential tasks require personal data, we propose Computer-Using Personal Agents (CUPAs) that have access to an external repository of the user's personal data. Compared with CUAs, CUPAs offer users better control of their personal data, the potential to automate more tasks involving personal data, better interoperability with external sources of data, and better capabilities to coordinate with other CUPAs in order to solve collaborative tasks involving the personal data of multiple users.
△ Less
Submitted 31 January, 2025;
originally announced March 2025.
-
Semantic Web and Creative AI -- A Technical Report from ISWS 2023
Authors:
Raia Abu Ahmad,
Reham Alharbi,
Roberto Barile,
Martin Böckling,
Francisco Bolanos,
Sara Bonfitto,
Oleksandra Bruns,
Irene Celino,
Yashrajsinh Chudasama,
Martin Critelli,
Claudia d'Amato,
Giada D'Ippolito,
Ioannis Dasoulas,
Stefano De Giorgis,
Vincenzo De Leo,
Chiara Di Bonaventura,
Marco Di Panfilo,
Daniil Dobriy,
John Domingue,
Xuemin Duan,
Michel Dumontier,
Sefika Efeoglu,
Ruben Eschauzier,
Fakih Ginwa,
Nicolas Ferranti
, et al. (52 additional authors not shown)
Abstract:
The International Semantic Web Research School (ISWS) is a week-long intensive program designed to immerse participants in the field. This document reports a collaborative effort performed by ten teams of students, each guided by a senior researcher as their mentor, attending ISWS 2023. Each team provided a different perspective to the topic of creative AI, substantiated by a set of research quest…
▽ More
The International Semantic Web Research School (ISWS) is a week-long intensive program designed to immerse participants in the field. This document reports a collaborative effort performed by ten teams of students, each guided by a senior researcher as their mentor, attending ISWS 2023. Each team provided a different perspective to the topic of creative AI, substantiated by a set of research questions as the main subject of their investigation. The 2023 edition of ISWS focuses on the intersection of Semantic Web technologies and Creative AI. ISWS 2023 explored various intersections between Semantic Web technologies and creative AI. A key area of focus was the potential of LLMs as support tools for knowledge engineering. Participants also delved into the multifaceted applications of LLMs, including legal aspects of creative content production, humans in the loop, decentralised approaches to multimodal generative AI models, nanopublications and AI for personal scientific knowledge graphs, commonsense knowledge in automatic story and narrative completion, generative AI for art critique, prompt engineering, automatic music composition, commonsense prototyping and conceptual blending, and elicitation of tacit knowledge. As Large Language Models and semantic technologies continue to evolve, new exciting prospects are emerging: a future where the boundaries between creative expression and factual knowledge become increasingly permeable and porous, leading to a world of knowledge that is both informative and inspiring.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions
Authors:
Aidan Hogan,
Xin Luna Dong,
Denny Vrandečić,
Gerhard Weikum
Abstract:
Much has been discussed about how Large Language Models, Knowledge Graphs and Search Engines can be combined in a synergistic manner. A dimension largely absent from current academic discourse is the user perspective. In particular, there remain many open questions regarding how best to address the diverse information needs of users, incorporating varying facets and levels of difficulty. This pape…
▽ More
Much has been discussed about how Large Language Models, Knowledge Graphs and Search Engines can be combined in a synergistic manner. A dimension largely absent from current academic discourse is the user perspective. In particular, there remain many open questions regarding how best to address the diverse information needs of users, incorporating varying facets and levels of difficulty. This paper introduces a taxonomy of user information needs, which guides us to study the pros, cons and possible synergies of Large Language Models, Knowledge Graphs and Search Engines. From this study, we derive a roadmap for future research.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
Holistic Processing of Colour Images Using Novel Quaternion-Valued Wavelets on the Plane
Authors:
Neil D. Dizon,
Jeffrey A. Hogan
Abstract:
Recently, novel quaternion-valued wavelets on the plane were constructed using an optimisation approach. These wavelets are compactly supported, smooth, orthonormal, non-separable and truly quaternionic. However, they have not been tested in application. In this paper, we introduce a methodology for decomposing and reconstructing colour images using quaternionic wavelet filters associated to recen…
▽ More
Recently, novel quaternion-valued wavelets on the plane were constructed using an optimisation approach. These wavelets are compactly supported, smooth, orthonormal, non-separable and truly quaternionic. However, they have not been tested in application. In this paper, we introduce a methodology for decomposing and reconstructing colour images using quaternionic wavelet filters associated to recently developed quaternion-valued wavelets on the plane. We investigate its applicability in compression, enhancement, segmentation, and denoising of colour images. Our results demonstrate these wavelets as promising tools for an end-to-end quaternion processing of colour images.
△ Less
Submitted 11 January, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Time- and Space-Efficient Regular Path Queries on Graphs
Authors:
Diego Arroyuelo,
Aidan Hogan,
Gonzalo Navarro,
Javiel Rojas-Ledesma
Abstract:
We introduce a time- and space-efficient technique to solve regularpath queries over labeled graphs. We combine a bit-parallel simula-tion of the Glushkov automaton of the regular expression with thering index introduced by Arroyuelo et al., exploiting its wavelettree representation of the triples in order to efficiently reach thestates of the product graph that are relevant for the query. Ourquer…
▽ More
We introduce a time- and space-efficient technique to solve regularpath queries over labeled graphs. We combine a bit-parallel simula-tion of the Glushkov automaton of the regular expression with thering index introduced by Arroyuelo et al., exploiting its wavelettree representation of the triples in order to efficiently reach thestates of the product graph that are relevant for the query. Ourquery algorithm is able to simultaneously process several automa-ton states, as well as several graph nodes/labels. Our experimentalresults show that our representation uses 3-5 times less space thanthe alternatives in the literature, while generally outperformingthem in query times (1.67 times faster than the next best).
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
MillenniumDB: A Persistent, Open-Source, Graph Database
Authors:
Domagoj Vrgoc,
Carlos Rojas,
Renzo Angles,
Marcelo Arenas,
Diego Arroyuelo,
Carlos Buil Aranda,
Aidan Hogan,
Gonzalo Navarro,
Cristian Riveros,
Juan Romero
Abstract:
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data manage…
▽ More
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Sampling low-spectrum signals on graphs via cluster-concentrated modes: examples
Authors:
Joseph D. Lakey,
Jeffrey A. Hogan
Abstract:
We establish frame inequalities for signals in Paley--Wiener spaces on two specific families of graphs consisting of combinations of cubes and cycles. The frame elements are localizations to cubes, regarded as clusters in the graphs, of vertex functions that are eigenvectors of certain spatio--spectral limiting operators on graph signals.
We establish frame inequalities for signals in Paley--Wiener spaces on two specific families of graphs consisting of combinations of cubes and cycles. The frame elements are localizations to cubes, regarded as clusters in the graphs, of vertex functions that are eigenvectors of certain spatio--spectral limiting operators on graph signals.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Question Answering over Knowledge Graphs with Neural Machine Translation and Entity Linking
Authors:
Daniel Diomedi,
Aidan Hogan
Abstract:
The goal of Question Answering over Knowledge Graphs (KGQA) is to find answers for natural language questions over a knowledge graph. Recent KGQA approaches adopt a neural machine translation (NMT) approach, where the natural language question is translated into a structured query language. However, NMT suffers from the out-of-vocabulary problem, where terms in a question may not have been seen du…
▽ More
The goal of Question Answering over Knowledge Graphs (KGQA) is to find answers for natural language questions over a knowledge graph. Recent KGQA approaches adopt a neural machine translation (NMT) approach, where the natural language question is translated into a structured query language. However, NMT suffers from the out-of-vocabulary problem, where terms in a question may not have been seen during training, impeding their translation. This issue is particularly problematic for the millions of entities that large knowledge graphs describe. We rather propose a KGQA approach that delegates the processing of entities to entity linking (EL) systems. NMT is then used to create a query template with placeholders that are filled by entities identified in an EL phase. Slot filling is used to decide which entity fills which placeholder. Experiments for QA over Wikidata show that our approach outperforms pure NMT: while there remains a strong dependence on having seen similar query templates during training, errors relating to entities are greatly reduced.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs
Authors:
Waqas Ali,
Muhammad Saleem,
Bin Yao,
Aidan Hogan,
Axel-Cyrille Ngonga Ngomo
Abstract:
RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend to focus on the distributed setting, the main foc…
▽ More
RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend to focus on the distributed setting, the main focus of the work is on providing a comprehensive survey of state-of-the-art storage, indexing and query processing techniques for efficiently evaluating SPARQL queries in a local setting (on one machine). To keep the survey self-contained, we also provide a short discussion on graph partitioning techniques used in the distributed setting. We conclude by discussing contemporary research challenges for further improving SPARQL query engines. This extended version also provides a survey of over one hundred SPARQL query engines and the techniques they use, along with twelve benchmarks and their features.
△ Less
Submitted 13 October, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Storage, Indexing, Query Processing, and Benchmarking in Centralized and Distributed RDF Engines: A Survey
Authors:
Waqas Ali,
Muhammad Saleem,
Bin Yao,
Aidan Hogan,
Axel-Cyrille Ngonga Ngomo
Abstract:
The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is significant adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ various mechanisms to implement crit…
▽ More
The recent advancements of the Semantic Web and Linked Data have changed the working of the traditional web. There is significant adoption of the Resource Description Framework (RDF) format for saving of web-based data. This massive adoption has paved the way for the development of various centralized and distributed RDF processing engines. These engines employ various mechanisms to implement critical components of the query processing engines such as data storage, indexing, language support, and query execution. All these components govern how queries are executed and can have a substantial effect on the query runtime. For example, the storage of RDF data in various ways significantly affects the data storage space required and the query runtime performance. The type of indexing approach used in RDF engines is critical for fast data lookup. The type of the underlying querying language (e.g., SPARQL or SQL) used for query execution is a crucial optimization component of the RDF storage solutions. Finally, query execution involving different join orders significantly affects the query response time. This paper provides a comprehensive review of centralized and distributed RDF engines in terms of storage, indexing, language support, and query execution.
△ Less
Submitted 23 September, 2020; v1 submitted 22 September, 2020;
originally announced September 2020.
-
Recursive SPARQL for Graph Analytics
Authors:
Aidan Hogan,
Juan Reutter,
Adrian Soto
Abstract:
Work on knowledge graphs and graph-based data management often focus either on declarative graph query languages or on frameworks for graph analytics, where there has been little work in trying to combine both approaches. However, many real-world tasks conceptually involve combinations of these approaches: a graph query can be used to select the appropriate data, which is then enriched with analyt…
▽ More
Work on knowledge graphs and graph-based data management often focus either on declarative graph query languages or on frameworks for graph analytics, where there has been little work in trying to combine both approaches. However, many real-world tasks conceptually involve combinations of these approaches: a graph query can be used to select the appropriate data, which is then enriched with analytics, and then possibly filtered or combined again with other data by means of a query language. In this paper we propose a declarative language that is well suited to perform graph querying and analytical tasks. We do this by proposing a minimalistic extension of SPARQL to allow for expressing analytical tasks; in particular, we propose to extend SPARQL with recursive features, and provide a formal syntax and semantics for our language. We show that this language can express key analytical tasks on graphs (in fact, it is Turing complete), offering a more declarative alternative to existing frameworks and languages. We show how procedures in our language can be implemented over an off-the-shelf SPARQL engine with a specialised client that allows parallelisation and batch-based processing when memory is limited. Results show that with such an implementation, procedures for popular analytics currently run in seconds or minutes for selective sub-graphs (our target use-case) but struggle at larger scales.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Knowledge Graphs
Authors:
Aidan Hogan,
Eva Blomqvist,
Michael Cochez,
Claudia d'Amato,
Gerard de Melo,
Claudio Gutierrez,
José Emilio Labra Gayo,
Sabrina Kirrane,
Sebastian Neumaier,
Axel Polleres,
Roberto Navigli,
Axel-Cyrille Ngonga Ngomo,
Sabbir M. Rashid,
Anisa Rula,
Lukas Schmelzeisen,
Juan Sequeda,
Steffen Staab,
Antoine Zimmermann
Abstract:
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss th…
▽ More
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.
△ Less
Submitted 11 September, 2021; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning
Authors:
Chi-Sing Ho,
Neal Jean,
Catherine A. Hogan,
Lena Blackmon,
Stefanie S. Jeffrey,
Mark Holodniy,
Niaz Banaei,
Amr A. E. Saleh,
Stefano Ermon,
Jennifer Dionne
Abstract:
Rapid identification of bacteria is essential to prevent the spread of infectious disease, help combat antimicrobial resistance, and improve patient outcomes. Raman optical spectroscopy promises to combine bacterial detection, identification, and antibiotic susceptibility testing in a single step. However, achieving clinically relevant speeds and accuracies remains challenging due to the weak Rama…
▽ More
Rapid identification of bacteria is essential to prevent the spread of infectious disease, help combat antimicrobial resistance, and improve patient outcomes. Raman optical spectroscopy promises to combine bacterial detection, identification, and antibiotic susceptibility testing in a single step. However, achieving clinically relevant speeds and accuracies remains challenging due to the weak Raman signal from bacterial cells and the large number of bacterial species and phenotypes. By amassing the largest known dataset of bacterial Raman spectra, we are able to apply state-of-the-art deep learning approaches to identify 30 of the most common bacterial pathogens from noisy Raman spectra, achieving antibiotic treatment identification accuracies of 99.0$\pm$0.1%. This novel approach distinguishes between methicillin-resistant and -susceptible isolates of Staphylococcus aureus (MRSA and MSSA) as well as a pair of isogenic MRSA and MSSA that are genetically identical apart from deletion of the mecA resistance gene, indicating the potential for culture-free detection of antibiotic resistance. Results from initial clinical validation are promising: using just 10 bacterial spectra from each of 25 isolates, we achieve 99.0$\pm$1.9% species identification accuracy. Our combined Raman-deep learning system represents an important proof-of-concept for rapid, culture-free identification of bacterial isolates and antibiotic resistance and could be readily extended for diagnostics on blood, urine, and sputum.
△ Less
Submitted 5 November, 2019; v1 submitted 22 January, 2019;
originally announced January 2019.
-
Efficiently Charting RDF
Authors:
Oren Kalinsky,
Oren Mishali,
Aidan Hogan,
Yoav Etsion,
Benny Kimelfeld
Abstract:
We propose a visual query language for interactively exploring large-scale knowledge graphs. Starting from an overview, the user explores bar charts through three interactions: class expansion, property expansion, and subject/object expansion. A major challenge faced is performance: a state-of-the-art SPARQL engine may require tens of minutes to compute the multiway join, grouping and counting req…
▽ More
We propose a visual query language for interactively exploring large-scale knowledge graphs. Starting from an overview, the user explores bar charts through three interactions: class expansion, property expansion, and subject/object expansion. A major challenge faced is performance: a state-of-the-art SPARQL engine may require tens of minutes to compute the multiway join, grouping and counting required to render a bar chart. A promising alternative is to apply approximation through online aggregation, trading precision for performance. However, state-of-the-art online aggregation algorithms such as Wander Join have two limitations for our exploration scenario: (1) a high number of rejected paths slows the convergence of the count estimations, and (2) no unbiased estimator exists for counts under the distinct operator. We thus devise a specialized algorithm for online aggregation that augments Wander Join with exact partial computations to reduce the number of rejected paths encountered, as well as a novel estimator that we prove to be unbiased in the case of the distinct operator. In an experimental study with random interactions exploring two large-scale knowledge graphs, our algorithm shows a clear reduction in error with respect to computation time versus Wander Join.
△ Less
Submitted 26 January, 2019; v1 submitted 27 November, 2018;
originally announced November 2018.
-
Universal Decision-Based Black-Box Perturbations: Breaking Security-Through-Obscurity Defenses
Authors:
Thomas A. Hogan,
Bhavya Kailkhura
Abstract:
We study the problem of finding a universal (image-agnostic) perturbation to fool machine learning (ML) classifiers (e.g., neural nets, decision tress) in the hard-label black-box setting. Recent work in adversarial ML in the white-box setting (model parameters are known) has shown that many state-of-the-art image classifiers are vulnerable to universal adversarial perturbations: a fixed human-imp…
▽ More
We study the problem of finding a universal (image-agnostic) perturbation to fool machine learning (ML) classifiers (e.g., neural nets, decision tress) in the hard-label black-box setting. Recent work in adversarial ML in the white-box setting (model parameters are known) has shown that many state-of-the-art image classifiers are vulnerable to universal adversarial perturbations: a fixed human-imperceptible perturbation that, when added to any image, causes it to be misclassified with high probability Kurakin et al. [2016], Szegedy et al. [2013], Chen et al. [2017a], Carlini and Wagner [2017]. This paper considers a more practical and challenging problem of finding such universal perturbations in an obscure (or black-box) setting. More specifically, we use zeroth order optimization algorithms to find such a universal adversarial perturbation when no model information is revealed-except that the attacker can make queries to probe the classifier. We further relax the assumption that the output of a query is continuous valued confidence scores for all the classes and consider the case where the output is a hard-label decision. Surprisingly, we found that even in these extremely obscure regimes, state-of-the-art ML classifiers can be fooled with a very high probability just by adding a single human-imperceptible image perturbation to any natural image. The surprising existence of universal perturbations in a hard-label black-box setting raises serious security concerns with the existence of a universal noise vector that adversaries can possibly exploit to break a classifier on most natural images.
△ Less
Submitted 13 November, 2018; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Tverberg-Type Theorems with Trees and Cycles as (Nerve) Intersection Patterns
Authors:
Jesús A. De Loera,
Thomas A. Hogan,
Deborah Oliveros,
Dominic Yang
Abstract:
Tverberg's theorem says that a set with sufficiently many points in $\mathbb{R}^d$ can always be partitioned into $m$ parts so that the $(m-1)$-simplex is the (nerve) intersection pattern of the convex hulls of the parts. The main results of our paper demonstrate that Tverberg's theorem is but a special case of a more general situation. Given sufficiently many points, all trees and cycles can also…
▽ More
Tverberg's theorem says that a set with sufficiently many points in $\mathbb{R}^d$ can always be partitioned into $m$ parts so that the $(m-1)$-simplex is the (nerve) intersection pattern of the convex hulls of the parts. The main results of our paper demonstrate that Tverberg's theorem is but a special case of a more general situation. Given sufficiently many points, all trees and cycles can also be induced by at least one partition of a point set.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Tverberg theorems over discrete sets of points
Authors:
Jesús A. De Loera,
Thomas A. Hogan,
Frédéric Meunier,
Nabil Mustafa
Abstract:
This paper discusses Tverberg-type theorems with coordinate constraints (i.e., versions of these theorems where all points lie within a subset $S \subset \mathbb{R}^d$ and the intersection of convex hulls is required to have a non-empty intersection with $S$). We determine the $m$-Tverberg number, when $m \geq 3$, of any discrete subset $S$ of $\mathbb{R}^2$ (a generalization of an unpublished res…
▽ More
This paper discusses Tverberg-type theorems with coordinate constraints (i.e., versions of these theorems where all points lie within a subset $S \subset \mathbb{R}^d$ and the intersection of convex hulls is required to have a non-empty intersection with $S$). We determine the $m$-Tverberg number, when $m \geq 3$, of any discrete subset $S$ of $\mathbb{R}^2$ (a generalization of an unpublished result of J.-P. Doignon). We also present improvements on the upper bounds for the Tverberg numbers of $\mathbb{Z}^3$ and $\mathbb{Z}^j \times \mathbb{R}^k$ and an integer version of the well-known positive-fraction selection lemma of J. Pach.
△ Less
Submitted 29 January, 2019; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Foundations of Modern Query Languages for Graph Databases
Authors:
Renzo Angles,
Marcelo Arenas,
Pablo Barcelo,
Aidan Hogan,
Juan Reutter,
Domagoj Vrgoc
Abstract:
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start wit…
▽ More
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed, and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.
△ Less
Submitted 15 June, 2017; v1 submitted 19 October, 2016;
originally announced October 2016.
-
OWL: Yet to arrive on the Web of Data?
Authors:
Birte Glimm,
Aidan Hogan,
Markus Krötzsch,
Axel Polleres
Abstract:
Seven years on from OWL becoming a W3C recommendation, and two years on from the more recent OWL 2 W3C recommendation, OWL has still experienced only patchy uptake on the Web. Although certain OWL features (like owl:sameAs) are very popular, other features of OWL are largely neglected by publishers in the Linked Data world. This may suggest that despite the promise of easy implementations and the…
▽ More
Seven years on from OWL becoming a W3C recommendation, and two years on from the more recent OWL 2 W3C recommendation, OWL has still experienced only patchy uptake on the Web. Although certain OWL features (like owl:sameAs) are very popular, other features of OWL are largely neglected by publishers in the Linked Data world. This may suggest that despite the promise of easy implementations and the proposal of tractable profiles suggested in OWL's second version, there is still no "right" standard fragment for the Linked Data community. In this paper, we (1) analyse uptake of OWL on the Web of Data, (2) gain insights into the OWL fragment that is actually used/usable on the Web, where we arrive at the conclusion that this fragment is likely to be a simplified profile based on OWL RL, (3) propose and discuss such a new fragment, which we call OWL LD (for Linked Data).
△ Less
Submitted 1 February, 2012;
originally announced February 2012.
-
Improving the recall of decentralised linked data querying through implicit knowledge
Authors:
Jürgen Umbrich,
Aidan Hogan,
Axel Polleres
Abstract:
Aside from crawling, indexing, and querying RDF data centrally, Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Proposed link-traversal query approaches for Linked Data have the benefits of up-to-date results and decentralised (i.e., client-side) execution, but operate on incomplete knowledge available in dereferenced documents, thus affecting recall. I…
▽ More
Aside from crawling, indexing, and querying RDF data centrally, Linked Data principles allow for processing SPARQL queries on-the-fly by dereferencing URIs. Proposed link-traversal query approaches for Linked Data have the benefits of up-to-date results and decentralised (i.e., client-side) execution, but operate on incomplete knowledge available in dereferenced documents, thus affecting recall. In this paper, we investigate how implicit knowledge - specifically that found through owl:sameAs and RDFS reasoning - can improve the recall in this setting. We start with an empirical analysis of a large crawl featuring 4 m Linked Data sources and 1.1 g quadruples: we (1) measure expected recall by only considering dereferenceable information, (2) measure the improvement in recall given by considering rdfs:seeAlso links as previous proposals did. We further propose and measure the impact of additionally considering (3) owl:sameAs links, and (4) applying lightweight RDFS reasoning (specifically ρDF) for finding more results, relying on static schema information. We evaluate our methods for live queries over our crawl.
△ Less
Submitted 1 September, 2011;
originally announced September 2011.