Search | arXiv e-print repository

Towards Computer-Using Personal Agents

Authors: Piero A. Bonatti, John Domingue, Anna Lisa Gentile, Andreas Harth, Olaf Hartig, Aidan Hogan, Katja Hose, Ernesto Jimenez-Ruiz, Deborah L. McGuinness, Chang Sun, Ruben Verborgh, Jesse Wright

Abstract: Computer-Using Agents (CUA) enable users to automate increasingly-complex tasks using graphical interfaces such as browsers. As many potential tasks require personal data, we propose Computer-Using Personal Agents (CUPAs) that have access to an external repository of the user's personal data. Compared with CUAs, CUPAs offer users better control of their personal data, the potential to automate mor… ▽ More Computer-Using Agents (CUA) enable users to automate increasingly-complex tasks using graphical interfaces such as browsers. As many potential tasks require personal data, we propose Computer-Using Personal Agents (CUPAs) that have access to an external repository of the user's personal data. Compared with CUAs, CUPAs offer users better control of their personal data, the potential to automate more tasks involving personal data, better interoperability with external sources of data, and better capabilities to coordinate with other CUPAs in order to solve collaborative tasks involving the personal data of multiple users. △ Less

Submitted 31 January, 2025; originally announced March 2025.

Comments: This report is a result of Dagstuhl Seminar 25051 "Trust and Accountability in Knowledge Graph-Based AI for Self Determination", which took place in January 2025

ACM Class: I.2.7; I.2.4; I.2.11; H.3.5

arXiv:2411.05622 [pdf]

From Resource Control to Digital Trust with User-Managed Access

Authors: Wouter Termont, Ruben Dedecker, Wout Slabbinck, Beatriz Esteves, Ben De Meester, Ruben Verborgh

Abstract: The User-Managed Access (UMA) extension to OAuth 2.0 is a promising candidate for increasing Digital Trust in personal data ecosystems like Solid. With minor modifications, it can achieve many requirements regarding usage control and transaction contextualization, even though additional specification is needed to address delegation of control and retraction of usage policies. The User-Managed Access (UMA) extension to OAuth 2.0 is a promising candidate for increasing Digital Trust in personal data ecosystems like Solid. With minor modifications, it can achieve many requirements regarding usage control and transaction contextualization, even though additional specification is needed to address delegation of control and retraction of usage policies. △ Less

Submitted 7 January, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

arXiv:2407.00998 [pdf, other]

Opportunities for Shape-based Optimization of Link Traversal Queries

Authors: Bryan-Elliott Tam, Ruben Taelman, Pieter Colpaert, Ruben Verborgh

Abstract: Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about… ▽ More Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes. △ Less

Submitted 28 August, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: 6 pages, 2 figures

arXiv:2406.10659 [pdf, other]

RDF Surfaces: Enabling Classical Negation on the Semantic Web

Authors: Patrick Hochstenbach, Mathijs van Noort, Dörthe Arndt, Rebekka Martens, Jos De Roo, Ruben Verborgh, Pieter Bonte, Femke Ongenae

Abstract: The Resource Description Framework (RDF) is a fundamental technology in the Semantic Web, enabling the representation and interchange of structured data. However, RDF lacks the capability to express negated statements in a generic way. As a result, exchanging negative information on a Web scale is thus far restricted to specific cases and predefined statements. The ability to negate (virtually) an… ▽ More The Resource Description Framework (RDF) is a fundamental technology in the Semantic Web, enabling the representation and interchange of structured data. However, RDF lacks the capability to express negated statements in a generic way. As a result, exchanging negative information on a Web scale is thus far restricted to specific cases and predefined statements. The ability to negate (virtually) any RDF statement allows for a comprehensive way to refute, deny or otherwise invalidate claims on a Web scale. Via an intermediate step of a diagrammatic approach to logical expressions called Peirce graphs, we introduce RDF Surfaces, an extension of RDF that incorporates the concept of classic negation, known from first-order logic. Overall, RDF Surfaces provides an abstract, visual approach to negation within the Semantic Web, offering a more general and widely applicable approach than previous attempts at incorporating negation. Aside from a (traditional) programmatic syntax, RDF Surfaces can also be represented visually by means of diagrams inspired by Peirce graphs. We demonstrate negation via RDF Surfaces and how to reason upon it in illustrative use cases drawn from the domains of academic publishing and eHealth. We hope this vision paper attracts new implementers and opens the discussion to its formal specification. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2309.16365 [pdf, other]

Libertas: Privacy-Preserving Collective Computation for Decentralised Personal Data Stores

Authors: Rui Zhao, Naman Goel, Nitin Agrawal, Jun Zhao, Jake Stein, Wael Albayaydh, Ruben Verborgh, Reuben Binns, Tim Berners-Lee, Nigel Shadbolt

Abstract: Data and data processing have become an indispensable aspect for our society. Insights drawn from collective data make invaluable contribution to scientific and societal research and business. But there are increasing worries about privacy issues and data misuse. This has prompted the emergence of decentralised personal data stores (PDS) like Solid that provide individuals more control over their… ▽ More Data and data processing have become an indispensable aspect for our society. Insights drawn from collective data make invaluable contribution to scientific and societal research and business. But there are increasing worries about privacy issues and data misuse. This has prompted the emergence of decentralised personal data stores (PDS) like Solid that provide individuals more control over their personal data. However, existing PDS frameworks face challenges in ensuring data privacy when performing collective computations with data from multiple users. While Secure Multi-Party Computation (MPC) offers input secrecy protection during the computation without relying on any single party, issues emerge when directly applying MPC in the context of PDS, particularly due to key factors like autonomy and decentralisation. In this work, we discuss the essence of this issue, identify a potential solution, and introduce a modular architecture, Libertas, to integrate MPC with PDS like Solid, without requiring protocol-level changes. We introduce a paradigm shift from an `omniscient' view to individual-based, user-centric view of trust and security, and discuss the threat model of Libertas. Two realistic use cases for collaborative data processing are used for evaluation, both for technical feasibility and empirical benchmark, highlighting its effectiveness in empowering gig workers and generating differentially private synthetic data. The results of our experiments underscore Libertas' linear scalability and provide valuable insights into compute optimisations, thereby advancing the state-of-the-art in privacy-preserving data processing practices. By offering practical solutions for maintaining both individual autonomy and privacy in collaborative data processing environments, Libertas contributes significantly to the ongoing discourse on privacy protection in data-driven decision-making contexts. △ Less

Submitted 30 March, 2025; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Accepted by CSCW 2025; manuscript version

arXiv:2305.08476 [pdf, other]

RDF Surfaces: Computer Says No

Authors: Patrick Hochstenbach, Jos De Roo, Ruben Verborgh

Abstract: Logic can define how agents are provided or denied access to resources, how to interlink resources using mining processes and provide users with choices for possible next steps in a workflow. These decisions are for the most part hidden, internal to machines processing data. In order to exchange this internal logic a portable Web logic is required which the Semantic Web could provide. Combining lo… ▽ More Logic can define how agents are provided or denied access to resources, how to interlink resources using mining processes and provide users with choices for possible next steps in a workflow. These decisions are for the most part hidden, internal to machines processing data. In order to exchange this internal logic a portable Web logic is required which the Semantic Web could provide. Combining logic and data provides insights into the reasoning process and creates a new level of trust on the Semantic Web. Current Web logics carries only a fragment of first-order logic (FOL) to keep exchange languages decidable or easily processable. But, this is at a cost: the portability of logic. Machines require implicit agreements to know which fragment of logic is being exchanged and need a strategy for how to cope with the different fragments. These choices could obscure insights into the reasoning process. We created RDF Surfaces in order to express the full expressivity of FOL including saying explicitly `no'. This vision paper provides basic principles and compares existing work. Even though support for FOL is semi-decidable, we argue these problems are surmountable. RDF Surfaces span many use cases, including describing misuse of information, adding explainability and trust to reasoning, and providing scope for reasoning over streams of data and queries. RDF Surfaces provide the direct translation of FOL for the Semantic Web. We hope this vision paper attracts new implementers and opens the discussion to its formal specification. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 5 pages, position paper for the ESWC2023 TrusDeKW workshop

ACM Class: D.3; F.3; H.4

arXiv:2302.14411 [pdf, other]

Distributed Subweb Specifications for Traversing the Web

Authors: Bart Bogaerts, Bas Ketsman, Younes Zeboudj, Heba Aamer, Ruben Taelman, Ruben Verborgh

Abstract: Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables d… ▽ More Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP). △ Less

Submitted 27 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)

arXiv:2302.06933 [pdf]

Evaluation of Link Traversal Query Execution over Decentralized Environments with Structural Assumptions

Authors: Ruben Taelman, Ruben Verborgh

Abstract: To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for… ▽ More To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, and provide the SolidBench benchmark to simulate Solid environments representatively. We introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds for non-complex queries, which existing algorithms cannot achieve. Furthermore, we discuss limitations with respect to more complex queries. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: Not peer-reviewed

arXiv:2210.04631 [pdf]

A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query Processing (Extended Version)

Authors: Ruben Taelman, Ruben Verborgh

Abstract: The societal and economical consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data network… ▽ More The societal and economical consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine's host machine or the query initiator's personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. Each threat is explained, together with an example, and one or more avenues for mitigations are proposed. We conclude with several concrete recommendations for LTQP query engine developers and data publishers as a first step to mitigate some of these issues. With this work, we start filling the unknowns for enabling querying over decentralized environments. Aside from future work on security, wider research is needed to uncover missing building blocks for enabling true decentralization. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: This is an extended version of an article with the same title published in the proceedings of the QuWeDa workshop at ISWC 2022. Next to more details in the related work and conclusions sections, this extension introduces concrete mitigations of each vulnerability

arXiv:2208.00665 [pdf, other]

Event Notifications in Value-Adding Networks

Authors: Patrick Hochstenbach, Herbert Van de Sompel, Miel Vander Sande, Ruben Dedecker, Ruben Verborgh

Abstract: Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. I… ▽ More Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. In this paper we report on work aimed at making linkages available in real-time, in which an alternative, decentralised scholarly communication network is considered that consists of interacting data nodes that host artifacts and service nodes that add value to artifacts. The first result of this work, the "Event Notifications in Value-Adding Networks" specification, details interoperability requirements for the exchange of real-time life-cycle information pertaining to artifacts using Linked Data Notifications. In an experiment, we applied our specification to one particular use-case: distributing Scholix data-literature links to a network of Belgian institutional repositories by a national service node. The results of our experiment confirm the potential of our approach and provide a framework to create a network of interacting nodes implementing the core scholarly functions (registration, certification, awareness and archiving) in a decentralized and decoupled way. △ Less

Submitted 3 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 12 pages, 2 figures, Accepted at the 26th International Conference on Theory and Practice of Digital Libraries, Padua, Italy

arXiv:2005.02239 [pdf, ps, other]

Guided Link-Traversal-Based Query Processing

Authors: Ruben Verborgh, Ruben Taelman

Abstract: Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and tr… ▽ More Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and trustworthiness of results, given that sources are selected dynamically. To address both issues, we propose guided LTBQP, a technique in which information about document linking structure and content policies is passed to a query processor. Thereby, the processor can prune the search tree of documents by only following relevant links, and restrict the result set to desired results by limiting which documents are considered for what kinds of content. In this exploratory paper, we describe the technique at a high level and sketch some of its applications. We argue that such guidance can make LTBQP a valuable query strategy in decentralized environments, where data is spread across documents with varying levels of user trust. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: 4 pages

arXiv:1609.07108 [pdf, other]

A Web API ecosystem through feature-based reuse

Authors: Ruben Verborgh, Michel Dumontier

Abstract: The fast-growing Web API landscape brings clients more options than ever before---in theory. In practice, they cannot easily switch between different providers offering similar functionality. We discuss a vision for developing Web APIs based on reuse of interface parts called features. Through the introduction of 5 design principles, we investigate the impact of feature-based reuse on Web APIs. Ap… ▽ More The fast-growing Web API landscape brings clients more options than ever before---in theory. In practice, they cannot easily switch between different providers offering similar functionality. We discuss a vision for developing Web APIs based on reuse of interface parts called features. Through the introduction of 5 design principles, we investigate the impact of feature-based reuse on Web APIs. Applying these principles enables a granular reuse of client and server code, documentation, and tools. Together, they can foster a measurable ecosystem with cross-API compatibility, opening the door to a more flexible generation of Web clients. △ Less

Submitted 12 March, 2018; v1 submitted 22 September, 2016; originally announced September 2016.

arXiv:1512.07780 [pdf, other]

doi 10.1017/S1471068416000016

The Pragmatic Proof: Hypermedia API Composition and Execution

Authors: Ruben Verborgh, Dörthe Arndt, Sofie Van Hoecke, Jos De Roo, Giovanni Mels, Thomas Steiner, Joaquim Gabarro

Abstract: Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume… ▽ More Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume hypermedia APIs, which need to be navigated at runtime. Clients instead need a more dynamic plan that allows them to follow hyperlinks and use forms with a preset goal. Therefore, in this article, we show how compositions of hypermedia APIs can be created by generic Semantic Web reasoners. This is achieved through the generation of a proof based on semantic descriptions of the APIs' functionality. To pragmatically verify the applicability of compositions, we introduce the notion of pre-execution and post-execution proofs. The runtime interaction between a client and a server is guided by proofs but driven by hypermedia, allowing the client to react to the application's actual state indicated by the server's response. We describe how to generate compositions from descriptions, discuss a computer-assisted process to generate descriptions, and verify reasoner performance on various composition tasks using a benchmark suite. The experimental results lead to the conclusion that proof-based consumption of hypermedia APIs is a feasible strategy at Web scale. △ Less

Submitted 24 December, 2015; originally announced December 2015.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)

arXiv:1501.06329 [pdf, other]

Disaster Monitoring with Wikipedia and Online Social Networking Sites: Structured Data and Linked Data Fragments to the Rescue?

Authors: Thomas Steiner, Ruben Verborgh

Abstract: In this paper, we present the first results of our ongoing early-stage research on a realtime disaster detection and monitoring tool. Based on Wikipedia, it is language-agnostic and leverages user-generated multimedia content shared on online social networking sites to help disaster responders prioritize their efforts. We make the tool and its source code publicly available as we make progress on… ▽ More In this paper, we present the first results of our ongoing early-stage research on a realtime disaster detection and monitoring tool. Based on Wikipedia, it is language-agnostic and leverages user-generated multimedia content shared on online social networking sites to help disaster responders prioritize their efforts. We make the tool and its source code publicly available as we make progress on it. Furthermore, we strive to publish detected disasters and accompanying multimedia content following the Linked Data principles to facilitate its wide consumption, redistribution, and evaluation of its usefulness. △ Less

Submitted 26 January, 2015; originally announced January 2015.

Comments: Accepted for publication at the AAAI Spring Symposium 2015: Structured Data for Humanitarian Technologies: Perfect fit or Overkill? #SD4HumTech15

Showing 1–14 of 14 results for author: Verborgh, R