-
Manifesto from Dagstuhl Perspectives Workshop 24352 -- Conversational Agents: A Framework for Evaluation (CAFE)
Authors:
Christine Bauer,
Li Chen,
Nicola Ferro,
Norbert Fuhr,
Avishek Anand,
Timo Breuer,
Guglielmo Faggioli,
Ophir Frieder,
Hideo Joho,
Jussi Karlgren,
Johannes Kiesel,
Bart P. Knijnenburg,
Aldo Lipani,
Lien Michiels,
Andrea Papenmeier,
Maria Soledad Pera,
Mark Sanderson,
Scott Sanner,
Benno Stein,
Johanne R. Trippas,
Karin Verspoor,
Martijn C Willemsen
Abstract:
During the workshop, we deeply discussed what CONversational Information ACcess (CONIAC) is and its unique features, proposing a world model abstracting it, and defined the Conversational Agents Framework for Evaluation (CAFE) for the evaluation of CONIAC systems, consisting of six major components: 1) goals of the system's stakeholders, 2) user tasks to be studied in the evaluation, 3) aspects of…
▽ More
During the workshop, we deeply discussed what CONversational Information ACcess (CONIAC) is and its unique features, proposing a world model abstracting it, and defined the Conversational Agents Framework for Evaluation (CAFE) for the evaluation of CONIAC systems, consisting of six major components: 1) goals of the system's stakeholders, 2) user tasks to be studied in the evaluation, 3) aspects of the users carrying out the tasks, 4) evaluation criteria to be considered, 5) evaluation methodology to be applied, and 6) measures for the quantitative criteria chosen.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Extending Business Process Management for Regulatory Transparency
Authors:
Jannis Kiesel,
Elias Grünewald
Abstract:
Ever-increasingly complex business processes are enabled by loosely coupled cloud-native systems. In such fast-paced development environments, data controllers face the challenge of capturing and updating all personal data processing activities due to considerable communication overhead between development teams and data protection staff. To date, established business process management methods ge…
▽ More
Ever-increasingly complex business processes are enabled by loosely coupled cloud-native systems. In such fast-paced development environments, data controllers face the challenge of capturing and updating all personal data processing activities due to considerable communication overhead between development teams and data protection staff. To date, established business process management methods generate valuable insights about systems, however, they do not account for all regulatory transparency obligations. For instance, data controllers need to record all information about data categories, legal purpose specifications, third-country transfers, etc. Therefore, we propose to bridge the gap between business processes and application systems by providing three contributions that assist in modeling, discovering, and checking personal data transparency through a process-oriented perspective. We enable transparency modeling for relevant business activities by providing a plug-in extension to BPMN featuring regulatory transparency information. Furthermore, we utilize event logs to record regulatory transparency information in realistic cloud-native systems. On this basis, we leverage process mining techniques to discover and analyze personal data flows in business processes, e.g., through transparency conformance checking. We design and implement prototypes for all contributions, emphasizing the appropriate integration and modeling effort required to create business-process-oriented transparency. Altogether, we connect current business process engineering techniques with regulatory needs as imposed by the GDPR and other legal frameworks.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Hook-in Privacy Techniques for gRPC-based Microservice Communication
Authors:
Louis Loechel,
Siar-Remzi Akbayin,
Elias Grünewald,
Jannis Kiesel,
Inga Strelnikova,
Thomas Janke,
Frank Pallas
Abstract:
gRPC is at the heart of modern distributed system architectures. Based on HTTP/2 and Protocol Buffers, it provides highly performant, standardized, and polyglot communication across loosely coupled microservices and is increasingly preferred over REST- or GraphQL-based service APIs in practice. Despite its widespread adoption, gRPC lacks any advanced privacy techniques beyond transport encryption…
▽ More
gRPC is at the heart of modern distributed system architectures. Based on HTTP/2 and Protocol Buffers, it provides highly performant, standardized, and polyglot communication across loosely coupled microservices and is increasingly preferred over REST- or GraphQL-based service APIs in practice. Despite its widespread adoption, gRPC lacks any advanced privacy techniques beyond transport encryption and basic token-based authentication. Such advanced techniques are, however, increasingly important for fulfilling regulatory requirements. For instance, anonymizing or otherwise minimizing (personal) data before responding to requests, or pre-processing data based on the purpose of the access may be crucial in certain usecases. In this paper, we therefore propose a novel approach for integrating such advanced privacy techniques into the gRPC framework in a practically viable way. Specifically, we present a general approach along with a working prototype that implements privacy techniques, such as data minimization and purpose limitation, in a configurable, extensible, and gRPC-native way utilizing a gRPC interceptor. We also showcase how to integrate this contribution into a realistic example of a food delivery use case. Alongside these implementations, a preliminary performance evaluation shows practical applicability with reasonable overheads. Altogether, we present a viable solution for integrating advanced privacy techniques into real-world gRPC-based microservice architectures, thereby facilitating regulatory compliance ``by design''.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Evaluating Generative Ad Hoc Information Retrieval
Authors:
Lukas Gienapp,
Harrisen Scells,
Niklas Deckers,
Janek Bevendorff,
Shuai Wang,
Johannes Kiesel,
Shahbaz Syed,
Maik Fröbe,
Guido Zuccon,
Benno Stein,
Matthias Hagen,
Martin Potthast
Abstract:
Recent advances in large language models have enabled the development of viable generative retrieval systems. Instead of a traditional document ranking, generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval. Yet, the establishe…
▽ More
Recent advances in large language models have enabled the development of viable generative retrieval systems. Instead of a traditional document ranking, generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval. Yet, the established evaluation methodology for ranking-based ad hoc retrieval is not suited for the reliable and reproducible evaluation of generated responses. To lay a foundation for developing new evaluation methods for generative retrieval systems, we survey the relevant literature from the fields of information retrieval and natural language processing, identify search tasks and system architectures in generative retrieval, develop a new user model, and study its operationalization.
△ Less
Submitted 22 May, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Hawk: DevOps-driven Transparency and Accountability in Cloud Native Systems
Authors:
Elias Grünewald,
Jannis Kiesel,
Siar-Remzi Akbayin,
Frank Pallas
Abstract:
Transparency is one of the most important principles of modern privacy regulations, such as the GDPR or CCPA. To be compliant with such regulatory frameworks, data controllers must provide data subjects with precise information about the collection, processing, storage, and transfer of personal data. To do so, respective facts and details must be compiled and always kept up to date. In traditional…
▽ More
Transparency is one of the most important principles of modern privacy regulations, such as the GDPR or CCPA. To be compliant with such regulatory frameworks, data controllers must provide data subjects with precise information about the collection, processing, storage, and transfer of personal data. To do so, respective facts and details must be compiled and always kept up to date. In traditional, rather static system environments, this inventory (including details such as the purposes of processing or the storage duration for each system component) could be done manually. In current circumstances of agile, DevOps-driven, and cloud-native information systems engineering, however, such manual practices do not suit anymore, making it increasingly hard for data controllers to achieve regulatory compliance. To allow for proper collection and maintenance of always up-to-date transparency information smoothly integrating into DevOps practices, we herein propose a set of novel approaches explicitly tailored to specific phases of the DevOps lifecycle most relevant in matters of privacy-related transparency and accountability at runtime: Release, Operation, and Monitoring. For each of these phases, we examine the specific challenges arising in determining the details of personal data processing, develop a distinct approach and provide respective proof of concept implementations that can easily be applied in cloud native systems. We also demonstrate how these components can be integrated with each other to establish transparency information comprising design- and runtime-elements. Furthermore, our experimental evaluation indicates reasonable overheads. On this basis, data controllers can fulfill their regulatory transparency obligations in line with actual engineering practices.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
Authors:
Nailia Mirzakhmedova,
Johannes Kiesel,
Milad Alshomary,
Maximilian Heinrich,
Nicolas Handke,
Xiaoni Cai,
Barriere Valentin,
Doratossadat Dastgheib,
Omid Ghahroodi,
Mohammad Ali Sadraei,
Ehsaneddin Asgari,
Lea Kawaletz,
Henning Wachsmuth,
Benno Stein
Abstract:
We present the Touché23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platforms. Each argument was annotated by 3 crowdworkers f…
▽ More
We present the Touché23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platforms. Each argument was annotated by 3 crowdworkers for 54 values. The Touché23-ValueEval dataset extends the Webis-ArgValues-22. In comparison to the previous dataset, the effectiveness of a 1-Baseline decreases, but that of an out-of-the-box BERT model increases. Therefore, though the classification difficulty increased as per the label distribution, the larger dataset allows for training better models.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Topic Ontologies for Arguments
Authors:
Yamen Ajjour,
Johannes Kiesel,
Benno Stein,
Martin Potthast
Abstract:
Many computational argumentation tasks, like stance classification, are topic-dependent: the effectiveness of approaches to these tasks significantly depends on whether the approaches were trained on arguments from the same topics as those they are tested on. So, which are these topics that researchers train approaches on? This paper contributes the first comprehensive survey of topic coverage, as…
▽ More
Many computational argumentation tasks, like stance classification, are topic-dependent: the effectiveness of approaches to these tasks significantly depends on whether the approaches were trained on arguments from the same topics as those they are tested on. So, which are these topics that researchers train approaches on? This paper contributes the first comprehensive survey of topic coverage, assessing 45 argument corpora. For the assessment, we take the first step towards building an argument topic ontology, consulting three diverse authoritative sources: the World Economic Forum, the Wikipedia list of controversial topics, and Debatepedia. Comparing the topic sets between the authoritative sources and corpora, our analysis shows that the corpora topics-which are mostly those frequently discussed in public online fora - are covered well by the sources. However, other topics from the sources are less extensively covered by the corpora of today, revealing interesting future directions for corpus construction.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
The Infinite Index: Information Retrieval on Generative Text-To-Image Models
Authors:
Niklas Deckers,
Maik Fröbe,
Johannes Kiesel,
Gianluca Pandolfo,
Christopher Schröder,
Benno Stein,
Martin Potthast
Abstract:
Conditional generative models such as DALL-E and Stable Diffusion generate images based on a user-defined text, the prompt. Finding and refining prompts that produce a desired image has become the art of prompt engineering. Generative models do not provide a built-in retrieval model for a user's information need expressed through prompts. In light of an extensive literature review, we reframe prom…
▽ More
Conditional generative models such as DALL-E and Stable Diffusion generate images based on a user-defined text, the prompt. Finding and refining prompts that produce a desired image has become the art of prompt engineering. Generative models do not provide a built-in retrieval model for a user's information need expressed through prompts. In light of an extensive literature review, we reframe prompt engineering for generative models as interactive text-based retrieval on a novel kind of "infinite index". We apply these insights for the first time in a case study on image generation for game design with an expert. Finally, we envision how active learning may help to guide the retrieval of generated images.
△ Less
Submitted 21 January, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
SCAI-QReCC Shared Task on Conversational Question Answering
Authors:
Svitlana Vakulenko,
Johannes Kiesel,
Maik Fröbe
Abstract:
Search-Oriented Conversational AI (SCAI) is an established venue that regularly puts a spotlight upon the recent work advancing the field of conversational search. SCAI'21 was organised as an independent on-line event and featured a shared task on conversational question answering. Since all of the participant teams experimented with answer generation models for this task, we identified evaluation…
▽ More
Search-Oriented Conversational AI (SCAI) is an established venue that regularly puts a spotlight upon the recent work advancing the field of conversational search. SCAI'21 was organised as an independent on-line event and featured a shared task on conversational question answering. Since all of the participant teams experimented with answer generation models for this task, we identified evaluation of answer correctness in this settings as the major challenge and a current research gap. Alongside the automatic evaluation, we conducted two crowdsourcing experiments to collect annotations for answer plausibility and faithfulness. As a result of this shared task, the original conversational QA dataset used for evaluation was further extended with alternative correct answers produced by the participant systems.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
Web Archive Analytics
Authors:
Michael Völske,
Janek Bevendorff,
Johannes Kiesel,
Benno Stein,
Maik Fröbe,
Matthias Hagen,
Martin Potthast
Abstract:
Web archive analytics is the exploitation of publicly accessible web pages and their evolution for research purposes -- to the extent organizationally possible for researchers. In order to better understand the complexity of this task, the first part of this paper puts the entirety of the world's captured, created, and replicated data (the "Global Datasphere") in relation to other important data s…
▽ More
Web archive analytics is the exploitation of publicly accessible web pages and their evolution for research purposes -- to the extent organizationally possible for researchers. In order to better understand the complexity of this task, the first part of this paper puts the entirety of the world's captured, created, and replicated data (the "Global Datasphere") in relation to other important data sets such as the public internet and its web pages, or what is preserved thereof by the Internet Archive.
Recently, the Webis research group, a network of university chairs to which the authors belong, concluded an agreement with the Internet Archive to download a substantial part of its web archive for research purposes. The second part of the paper in hand describes our infrastructure for processing this data treasure: We will eventually host around 8 PB of web archive data from the Internet Archive and Common Crawl, with the goal of supplementing existing large scale web corpora and forming a non-biased subset of the 30 PB web archive at the Internet Archive.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
A Stylometric Inquiry into Hyperpartisan and Fake News
Authors:
Martin Potthast,
Johannes Kiesel,
Kevin Reinartz,
Janek Bevendorff,
Benno Stein
Abstract:
This paper reports on a writing style analysis of hyperpartisan (i.e., extremely one-sided) news in connection to fake news. It presents a large corpus of 1,627 articles that were manually fact-checked by professional journalists from BuzzFeed. The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing.…
▽ More
This paper reports on a writing style analysis of hyperpartisan (i.e., extremely one-sided) news in connection to fake news. It presents a large corpus of 1,627 articles that were manually fact-checked by professional journalists from BuzzFeed. The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of which originated from hyperpartisan publishers.
We propose and demonstrate a new way of assessing style similarity between text categories via Unmasking---a meta-learning approach originally devised for authorship verification---, revealing that the style of left-wing and right-wing news have a lot more in common than any of the two have with the mainstream. Furthermore, we show that hyperpartisan news can be discriminated well by its style from the mainstream (F1=0.78), as can be satire from both (F1=0.81). Unsurprisingly, style-based fake news detection does not live up to scratch (F1=0.46). Nevertheless, the former results are important to implement pre-screening for fake news detectors.
△ Less
Submitted 18 February, 2017;
originally announced February 2017.