-
CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition
Authors:
Joerg Deigmoeller,
Stephan Hasler,
Nakul Agarwal,
Daniel Tanneberg,
Anna Belardinelli,
Reza Ghoddoosian,
Chao Wang,
Felix Ocker,
Fan Zhang,
Behzad Dariush,
Michael Gienger
Abstract:
We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring tha…
▽ More
We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring that robots correctly recognize and track actors, objects, and their interactions over time. To achieve this, CARMA uniquely identifies physical instances of such entities in the real world and organizes them into grounded triplets of actors, objects, and actions.
To validate our approach, we conducted three experiments, where multiple humans and a robot interact: collaborative pouring, handovers, and sorting. These scenarios allow the assessment of the system's capabilities as to role distinction, multi-actor awareness, and consistent instance identification. Our experiments demonstrate that the system can reliably generate accurate actor-action-object triplets, providing a structured and robust foundation for applications requiring spatiotemporal reasoning and situated decision-making in collaborative settings.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Mirror Eyes: Explainable Human-Robot Interaction at a Glance
Authors:
Matti Krüger,
Daniel Tanneberg,
Chao Wang,
Stephan Hasler,
Michael Gienger
Abstract:
The gaze of a person tends to reflect their interest. This work explores what happens when this statement is taken literally and applied to robots. Here we present a robot system that employs a moving robot head with a screen-based eye model that can direct the robot's gaze to points in physical space and present a reflection-like mirror image of the attended region on top of each eye. We conducte…
▽ More
The gaze of a person tends to reflect their interest. This work explores what happens when this statement is taken literally and applied to robots. Here we present a robot system that employs a moving robot head with a screen-based eye model that can direct the robot's gaze to points in physical space and present a reflection-like mirror image of the attended region on top of each eye. We conducted a user study with 33 participants, who were asked to instruct the robot to perform pick-and-place tasks, monitor the robot's task execution, and interrupt it in case of erroneous actions. Despite a deliberate lack of instructions about the role of the eyes and a very brief system exposure, participants felt more aware about the robot's information processing, detected erroneous actions earlier, and rated the user experience higher when eye-based mirroring was enabled compared to non-reflective eyes. These results suggest a beneficial and intuitive utilization of the introduced method in cooperative human-robot interaction.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Spatial Reasoner: A 3D Inference Pipeline for XR Applications
Authors:
Steven Häsler,
Philipp Ackermann
Abstract:
Modern extended reality XR systems provide rich analysis of image data and fusion of sensor input and demand AR/VR applications that can reason about 3D scenes in a semantic manner. We present a spatial reasoning framework that bridges geometric facts with symbolic predicates and relations to handle key tasks such as determining how 3D objects are arranged among each other ('on', 'behind', 'near',…
▽ More
Modern extended reality XR systems provide rich analysis of image data and fusion of sensor input and demand AR/VR applications that can reason about 3D scenes in a semantic manner. We present a spatial reasoning framework that bridges geometric facts with symbolic predicates and relations to handle key tasks such as determining how 3D objects are arranged among each other ('on', 'behind', 'near', etc.). Its foundation relies on oriented 3D bounding box representations, enhanced by a comprehensive set of spatial predicates, ranging from topology and connectivity to directionality and orientation, expressed in a formalism related to natural language. The derived predicates form a spatial knowledge graph and, in combination with a pipeline-based inference model, enable spatial queries and dynamic rule evaluation. Implementations for client- and server-side processing demonstrate the framework's capability to efficiently translate geometric data into actionable knowledge, ensuring scalable and technology-independent spatial reasoning in complex 3D environments. The Spatial Reasoner framework is fostering the creation of spatial ontologies, and seamlessly integrates with and therefore enriches machine learning, natural language processing, and rule systems in XR applications.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Sounds Good? Fast and Secure Contact Exchange in Groups
Authors:
Florentin Putz,
Steffen Haesler,
Matthias Hollick
Abstract:
Trustworthy digital communication requires the secure exchange of contact information, but current approaches lack usability and scalability for larger groups of users. We evaluate the usability of two secure contact exchange systems: the current state of the art, SafeSlinger, and our newly designed protocol, PairSonic, which extends trust from physical encounters to spontaneous online communicati…
▽ More
Trustworthy digital communication requires the secure exchange of contact information, but current approaches lack usability and scalability for larger groups of users. We evaluate the usability of two secure contact exchange systems: the current state of the art, SafeSlinger, and our newly designed protocol, PairSonic, which extends trust from physical encounters to spontaneous online communication. Our lab study (N=45) demonstrates PairSonic's superior usability, automating the tedious verification tasks from previous approaches via an acoustic out-of-band channel. Although participants significantly preferred our system, minimizing user effort surprisingly decreased the perceived security for some users, who associated security with complexity. We discuss user perceptions of the different protocol components and identify remaining usability barriers for CSCW application scenarios.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
PairSonic: Helping Groups Securely Exchange Contact Information
Authors:
Florentin Putz,
Steffen Haesler,
Thomas Völkl,
Maximilian Gehring,
Nils Rollshausen,
Matthias Hollick
Abstract:
Securely exchanging contact information is essential for establishing trustworthy communication channels that facilitate effective online collaboration. However, current methods are neither user-friendly nor scalable for large groups of users. In response, we introduce PairSonic, a novel group pairing protocol that extends trust from physical encounters to online communication. PairSonic simplifie…
▽ More
Securely exchanging contact information is essential for establishing trustworthy communication channels that facilitate effective online collaboration. However, current methods are neither user-friendly nor scalable for large groups of users. In response, we introduce PairSonic, a novel group pairing protocol that extends trust from physical encounters to online communication. PairSonic simplifies the pairing process by automating the tedious verification tasks of previous methods through an acoustic out-of-band channel using smartphones' built-in hardware. Our protocol not only facilitates connecting users for computer-supported collaboration, but also provides a more user-friendly and scalable solution to the authentication ceremonies currently used in end-to-end encrypted messengers like Signal or WhatsApp. PairSonic is available as open-source software: https://github.com/seemoo-lab/pairsonic
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Stacked Confusion Reject Plots (SCORE)
Authors:
Stephan Hasler,
Lydia Fischer
Abstract:
Machine learning is more and more applied in critical application areas like health and driver assistance. To minimize the risk of wrong decisions, in such applications it is necessary to consider the certainty of a classification to reject uncertain samples. An established tool for this are reject curves that visualize the trade-off between the number of rejected samples and classification perfor…
▽ More
Machine learning is more and more applied in critical application areas like health and driver assistance. To minimize the risk of wrong decisions, in such applications it is necessary to consider the certainty of a classification to reject uncertain samples. An established tool for this are reject curves that visualize the trade-off between the number of rejected samples and classification performance metrics. We argue that common reject curves are too abstract and hard to interpret by non-experts. We propose Stacked Confusion Reject Plots (SCORE) that offer a more intuitive understanding of the used data and the classifier's behavior. We present example plots on artificial Gaussian data to document the different options of SCORE and provide the code as a Python package.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Efficient Symbolic Planning with Views
Authors:
Stephan Hasler,
Daniel Tanneberg,
Michael Gienger
Abstract:
Robotic planning systems model spatial relations in detail as these are needed for manipulation tasks. In contrast to this, other physical attributes of objects and the effect of devices are usually oversimplified and expressed by abstract compound attributes. This limits the ability of planners to find alternative solutions. We propose to break these compound attributes down into a shared set of…
▽ More
Robotic planning systems model spatial relations in detail as these are needed for manipulation tasks. In contrast to this, other physical attributes of objects and the effect of devices are usually oversimplified and expressed by abstract compound attributes. This limits the ability of planners to find alternative solutions. We propose to break these compound attributes down into a shared set of elementary attributes. This strongly facilitates generalization between different tasks and environments and thus helps to find innovative solutions. On the down-side, this generalization comes with an increased complexity of the solution space. Therefore, as the main contribution of the paper, we propose a method that splits the planning problem into a sequence of views, where in each view only an increasing subset of attributes is considered. We show that this view-based strategy offers a good compromise between planning speed and quality of the found plan, and discuss its general applicability and limitations.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
To Help or Not to Help: LLM-based Attentive Support for Human-Robot Group Interactions
Authors:
Daniel Tanneberg,
Felix Ocker,
Stephan Hasler,
Joerg Deigmoeller,
Anna Belardinelli,
Chao Wang,
Heiko Wersing,
Bernhard Sendhoff,
Michael Gienger
Abstract:
How can a robot provide unobtrusive physical support within a group of humans? We present Attentive Support, a novel interaction concept for robots to support a group of humans. It combines scene perception, dialogue acquisition, situation understanding, and behavior generation with the common-sense reasoning capabilities of Large Language Models (LLMs). In addition to following user instructions,…
▽ More
How can a robot provide unobtrusive physical support within a group of humans? We present Attentive Support, a novel interaction concept for robots to support a group of humans. It combines scene perception, dialogue acquisition, situation understanding, and behavior generation with the common-sense reasoning capabilities of Large Language Models (LLMs). In addition to following user instructions, Attentive Support is capable of deciding when and how to support the humans, and when to remain silent to not disturb the group. With a diverse set of scenarios, we show and evaluate the robot's attentive behavior, which supports and helps the humans when required, while not disturbing if no help is needed.
△ Less
Submitted 24 April, 2025; v1 submitted 19 March, 2024;
originally announced March 2024.
-
LaMI: Large Language Models for Multi-Modal Human-Robot Interaction
Authors:
Chao Wang,
Stephan Hasler,
Daniel Tanneberg,
Felix Ocker,
Frank Joublin,
Antonello Ceravola,
Joerg Deigmoeller,
Michael Gienger
Abstract:
This paper presents an innovative large language model (LLM)-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: prov…
▽ More
This paper presents an innovative large language model (LLM)-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: providing high-level linguistic guidance, creating "atomic actions" and expressions the robot can use, and offering a set of examples. Implemented on a physical robot, it demonstrates proficiency in adapting to multi-modal inputs and determining the appropriate manner of action to assist humans with its arms, following researchers' defined guidelines. Simultaneously, it coordinates the robot's lid, neck, and ear movements with speech output to produce dynamic, multi-modal expressions. This showcases the system's potential to revolutionize HRI by shifting from conventional, manual state-and-flow design methods to an intuitive, guidance-based, and example-driven approach. Supplementary material can be found at https://hri-eu.github.io/Lami/
△ Less
Submitted 11 April, 2024; v1 submitted 26 January, 2024;
originally announced January 2024.
-
CoPAL: Corrective Planning of Robot Actions with Large Language Models
Authors:
Frank Joublin,
Antonello Ceravola,
Pavel Smirnov,
Felix Ocker,
Joerg Deigmoeller,
Anna Belardinelli,
Chao Wang,
Stephan Hasler,
Daniel Tanneberg,
Michael Gienger
Abstract:
In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless inter…
▽ More
In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless interplay between multiple cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans. We demonstrate the efficacy of the proposed feedback architecture, particularly its impact on executability, correctness, and time complexity via empirical evaluation in the context of a simulation and two intricate real-world scenarios: blocks world, barman and pizza preparation.
△ Less
Submitted 24 April, 2025; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Communicating Robot's Intentions while Assisting Users via Augmented Reality
Authors:
Chao Wang,
Theodoros Stouraitis,
Anna Belardinelli,
Stephan Hasler,
Michael Gienger
Abstract:
This paper explores the challenges faced by assistive robots in effectively cooperating with humans, requiring them to anticipate human behavior, predict their actions' impact, and generate understandable robot actions. The study focuses on a use-case involving a user with limited mobility needing assistance with pouring a beverage, where tasks like unscrewing a cap or reaching for objects demand…
▽ More
This paper explores the challenges faced by assistive robots in effectively cooperating with humans, requiring them to anticipate human behavior, predict their actions' impact, and generate understandable robot actions. The study focuses on a use-case involving a user with limited mobility needing assistance with pouring a beverage, where tasks like unscrewing a cap or reaching for objects demand coordinated support from the robot. Yet, anticipating the robot's intentions can be challenging for the user, which can hinder effective collaboration. To address this issue, we propose an innovative solution that utilizes Augmented Reality (AR) to communicate the robot's intentions and expected movements to the user, fostering a seamless and intuitive interaction.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
FIDO2 the Rescue? Platform vs. Roaming Authentication on Smartphones
Authors:
Leon Würsching,
Florentin Putz,
Steffen Haesler,
Matthias Hollick
Abstract:
Modern smartphones support FIDO2 passwordless authentication using either external security keys or internal biometric authentication, but it is unclear whether users appreciate and accept these new forms of web authentication for their own accounts. We present the first lab study (N=87) comparing platform and roaming authentication on smartphones, determining the practical strengths and weaknesse…
▽ More
Modern smartphones support FIDO2 passwordless authentication using either external security keys or internal biometric authentication, but it is unclear whether users appreciate and accept these new forms of web authentication for their own accounts. We present the first lab study (N=87) comparing platform and roaming authentication on smartphones, determining the practical strengths and weaknesses of FIDO2 as perceived by users in a mobile scenario. Most participants were willing to adopt passwordless authentication during our in-person user study, but closer analysis shows that participants prioritize usability, security, and availability differently depending on the account type. We identify remaining adoption barriers that prevent FIDO2 from succeeding password authentication, such as missing support for contemporary usage patterns, including account delegation and usage on multiple clients.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Explainable Human-Robot Training and Cooperation with Augmented Reality
Authors:
Chao Wang,
Anna Belardinelli,
Stephan Hasler,
Theodoros Stouraitis,
Daniel Tanneberg,
Michael Gienger
Abstract:
The current spread of social and assistive robotics applications is increasingly highlighting the need for robots that can be easily taught and interacted with, even by users with no technical background. Still, it is often difficult to grasp what such robots know or to assess if a correct representation of the task is being formed. Augmented Reality (AR) has the potential to bridge this gap. We d…
▽ More
The current spread of social and assistive robotics applications is increasingly highlighting the need for robots that can be easily taught and interacted with, even by users with no technical background. Still, it is often difficult to grasp what such robots know or to assess if a correct representation of the task is being formed. Augmented Reality (AR) has the potential to bridge this gap. We demonstrate three use cases where AR design elements enhance the explainability and efficiency of human-robot interaction: 1) a human teaching a robot some simple kitchen tasks by demonstration, 2) the robot showing its plan for solving novel tasks in AR to a human for validation, and 3) a robot communicating its intentions via AR while assisting people with limited mobility during daily activities.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Improved Multi-Source Domain Adaptation by Preservation of Factors
Authors:
Sebastian Schrom,
Stephan Hasler,
Jürgen Adamy
Abstract:
Domain Adaptation (DA) is a highly relevant research topic when it comes to image classification with deep neural networks. Combining multiple source domains in a sophisticated way to optimize a classification model can improve the generalization to a target domain. Here, the difference in data distributions of source and target image datasets plays a major role. In this paper, we describe based o…
▽ More
Domain Adaptation (DA) is a highly relevant research topic when it comes to image classification with deep neural networks. Combining multiple source domains in a sophisticated way to optimize a classification model can improve the generalization to a target domain. Here, the difference in data distributions of source and target image datasets plays a major role. In this paper, we describe based on a theory of visual factors how real-world scenes appear in images in general and how recent DA datasets are composed of such. We show that different domains can be described by a set of so called domain factors, whose values are consistent within a domain, but can change across domains. Many DA approaches try to remove all domain factors from the feature representation to be domain invariant. In this paper we show that this can lead to negative transfer since task-informative factors can get lost as well. To address this, we propose Factor-Preserving DA (FP-DA), a method to train a deep adversarial unsupervised DA model, which is able to preserve specific task relevant factors in a multi-domain scenario. We demonstrate on CORe50, a dataset with many domains, how such factors can be identified by standard one-to-one transfer experiments between single domains combined with PCA. By applying FP-DA, we show that the highest average and minimum performance can be achieved.
△ Less
Submitted 16 October, 2020; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Designing Interaction for Multi-agent Cooperative System in an Office Environment
Authors:
Chao Wang,
Stephan Hasler,
Manuel Muehlig,
Frank Joublin,
Antonello Ceravola,
Joerg Deigmoeller,
Lydia Fischer
Abstract:
Future intelligent system will involve very various types of artificial agents, such as mobile robots, smart home infrastructure or personal devices, which share data and collaborate with each other to execute certain tasks.Designing an efficient human-machine interface, which can support users to express needs to the system, supervise the collaboration progress of different entities and evaluate…
▽ More
Future intelligent system will involve very various types of artificial agents, such as mobile robots, smart home infrastructure or personal devices, which share data and collaborate with each other to execute certain tasks.Designing an efficient human-machine interface, which can support users to express needs to the system, supervise the collaboration progress of different entities and evaluate the result, will be challengeable. This paper presents the design and implementation of the human-machine interface of Intelligent Cyber-Physical system (ICPS),which is a multi-entity coordination system of robots and other smart devices in a working environment. ICPS gathers sensory data from entities and then receives users' command, then optimizes plans to utilize the capability of different entities to serve people. Using multi-model interaction methods, e.g. graphical interfaces, speech interaction, gestures and facial expressions, ICPS is able to receive inputs from users through different entities, keep users aware of the progress and accomplish the task efficiently
△ Less
Submitted 15 February, 2020;
originally announced February 2020.