-
Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions
Authors:
Ashwin George,
Luciano Cavalcante Siebert,
David A. Abbink,
Arkady Zgonnikov
Abstract:
Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial inte…
▽ More
Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions. Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration
Authors:
Angelo Caregnato-Neto,
Luciano Cavalcante Siebert,
Arkady Zgonnikov,
Marcos Ricardo Omena de Albuquerque Maximo,
Rubens Junqueira Magalhães Afonso
Abstract:
One of the key issues in human-robot collaboration is the development of computational models that allow robots to predict and adapt to human behavior. Much progress has been achieved in developing such models, as well as control techniques that address the autonomy problems of motion planning and decision-making in robotics. However, the integration of computational models of human behavior with…
▽ More
One of the key issues in human-robot collaboration is the development of computational models that allow robots to predict and adapt to human behavior. Much progress has been achieved in developing such models, as well as control techniques that address the autonomy problems of motion planning and decision-making in robotics. However, the integration of computational models of human behavior with such control techniques still poses a major challenge, resulting in a bottleneck for efficient collaborative human-robot teams. In this context, we present a novel architecture for human-robot collaboration: Adaptive Robot Motion for Collaboration with Humans using Adversarial Inverse Reinforcement learning (ARMCHAIR). Our solution leverages adversarial inverse reinforcement learning and model predictive control to compute optimal trajectories and decisions for a mobile multi-robot system that collaborates with a human in an exploration task. During the mission, ARMCHAIR operates without human intervention, autonomously identifying the necessity to support and acting accordingly. Our approach also explicitly addresses the network connectivity requirement of the human-robot team. Extensive simulation-based evaluations demonstrate that ARMCHAIR allows a group of robots to safely support a simulated human in an exploration scenario, preventing collisions and network disconnections, and improving the overall performance of the task.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems
Authors:
Enrico Liscio,
Luciano C. Siebert,
Catholijn M. Jonker,
Pradeep K. Murukannaiah
Abstract:
Understanding citizens' values in participatory systems is crucial for citizen-centric policy-making. We envision a hybrid participatory system where participants make choices and provide motivations for those choices, and AI agents estimate their value preferences by interacting with them. We focus on situations where a conflict is detected between participants' choices and motivations, and propo…
▽ More
Understanding citizens' values in participatory systems is crucial for citizen-centric policy-making. We envision a hybrid participatory system where participants make choices and provide motivations for those choices, and AI agents estimate their value preferences by interacting with them. We focus on situations where a conflict is detected between participants' choices and motivations, and propose methods for estimating value preferences while addressing detected inconsistencies by interacting with the participants. We operationalize the philosophical stance that "valuing is deliberatively consequential." That is, if a participant's choice is based on a deliberation of value preferences, the value preferences can be observed in the motivation the participant provides for the choice. Thus, we propose and compare value preferences estimation methods that prioritize the values estimated from motivations over the values estimated from choices alone. Then, we introduce a disambiguation strategy that combines Natural Language Processing and Active Learning to address the detected inconsistencies between choices and motivations. We evaluate the proposed methods on a dataset of a large-scale survey on energy transition. The results show that explicitly addressing inconsistencies between choices and motivations improves the estimation of an individual's value preferences. The disambiguation strategy does not show substantial improvements when compared to similar baselines--however, we discuss how the novelty of the approach can open new research avenues and propose improvements to address the current limitations.
△ Less
Submitted 11 February, 2025; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Explaining Learned Reward Functions with Counterfactual Trajectories
Authors:
Jan Wehner,
Frans Oliehoek,
Luciano Cavalcante Siebert
Abstract:
Learning rewards from human behaviour or feedback is a promising approach to aligning AI systems with human values but fails to consistently extract correct reward functions. Interpretability tools could enable users to understand and evaluate possible flaws in learned reward functions. We propose Counterfactual Trajectory Explanations (CTEs) to interpret reward functions in reinforcement learning…
▽ More
Learning rewards from human behaviour or feedback is a promising approach to aligning AI systems with human values but fails to consistently extract correct reward functions. Interpretability tools could enable users to understand and evaluate possible flaws in learned reward functions. We propose Counterfactual Trajectory Explanations (CTEs) to interpret reward functions in reinforcement learning by contrasting an original with a counterfactual partial trajectory and the rewards they each receive. We derive six quality criteria for CTEs and propose a novel Monte-Carlo-based algorithm for generating CTEs that optimises these quality criteria. Finally, we measure how informative the generated explanations are to a proxy-human model by training it on CTEs. CTEs are demonstrably informative for the proxy-human model, increasing the similarity between its predictions and the reward function on unseen trajectories. Further, it learns to accurately judge differences in rewards between trajectories and generalises to out-of-distribution examples. Although CTEs do not lead to a perfect understanding of the reward, our method, and more generally the adaptation of XAI methods, are presented as a fruitful approach for interpreting learned reward functions.
△ Less
Submitted 15 October, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Reflective Hybrid Intelligence for Meaningful Human Control in Decision-Support Systems
Authors:
Catholijn M. Jonker,
Luciano Cavalcante Siebert,
Pradeep K. Murukannaiah
Abstract:
With the growing capabilities and pervasiveness of AI systems, societies must collectively choose between reduced human autonomy, endangered democracies and limited human rights, and AI that is aligned to human and social values, nurturing collaboration, resilience, knowledge and ethical behaviour. In this chapter, we introduce the notion of self-reflective AI systems for meaningful human control…
▽ More
With the growing capabilities and pervasiveness of AI systems, societies must collectively choose between reduced human autonomy, endangered democracies and limited human rights, and AI that is aligned to human and social values, nurturing collaboration, resilience, knowledge and ethical behaviour. In this chapter, we introduce the notion of self-reflective AI systems for meaningful human control over AI systems. Focusing on decision support systems, we propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches to create AI systems responsive to human values and social norms. We also propose a possible research approach to design and develop self-reflective capability in AI systems. Finally, we argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI), thus increasing meaningful human control and empowering human moral reasoning by providing comprehensible information and insights on possible human moral blind spots.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Feasible Action-Space Reduction as a Metric of Causal Responsibility in Multi-Agent Spatial Interactions
Authors:
Ashwin George,
Luciano Cavalcante Siebert,
David Abbink,
Arkady Zgonnikov
Abstract:
Modelling causal responsibility in multi-agent spatial interactions is crucial for safety and efficiency of interactions of humans with autonomous agents. However, current formal metrics and models of responsibility either lack grounding in ethical and philosophical concepts of responsibility, or cannot be applied to spatial interactions. In this work we propose a metric of causal responsibility w…
▽ More
Modelling causal responsibility in multi-agent spatial interactions is crucial for safety and efficiency of interactions of humans with autonomous agents. However, current formal metrics and models of responsibility either lack grounding in ethical and philosophical concepts of responsibility, or cannot be applied to spatial interactions. In this work we propose a metric of causal responsibility which is tailored to multi-agent spatial interactions, for instance interactions in traffic. In such interactions, a given agent can, by reducing another agent's feasible action space, influence the latter. Therefore, we propose feasible action space reduction (FeAR) as a metric of causal responsibility among agents. Specifically, we look at ex-post causal responsibility for simultaneous actions. We propose the use of Moves de Rigueur (MdR) - a consistent set of prescribed actions for agents - to model the effect of norms on responsibility allocation. We apply the metric in a grid world simulation for spatial interactions and show how the actions, contexts, and norms affect the causal responsibility ascribed to agents. Finally, we demonstrate the application of this metric in complex multi-agent interactions. We argue that the FeAR metric is a step towards an interdisciplinary framework for quantifying responsibility that is needed to ensure safety and meaningful human control in human-AI systems.
△ Less
Submitted 13 November, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
MARL-iDR: Multi-Agent Reinforcement Learning for Incentive-based Residential Demand Response
Authors:
Jasper van Tilburg,
Luciano C. Siebert,
Jochen L. Cremer
Abstract:
This paper presents a decentralized Multi-Agent Reinforcement Learning (MARL) approach to an incentive-based Demand Response (DR) program, which aims to maintain the capacity limits of the electricity grid and prevent grid congestion by financially incentivizing residential consumers to reduce their energy consumption. The proposed approach addresses the key challenge of coordinating heterogeneous…
▽ More
This paper presents a decentralized Multi-Agent Reinforcement Learning (MARL) approach to an incentive-based Demand Response (DR) program, which aims to maintain the capacity limits of the electricity grid and prevent grid congestion by financially incentivizing residential consumers to reduce their energy consumption. The proposed approach addresses the key challenge of coordinating heterogeneous preferences and requirements from multiple participants while preserving their privacy and minimizing financial costs for the aggregator. The participant agents use a novel Disjunctively Constrained Knapsack Problem optimization to curtail or shift the requested household appliances based on the selected demand reduction. Through case studies with electricity data from $25$ households, the proposed approach effectively reduced energy consumption's Peak-to-Average ratio (PAR) by $14.48$% compared to the original PAR while fully preserving participant privacy. This approach has the potential to significantly improve the efficiency and reliability of the electricity grid, making it an important contribution to the management of renewable energy resources and the growing electricity demand.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning
Authors:
Markus Peschl,
Arkady Zgonnikov,
Frans A. Oliehoek,
Luciano C. Siebert
Abstract:
Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (…
▽ More
Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms into a Pareto-optimal policy. Through maintaining a distribution over scalarization weights, our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies. We empirically demonstrate the effectiveness of MORAL in two scenarios, which model a delivery and an emergency task that require an agent to act in the presence of normative conflicts. Overall, we consider our research a step towards multi-objective RL with learned rewards, bridging the gap between current reward learning and machine ethics literature.
△ Less
Submitted 30 December, 2021;
originally announced January 2022.
-
Meaningful human control: actionable properties for AI system development
Authors:
Luciano Cavalcante Siebert,
Maria Luce Lupetti,
Evgeni Aizenberg,
Niek Beckers,
Arkady Zgonnikov,
Herman Veluwenkamp,
David Abbink,
Elisa Giaccardi,
Geert-Jan Houben,
Catholijn M. Jonker,
Jeroen van den Hoven,
Deborah Forster,
Reginald L. Lagendijk
Abstract:
How can humans remain in control of artificial intelligence (AI)-based systems designed to perform tasks autonomously? Such systems are increasingly ubiquitous, creating benefits - but also undesirable situations where moral responsibility for their actions cannot be properly attributed to any particular person or group. The concept of meaningful human control has been proposed to address responsi…
▽ More
How can humans remain in control of artificial intelligence (AI)-based systems designed to perform tasks autonomously? Such systems are increasingly ubiquitous, creating benefits - but also undesirable situations where moral responsibility for their actions cannot be properly attributed to any particular person or group. The concept of meaningful human control has been proposed to address responsibility gaps and mitigate them by establishing conditions that enable a proper attribution of responsibility for humans; however, clear requirements for researchers, designers, and engineers are yet inexistent, making the development of AI-based systems that remain under meaningful human control challenging. In this paper, we address the gap between philosophical theory and engineering practice by identifying, through an iterative process of abductive thinking, four actionable properties for AI-based systems under meaningful human control, which we discuss making use of two applications scenarios: automated vehicles and AI-based hiring. First, a system in which humans and AI algorithms interact should have an explicitly defined domain of morally loaded situations within which the system ought to operate. Second, humans and AI agents within the system should have appropriate and mutually compatible representations. Third, responsibility attributed to a human should be commensurate with that human's ability and authority to control the system. Fourth, there should be explicit links between the actions of the AI agents and actions of humans who are aware of their moral responsibility. We argue that these four properties will support practically-minded professionals to take concrete steps toward designing and engineering for AI systems that facilitate meaningful human control.
△ Less
Submitted 19 May, 2022; v1 submitted 25 November, 2021;
originally announced December 2021.
-
Improving Confidence in the Estimation of Values and Norms
Authors:
Luciano Cavalcante Siebert,
Rijk Mercuur,
Virginia Dignum,
Jeroen van den Hoven,
Catholijn Jonker
Abstract:
Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what ex…
▽ More
Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent (SHA) based on its actions in the ultimatum game. We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis. We found that both methods are able to increase the confidence in estimating human values and norms, but differ in their applicability, the latter being more efficient when the number of interactions with the agent is to be minimized. These insights are useful to improve the alignment of AAs with human values and norms.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.