-
Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
Authors:
Calarina Muslimani,
Kerrick Johnstonbaugh,
Suyog Chandramouli,
Serena Booth,
W. Bradley Knox,
Matthew E. Taylor
Abstract:
Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic: how do we know if a reward function is correctly…
▽ More
Reinforcement learning agents are fundamentally limited by the quality of the reward functions they learn from, yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally problematic: how do we know if a reward function is correctly specified? In our work, we address these challenges by focusing on reward alignment -- assessing whether a reward function accurately encodes the preferences of a human stakeholder. As a concrete measure of reward alignment, we introduce the Trajectory Alignment Coefficient to quantify the similarity between a human stakeholder's ranking of trajectory distributions and those induced by a given reward function. We show that the Trajectory Alignment Coefficient exhibits desirable properties, such as not requiring access to a ground truth reward, invariance to potential-based reward shaping, and applicability to online RL. Additionally, in an 11 -- person user study of RL practitioners, we found that access to the Trajectory Alignment Coefficient during reward selection led to statistically significant improvements. Compared to relying only on reward functions, our metric reduced cognitive workload by 1.5x, was preferred by 82% of users and increased the success rate of selecting reward functions that produced performant policies by 41%.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Automating the Practice of Science -- Opportunities, Challenges, and Implications
Authors:
Sebastian Musslick,
Laura K. Bartlett,
Suyog H. Chandramouli,
Marina Dubova,
Fernand Gobet,
Thomas L. Griffiths,
Jessica Hullman,
Ross D. King,
J. Nathan Kutz,
Christopher G. Lucas,
Suhas Mahesh,
Franco Pestilli,
Sabina J. Sloman,
William R. Holmes
Abstract:
Automation transformed various aspects of our human civilization, revolutionizing industries and streamlining processes. In the domain of scientific inquiry, automated approaches emerged as powerful tools, holding promise for accelerating discovery, enhancing reproducibility, and overcoming the traditional impediments to scientific progress. This article evaluates the scope of automation within sc…
▽ More
Automation transformed various aspects of our human civilization, revolutionizing industries and streamlining processes. In the domain of scientific inquiry, automated approaches emerged as powerful tools, holding promise for accelerating discovery, enhancing reproducibility, and overcoming the traditional impediments to scientific progress. This article evaluates the scope of automation within scientific practice and assesses recent approaches. Furthermore, it discusses different perspectives to the following questions: Where do the greatest opportunities lie for automation in scientific practice?; What are the current bottlenecks of automating scientific practice?; and What are significant ethical and practical consequences of automating scientific practice? By discussing the motivations behind automated science, analyzing the hurdles encountered, and examining its implications, this article invites researchers, policymakers, and stakeholders to navigate the rapidly evolving frontier of automated scientific practice.
△ Less
Submitted 27 August, 2024;
originally announced September 2024.
-
Online simulator-based experimental design for cognitive model selection
Authors:
Alexander Aushev,
Aini Putkonen,
Gregoire Clarte,
Suyog Chandramouli,
Luigi Acerbi,
Samuel Kaski,
Andrew Howes
Abstract:
The problem of model selection with a limited number of experimental trials has received considerable attention in cognitive science, where the role of experiments is to discriminate between theories expressed as computational models. Research on this subject has mostly been restricted to optimal experiment design with analytically tractable models. However, cognitive models of increasing complexi…
▽ More
The problem of model selection with a limited number of experimental trials has received considerable attention in cognitive science, where the role of experiments is to discriminate between theories expressed as computational models. Research on this subject has mostly been restricted to optimal experiment design with analytically tractable models. However, cognitive models of increasing complexity, with intractable likelihoods, are becoming more commonplace. In this paper, we propose BOSMOS: an approach to experimental design that can select between computational models without tractable likelihoods. It does so in a data-efficient manner, by sequentially and adaptively generating informative experiments. In contrast to previous approaches, we introduce a novel simulator-based utility objective for design selection, and a new approximation of the model likelihood for model selection. In simulated experiments, we demonstrate that the proposed BOSMOS technique can accurately select models in up to 2 orders of magnitude less time than existing LFI alternatives for three cognitive science tasks: memory retention, sequential signal detection and risky choice.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Interactive Causal Structure Discovery in Earth System Sciences
Authors:
Laila Melkas,
Rafael Savvides,
Suyog Chandramouli,
Jarmo Mäkelä,
Tuomo Nieminen,
Ivan Mammarella,
Kai Puolamäki
Abstract:
Causal structure discovery (CSD) models are making inroads into several domains, including Earth system sciences. Their widespread adaptation is however hampered by the fact that the resulting models often do not take into account the domain knowledge of the experts and that it is often necessary to modify the resulting models iteratively. We present a workflow that is required to take this knowle…
▽ More
Causal structure discovery (CSD) models are making inroads into several domains, including Earth system sciences. Their widespread adaptation is however hampered by the fact that the resulting models often do not take into account the domain knowledge of the experts and that it is often necessary to modify the resulting models iteratively. We present a workflow that is required to take this knowledge into account and to apply CSD algorithms in Earth system sciences. At the same time, we describe open research questions that still need to be addressed. We present a way to interactively modify the outputs of the CSD algorithms and argue that the user interaction can be modelled as a greedy finding of the local maximum-a-posteriori solution of the likelihood function, which is composed of the likelihood of the causal model and the prior distribution representing the knowledge of the expert user. We use a real-world data set for examples constructed in collaboration with our co-authors, who are the domain area experts. We show that finding maximally usable causal models in the Earth system sciences or other similar domains is a difficult task which contains many interesting open research questions. We argue that taking the domain knowledge into account has a substantial effect on the final causal models discovered.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
A note on the rationing of divisible and indivisible goods in a general network
Authors:
Shyam Chandramouli,
Jay Sethuraman
Abstract:
The study of matching theory has gained importance recently with applications in Kidney Exchange, House Allocation, School Choice etc. The general theme of these problems is to allocate goods in a fair manner amongst participating agents. The agents generally have a unit supply/demand of a good that they want to exchange with other agents. On the other hand, Bochet et al. study a more general vers…
▽ More
The study of matching theory has gained importance recently with applications in Kidney Exchange, House Allocation, School Choice etc. The general theme of these problems is to allocate goods in a fair manner amongst participating agents. The agents generally have a unit supply/demand of a good that they want to exchange with other agents. On the other hand, Bochet et al. study a more general version of the problem where they allow for agents to have arbitrary number of divisible goods to be rationed to other agents in the network. In this current work, our main focus is on non-bipartite networks where agents have arbitrary units of a homogeneous indivisible good that they want to exchange with their neighbors. Our aim is to develop mechanisms that would identify a fair and strategyproof allocation for the agents in the network. Thus, we generalize the kidney exchange problem to that of a network with arbitrary capacity of available goods. Our main idea is that this problem and a couple of other related versions of non-bipartite fair allocation problem can be suitably transformed to one of fair allocations on bipartite networks for which we know of well studied fair allocation mechanisms.
△ Less
Submitted 17 January, 2020;
originally announced February 2020.
-
Strategyproof and Consistent Rules for Bipartite Flow Problems
Authors:
Shyam S Chandramouli,
Jay Sethuraman
Abstract:
We continue the study of Bochet et al. and Moulin and Sethuraman on fair allocation in bipartite networks. In these models, there is a moneyless market, in which a non-storable, homogeneous commodity is reallocated between agents with single-peaked preferences. Agents are either suppliers or demanders. While the egalitarian rule of Bochet et al. satisfies pareto optimality, no envy and strategypro…
▽ More
We continue the study of Bochet et al. and Moulin and Sethuraman on fair allocation in bipartite networks. In these models, there is a moneyless market, in which a non-storable, homogeneous commodity is reallocated between agents with single-peaked preferences. Agents are either suppliers or demanders. While the egalitarian rule of Bochet et al. satisfies pareto optimality, no envy and strategyproof, it is not consistent. On the other hand, the work of Moulin and Sethuraman is related to consistent allocations and rules that are extensions of the uniform rule. We bridge the two streams of work by introducing the edge fair mechanism which is both consistent and groupstrategyproof. On the way, we explore the "price of consistency" i.e. how the notion of consistency is fundamentally incompatible with certain notions of fairness like Lorenz Dominance and No-Envy. The current work also introduces the idea of strong invariance as desideratum for groupstrategyproofness and generalizes the proof of Chandramouli and Sethuraman to a more broader class of mechanisms. Finally, we conclude with the study of the edge fair mechanism in a transshipment model where the strategic agents are on the links connecting different supply/demand locations.
△ Less
Submitted 16 April, 2013; v1 submitted 14 April, 2013;
originally announced April 2013.
-
Groupstrategyproofness of the Egalitarian Mechanism for Constrained Rationing Problems
Authors:
Shyam S Chandramouli,
Jay Sethuraman
Abstract:
The key contribution of the paper is a comprehensive study of the egalitarian mechanism with respect to manipulation by a coalition of agents. Our main result is that the egalitarian mechanism is, in fact, peak group strategyproof : no coalition of agents can (weakly) benefit from jointly misreporting their peaks. Furthermore, we show that the egalitarian mechanism cannot be manipulated by any coa…
▽ More
The key contribution of the paper is a comprehensive study of the egalitarian mechanism with respect to manipulation by a coalition of agents. Our main result is that the egalitarian mechanism is, in fact, peak group strategyproof : no coalition of agents can (weakly) benefit from jointly misreporting their peaks. Furthermore, we show that the egalitarian mechanism cannot be manipulated by any coalition of suppliers (or any coalition of demanders) in the model where both the suppliers and demanders are agents. Our proofs shed light on the structure of the two models and simpify some of the earlier proofs of strategyproofness in the earlier papers. An implication of our results is that the well known algorithm of Megiddo to compute a lexicographically optimal flow in a network is group strategyproof with respect to the source capacities (or sink capacities).
△ Less
Submitted 22 July, 2011;
originally announced July 2011.