Search | arXiv e-print repository

How does Your RL Agent Explore? An Optimal Transport Analysis of Occupancy Measure Trajectories

Authors: Reabetswe M. Nkhumise, Debabrota Basu, Tony J. Prescott, Aditya Gilra

Abstract: The rising successes of RL are propelled by combining smart algorithmic strategies and deep architectures to optimize the distribution of returns and visitations over the state-action space. A quantitative framework to compare the learning processes of these eclectic RL algorithms is currently absent but desired in practice. We address this gap by representing the learning process of an RL algorit… ▽ More The rising successes of RL are propelled by combining smart algorithmic strategies and deep architectures to optimize the distribution of returns and visitations over the state-action space. A quantitative framework to compare the learning processes of these eclectic RL algorithms is currently absent but desired in practice. We address this gap by representing the learning process of an RL algorithm as a sequence of policies generated during training, and then studying the policy trajectory induced in the manifold of state-action occupancy measures. Using an optimal transport-based metric, we measure the length of the paths induced by the policy sequence yielded by an RL algorithm between an initial policy and a final optimal policy. Hence, we first define the 'Effort of Sequential Learning' (ESL). ESL quantifies the relative distance that an RL algorithm travels compared to the shortest path from the initial to the optimal policy. Further, we connect the dynamics of policies in the occupancy measure space and regret (another metric to understand the suboptimality of an RL algorithm), by defining the 'Optimal Movement Ratio' (OMR). OMR assesses the fraction of movements in the occupancy measure space that effectively reduce an analogue of regret. Finally, we derive approximation guarantees to estimate ESL and OMR with finite number of samples and without access to an optimal policy. Through empirical analyses across various environments and algorithms, we demonstrate that ESL and OMR provide insights into the exploration processes of RL algorithms and hardness of different tasks in discrete and continuous MDPs. △ Less

Submitted 16 October, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2102.11914 [pdf, other]

A Robotic Model of Hippocampal Reverse Replay for Reinforcement Learning

Authors: Matthew T. Whelan, Tony J. Prescott, Eleni Vasilaki

Abstract: Hippocampal reverse replay is thought to contribute to learning, and particularly reinforcement learning, in animals. We present a computational model of learning in the hippocampus that builds on a previous model of the hippocampal-striatal network viewed as implementing a three-factor reinforcement learning rule. To augment this model with hippocampal reverse replay, a novel policy gradient lear… ▽ More Hippocampal reverse replay is thought to contribute to learning, and particularly reinforcement learning, in animals. We present a computational model of learning in the hippocampus that builds on a previous model of the hippocampal-striatal network viewed as implementing a three-factor reinforcement learning rule. To augment this model with hippocampal reverse replay, a novel policy gradient learning rule is derived that associates place cell activity with responses in cells representing actions. This new model is evaluated using a simulated robot spatial navigation task inspired by the Morris water maze. Results show that reverse replay can accelerate learning from reinforcement, whilst improving stability and robustness over multiple trials. As implied by the neurobiological data, our study implies that reverse replay can make a significant positive contribution to reinforcement learning, although learning that is less efficient and less stable is possible in its absence. We conclude that reverse replay may enhance reinforcement learning in the mammalian hippocampal-striatal system rather than provide its core mechanism. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: 39 pages, 6 figures, 2 tables, journal, submitted to Bioinspiration and Biomimetics

arXiv:1911.03446 [pdf, other]

doi 10.1038/s41467-021-20901-5

Scaling advantage in quantum simulation of geometrically frustrated magnets

Authors: Andrew D. King, Jack Raymond, Trevor Lanting, Sergei V. Isakov, Masoud Mohseni, Gabriel Poulin-Lamarre, Sara Ejtemaee, William Bernoudy, Isil Ozfidan, Anatoly Yu. Smirnov, Mauricio Reis, Fabio Altomare, Michael Babcock, Catia Baron, Andrew J. Berkley, Kelly Boothby, Paul I. Bunyk, Holly Christiani, Colin Enderud, Bram Evert, Richard Harris, Emile Hoskinson, Shuiyuan Huang, Kais Jooya, Ali Khodabandelou , et al. (29 additional authors not shown)

Abstract: The promise of quantum computing lies in harnessing programmable quantum devices for practical applications such as efficient simulation of quantum materials and condensed matter systems. One important task is the simulation of geometrically frustrated magnets in which topological phenomena can emerge from competition between quantum and thermal fluctuations. Here we report on experimental observa… ▽ More The promise of quantum computing lies in harnessing programmable quantum devices for practical applications such as efficient simulation of quantum materials and condensed matter systems. One important task is the simulation of geometrically frustrated magnets in which topological phenomena can emerge from competition between quantum and thermal fluctuations. Here we report on experimental observations of relaxation in such simulations, measured on up to 1440 qubits with microsecond resolution. By initializing the system in a state with topological obstruction, we observe quantum annealing (QA) relaxation timescales in excess of one microsecond. Measurements indicate a dynamical advantage in the quantum simulation over the classical approach of path-integral Monte Carlo (PIMC) fixed-Hamiltonian relaxation with multiqubit cluster updates. The advantage increases with both system size and inverse temperature, exceeding a million-fold speedup over a CPU. This is an important piece of experimental evidence that in general, PIMC does not mimic QA dynamics for stoquastic Hamiltonians. The observed scaling advantage, for simulation of frustrated magnetism in quantum condensed matter, demonstrates that near-term quantum devices can be used to accelerate computational tasks of practical relevance. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: 7 pages, 4 figures, 22 pages of supplemental material with 18 figures

arXiv:1706.03661 [pdf, other]

doi 10.1109/TCDS.2017.2754143

DAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express Knowledge About the World and the Self

Authors: Clément Moulin-Frier, Tobias Fischer, Maxime Petit, Grégoire Pointeau, Jordi-Ysard Puigbo, Ugo Pattacini, Sock Ching Low, Daniel Camilleri, Phuong Nguyen, Matej Hoffmann, Hyung Jin Chang, Martina Zambelli, Anne-Laure Mealier, Andreas Damianou, Giorgio Metta, Tony J. Prescott, Yiannis Demiris, Peter Ford Dominey, Paul F. M. J. Verschure

Abstract: This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor… ▽ More This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users. △ Less

Submitted 18 September, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

Comments: Preprint version; final version available at http://ieeexplore.ieee.org/ IEEE Transactions on Cognitive and Developmental Systems (Accepted) DOI: 10.1109/TCDS.2017.2754143

Journal ref: IEEE Transactions on Cognitive and Developmental Systems 10 (4), 1005-1022, 2018

arXiv:1611.02695 [pdf, other]

Automatic recognition of child speech for robotic applications in noisy environments

Authors: Samuel Fernando, Roger K. Moore, David Cameron, Emily C. Collins, Abigail Millings, Amanda J. Sharkey, Tony J. Prescott

Abstract: Automatic speech recognition (ASR) allows a natural and intuitive interface for robotic educational applications for children. However there are a number of challenges to overcome to allow such an interface to operate robustly in realistic settings, including the intrinsic difficulties of recognising child speech and high levels of background noise often present in classrooms. As part of the EU EA… ▽ More Automatic speech recognition (ASR) allows a natural and intuitive interface for robotic educational applications for children. However there are a number of challenges to overcome to allow such an interface to operate robustly in realistic settings, including the intrinsic difficulties of recognising child speech and high levels of background noise often present in classrooms. As part of the EU EASEL project we have provided several contributions to address these challenges, implementing our own ASR module for use in robotics applications. We used the latest deep neural network algorithms which provide a leap in performance over the traditional GMM approach, and apply data augmentation methods to improve robustness to noise and speaker variation. We provide a close integration between the ASR module and the rest of the dialogue system, allowing the ASR to receive in real-time the language models relevant to the current section of the dialogue, greatly improving the accuracy. We integrated our ASR module into an interactive, multimodal system using a small humanoid robot to help children learn about exercise and energy. The system was installed at a public museum event as part of a research study where 320 children (aged 3 to 14) interacted with the robot, with our ASR achieving 90% accuracy for fluent and near-fluent speech. △ Less

Submitted 8 November, 2016; originally announced November 2016.

Comments: Submission to Computer Speech and Language, special issue on Interaction Technologies for Children

arXiv:1606.06104 [pdf, other]

Impact of robot responsiveness and adult involvement on children's social behaviours in human-robot interaction

Authors: David Cameron, Samuel Fernando, Emily Collins, Abigail Millings, Roger Moore, Amanda Sharkey, Tony Prescott

Abstract: A key challenge in developing engaging social robots is creating convincing, autonomous and responsive agents, which users perceive, and treat, as social beings. As a part of the collaborative project: Expressive Agents for Symbiotic Education and Learning (EASEL), this study examines the impact of autonomous response to children's speech, by the humanoid robot Zeno, on their interactions with it… ▽ More A key challenge in developing engaging social robots is creating convincing, autonomous and responsive agents, which users perceive, and treat, as social beings. As a part of the collaborative project: Expressive Agents for Symbiotic Education and Learning (EASEL), this study examines the impact of autonomous response to children's speech, by the humanoid robot Zeno, on their interactions with it as a social entity. Results indicate that robot autonomy and adult assistance during HRI can substantially influence children's behaviour during interaction and their affect after. Children working with a fully-autonomous, responsive robot demonstrated greater physical activity following robot instruction than those working with a less responsive robot, which required adult assistance to interact with. During dialogue with the robot, children working with the fully-autonomous robot also looked towards the robot in anticipation of its vocalisations on more occasions. In contrast, a less responsive robot, requiring adult assistance to interact with, led to greater self-report positive affect and more occasions of children looking to the robot in response to its vocalisations. We discuss the broader implications of these findings in terms of anthropomorphism of social robots and in relation to the overall project strategy to further the understanding of how interactions with social robots could lead to task-appropriate symbiotic relationships. △ Less

Submitted 20 June, 2016; originally announced June 2016.

Comments: 5th International Symposium on New Frontiers in Human-Robot Interaction 2016 (arXiv:1602.05456)

Report number: AISB-NFHRI/2016/07

Showing 1–6 of 6 results for author: Prescott, T