Search | arXiv e-print repository

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning

Authors: Sheila Schoepp, Masoud Jafaripour, Yingyue Cao, Tianpei Yang, Fatemeh Abdollahi, Shadan Golestan, Zahin Sufiyan, Osmar R. Zaiane, Matthew E. Taylor

Abstract: Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LL… ▽ More Reinforcement learning (RL) has shown impressive results in sequential decision-making tasks. Meanwhile, Large Language Models (LLMs) and Vision-Language Models (VLMs) have emerged, exhibiting impressive capabilities in multimodal understanding and reasoning. These advances have led to a surge of research integrating LLMs and VLMs into RL. In this survey, we review representative works in which LLMs and VLMs are used to overcome key challenges in RL, such as lack of prior knowledge, long-horizon planning, and reward design. We present a taxonomy that categorizes these LLM/VLM-assisted RL approaches into three roles: agent, planner, and reward. We conclude by exploring open problems, including grounding, bias mitigation, improved representations, and action advice. By consolidating existing research and identifying future directions, this survey establishes a framework for integrating LLMs and VLMs into RL, advancing approaches that unify natural language and visual understanding with sequential decision-making. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: 9 pages, 4 figures

arXiv:2501.14271 [pdf, other]

TLXML: Task-Level Explanation of Meta-Learning via Influence Functions

Authors: Yoshihiro Mitsuka, Shadan Golestan, Zahin Sufiyan, Sheila Schoepp, Shotaro Miwa, Osmar R. Zaiane

Abstract: The scheme of adaptation via meta-learning is seen as an ingredient for solving the problem of data shortage or distribution shift in real-world applications, but it also brings the new risk of inappropriate updates of the model in the user environment, which increases the demand for explainability. Among the various types of XAI methods, establishing a method of explanation based on past experien… ▽ More The scheme of adaptation via meta-learning is seen as an ingredient for solving the problem of data shortage or distribution shift in real-world applications, but it also brings the new risk of inappropriate updates of the model in the user environment, which increases the demand for explainability. Among the various types of XAI methods, establishing a method of explanation based on past experience in meta-learning requires special consideration due to its bi-level structure of training, which has been left unexplored. In this work, we propose influence functions for explaining meta-learning that measure the sensitivities of training tasks to adaptation and inference. We also argue that the approximation of the Hessian using the Gauss-Newton matrix resolves computational barriers peculiar to meta-learning. We demonstrate the adequacy of the method through experiments on task distinction and task distribution distinction using image classification tasks with MAML and Prototypical Network. △ Less

Submitted 7 February, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

Comments: 22 pages; v2: modification in metadata

arXiv:2501.03405 [pdf, other]

A Study of the Efficacy of Generative Flow Networks for Robotics and Machine Fault-Adaptation

Authors: Zahin Sufiyan, Shadan Golestan, Shotaro Miwa, Yoshihiro Mitsuka, Osmar Zaiane

Abstract: Advancements in robotics have opened possibilities to automate tasks in various fields such as manufacturing, emergency response and healthcare. However, a significant challenge that prevents robots from operating in real-world environments effectively is out-of-distribution (OOD) situations, wherein robots encounter unforseen situations. One major OOD situations is when robots encounter faults, m… ▽ More Advancements in robotics have opened possibilities to automate tasks in various fields such as manufacturing, emergency response and healthcare. However, a significant challenge that prevents robots from operating in real-world environments effectively is out-of-distribution (OOD) situations, wherein robots encounter unforseen situations. One major OOD situations is when robots encounter faults, making fault adaptation essential for real-world operation for robots. Current state-of-the-art reinforcement learning algorithms show promising results but suffer from sample inefficiency, leading to low adaptation speed due to their limited ability to generalize to OOD situations. Our research is a step towards adding hardware fault tolerance and fast fault adaptability to machines. In this research, our primary focus is to investigate the efficacy of generative flow networks in robotic environments, particularly in the domain of machine fault adaptation. We simulated a robotic environment called Reacher in our experiments. We modify this environment to introduce four distinct fault environments that replicate real-world machines/robot malfunctions. The empirical evaluation of this research indicates that continuous generative flow networks (CFlowNets) indeed have the capability to add adaptive behaviors in machines under adversarial conditions. Furthermore, the comparative analysis of CFlowNets with reinforcement learning algorithms also provides some key insights into the performance in terms of adaptation speed and sample efficiency. Additionally, a separate study investigates the implications of transferring knowledge from pre-fault task to post-fault environments. Our experiments confirm that CFlowNets has the potential to be deployed in a real-world machine and it can demonstrate adaptability in case of malfunctions to maintain functionality. △ Less

Submitted 6 January, 2025; originally announced January 2025.

arXiv:2407.15283 [pdf, other]

Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Authors: Sheila Schoepp, Mehran Taghian, Shotaro Miwa, Yoshihiro Mitsuka, Shadan Golestan, Osmar Zaïane

Abstract: Industry is rapidly moving towards fully autonomous and interconnected systems that can detect and adapt to changing conditions, including machine hardware faults. Traditional methods for adding hardware fault tolerance to machines involve duplicating components and algorithmically reconfiguring a machine's processes when a fault occurs. However, the growing interest in reinforcement learning-base… ▽ More Industry is rapidly moving towards fully autonomous and interconnected systems that can detect and adapt to changing conditions, including machine hardware faults. Traditional methods for adding hardware fault tolerance to machines involve duplicating components and algorithmically reconfiguring a machine's processes when a fault occurs. However, the growing interest in reinforcement learning-based robotic control offers a new perspective on achieving hardware fault tolerance. However, limited research has explored the potential of these approaches for hardware fault tolerance in machines. This paper investigates the potential of two state-of-the-art reinforcement learning algorithms, Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), to enhance hardware fault tolerance into machines. We assess the performance of these algorithms in two OpenAI Gym simulated environments, Ant-v2 and FetchReach-v1. Robot models in these environments are subjected to six simulated hardware faults. Additionally, we conduct an ablation study to determine the optimal method for transferring an agent's knowledge, acquired through learning in a normal (pre-fault) environment, to a (post-)fault environment in a continual learning setting. Our results demonstrate that reinforcement learning-based approaches can enhance hardware fault tolerance in simulated machines, with adaptation occurring within minutes. Specifically, PPO exhibits the fastest adaptation when retaining the knowledge within its models, while SAC performs best when discarding all acquired knowledge. Overall, this study highlights the potential of reinforcement learning-based approaches, such as PPO and SAC, for hardware fault tolerance in machines. These findings pave the way for the development of robust and adaptive machines capable of effectively operating in real-world scenarios. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2309.05784 [pdf, other]

Grey-box Bayesian Optimization for Sensor Placement in Assisted Living Environments

Authors: Shadan Golestan, Omid Ardakanian, Pierre Boulanger

Abstract: Optimizing the configuration and placement of sensors is crucial for reliable fall detection, indoor localization, and activity recognition in assisted living spaces. We propose a novel, sample-efficient approach to find a high-quality sensor placement in an arbitrary indoor space based on grey-box Bayesian optimization and simulation-based evaluation. Our key technical contribution lies in captur… ▽ More Optimizing the configuration and placement of sensors is crucial for reliable fall detection, indoor localization, and activity recognition in assisted living spaces. We propose a novel, sample-efficient approach to find a high-quality sensor placement in an arbitrary indoor space based on grey-box Bayesian optimization and simulation-based evaluation. Our key technical contribution lies in capturing domain-specific knowledge about the spatial distribution of activities and incorporating it into the iterative selection of query points in Bayesian optimization. Considering two simulated indoor environments and a real-world dataset containing human activities and sensor triggers, we show that our proposed method performs better compared to state-of-the-art black-box optimization techniques in identifying high-quality sensor placements, leading to accurate activity recognition in terms of F1-score, while also requiring a significantly lower (51.3% on average) number of expensive function queries. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:1807.10986 [pdf]

A Comprehensive Review of Technologies Used for Screening, Assessment, and Rehabilitation of Autism Spectrum Disorder

Authors: Shadan Golestan, Pegah Soleiman, Hadi Moradi

Abstract: Autism Spectrum Disorder (ASD) is an umbrella term for a wide range of developmental disorders. For the past two decades, researchers proposed the use of various technologies in order to tackle specific symptoms of the disorder. Although there exist many literature reviews about screening, assessment, and rehabilitation of ASD, no comprehensive survey of types of technologies in all defined sympto… ▽ More Autism Spectrum Disorder (ASD) is an umbrella term for a wide range of developmental disorders. For the past two decades, researchers proposed the use of various technologies in order to tackle specific symptoms of the disorder. Although there exist many literature reviews about screening, assessment, and rehabilitation of ASD, no comprehensive survey of types of technologies in all defined symptoms of ASD has been presented. Therefore, in this paper a comprehensive survey of previous studies has been presented in which the studies are classified into three main categories, and several sub-categories, and three main technologies. An analysis of the number of studies in each category and sub-category is given to help researchers decide on areas which need further investigation. The analysis show that the majority of studies fall into the software-based systems technology category. Finally, a brief review of studies in each category of ASD is presented for each type of technology. As a result, this paper also helps researchers to obtain an overview of the typical methods of using a specific technology in ASD screening, assessment, and rehabilitation. △ Less

Submitted 28 July, 2018; originally announced July 2018.

Comments: 12 pages, 6 figures, 2 tables

Showing 1–6 of 6 results for author: Golestan, S