-
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning
Authors:
Siddharth Aravindan,
Dixant Mittal,
Wee Sun Lee
Abstract:
Posterior Sampling for Reinforcement Learning (PSRL) is a well-known algorithm that augments model-based reinforcement learning (MBRL) algorithms with Thompson sampling. PSRL maintains posterior distributions of the environment transition dynamics and the reward function, which are intractable for tasks with high-dimensional state and action spaces. Recent works show that dropout, used in conjunct…
▽ More
Posterior Sampling for Reinforcement Learning (PSRL) is a well-known algorithm that augments model-based reinforcement learning (MBRL) algorithms with Thompson sampling. PSRL maintains posterior distributions of the environment transition dynamics and the reward function, which are intractable for tasks with high-dimensional state and action spaces. Recent works show that dropout, used in conjunction with neural networks, induces variational distributions that can approximate these posteriors. In this paper, we propose Event-based Variational Distributions for Exploration (EVaDE), which are variational distributions that are useful for MBRL, especially when the underlying domain is object-based. We leverage the general domain knowledge of object-based domains to design three types of event-based convolutional layers to direct exploration. These layers rely on Gaussian dropouts and are inserted between the layers of the deep neural network model to help facilitate variational Thompson sampling. We empirically show the effectiveness of EVaDE-equipped Simulated Policy Learning (EVaDE-SimPLe) on the 100K Atari game suite.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Instructive3D: Editing Large Reconstruction Models with Text Instructions
Authors:
Kunal Kathare,
Ankit Dhiman,
K Vikas Gowda,
Siddharth Aravindan,
Shubham Monga,
Basavaraja Shanthappa Vandrotti,
Lokesh R Boregowda
Abstract:
Transformer based methods have enabled users to create, modify, and comprehend text and image data. Recently proposed Large Reconstruction Models (LRMs) further extend this by providing the ability to generate high-quality 3D models with the help of a single object image. These models, however, lack the ability to manipulate or edit the finer details, such as adding standard design patterns or cha…
▽ More
Transformer based methods have enabled users to create, modify, and comprehend text and image data. Recently proposed Large Reconstruction Models (LRMs) further extend this by providing the ability to generate high-quality 3D models with the help of a single object image. These models, however, lack the ability to manipulate or edit the finer details, such as adding standard design patterns or changing the color and reflectance of the generated objects, thus lacking fine-grained control that may be very helpful in domains such as augmented reality, animation and gaming. Naively training LRMs for this purpose would require generating precisely edited images and 3D object pairs, which is computationally expensive. In this paper, we propose Instructive3D, a novel LRM based model that integrates generation and fine-grained editing, through user text prompts, of 3D objects into a single model. We accomplish this by adding an adapter that performs a diffusion process conditioned on a text prompt specifying edits in the triplane latent space representation of 3D object models. Our method does not require the generation of edited 3D objects. Additionally, Instructive3D allows us to perform geometrically consistent modifications, as the edits done through user-defined text prompts are applied to the triplane latent representation thus enhancing the versatility and precision of 3D objects generated. We compare the objects generated by Instructive3D and a baseline that first generates the 3D object meshes using a standard LRM model and then edits these 3D objects using text prompts when images are provided from the Objaverse LVIS dataset. We find that Instructive3D produces qualitatively superior 3D objects with the properties specified by the edit prompts.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search
Authors:
Dixant Mittal,
Siddharth Aravindan,
Wee Sun Lee
Abstract:
Online tree-based search algorithms iteratively simulate trajectories and update action-values for a set of states stored in a tree structure. It works reasonably well in practice but fails to effectively utilise the information gathered from similar states. Depending upon the smoothness of the action-value function, one approach to overcoming this issue is through online learning, where informati…
▽ More
Online tree-based search algorithms iteratively simulate trajectories and update action-values for a set of states stored in a tree structure. It works reasonably well in practice but fails to effectively utilise the information gathered from similar states. Depending upon the smoothness of the action-value function, one approach to overcoming this issue is through online learning, where information is interpolated among similar states; Policy Gradient Search provides a practical algorithm to achieve this. However, Policy Gradient Search lacks an explicit exploration mechanism, which is a key feature of tree-based online search algorithms. In this paper, we propose an efficient and effective online search algorithm called Exploratory Policy Gradient Search (ExPoSe), which leverages information sharing among states by updating the search policy parameters directly, while incorporating a well-defined exploration mechanism during the online search process. We evaluate ExPoSe on a range of decision-making problems, including Atari games, Sokoban, and Hamiltonian cycle search in sparse graphs. The results demonstrate that ExPoSe consistently outperforms other popular online search algorithms across all domains. The ExPoSe source code is available at \textit{\url{https://github.com/dixantmittal/ExPoSe}}.
△ Less
Submitted 4 March, 2023; v1 submitted 3 February, 2022;
originally announced February 2022.
-
State-Aware Variational Thompson Sampling for Deep Q-Networks
Authors:
Siddharth Aravindan,
Wee Sun Lee
Abstract:
Thompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a high dimensional state-action space. We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are per…
▽ More
Thompson sampling is a well-known approach for balancing exploration and exploitation in reinforcement learning. It requires the posterior distribution of value-action functions to be maintained; this is generally intractable for tasks that have a high dimensional state-action space. We derive a variational Thompson sampling approximation for DQNs which uses a deep network whose parameters are perturbed by a learned variational noise distribution. We interpret the successful NoisyNets method \cite{fortunato2018noisy} as an approximation to the variational Thompson sampling method that we derive. Further, we propose State Aware Noisy Exploration (SANE) which seeks to improve on NoisyNets by allowing a non-uniform perturbation, where the amount of parameter perturbation is conditioned on the state of the agent. This is done with the help of an auxiliary perturbation module, whose output is state dependent and is learnt end to end with gradient descent. We hypothesize that such state-aware noisy exploration is particularly useful in problems where exploration in certain \textit{high risk} states may result in the agent failing badly. We demonstrate the effectiveness of the state-aware exploration method in the off-policy setting by augmenting DQNs with the auxiliary perturbation module.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
An Analysis of Frame-skipping in Reinforcement Learning
Authors:
Shivaram Kalyanakrishnan,
Siddharth Aravindan,
Vishwajeet Bagdawat,
Varun Bhatt,
Harshith Goka,
Archit Gupta,
Kalpesh Krishna,
Vihari Piratla
Abstract:
In the practice of sequential decision making, agents are often designed to sense state at regular intervals of $d$ time steps, $d > 1$, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially be…
▽ More
In the practice of sequential decision making, agents are often designed to sense state at regular intervals of $d$ time steps, $d > 1$, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially better policies when run with $d > 1$ -- in fact with $d$ even as high as $180$. In this paper, we investigate the role of the parameter $d$ in RL; $d$ is called the "frame-skip" parameter, since states in the Atari domain are images. For evaluating a fixed policy, we observe that under standard conditions, frame-skipping does not affect asymptotic consistency. Depending on other parameters, it can possibly even benefit learning. To use $d > 1$ in the control setting, one must first specify which $d$-step open-loop action sequences can be executed in between sensing steps. We focus on "action-repetition", the common restriction of this choice to $d$-length sequences of the same action. We define a task-dependent quantity called the "price of inertia", in terms of which we upper-bound the loss incurred by action-repetition. We show that this loss may be offset by the gain brought to learning by a smaller task horizon. Our analysis is supported by experiments on different tasks and learning algorithms.
△ Less
Submitted 6 February, 2021;
originally announced February 2021.
-
Learning to Prune Deep Neural Networks via Reinforcement Learning
Authors:
Manas Gupta,
Siddharth Aravindan,
Aleksandra Kalisz,
Vijay Chandrasekhar,
Lin Jie
Abstract:
This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. This enables PuRL to achieve sparsity and accuracy comparable to current state-of-the-art methods, while having a much shorte…
▽ More
This paper proposes PuRL - a deep reinforcement learning (RL) based algorithm for pruning neural networks. Unlike current RL based model compression approaches where feedback is given only at the end of each episode to the agent, PuRL provides rewards at every pruning step. This enables PuRL to achieve sparsity and accuracy comparable to current state-of-the-art methods, while having a much shorter training cycle. PuRL achieves more than 80% sparsity on the ResNet-50 model while retaining a Top-1 accuracy of 75.37% on the ImageNet dataset. Through our experiments we show that PuRL is also able to sparsify already efficient architectures like MobileNet-V2. In addition to performance characterisation experiments, we also provide a discussion and analysis of the various RL design choices that went into the tuning of the Markov Decision Process underlying PuRL. Lastly, we point out that PuRL is simple to use and can be easily adapted for various architectures.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.