Skip to main content

Showing 1–8 of 8 results for author: Bhandwaldar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.07097  [pdf, other

    cs.LG cs.AI cs.CL math.PR stat.ML

    Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning

    Authors: Nikhil Shivakumar Nayak, Krishnateja Killamsetty, Ligong Han, Abhishek Bhandwaldar, Prateek Chanda, Kai Xu, Hao Wang, Aldo Pareja, Oleg Silkin, Mustafa Eyceoz, Akash Srivastava

    Abstract: Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones. Existing methods typically rely on low-rank, parameter-efficient updates that limit the model's expressivity and introduce additional parameters per task, leading to scalability issues. To address these limitations, we pr… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 25 pages, 13 figures, 6 tables

    MSC Class: 68T50 ACM Class: I.2.0; G.3

  2. arXiv:2412.13337  [pdf, other

    cs.LG cs.AI stat.ML

    Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

    Authors: Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang, Krishnateja Killamsetty, Shivchander Sudalairaj, Wenlong Zhao, Seungwook Han, Abhishek Bhandwaldar, Guangxuan Xu, Kai Xu, Ligong Han, Luke Inglis, Akash Srivastava

    Abstract: The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fi… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 33 pages, 19 figures. Appendix included in submission. Submitted to ICLR 2025

    MSC Class: 53-04 ACM Class: I.2.7; I.2.6; I.2.4

  3. arXiv:2403.01081  [pdf, other

    cs.CL cs.LG

    LAB: Large-Scale Alignment for ChatBots

    Authors: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Kai Xu, David D. Cox, Akash Srivastava

    Abstract: This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Corresponding Author: Akash Srivastava. Equal Contribution: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Akash Srivastava, Code: https://github.com/instructlab

  4. arXiv:2310.04413  [pdf, other

    cs.LG cs.AI

    Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

    Authors: Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit Agrawal

    Abstract: Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirica… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Accepted NeurIPS 2023

    Journal ref: NeurIPS 2023

  5. arXiv:2110.06912  [pdf, other

    cs.RO cs.AI cs.LG

    OPEn: An Open-ended Physics Environment for Learning Without a Task

    Authors: Chuang Gan, Abhishek Bhandwaldar, Antonio Torralba, Joshua B. Tenenbaum, Phillip Isola

    Abstract: Humans have mental models that allow them to plan, experiment, and reason in the physical world. How should an intelligent agent go about learning such models? In this paper, we will study if models of the world learned in an open-ended physics environment, without any specific tasks, can be reused for downstream physics reasoning tasks. To this end, we build a benchmark Open-ended Physics ENviron… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: IROS 2021. Project page: http://open.csail.mit.edu/

  6. arXiv:2103.14025  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI

    Authors: Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh McDermott, Antonio Torralba, Joshua B. Tenenbaum

    Abstract: We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desi… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: Project page: http://tdw-transport.csail.mit.edu/

  7. arXiv:2102.12321  [pdf, other

    cs.AI cs.CV cs.LG

    AGENT: A Benchmark for Core Psychological Reasoning

    Authors: Tianmin Shu, Abhishek Bhandwaldar, Chuang Gan, Kevin A. Smith, Shari Liu, Dan Gutfreund, Elizabeth Spelke, Joshua B. Tenenbaum, Tomer D. Ullman

    Abstract: For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraint… ▽ More

    Submitted 25 July, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: ICML 2021, 12 pages, 7 figures

  8. arXiv:2007.04954  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

    Authors: Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

    Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedu… ▽ More

    Submitted 28 December, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Oral Presentation at NeurIPS 21 Datasets and Benchmarks Track. Project page: http://www.threedworld.org