Skip to main content

Showing 1–15 of 15 results for author: Puig, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.04040  [pdf, other

    cs.AI cs.RO

    ADAPT: Actively Discovering and Adapting to Preferences for any Task

    Authors: Maithili Patel, Xavier Puig, Ruta Desai, Roozbeh Mottaghi, Sonia Chernova, Joanne Truong, Akshara Rai

    Abstract: Assistive agents should be able to perform under-specified long-horizon tasks while respecting user preferences. We introduce Actively Discovering and Adapting to Preferences for any Task (ADAPT) -- a benchmark designed to evaluate agents' ability to adhere to user preferences across various household tasks through active questioning. Next, we propose Reflection-DPO, a novel training approach for… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  2. arXiv:2504.00907  [pdf, other

    cs.AI

    Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning

    Authors: Ram Ramrakhya, Matthew Chang, Xavier Puig, Ruta Desai, Zsolt Kira, Roozbeh Mottaghi

    Abstract: Embodied agents operating in real-world environments must interpret ambiguous and under-specified human instructions. A capable household robot should recognize ambiguity and ask relevant clarification questions to infer the user intent accurately, leading to more effective task execution. To study this problem, we introduce the Ask-to-Act task, where an embodied agent must fetch a specific object… ▽ More

    Submitted 1 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  3. arXiv:2502.05271  [pdf, other

    cs.RO

    RobotMover: Learning to Move Large Objects From Human Demonstrations

    Authors: Tianyu Li, Joanne Truong, Jimmy Yang, Alexander Clegg, Akshara Rai, Sehoon Ha, Xavier Puig

    Abstract: Moving large objects, such as furniture or appliances, is a critical capability for robots operating in human environments. This task presents unique challenges, including whole-body coordination to avoid collisions and managing the dynamics of bulky, heavy objects. In this work, we present RobotMover, a learning-based system for large object manipulation that uses human-object interaction demonst… ▽ More

    Submitted 13 May, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

  4. arXiv:2411.00081  [pdf, other

    cs.RO cs.AI

    PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

    Authors: Matthew Chang, Gunjan Chhablani, Alexander Clegg, Mikael Dallaire Cote, Ruta Desai, Michal Hlavac, Vladimir Karashchuk, Jacob Krantz, Roozbeh Mottaghi, Priyam Parashar, Siddharth Patki, Ishita Prasad, Xavier Puig, Akshara Rai, Ram Ramrakhya, Daniel Tran, Joanne Truong, John M. Turner, Eric Undersander, Tsung-Yen Yang

    Abstract: We present a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration (PARTNR) designed to study human-robot coordination in household activities. PARTNR tasks exhibit characteristics of everyday tasks, such as spatial, temporal, and heterogeneous agent capability constraints. We employ a semi-automated task generation pipeline using Large Language Models (LLMs), incorporating simul… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

    Comments: Alphabetical author order

  5. arXiv:2407.12061  [pdf, other

    cs.HC cs.AI cs.RO

    Situated Instruction Following

    Authors: So Yeon Min, Xavi Puig, Devendra Singh Chaplot, Tsung-Yen Yang, Akshara Rai, Priyam Parashar, Ruslan Salakhutdinov, Yonatan Bisk, Roozbeh Mottaghi

    Abstract: Language is never spoken in a vacuum. It is expressed, comprehended, and contextualized within the holistic backdrop of the speaker's history, actions, and environment. Since humans are used to communicating efficiently with situated language, the practicality of robotic assistants hinge on their ability to understand and act upon implicit and situated instructions. In traditional instruction foll… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: European Conference on Computer Vision 2024 (ECCV 2024)

  6. arXiv:2312.03913  [pdf, other

    cs.CV

    Controllable Human-Object Interaction Synthesis

    Authors: Jiaman Li, Alexander Clegg, Roozbeh Mottaghi, Jiajun Wu, Xavier Puig, C. Karen Liu

    Abstract: Synthesizing semantic-aware, long-horizon, human-object interaction is critical to simulate realistic human behaviors. In this work, we address the challenging problem of generating synchronized object motion and human motion guided by language descriptions in 3D scenes. We propose Controllable Human-Object Interaction Synthesis (CHOIS), an approach that generates object motion and human motion si… ▽ More

    Submitted 14 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: ECCV 2024, project webpage: https://lijiaman.github.io/projects/chois/

  7. arXiv:2310.13724  [pdf, other

    cs.HC cs.AI cs.CV cs.GR cs.MA cs.RO

    Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

    Authors: Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

    Abstract: We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real h… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Project page: http://aihabitat.org/habitat3

  8. arXiv:2304.02061  [pdf, other

    cs.CV

    Generating Continual Human Motion in Diverse 3D Scenes

    Authors: Aymen Mir, Xavier Puig, Angjoo Kanazawa, Gerard Pons-Moll

    Abstract: We introduce a method to synthesize animator guided human motion across 3D scenes. Given a set of sparse (3 or 4) joint locations (such as the location of a person's hand and two feet) and a seed motion sequence in a 3D scene, our method generates a plausible motion sequence starting from the seed motion while satisfying the constraints imposed by the provided keypoints. We decompose the continual… ▽ More

    Submitted 2 February, 2025; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Webpage: https://virtualhumans.mpi-inf.mpg.de/origin_2/

  9. arXiv:2301.05223  [pdf, other

    cs.RO cs.AI cs.LG cs.MA

    NOPA: Neurally-guided Online Probabilistic Assistance for Building Socially Intelligent Home Assistants

    Authors: Xavier Puig, Tianmin Shu, Joshua B. Tenenbaum, Antonio Torralba

    Abstract: In this work, we study how to build socially intelligent robots to assist people in their homes. In particular, we focus on assistance with online goal inference, where robots must simultaneously infer humans' goals and how to help them achieve those goals. Prior assistance methods either lack the adaptivity to adjust helping strategies (i.e., when and how to help) in response to uncertainty about… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: Project website: https://www.tshu.io/online_watch_and_help. Code: https://github.com/xavierpuigf/online_watch_and_help

  10. arXiv:2202.01771  [pdf, other

    cs.LG cs.CL

    Pre-Trained Language Models for Interactive Decision-Making

    Authors: Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

    Abstract: Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and generalization in general sequential decision-making problems. In this approach, goals and observations are represented as a sequence of embeddings, and a policy network i… ▽ More

    Submitted 29 October, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  11. arXiv:2106.05258  [pdf, other

    cs.CV

    Generative Models as a Data Source for Multiview Representation Learning

    Authors: Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip Isola

    Abstract: Generative models are now capable of producing highly realistic images that look nearly indistinguishable from the data on which they are trained. This raises the question: if we have good enough generative models, do we still need datasets? We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from dat… ▽ More

    Submitted 15 March, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  12. arXiv:2010.09890  [pdf, other

    cs.AI cs.LG cs.MA

    Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration

    Authors: Xavier Puig, Tianmin Shu, Shuang Li, Zilin Wang, Yuan-Hong Liao, Joshua B. Tenenbaum, Sanja Fidler, Antonio Torralba

    Abstract: In this paper, we introduce Watch-And-Help (WAH), a challenge for testing social intelligence in agents. In WAH, an AI agent needs to help a human-like agent perform a complex household task efficiently. To succeed, the AI agent needs to i) understand the underlying goal of the task by watching a single demonstration of the human-like agent performing the same task (social perception), and ii) coo… ▽ More

    Submitted 3 May, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: ICLR 2021

  13. arXiv:1806.07011  [pdf, other

    cs.CV cs.AI cs.LG

    VirtualHome: Simulating Household Activities via Programs

    Authors: Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba

    Abstract: In this paper, we are interested in modeling complex activities that occur in a typical household. We propose to use programs, i.e., sequences of atomic actions and interactions, as a high level representation of complex tasks. Programs are interesting because they provide a non-ambiguous representation of a task, and allow agents to execute them. However, nowadays, there is no database providing… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: CVPR 2018 (Oral)

  14. arXiv:1703.08769  [pdf, other

    cs.CV cs.AI

    Open Vocabulary Scene Parsing

    Authors: Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

    Abstract: Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets. In this paper, we propose a new task that aims at parsing scenes with a large and open vocabulary, and several evaluation metrics are explored for this problem. Our proposed approach to this problem is a joint image pixel and word concept embeddings framew… ▽ More

    Submitted 4 April, 2017; v1 submitted 26 March, 2017; originally announced March 2017.

  15. arXiv:1608.05442  [pdf, other

    cs.CV

    Semantic Understanding of Scenes through the ADE20K Dataset

    Authors: Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

    Abstract: Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community's efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse anno… ▽ More

    Submitted 16 October, 2018; v1 submitted 18 August, 2016; originally announced August 2016.

    Comments: IJCV extension