Skip to main content

Showing 1–11 of 11 results for author: Berges, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.14151  [pdf, other

    cs.CV cs.AI cs.RO

    Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

    Authors: Sergio Arnaud, Paul McVay, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Michael Rabbat, Nicolas Ballas, Mido Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

    Abstract: We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp." LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, LOCATE 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world depl… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    ACM Class: I.2.10; I.2.6; I.2.9; I.3.7; I.4.6; I.4.8

  2. arXiv:2502.20389  [pdf, ps, other

    cs.CV

    From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation from 2D VLMs

    Authors: Ang Cao, Sergio Arnaud, Oleksandr Maksymets, Jianing Yang, Ayush Jain, Sriram Yenamandra, Ada Martin, Vincent-Pierre Berges, Paul McVay, Ruslan Partsey, Aravind Rajeswaran, Franziska Meier, Justin Johnson, Jeong Joon Park, Alexander Sax

    Abstract: 3D vision-language grounding faces a fundamental data bottleneck: while 2D models train on billions of images, 3D models have access to only thousands of labeled scenes--a six-order-of-magnitude gap that severely limits performance. We introduce $\textbf{LIFT-GS}$, a practical distillation technique that overcomes this limitation by using differentiable rendering to bridge 3D and 2D supervision. L… ▽ More

    Submitted 9 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Project page: https://liftgs.github.io

  3. arXiv:2412.09764  [pdf, other

    cs.CL cs.AI

    Memory Layers at Scale

    Authors: Vincent-Pierre Berges, Barlas Oğuz, Daniel Haziza, Wen-tau Yih, Luke Zettlemoyer, Gargi Ghosh

    Abstract: Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstre… ▽ More

    Submitted 20 December, 2024; v1 submitted 12 December, 2024; originally announced December 2024.

  4. arXiv:2310.13724  [pdf, other

    cs.HC cs.AI cs.CV cs.GR cs.MA cs.RO

    Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

    Authors: Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

    Abstract: We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real h… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Project page: http://aihabitat.org/habitat3

  5. arXiv:2310.02219  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

    Authors: Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

    Abstract: We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study involves five different PVRs, each trained for five distinct manipulation or indoor navigation tasks. We performed this evaluation using three different robots and two different policy learning paradigms. From this effort, we c… ▽ More

    Submitted 13 July, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Project website https://pvrs-sim2real.github.io/

    MSC Class: 68T45 (Primary) 68T40; 68T05(Secondary) ACM Class: I.2.9; I.2.6; I.4.8; I.5.4

  6. arXiv:2306.07552  [pdf, other

    cs.LG cs.AI cs.RO

    Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

    Authors: Vincent-Pierre Berges, Andrew Szot, Devendra Singh Chaplot, Aaron Gokaslan, Roozbeh Mottaghi, Dhruv Batra, Eric Undersander

    Abstract: We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, a… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  7. arXiv:2303.18240  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

    Authors: Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

    Abstract: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of… ▽ More

    Submitted 1 February, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

    Comments: Project website: https://eai-vc.github.io

  8. arXiv:2204.13226  [pdf, other

    cs.CV cs.LG

    Offline Visual Representation Learning for Embodied Navigation

    Authors: Karmesh Yadav, Ram Ramrakhya, Arjun Majumdar, Vincent-Pierre Berges, Sachit Kuhar, Dhruv Batra, Alexei Baevski, Oleksandr Maksymets

    Abstract: How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g. predicting the action taken between two successive observations). In this paper, we show that an alternative 2-stage strategy is far more effectiv… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: 15 pages, 4 figures, 7 tables and supplementary

  9. arXiv:2111.05992  [pdf, other

    cs.LG cs.AI

    On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning

    Authors: Andrew Cohen, Ervin Teng, Vincent-Pierre Berges, Ruo-Ping Dong, Hunter Henry, Marwan Mattar, Alexander Zook, Sujoy Ganguly

    Abstract: The creation and destruction of agents in cooperative multi-agent reinforcement learning (MARL) is a critically under-explored area of research. Current MARL algorithms often assume that the number of agents within a group remains fixed throughout an experiment. However, in many practical problems, an agent may terminate before their teammates. This early termination issue presents a challenge: th… ▽ More

    Submitted 6 June, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

    Comments: RL in Games Workshop AAAI 2022

  10. arXiv:1902.01378  [pdf, other

    cs.AI cs.LG

    Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

    Authors: Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange

    Abstract: The rapid pace of recent research in AI has been driven in part by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to competitive video games. We propose a new benchmark - Obstacle Tower: a high fidelity, 3D, 3rd person, procedurally generated environment. An agent playing Obstacle Tower must… ▽ More

    Submitted 1 July, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: IJCAI 2019

  11. arXiv:1809.02627  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Unity: A General Platform for Intelligent Agents

    Authors: Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar, Danny Lange

    Abstract: Recent advances in artificial intelligence have been driven by the presence of increasingly realistic and complex simulated environments. However, many of the existing environments provide either unrealistic visuals, inaccurate physics, low task complexity, restricted agent perspective, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to f… ▽ More

    Submitted 6 May, 2020; v1 submitted 7 September, 2018; originally announced September 2018.