Skip to main content

Showing 1–2 of 2 results for author: Gejji, A

.
  1. arXiv:2506.09985  [pdf, ps, other

    cs.AI cs.CV cs.LG cs.RO

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Authors: Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xiaodong Ma , et al. (5 additional authors not shown)

    Abstract: A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data (robot trajectories), to develop models capable of understanding, predicting, and planning in the physical world. We first pre-train an action-free joint-embedding-… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 48 pages, 19 figures

  2. arXiv:2504.14151  [pdf, other

    cs.CV cs.AI cs.RO

    Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

    Authors: Sergio Arnaud, Paul McVay, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Michael Rabbat, Nicolas Ballas, Mido Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

    Abstract: We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp." LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, LOCATE 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world depl… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    ACM Class: I.2.10; I.2.6; I.2.9; I.3.7; I.4.6; I.4.8