Skip to main content

Showing 1–10 of 10 results for author: Hansen-Estruch, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.09755  [pdf, other

    cs.CV cs.AI

    Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

    Authors: Philippe Hansen-Estruch, David Yan, Ching-Yao Chung, Orr Zohar, Jialiang Wang, Tingbo Hou, Tao Xu, Sriram Vishwanath, Peter Vajda, Xinlei Chen

    Abstract: Visual tokenization via auto-encoding empowers state-of-the-art image and video generative models by compressing pixels into a latent space. Although scaling Transformer-based generators has been central to recent advances, the tokenizer component itself is rarely scaled, leaving open questions about how auto-encoder design choices influence both its objective of reconstruction and downstream gene… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 28 pages, 25 figures, 7 Tables

    ACM Class: I.2.10; I.4.2; I.4.5

  2. arXiv:2412.10360  [pdf, other

    cs.CV cs.AI

    Apollo: An Exploration of Video Understanding in Large Multimodal Models

    Authors: Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, Xide Xia

    Abstract: Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made without proper justification or analysis. The high computational cost of training and evaluating such models, coupled with limited open research, hinders… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: https://apollo-lmms.github.io

  3. arXiv:2408.08441  [pdf, other

    cs.LG cs.RO

    D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

    Authors: Rafael Rafailov, Kyle Hatch, Anikait Singh, Laura Smith, Aviral Kumar, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip Ball, Jiajun Wu, Chelsea Finn, Sergey Levine

    Abstract: Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetu… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: RLC 2024

  4. arXiv:2407.09533  [pdf, other

    cs.CV cs.AI

    Video Occupancy Models

    Authors: Manan Tomar, Philippe Hansen-Estruch, Philip Bachman, Alex Lamb, John Langford, Matthew E. Taylor, Sergey Levine

    Abstract: We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding th… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  5. arXiv:2406.17688  [pdf, other

    cs.CV cs.AI

    Unified Auto-Encoding with Masked Diffusion

    Authors: Philippe Hansen-Estruch, Sriram Vishwanath, Amy Zhang, Manan Tomar

    Abstract: At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarit… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 19 Pages, 8 Figures, 3Tables

    ACM Class: I.2.10

  6. arXiv:2308.12952  [pdf, other

    cs.RO cs.LG

    BridgeData V2: A Dataset for Robot Learning at Scale

    Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

    Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 9 pages

  7. arXiv:2307.00117  [pdf, other

    cs.RO cs.LG

    Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

    Authors: Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

    Abstract: Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsig… ▽ More

    Submitted 17 August, 2023; v1 submitted 30 June, 2023; originally announced July 2023.

    Comments: 15 pages, 5 figures

    Journal ref: Conference on Robot Learning (CoRL), 2023

  8. arXiv:2304.10573  [pdf, other

    cs.LG cs.AI

    IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

    Authors: Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine

    Abstract: Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizi… ▽ More

    Submitted 19 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 9 Pages, 4 Figures, 3 Tables

  9. arXiv:2204.13060  [pdf, other

    cs.LG

    Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

    Authors: Philippe Hansen-Estruch, Amy Zhang, Ashvin Nair, Patrick Yin, Sergey Levine

    Abstract: Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent… ▽ More

    Submitted 16 May, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: ICML 2022. 20 Pages, 15 Figures, 4 Tables. Website at https://sites.google.com/view/gc-bisimulation

    MSC Class: 68T07 ACM Class: I.2.8

  10. arXiv:2104.02844  [pdf, other

    eess.SY cs.AI

    GEM: Group Enhanced Model for Learning Dynamical Control Systems

    Authors: Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin

    Abstract: Learning the dynamics of a physical system wherein an autonomous agent operates is an important task. Often these systems present apparent geometric structures. For instance, the trajectories of a robotic manipulator can be broken down into a collection of its transitional and rotational motions, fully characterized by the corresponding Lie groups and Lie algebras. In this work, we take advantage… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 14 pages, 8 figures