Skip to main content

Showing 1–50 of 294 results for author: Finn, C

.
  1. arXiv:2506.18123  [pdf, ps, other

    cs.RO cs.LG

    RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies

    Authors: Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, Jonathan Tremblay, Kanav Arora, Kirsty Ellis, Luca Macesanu, Matthew Leonard, Meedeum Cho, Ozgur Aslan, Shivin Dass, Jie Wang, Xingfang Yuan, Xuning Yang, Abhishek Gupta, Dinesh Jayaraman, Glen Berseth, Kostas Daniilidis , et al. (5 additional authors not shown)

    Abstract: Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benchmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized ''robot challenges'', and do not readily scale to evaluating generalist policies across a broad range of tasks and environ… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Website: https://robo-arena.github.io/

  2. arXiv:2506.07505  [pdf, ps, other

    cs.LG cs.AI

    Reinforcement Learning via Implicit Imitation Guidance

    Authors: Perry Dong, Alec M. Lessing, Annie S. Chen, Chelsea Finn

    Abstract: We study the problem of sample efficient reinforcement learning, where prior data such as demonstrations are provided for initialization in lieu of a dense reward signal. A natural approach is to incorporate an imitation learning objective, either as regularization during training or to acquire a reference policy. However, imitation learning objectives can ultimately degrade long-term performance,… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  3. arXiv:2506.05256  [pdf, ps, other

    cs.AI cs.LG

    Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

    Authors: Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber

    Abstract: Large reasoning models (LRMs) achieve higher performance on challenging reasoning tasks by generating more tokens at inference time, but this verbosity often wastes computation on easy problems. Existing solutions, including supervised finetuning on shorter traces, user-controlled budgets, or RL with uniform penalties, either require data curation, manual configuration, or treat all problems alike… ▽ More

    Submitted 5 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  4. arXiv:2505.10251  [pdf, ps, other

    cs.RO

    SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning

    Authors: Ji Woong Kim, Juo-Tung Chen, Pascal Hansen, Lucy X. Shi, Antony Goldenberg, Samuel Schmidgall, Paul Maria Scheikl, Anton Deguet, Brandon M. White, De Ru Tsai, Richard Cha, Jeffrey Jopling, Chelsea Finn, Axel Krieger

    Abstract: Research on autonomous surgery has largely focused on simple task automation in controlled environments. However, real-world surgical applications demand dexterous manipulation over extended durations and robust generalization to the inherent variability of human tissue. These challenges remain difficult to address using existing logic-based or conventional end-to-end learning strategies. To addre… ▽ More

    Submitted 17 June, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  5. arXiv:2505.09561  [pdf, ps, other

    cs.RO cs.AI cs.LG

    Learning Long-Context Diffusion Policies via Past-Token Prediction

    Authors: Marcel Torne, Andy Tang, Yuejiang Liu, Chelsea Finn

    Abstract: Reasoning over long sequences of observations and actions is essential for many robotic tasks. Yet, learning effective long-context policies from demonstrations remains challenging. As context length increases, training becomes increasingly expensive due to rising memory demands, and policy performance often degrades as a result of spurious correlations. Recent methods typically sidestep these iss… ▽ More

    Submitted 19 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: Videos are available at https://long-context-dp.github.io

  6. arXiv:2505.08078  [pdf, other

    cs.RO cs.AI

    What Matters for Batch Online Reinforcement Learning in Robotics?

    Authors: Perry Dong, Suvir Mirchandani, Dorsa Sadigh, Chelsea Finn

    Abstract: The ability to learn from large batches of autonomously collected data for policy improvement -- a paradigm we refer to as batch online reinforcement learning -- holds the promise of enabling truly scalable robot learning by significantly reducing the need for human effort of data collection while getting benefits from self-improvement. Yet, despite the promise of this paradigm, it remains challen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2504.16925  [pdf, other

    cs.RO cs.AI

    Latent Diffusion Planning for Imitation Learning

    Authors: Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn

    Abstract: Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-f… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  8. arXiv:2504.16054  [pdf, other

    cs.LG cs.RO

    $π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

    Authors: Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren , et al. (11 additional authors not shown)

    Abstract: In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  9. arXiv:2503.22020  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

    Authors: Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Ming-Yu Liu, Donglai Xiang, Gordon Wetzstein, Tsung-Yi Lin

    Abstract: Vision-language-action models (VLAs) have shown potential in leveraging pretrained vision-language models and diverse robot demonstrations for learning generalizable sensorimotor control. While this paradigm effectively utilizes large-scale data from both robotic and non-robotic sources, current VLAs primarily focus on direct input--output mappings, lacking the intermediate reasoning steps crucial… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Project website: https://cot-vla.github.io/

    Journal ref: CVPR 2025

  10. arXiv:2503.03707  [pdf, other

    cs.RO cs.AI cs.LG

    Curating Demonstrations using Online Experience

    Authors: Annie S. Chen, Alec M. Lessing, Yuejiang Liu, Chelsea Finn

    Abstract: Many robot demonstration datasets contain heterogeneous demonstrations of varying quality. This heterogeneity may benefit policy pre-training, but can hinder robot performance when used with a final imitation learning objective. In particular, some strategies in the data may be less reliable than others or may be underrepresented in the data, leading to poor performance when such strategies are sa… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  11. arXiv:2502.19645  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

    Authors: Moo Jin Kim, Chelsea Finn, Percy Liang

    Abstract: Recent vision-language-action models (VLAs) build upon pretrained vision-language models and leverage diverse robot datasets to demonstrate strong task execution, language following ability, and semantic generalization. Despite these successes, VLAs struggle with novel robot setups and require fine-tuning to achieve good performance, yet how to most effectively fine-tune them is unclear given many… ▽ More

    Submitted 28 April, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: Accepted to Robotics: Science and Systems (RSS) 2025. Project website: https://openvla-oft.github.io/

  12. arXiv:2502.19417  [pdf, other

    cs.RO cs.AI cs.LG

    Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models

    Authors: Lucy Xiaoyang Shi, Brian Ichter, Michael Equi, Liyiming Ke, Karl Pertsch, Quan Vuong, James Tanner, Anna Walling, Haohuan Wang, Niccolo Fusai, Adrian Li-Bell, Danny Driess, Lachy Groom, Sergey Levine, Chelsea Finn

    Abstract: Generalist robots that can perform a range of different tasks in open-world settings must be able to not only reason about the steps needed to accomplish their goals, but also process complex instructions, prompts, and even feedback during task execution. Intricate instructions (e.g., "Could you make me a vegetarian sandwich?" or "I don't like that one") require not just the ability to physically… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  13. arXiv:2502.19312  [pdf, other

    cs.LG cs.AI cs.CL cs.HC stat.ML

    FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users

    Authors: Anikait Singh, Sheryl Hsu, Kyle Hsu, Eric Mitchell, Stefano Ermon, Tatsunori Hashimoto, Archit Sharma, Chelsea Finn

    Abstract: Effective personalization of LLMs is critical for a broad range of user-interfacing applications such as virtual assistants and content curation. Inspired by the strong in-context learning capabilities of LLMs, we propose Few-Shot Preference Optimization (FSPO), which reframes reward modeling as a meta-learning problem. Under this framework, an LLM learns to quickly adapt to a user via a few label… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: Website: https://fewshot-preference-optimization.github.io/

  14. arXiv:2502.01719  [pdf, other

    cs.CV

    MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

    Authors: Haibo Tong, Zhaoyang Wang, Zhaorun Chen, Haonian Ji, Shi Qiu, Siwei Han, Kexin Geng, Zhongkai Xue, Yiyang Zhou, Peng Xia, Mingyu Ding, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: Recent advancements in video generation have significantly improved the ability to synthesize videos from text instructions. However, existing models still struggle with key challenges such as instruction misalignment, content hallucination, safety concerns, and bias. Addressing these limitations, we introduce MJ-BENCH-VIDEO, a large-scale video preference benchmark designed to evaluate video gene… ▽ More

    Submitted 6 February, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  15. arXiv:2501.09747  [pdf, other

    cs.RO cs.LG

    FAST: Efficient Action Tokenization for Vision-Language-Action Models

    Authors: Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine

    Abstract: Autoregressive sequence models, such as Transformer-based vision-language action (VLA) policies, can be tremendously effective for capturing complex and generalizable robotic behaviors. However, such models require us to choose a tokenization of our continuous action signals, which determines how the discrete symbols predicted by the model map to continuous robot actions. We find that current appr… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: Website: https://www.pi.website/research/fast

  16. arXiv:2501.04682  [pdf, other

    cs.AI cs.CL

    Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

    Authors: Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn

    Abstract: We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required to arrive at a particular CoT. We present empirical evidence from state-of-the-art models exhibiting behaviors consistent with in-context search, and explore methods for producing Meta-CoT via process supervision, synthetic data g… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  17. arXiv:2412.08812  [pdf, other

    cs.LG

    Test-Time Alignment via Hypothesis Reweighting

    Authors: Yoonho Lee, Jonathan Williams, Henrik Marklund, Archit Sharma, Eric Mitchell, Anikait Singh, Chelsea Finn

    Abstract: Large pretrained models often struggle with underspecified tasks -- situations where the training data does not fully define the desired behavior. For example, chatbots must handle diverse and often conflicting user preferences, requiring adaptability to various user needs. We propose a novel framework to address the general challenge of aligning models to test-time user intent, which is rarely fu… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

    Comments: Preprint

  18. arXiv:2412.06685  [pdf, other

    cs.LG cs.AI

    Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

    Authors: Max Sobol Mark, Tian Gao, Georgia Gabriela Sampaio, Mohan Kumar Srirama, Archit Sharma, Chelsea Finn, Aviral Kumar

    Abstract: Recent advances in learning decision-making policies can largely be attributed to training expressive policy models, largely via imitation learning. While imitation learning discards non-expert data, reinforcement learning (RL) can still learn from suboptimal data. However, instantiating RL training of a new policy class often presents a different challenge: most deep RL machinery is co-developed… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  19. arXiv:2411.01915  [pdf, other

    cs.RO

    RoboCrowd: Scaling Robot Data Collection through Crowdsourcing

    Authors: Suvir Mirchandani, David D. Yuan, Kaylee Burns, Md Sazzad Islam, Tony Z. Zhao, Chelsea Finn, Dorsa Sadigh

    Abstract: In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizin… ▽ More

    Submitted 21 May, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 21 pages, 25 figures. International Conference on Robotics and Automation (ICRA) 2025

  20. arXiv:2410.24164  [pdf, other

    cs.LG cs.RO

    $π_0$: A Vision-Language-Action Flow Model for General Robot Control

    Authors: Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, Ury Zhilinsky

    Abstract: Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the level of generality required for effective real-world systems faces major obstacles in terms of data, generalization, and robustness. In this paper, we discuss… ▽ More

    Submitted 13 November, 2024; v1 submitted 31 October, 2024; originally announced October 2024.

    Comments: See project website for videos: https://physicalintelligence.company/blog/pi0

  21. arXiv:2410.23214  [pdf, other

    cs.LG cs.AI

    Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

    Authors: Sheryl Hsu, Omar Khattab, Chelsea Finn, Archit Sharma

    Abstract: The hallucinations of large language models (LLMs) are increasingly mitigated by allowing LLMs to search for information and to ground their answers in real sources. Unfortunately, LLMs often struggle with posing the right search queries, especially when dealing with complex or otherwise indirect topics. Observing that LLMs can learn to search for relevant facts by $\textit{trying}$ different quer… ▽ More

    Submitted 30 October, 2024; v1 submitted 30 October, 2024; originally announced October 2024.

  22. arXiv:2410.13126  [pdf, other

    cs.RO

    ALOHA Unleashed: A Simple Recipe for Robot Dexterity

    Authors: Tony Z. Zhao, Jonathan Tompson, Danny Driess, Pete Florence, Kamyar Ghasemipour, Chelsea Finn, Ayzaan Wahid

    Abstract: Recent work has shown promising results for learning end-to-end robot policies using imitation learning. In this work we address the question of how far can we push imitation learning for challenging dexterous manipulation tasks. We show that a simple recipe of large scale data collection on the ALOHA 2 platform, combined with expressive models such as Diffusion Policies, can be effective in learn… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  23. arXiv:2410.12832  [pdf, other

    cs.LG

    Generative Reward Models

    Authors: Dakota Mahan, Duy Van Phung, Rafael Rafailov, Chase Blagden, Nathan Lile, Louis Castricato, Jan-Philipp Fränken, Chelsea Finn, Alon Albalak

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has greatly improved the performance of modern Large Language Models (LLMs). The RLHF process is resource-intensive and technically challenging, generally requiring a large collection of human preference labels over model-generated outputs. Reinforcement Learning from AI Feedback (RLAIF) addresses this data collection challenge by leveraging synthe… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  24. arXiv:2410.06232  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE

    Range, not Independence, Drives Modularity in Biologically Inspired Representations

    Authors: Will Dorrell, Kyle Hsu, Luke Hollingsworth, Jin Hwa Lee, Jiajun Wu, Chelsea Finn, Peter E Latham, Tim EJ Behrens, James CR Whittington

    Abstract: Why do biological and artificial neurons sometimes modularise, each encoding a single meaningful variable, and sometimes entangle their representation of many variables? In this work, we develop a theory of when biologically inspired networks -- those that are nonnegative and energy efficient -- modularise their representation of source variables (sources). We derive necessary and sufficient condi… ▽ More

    Submitted 11 April, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: 37 pages, 12 figures. WD and KH contributed equally; LH and JHL contributed equally

  25. arXiv:2410.00231  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models

    Authors: Qi Wu, Zipeng Fu, Xuxin Cheng, Xiaolong Wang, Chelsea Finn

    Abstract: Learning-based methods have achieved strong performance for quadrupedal locomotion. However, several challenges prevent quadrupeds from learning helpful indoor skills that require interaction with environments and humans: lack of end-effectors for manipulation, limited semantic understanding using only simulation data, and low traversability and reachability in indoor environments. We present a sy… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Project website: https://helpful-doggybot.github.io/

  26. arXiv:2409.19817  [pdf, other

    cs.LG cs.AI cs.CL

    Calibrating Language Models with Adaptive Temperature Scaling

    Authors: Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric Mitchell, Chelsea Finn

    Abstract: The effectiveness of large language models (LLMs) is not only measured by their ability to generate accurate outputs but also by their calibration-how well their confidence scores reflect the probability of their outputs being correct. While unsupervised pre-training has been shown to yield LLMs with well-calibrated conditional probabilities, recent studies have shown that after fine-tuning with r… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024

  27. arXiv:2408.17355  [pdf, other

    cs.RO cs.AI cs.LG

    Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

    Authors: Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Maximilian Du, Chelsea Finn

    Abstract: Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its effects on the learned policy remain inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts the d… ▽ More

    Submitted 25 April, 2025; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: Project website: https://bid-robot.github.io/

  28. arXiv:2408.08441  [pdf, other

    cs.LG cs.RO

    D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

    Authors: Rafael Rafailov, Kyle Hatch, Anikait Singh, Laura Smith, Aviral Kumar, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip Ball, Jiajun Wu, Chelsea Finn, Sergey Levine

    Abstract: Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetu… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: RLC 2024

  29. arXiv:2408.07199  [pdf, other

    cs.AI cs.LG

    Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

    Authors: Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, Rafael Rafailov

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning, yet their application in agentic, multi-step reasoning within interactive environments remains a difficult challenge. Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities needed to perform complex decision-making in dynamic s… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  30. arXiv:2407.17387  [pdf, other

    cs.CL

    PERSONA: A Reproducible Testbed for Pluralistic Alignment

    Authors: Louis Castricato, Nathan Lile, Rafael Rafailov, Jan-Philipp Fränken, Chelsea Finn

    Abstract: The rapid advancement of language models (LMs) necessitates robust alignment with diverse user values. However, current preference optimization approaches often fail to capture the plurality of user opinions, instead reinforcing majority viewpoints and marginalizing minority perspectives. We introduce PERSONA, a reproducible test bed designed to evaluate and improve pluralistic alignment of LMs. W… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  31. arXiv:2407.12998  [pdf, other

    cs.RO

    Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks

    Authors: Ji Woong Kim, Tony Z. Zhao, Samuel Schmidgall, Anton Deguet, Marin Kobilarov, Chelsea Finn, Axel Krieger

    Abstract: We explore whether surgical manipulation tasks can be learned on the da Vinci robot via imitation learning. However, the da Vinci system presents unique challenges which hinder straight-forward implementation of imitation learning. Notably, its forward kinematics is inconsistent due to imprecise joint measurements, and naively training a policy using such approximate kinematics data often leads to… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages

  32. arXiv:2407.10341  [pdf, other

    cs.RO cs.AI cs.LG

    Affordance-Guided Reinforcement Learning via Visual Prompting

    Authors: Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn

    Abstract: Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, th… ▽ More

    Submitted 5 March, 2025; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures. Robotics: Science and Systems (RSS) 2024, Task Specification for General-Purpose Intelligent Robots & Lifelong Robot Learning Workshops

  33. arXiv:2407.08693  [pdf, other

    cs.RO cs.LG

    Robotic Control via Embodied Chain-of-Thought Reasoning

    Authors: Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine

    Abstract: A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities… ▽ More

    Submitted 6 March, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Website: https://embodied-cot.github.io. Updated funding information

  34. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  35. arXiv:2407.04842  [pdf, other

    cs.CV cs.CL cs.LG

    MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

    Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 42 pages, 13 figures, 33 tables

  36. arXiv:2407.02666  [pdf, other

    cs.RO cs.AI

    Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

    Authors: Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn

    Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 27 pages

  37. arXiv:2406.15917  [pdf, other

    cs.RO

    To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment

    Authors: Maximilian Du, Alexander Khazatsky, Tobias Gerstenberg, Chelsea Finn

    Abstract: When faced with a novel scenario, it can be hard to succeed on the first attempt. In these challenging situations, it is important to know how to retry quickly and meaningfully. Retrying behavior can emerge naturally in robots trained on diverse data, but such robot policies will typically only exhibit undirected retrying behavior and may not terminate a suboptimal approach before an unrecoverable… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  38. arXiv:2406.10454  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    HumanPlus: Humanoid Shadowing and Imitation from Humans

    Authors: Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

    Abstract: One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: project website: https://humanoid-ai.github.io/

  39. arXiv:2406.09246  [pdf, other

    cs.RO cs.LG

    OpenVLA: An Open-Source Vision-Language-Action Model

    Authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

    Abstract: Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control. Yet, widespread adoption of VLAs for robotics has be… ▽ More

    Submitted 5 September, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Website: https://openvla.github.io/

  40. arXiv:2406.02900  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

    Authors: Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods… ▽ More

    Submitted 4 November, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 30 pages, 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

  41. arXiv:2405.13193  [pdf, other

    cs.LG

    Efficient Imitation Learning with Conservative World Models

    Authors: Victor Kolev, Rafael Rafailov, Kyle Hatch, Jiajun Wu, Chelsea Finn

    Abstract: We tackle the problem of policy learning from expert demonstrations without a reward function. A central challenge in this space is that these policies fail upon deployment due to issues of distributional shift, environment stochasticity, or compounding errors. Adversarial imitation learning alleviates this issue but requires additional on-policy training samples for stability, which presents a ch… ▽ More

    Submitted 15 August, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Oral presentation, L4DC 2024

  42. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  43. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  44. arXiv:2405.02292  [pdf, other

    cs.RO cs.LG

    ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation

    Authors: ALOHA 2 Team, Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka , et al. (1 additional authors not shown)

    Abstract: Diverse demonstration datasets have powered significant advances in robot learning, but the dexterity and scale of such data can be limited by the hardware cost, the hardware robustness, and the ease of teleoperation. We introduce ALOHA 2, an enhanced version of ALOHA that has greater performance, ergonomics, and robustness compared to the original design. To accelerate research in large-scale bim… ▽ More

    Submitted 7 February, 2024; originally announced May 2024.

    Comments: Project website: aloha-2.github.io

  45. arXiv:2404.14367  [pdf, other

    cs.LG

    Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

    Authors: Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar

    Abstract: Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning. Different methods come with different implementation tradeoffs and performance differences, and existing empirical findings present different concl… ▽ More

    Submitted 2 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: International Conference on Machine Learning (ICML), 2024

  46. arXiv:2404.12358  [pdf, other

    cs.LG

    From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

    Authors: Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

    Abstract: Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatch… ▽ More

    Submitted 12 August, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: COLM 2024

  47. arXiv:2404.10282  [pdf, other

    cs.LG cs.CV

    Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

    Authors: Kyle Hsu, Jubayer Ibn Hamid, Kaylee Burns, Chelsea Finn, Jiajun Wu

    Abstract: Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set. In this work, we consider endowing a neural network autoencoder with three select inductive biases from the literature: data compression into a grid-like latent space via quantization, collective independence amongst latents, and minimal functional influence of any latent on how… ▽ More

    Submitted 24 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: ICML 2024 camera-ready. 22 pages, 10 figures, code available at https://github.com/kylehkhsu/tripod

  48. arXiv:2403.19159  [pdf, other

    cs.CL cs.LG

    Disentangling Length from Quality in Direct Preference Optimization

    Authors: Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is often more highly rated by users, even when it is less helpful and objective. A number of approaches have been developed to control those biases in the… ▽ More

    Submitted 9 September, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  49. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (76 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 22 April, 2025; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  50. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/