Skip to main content

Showing 1–23 of 23 results for author: Itkina, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05331  [pdf, ps, other

    cs.RO

    A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

    Authors: TRI LBM Team, Jose Barreiros, Andrew Beaulieu, Aditya Bhat, Rick Cory, Eric Cousineau, Hongkai Dai, Ching-Hsin Fang, Kunimatsu Hashimoto, Muhammad Zubair Irshad, Masha Itkina, Naveen Kuppuswamy, Kuan-Hui Lee, Katherine Liu, Dale McConachie, Ian McMahon, Haruki Nishimura, Calder Phillips-Grafflin, Charles Richter, Paarth Shah, Krishnan Srinivasan, Blake Wulfe, Chen Xu, Mengchao Zhang, Alex Alspach , et al. (57 additional authors not shown)

    Abstract: Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnere… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2506.19121  [pdf, ps, other

    cs.RO cs.AI cs.LG

    CUPID: Curating Data your Robot Loves with Influence Functions

    Authors: Christopher Agia, Rohan Sinha, Jingyun Yang, Rika Antonova, Marco Pavone, Haruki Nishimura, Masha Itkina, Jeannette Bohg

    Abstract: In robot imitation learning, policy performance is tightly coupled with the quality and composition of the demonstration data. Yet, developing a precise understanding of how individual demonstrations contribute to downstream outcomes - such as closed-loop task success or failure - remains a persistent challenge. We propose CUPID, a robot data curation method based on a novel influence function-the… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project page: https://cupid-curation.github.io. 28 pages, 15 figures

    ACM Class: I.2.6; I.2.9

  3. arXiv:2506.09937  [pdf, ps, other

    cs.RO cs.AI

    SAFE: Multitask Failure Detection for Vision-Language-Action Models

    Authors: Qiao Gu, Yuanliang Ju, Shengxiang Sun, Igor Gilitschenski, Haruki Nishimura, Masha Itkina, Florian Shkurti

    Abstract: While vision-language-action models (VLAs) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks out-of-the-box. To allow these policies to safely interact with their environments, we need a failure detector that gives a timely alert such that the robot can stop, backtrack, or ask for help. However, existi… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Project Page: https://vla-safe.github.io/

  4. arXiv:2505.20781  [pdf, ps, other

    cs.RO cs.LG

    STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

    Authors: Hossein Goli, Michael Gimelfarb, Nathan Samuel de Lara, Haruki Nishimura, Masha Itkina, Florian Shkurti

    Abstract: Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy, and is crucial in domains such as robotics or healthcare where direct interaction with the environment is costly or unsafe. Existing OPE methods are ineffective for high-dimensional, long-horizon problems, due to exponential blow-ups in variance from importance weighting or… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  5. arXiv:2503.10966  [pdf, ps, other

    cs.RO stat.ML

    Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping

    Authors: David Snyder, Asher James Hancock, Apurva Badithela, Emma Dixon, Patrick Miller, Rares Andrei Ambrus, Anirudha Majumdar, Masha Itkina, Haruki Nishimura

    Abstract: Imitation learning has enabled robots to perform complex, long-horizon tasks in challenging dexterous manipulation settings. As new methods are developed, they must be rigorously evaluated and compared against corresponding baselines through repeated evaluation trials. However, policy comparison is fundamentally constrained by a small feasible sample size (e.g., 10 or 50) due to significant human… ▽ More

    Submitted 6 June, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: 14 + 5 pages, 10 figures, 4 tables. Accepted to RSS 2025

  6. arXiv:2503.08558  [pdf, ps, other

    cs.RO cs.AI cs.LG

    Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies

    Authors: Chen Xu, Tony Khuong Nguyen, Emma Dixon, Christopher Rodriguez, Patrick Miller, Robert Lee, Paarth Shah, Rares Ambrus, Haruki Nishimura, Masha Itkina

    Abstract: Recent years have witnessed impressive robotic manipulation systems driven by advances in imitation learning and generative modeling, such as diffusion- and flow-based approaches. As robot policy performance increases, so does the complexity and time horizon of achievable tasks, inducing unexpected and diverse failure modes that are difficult to predict a priori. To enable trustworthy policy deplo… ▽ More

    Submitted 20 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: Accepted by Robotics: Science and Systems 2025

  7. arXiv:2410.20018  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

    Authors: Kyle B. Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel

    Abstract: Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: Code, model checkpoints and videos can be found at https://ghil-glue.github.io

  8. arXiv:2407.21126  [pdf, other

    cs.CV cs.RO

    Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving

    Authors: Bernard Lange, Masha Itkina, Jiachen Li, Mykel J. Kochenderfer

    Abstract: Environment prediction frameworks are critical for the safe navigation of autonomous vehicles (AVs) in dynamic settings. LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation, enabling self-supervised joint scene predictions while exhibiting resilience to partial observability and perception detection failures. Prior approaches have focused on det… ▽ More

    Submitted 22 May, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2405.05439  [pdf, other

    cs.RO cs.AI cs.LG stat.AP

    How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

    Authors: Joseph A. Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac Schwager, Thomas Kollar

    Abstract: With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by dist… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  10. arXiv:2403.15941  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Explore until Confident: Efficient Exploration for Embodied Question Answering

    Authors: Allen Z. Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, Dorsa Sadigh

    Abstract: We consider the problem of Embodied Question Answering (EQA), which refers to settings where an embodied agent such as a robot needs to actively explore an environment to gather information until it is confident about the answer to a question. In this work, we leverage the strong semantic reasoning capabilities of large vision-language models (VLMs) to efficiently explore and answer such questions… ▽ More

    Submitted 7 July, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: Robotics: Science and Systems (RSS) 2024

  11. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (76 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 22 April, 2025; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  12. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (269 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  13. arXiv:2211.08701  [pdf, other

    cs.RO cs.CV cs.LG

    Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction

    Authors: Masha Itkina, Mykel J. Kochenderfer

    Abstract: Although neural networks have seen tremendous success as predictive models in a variety of domains, they can be overly confident in their predictions on out-of-distribution (OOD) data. To be viable for safety-critical applications, like autonomous vehicles, neural networks must accurately estimate their epistemic or model uncertainty, achieving a level of system self-awareness. Techniques for epis… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Conference on Robot Learning (CoRL) 2022, 15 pages, 4 figures

    ACM Class: I.2.9; I.2.6; I.2.10

  14. arXiv:2210.01249  [pdf, other

    cs.RO cs.CV

    LOPR: Latent Occupancy PRediction using Generative Models

    Authors: Bernard Lange, Masha Itkina, Mykel J. Kochenderfer

    Abstract: Environment prediction frameworks are integral for autonomous vehicles, enabling safe navigation in dynamic environments. LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation that facilitates joint scene predictions without relying on manual labeling unlike commonly used trajectory prediction frameworks. Prior approaches have optimized deterministic L-OG… ▽ More

    Submitted 24 August, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  15. arXiv:2210.00552  [pdf, other

    cs.RO cs.HC cs.LG

    Occlusion-Aware Crowd Navigation Using People as Sensors

    Authors: Ye-Ji Mun, Masha Itkina, Shuijing Liu, Katherine Driggs-Campbell

    Abstract: Autonomous navigation in crowded spaces poses a challenge for mobile robots due to the highly dynamic, partially observable environment. Occlusions are highly prevalent in such settings due to a limited sensor field of view and obstructing human agents. Previous work has shown that observed interactive behaviors of human agents can be used to estimate potential obstacles despite occlusions. We pro… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 7 pages, 01041552993 figures, Accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA)

  16. arXiv:2203.14155  [pdf, other

    cs.RO cs.AI cs.LG

    How Do We Fail? Stress Testing Perception in Autonomous Vehicles

    Authors: Harrison Delecki, Masha Itkina, Bernard Lange, Ransalu Senanayake, Mykel J. Kochenderfer

    Abstract: Autonomous vehicles (AVs) rely on environment perception and behavior prediction to reason about agents in their surroundings. These perception systems must be robust to adverse weather such as rain, fog, and snow. However, validation of these systems is challenging due to their complexity and dependence on observation histories. This paper presents a method for characterizing failures of LiDAR-ba… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

    Comments: Submitted to IEEE IROS 2022

  17. arXiv:2110.14182  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

    Authors: Phil Chen, Masha Itkina, Ransalu Senanayake, Mykel J. Kochenderfer

    Abstract: Many applications of generative models rely on the marginalization of their high-dimensional output probability distributions. Normalization functions that yield sparse probability distributions can make exact marginalization more computationally tractable. However, sparse normalization functions usually require alternative loss functions for training since the log-likelihood is undefined for spar… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021. Code is available at https://github.com/sisl/EvSoftmax

  18. arXiv:2109.02173  [pdf, other

    cs.RO cs.AI cs.CV cs.LG cs.MA

    Multi-Agent Variational Occlusion Inference Using People as Sensors

    Authors: Masha Itkina, Ye-Ji Mun, Katherine Driggs-Campbell, Mykel J. Kochenderfer

    Abstract: Autonomous vehicles must reason about spatial occlusions in urban environments to ensure safety without being overly cautious. Prior work explored occlusion inference from observed social behaviors of road agents, hence treating people as sensors. Inferring occupancy from agent behaviors is an inherently multimodal problem; a driver may behave similarly for different occupancy patterns ahead of th… ▽ More

    Submitted 2 March, 2022; v1 submitted 5 September, 2021; originally announced September 2021.

    Comments: 12 pages, 9 figures, International Conference on Robotics and Automation (ICRA) 2022

    ACM Class: I.2.9; I.2.10

  19. arXiv:2011.09045  [pdf, other

    cs.RO

    Double-Prong ConvLSTM for Spatiotemporal Occupancy Prediction in Dynamic Environments

    Authors: Maneekwan Toyungyernsub, Masha Itkina, Ransalu Senanayake, Mykel J. Kochenderfer

    Abstract: Predicting the future occupancy state of an environment is important to enable informed decisions for autonomous vehicles. Common challenges in occupancy prediction include vanishing dynamic objects and blurred predictions, especially for long prediction horizons. In this work, we propose a double-prong neural network architecture to predict the spatiotemporal evolution of the occupancy state. One… ▽ More

    Submitted 27 September, 2022; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted at 2021 International Conference on Robotics and Automation (ICRA 2021)

    ACM Class: I.2.9; I.2.10

  20. arXiv:2011.01413  [pdf, other

    cs.CV cs.RO

    Out-of-Distribution Detection for Automotive Perception

    Authors: Julia Nitsch, Masha Itkina, Ransalu Senanayake, Juan Nieto, Max Schmidt, Roland Siegwart, Mykel J. Kochenderfer, Cesar Cadena

    Abstract: Neural networks (NNs) are widely used for object classification in autonomous driving. However, NNs can fail on input data not well represented by the training dataset, known as out-of-distribution (OOD) data. A mechanism to detect OOD samples is important for safety-critical applications, such as automotive perception, to trigger a safe fallback mode. NNs often rely on softmax normalization for c… ▽ More

    Submitted 5 September, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: 6 pages, 4 figures, paper accepted at Intelligent Transportation Systems Conference (ITSC) 2021

    ACM Class: I.2.10; I.2.9

  21. arXiv:2010.09662  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Attention Augmented ConvLSTM for Environment Prediction

    Authors: Bernard Lange, Masha Itkina, Mykel J. Kochenderfer

    Abstract: Safe and proactive planning in robotic systems generally requires accurate predictions of the environment. Prior work on environment prediction applied video frame prediction techniques to bird's-eye view environment representations, such as occupancy grids. ConvLSTM-based frameworks used previously often result in significant blurring and vanishing of moving objects, thus hindering their applicab… ▽ More

    Submitted 10 September, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted to be published on 2021 International Conference on Intelligent Robots and Systems (IROS)

    ACM Class: I.2.9; I.2.10

  22. arXiv:2010.09164  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders

    Authors: Masha Itkina, Boris Ivanovic, Ransalu Senanayake, Mykel J. Kochenderfer, Marco Pavone

    Abstract: Discrete latent spaces in variational autoencoders have been shown to effectively capture the data distribution for many real-world problems such as natural language understanding, human intent prediction, and visual scene representation. However, discrete latent spaces need to be sufficiently large to capture the complexities of real-world data, rendering downstream tasks computationally challeng… ▽ More

    Submitted 18 January, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: 21 pages, 15 figures, 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

    ACM Class: I.2.10; I.2.9; I.2.6

  23. arXiv:1904.12374  [pdf, other

    cs.CV cs.LG cs.RO

    Dynamic Environment Prediction in Urban Scenes using Recurrent Representation Learning

    Authors: Masha Itkina, Katherine Driggs-Campbell, Mykel J. Kochenderfer

    Abstract: A key challenge for autonomous driving is safe trajectory planning in cluttered, urban environments with dynamic obstacles, such as pedestrians, bicyclists, and other vehicles. A reliable prediction of the future environment, including the behavior of dynamic agents, would allow planning algorithms to proactively generate a trajectory in response to a rapidly changing environment. We present a nov… ▽ More

    Submitted 18 August, 2019; v1 submitted 28 April, 2019; originally announced April 2019.

    Comments: 8 pages, updated final draft, accepted into Intelligent Transportation Systems Conference (ITSC) 2019