Skip to main content

Showing 1–32 of 32 results for author: Mees, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.06862  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG cs.SD eess.AS

    Multimodal Spatial Language Maps for Robot Navigation and Manipulation

    Authors: Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

    Abstract: Grounding language to a navigating agent's observations can leverage pretrained multimodal foundation models to match perceptions to object or event descriptions. However, previous approaches remain disconnected from environment mapping, lack the spatial precision of geometric maps, or neglect additional modality information beyond vision. To address this, we propose multimodal spatial language ma… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: accepted to International Journal of Robotics Research (IJRR). 24 pages, 18 figures. The paper contains texts from VLMaps(arXiv:2210.05714) and AVLMaps(arXiv:2303.07522). The project page is https://mslmaps.github.io/

  2. arXiv:2505.08243  [pdf, other

    cs.RO

    Training Strategies for Efficient Embodied Reasoning

    Authors: William Chen, Suneel Belkhale, Suvir Mirchandani, Oier Mees, Danny Driess, Karl Pertsch, Sergey Levine

    Abstract: Robot chain-of-thought reasoning (CoT) -- wherein a model predicts helpful intermediate representations before choosing actions -- provides an effective method for improving the generalization and performance of robot policies, especially vision-language-action models (VLAs). While such approaches have been shown to improve performance and generalization, they suffer from core limitations, like ne… ▽ More

    Submitted 17 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: Updated figure layout, added project page link

  3. arXiv:2501.09747  [pdf, other

    cs.RO cs.LG

    FAST: Efficient Action Tokenization for Vision-Language-Action Models

    Authors: Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine

    Abstract: Autoregressive sequence models, such as Transformer-based vision-language action (VLA) policies, can be tremendously effective for capturing complex and generalizable robotic behaviors. However, such models require us to choose a tokenization of our continuous action signals, which determines how the discrete symbols predicted by the model map to continuous robot actions. We find that current appr… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: Website: https://www.pi.website/research/fast

  4. arXiv:2501.04693  [pdf, other

    cs.RO cs.AI

    Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

    Authors: Joshua Jones, Oier Mees, Carmelo Sferrazza, Kyle Stachowicz, Pieter Abbeel, Sergey Levine

    Abstract: Interacting with the world is a multi-sensory experience: achieving effective general-purpose interaction requires making use of all available modalities -- including vision, touch, and audio -- to fill in gaps from partial observation. For example, when vision is occluded reaching into a bag, a robot should rely on its senses of touch and sound. However, state-of-the-art generalist robot policies… ▽ More

    Submitted 14 January, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  5. arXiv:2410.20018  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

    Authors: Kyle B. Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel

    Abstract: Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: Code, model checkpoints and videos can be found at https://ghil-glue.github.io

  6. arXiv:2410.17772  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models

    Authors: Nils Blank, Moritz Reuss, Marcel Rühle, Ömer Erdinç Yağmurlu, Fabian Wenzel, Oier Mees, Rudolf Lioutikov

    Abstract: A central challenge towards developing robots that can relate human language to their perception and actions is the scarcity of natural language annotations in diverse robot datasets. Moreover, robot policies that follow natural language instructions are typically trained on either templated language or expensive human-labeled instructions, hindering their scalability. To this end, we introduce NI… ▽ More

    Submitted 26 October, 2024; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: Project Website at https://robottasklabeling.github.io/

  7. arXiv:2410.13816  [pdf, other

    cs.RO cs.LG

    Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

    Authors: Mitsuhiko Nakamoto, Oier Mees, Aviral Kumar, Sergey Levine

    Abstract: Large, general-purpose robotic policies trained on diverse demonstration datasets have been shown to be remarkably effective both for controlling a variety of robots in a range of different scenes, and for acquiring broad repertoires of manipulation skills. However, the data that such policies are trained on is generally of mixed quality -- not only are human-collected demonstrations unlikely to p… ▽ More

    Submitted 24 February, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: Conference on Robot Learning (CoRL) 2024. Project Page: https://nakamotoo.github.io/V-GPS

  8. arXiv:2410.10088  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    The Ingredients for Robotic Diffusion Transformers

    Authors: Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine

    Abstract: In recent years roboticists have achieved remarkable progress in solving increasingly general tasks on dexterous robotic hardware by leveraging high capacity Transformer network architectures and generative diffusion models. Unfortunately, combining these two orthogonal improvements has proven surprisingly difficult, since there is no clear and well-understood process for making important design c… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  9. arXiv:2410.03603  [pdf, other

    cs.RO

    LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos

    Authors: Noriaki Hirose, Catherine Glossop, Ajay Sridhar, Dhruv Shah, Oier Mees, Sergey Levine

    Abstract: The world is filled with a wide variety of objects. For robots to be useful, they need the ability to find arbitrary objects described by people. In this paper, we present LeLaN(Learning Language-conditioned Navigation policy), a novel approach that consumes unlabeled, action-free egocentric data to learn scalable, language-conditioned object navigation. Our framework, LeLaN leverages the semantic… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 23 pages, 9 figures, 5 tables, Conference on Robot Learning 2024

  10. arXiv:2408.16228  [pdf, other

    cs.RO cs.LG

    Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

    Authors: Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang

    Abstract: Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO)… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 27 pages, 14 figures

    Journal ref: Conference on Robot Learning, 2024

  11. arXiv:2408.11812  [pdf, other

    cs.RO cs.LG

    Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

    Authors: Ria Doshi, Homer Walke, Oier Mees, Sudeep Dasari, Sergey Levine

    Abstract: Modern machine learning systems rely on large datasets to attain broad generalization, and this often poses a challenge in robot learning, where each robotic platform and task might have only a small dataset. By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets, which in turn can lead to better generalization… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Project website at https://crossformer-model.github.io/

  12. arXiv:2407.20635  [pdf, other

    cs.RO cs.AI

    Autonomous Improvement of Instruction Following Skills via Foundation Models

    Authors: Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, Sergey Levine

    Abstract: Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning: instead of collecting costly teleoperated demonstration data, large-scale deployment of fleets of robots can quickly collect larger quantities of autonomous data that can collectively improve their performance. However, autonomous improvement requires… ▽ More

    Submitted 15 October, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 2024 Conference on Robot Learning (CoRL)

    Journal ref: Conference on Robot Learning 2024

  13. arXiv:2407.08693  [pdf, other

    cs.RO cs.LG

    Robotic Control via Embodied Chain-of-Thought Reasoning

    Authors: Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine

    Abstract: A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities… ▽ More

    Submitted 6 March, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Website: https://embodied-cot.github.io. Updated funding information

  14. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  15. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  16. arXiv:2402.02651  [pdf, other

    cs.LG cs.AI cs.CV

    Vision-Language Models Provide Promptable Representations for Reinforcement Learning

    Authors: William Chen, Oier Mees, Aviral Kumar, Sergey Levine

    Abstract: Humans can quickly learn new behaviors by leveraging background world knowledge. In contrast, agents trained with reinforcement learning (RL) typically learn behaviors from scratch. We thus propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied RL. We initialize policies w… ▽ More

    Submitted 22 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  17. arXiv:2312.10807  [pdf, other

    cs.RO

    Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation

    Authors: Hongkuan Zhou, Xiangtong Yao, Oier Mees, Yuan Meng, Ted Xiao, Yonatan Bisk, Jean Oh, Edward Johns, Mohit Shridhar, Dhruv Shah, Jesse Thomason, Kai Huang, Joyce Chai, Zhenshan Bing, Alois Knoll

    Abstract: Language-conditioned robot manipulation is an emerging field aimed at enabling seamless communication and cooperation between humans and robotic agents by teaching robots to comprehend and execute instructions conveyed in natural language. This interdisciplinary area integrates scene understanding, language processing, and policy learning to bridge the gap between human instructions and robotic ac… ▽ More

    Submitted 17 February, 2025; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: 37 pages, 15 figures, 4 tables, 354 citations

  18. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (269 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  19. arXiv:2303.07522  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Audio Visual Language Maps for Robot Navigation

    Authors: Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

    Abstract: While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. In this work, we propose Audio-Visual-Language Maps (AVLMaps), a unified 3D spatial map representation for storing cross-modal information from audio, visual, and language cues. AVLMaps integrate the open-vocabulary capabilities of… ▽ More

    Submitted 27 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Project page: https://avlmaps.github.io/

  20. arXiv:2210.05714  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Visual Language Maps for Robot Navigation

    Authors: Chenguang Huang, Oier Mees, Andy Zeng, Wolfram Burgard

    Abstract: Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e.g., image captions). While this is useful for matching images to natural language descriptions of object goals, it remains disjoint from the process of mapping the environment, so that it lacks the spatial precision of classic geometri… ▽ More

    Submitted 8 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted at the 2023 IEEE International Conference on Robotics and Automation (ICRA). Project page: https://vlmaps.github.io

  21. arXiv:2210.01911  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Grounding Language with Visual Affordances over Unstructured Data

    Authors: Oier Mees, Jessica Borja-Diaz, Wolfram Burgard

    Abstract: Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills. However, in practice, learning multi-task, language-conditioned robotic skills typically requires large-scale data collection and frequent human intervention to reset the environment or help correcting the current policies. In this work, we propose a novel approach… ▽ More

    Submitted 8 March, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted at the 2023 IEEE International Conference on Robotics and Automation (ICRA). Project website: http://hulc2.cs.uni-freiburg.de

  22. arXiv:2209.08959  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Latent Plans for Task-Agnostic Offline Reinforcement Learning

    Authors: Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, Wolfram Burgard

    Abstract: Everyday tasks of long-horizon and comprising a sequence of multiple implicit subtasks still impose a major challenge in offline robot control. While a number of prior methods aimed to address this setting with variants of imitation and offline reinforcement learning, the learned behavior is typically narrow and often struggles to reach configurable long-horizon goals. As both paradigms have compl… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: CoRL 2022. Project website: http://tacorl.cs.uni-freiburg.de/

  23. arXiv:2204.06252  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data

    Authors: Oier Mees, Lukas Hermann, Wolfram Burgard

    Abstract: A long-standing goal in robotics is to build robots that can perform a wide range of daily tasks from perceptions obtained with their onboard sensors and specified only via natural language. While recently substantial advances have been achieved in language-driven robotics by leveraging end-to-end learning from pixels, there is no clear and well-understood process for making various design choices… ▽ More

    Submitted 30 August, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted for publication at IEEE Robotics and Automation Letters (RAL). Codebase and trained models available at http://hulc.cs.uni-freiburg.de

  24. arXiv:2203.00352  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Affordance Learning from Play for Sample-Efficient Policy Learning

    Authors: Jessica Borja-Diaz, Oier Mees, Gabriel Kalweit, Lukas Hermann, Joschka Boedecker, Wolfram Burgard

    Abstract: Robots operating in human-centered environments should have the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal. To this end, we propose a novel approach that extracts a self-supervised visual affordance model from human teleoperated play data and leverages it to enable efficient policy le… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: Accepted at the 2022 IEEE International Conference on Robotics and Automation (ICRA). Videos at http://vapo.cs.uni-freiburg.de/

  25. arXiv:2112.03227  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

    Authors: Oier Mees, Lukas Hermann, Erick Rosete-Beas, Wolfram Burgard

    Abstract: General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this paper, we present CALVIN (Composing Actions from… ▽ More

    Submitted 13 July, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Accepted for publication at IEEE Robotics and Automation Letters (RAL). Code, models and dataset available at http://calvin.cs.uni-freiburg.de

  26. arXiv:2102.08094  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Composing Pick-and-Place Tasks By Grounding Language

    Authors: Oier Mees, Wolfram Burgard

    Abstract: Controlling robots to perform tasks via natural language is one of the most challenging topics in human-robot interaction. In this work, we present a robot system that follows unconstrained language instructions to pick and place arbitrary objects and effectively resolves ambiguities through dialogues. Our approach infers objects and their relationships from input images and language expressions a… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted at the International Symposium on Experimental Robotics (ISER) 2020. Videos at http://speechrobot.cs.uni-freiburg.de

  27. arXiv:2008.00456  [pdf, other

    cs.RO cs.CV cs.LG

    Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction

    Authors: Iman Nematollahi, Oier Mees, Lukas Hermann, Wolfram Burgard

    Abstract: A key challenge for an agent learning to interact with the world is to reason about physical properties of objects and to foresee their dynamics under the effect of applied forces. In order to scale learning through interaction to many objects and scenes, robots should be able to improve their own performance from real-world experience without requiring human supervision. To this end, we propose a… ▽ More

    Submitted 2 August, 2020; originally announced August 2020.

    Comments: Accepted at the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  28. arXiv:2001.08481  [pdf, other

    cs.RO cs.AI cs.CV

    Learning Object Placements For Relational Instructions by Hallucinating Scene Representations

    Authors: Oier Mees, Alp Emek, Johan Vertens, Wolfram Burgard

    Abstract: Robots coexisting with humans in their environment and performing services for them need the ability to interact with them. One particular requirement for such robots is that they are able to understand spatial relations and can place objects in accordance with the spatial relations expressed by their user. In this work, we present a convolutional neural network for estimating pixelwise object pla… ▽ More

    Submitted 21 February, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Accepted at the 2020 IEEE International Conference on Robotics and Automation (ICRA). Video at https://www.youtube.com/watch?v=zaZkHTWFMKM

  29. arXiv:1910.09430  [pdf, other

    cs.CV cs.LG cs.RO

    Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video

    Authors: Oier Mees, Markus Merklinger, Gabriel Kalweit, Wolfram Burgard

    Abstract: Key challenges for the deployment of reinforcement learning (RL) agents in the real world are the discovery, representation and reuse of skills in the absence of a reward function. To this end, we propose a novel approach to learn a task-agnostic skill embedding space from unlabeled multi-view videos. Our method learns a general skill embedding independently from the task context by using an adver… ▽ More

    Submitted 6 February, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted at the 2020 IEEE International Conference on Robotics and Automation (ICRA). Video at https://www.youtube.com/watch?v=z8gG1k9kSqA Project page at http://robotskills.cs.uni-freiburg.de

  30. arXiv:1910.07948  [pdf, other

    cs.RO cs.CV cs.LG eess.IV

    Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics

    Authors: Oier Mees, Maxim Tatarchenko, Thomas Brox, Wolfram Burgard

    Abstract: We present a convolutional neural network for joint 3D shape prediction and viewpoint estimation from a single input image. During training, our network gets the learning signal from a silhouette of an object in the input image - a form of self-supervision. It does not require ground truth data for 3D shapes and the viewpoints. Because it relies on such a weak form of supervision, our approach can… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

    Comments: Accepted at the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Video at https://www.youtube.com/watch?v=oQgHG9JdMP4

  31. arXiv:1707.05733  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments

    Authors: Oier Mees, Andreas Eitel, Wolfram Burgard

    Abstract: Object detection is an essential task for autonomous robots operating in dynamic and changing environments. A robot should be able to detect objects in the presence of sensor noise that can be induced by changing lighting conditions for cameras and false depth readings for range sensors, especially RGB-D cameras. To tackle these challenges, we propose a novel adaptive fusion approach for object de… ▽ More

    Submitted 19 November, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

    Comments: Published at the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems. Added a new baseline with respect to the IROS version. Project page with code, pretrained models and our InOutDoorPeople RGB-D dataset at http://adaptivefusion.cs.uni-freiburg.de/

  32. arXiv:1703.01946  [pdf, other

    cs.RO cs.AI cs.LG

    Metric Learning for Generalizing Spatial Relations to New Objects

    Authors: Oier Mees, Nichola Abdo, Mladen Mazuran, Wolfram Burgard

    Abstract: Human-centered environments are rich with a wide variety of spatial relations between everyday objects. For autonomous robots to operate effectively in such environments, they should be able to reason about these relations and generalize them to objects with different shapes and sizes. For example, having learned to place a toy inside a basket, a robot should be able to generalize this concept usi… ▽ More

    Submitted 24 July, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

    Comments: Accepted at the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. The new Freiburg Spatial Relations Dataset and a demo video of our approach running on the PR-2 robot are available at our project website: http://spatialrelations.cs.uni-freiburg.de