Skip to main content

Showing 1–29 of 29 results for author: Sasabuchi, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.00871  [pdf, other

    cs.RO cs.AI

    IK Seed Generator for Dual-Arm Human-like Physicality Robot with Mobile Base

    Authors: Jun Takamatsu, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake, Katsushi Ikeuchi

    Abstract: Robots are strongly expected as a means of replacing human tasks. If a robot has a human-like physicality, the possibility of replacing human tasks increases. In the case of household service robots, it is desirable for them to be on a human-like size so that they do not become excessively large in order to coexist with humans in their operating environment. However, robots with size limitations t… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 8 pages, 12 figures, 4 tables

  2. arXiv:2504.18084  [pdf, other

    cs.RO

    RL-Driven Data Generation for Robust Vision-Based Dexterous Grasping

    Authors: Atsushi Kanehira, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: This work presents reinforcement learning (RL)-driven data augmentation to improve the generalization of vision-action (VA) models for dexterous grasping. While real-to-sim-to-real frameworks, where a few real demonstrations seed large-scale simulated data, have proven effective for VA models, applying them to dexterous settings remains challenging: obtaining stable multi-finger contacts is nontri… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  3. arXiv:2504.04939  [pdf, other

    cs.RO cs.AI cs.CV

    A Taxonomy of Self-Handover

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: Self-handover, transferring an object between one's own hands, is a common but understudied bimanual action. While it facilitates seamless transitions in complex tasks, the strategies underlying its execution remain largely unexplored. Here, we introduce the first systematic taxonomy of self-handover, derived from manual annotation of over 12 hours of cooking activity performed by 21 participants.… ▽ More

    Submitted 8 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: 8 pages, 8 figures, 1 table, Last updated on April 7th, 2025

  4. arXiv:2504.01252  [pdf, other

    cs.RO cs.AI

    Plan-and-Act using Large Language Models for Interactive Agreement

    Authors: Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: Recent large language models (LLMs) are capable of planning robot actions. In this paper, we explore how LLMs can be used for planning actions with tasks involving situational human-robot interaction (HRI). A key problem of applying LLMs in situational HRI is balancing between "respecting the current human's activity" and "prioritizing the robot's task," as well as understanding the timing of when… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  5. arXiv:2503.15491  [pdf, other

    cs.HC cs.CL cs.LG cs.RO

    Agreeing to Interact in Human-Robot Interaction using Large Language Models and Vision Language Models

    Authors: Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: In human-robot interaction (HRI), the beginning of an interaction is often complex. Whether the robot should communicate with the human is dependent on several situational factors (e.g., the current human's activity, urgency of the interaction, etc.). We test whether large language models (LLM) and vision language models (VLM) can provide solutions to this problem. We compare four different system… ▽ More

    Submitted 7 January, 2025; originally announced March 2025.

  6. arXiv:2501.03968  [pdf, other

    cs.RO cs.AI cs.CV cs.HC

    VLM-driven Behavior Tree for Context-aware Task Planning

    Authors: Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Kazuhiro Sasabuchi, Katsushi Ikeuchi

    Abstract: The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex… ▽ More

    Submitted 10 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 10 pages, 11 figures, 5 tables. Last updated on January 9th, 2024

  7. arXiv:2412.11337  [pdf, other

    cs.RO cs.AI cs.CV

    Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience

    Authors: Naoki Wake, Atsushi Kanehira, Daichi Saito, Jun Takamatsu, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi

    Abstract: Multi-step dexterous manipulation is a fundamental skill in household scenarios, yet remains an underexplored area in robotics. This paper proposes a modular approach, where each step of the manipulation process is addressed with dedicated policies based on effective modality input, rather than relying on a single end-to-end model. To demonstrate this, a dexterous robotic hand performs a manipulat… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: 8 pages, 5 figures, 2 tables. Last updated on December 14th, 2024

  8. Open-Vocabulary Action Localization with Iterative Visual Prompting

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: Video action localization aims to find the timings of specific actions from a long video. Although existing learning-based approaches have been successful, they require annotating videos, which comes with a considerable labor cost. This paper proposes a training-free, open-vocabulary approach based on emerging off-the-shelf vision-language models (VLMs). The challenge stems from the fact that VLMs… ▽ More

    Submitted 7 April, 2025; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures, 6 tables. Published in IEEE Access. Last updated on April 7th, 2025

  9. arXiv:2407.11436  [pdf, other

    cs.RO

    APriCoT: Action Primitives based on Contact-state Transition for In-Hand Tool Manipulation

    Authors: Daichi Saito, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

    Abstract: In-hand tool manipulation is an operation that not only manipulates a tool within the hand (i.e., in-hand manipulation) but also achieves a grasp suitable for a task after the manipulation. This study aims to achieve an in-hand tool manipulation skill through deep reinforcement learning. The difficulty of learning the skill arises because this manipulation requires (A) exploring long-term contact-… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  10. arXiv:2403.02316  [pdf, other

    cs.RO

    Designing Library of Skill-Agents for Hardware-Level Reusability

    Authors: Jun Takamatsu, Daichi Saito, Katsushi Ikeuchi, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake

    Abstract: To use new robot hardware in a new environment, it is necessary to develop a control program tailored to that specific robot in that environment. Considering the reusability of software among robots is crucial to minimize the effort involved in this process and maximize software reuse across different robots in different environments. This paper proposes a method to remedy this process by consider… ▽ More

    Submitted 20 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  11. arXiv:2311.12015  [pdf, other

    cs.RO cs.CL cs.CV

    GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V(ision), to facilitate one-shot visual teaching for robotic manipulation. This system analyzes videos of humans performing tasks and outputs executable robot programs that incorporate insights into affordances. The process begins with GPT-4V analyzing the videos to obtain textual explanations of environmental and… ▽ More

    Submitted 26 September, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 8 pages, 10 figures, 3 tables. Published in IEEE Robotics and Automation Letters (RA-L) (in press). Last updated on September 26th, 2024

  12. arXiv:2311.11007  [pdf, other

    cs.RO

    Constraint-aware Policy for Compliant Manipulation

    Authors: Daichi Saito, Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

    Abstract: Robot manipulation in a physically-constrained environment requires compliant manipulation. Compliant manipulation is a manipulation skill to adjust hand motion based on the force imposed by the environment. Recently, reinforcement learning (RL) has been applied to solve household operations involving compliant manipulation. However, previous RL methods have primarily focused on designing a policy… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  13. arXiv:2310.11753  [pdf, other

    cs.RO cs.CL

    Bias in Emotion Recognition with ChatGPT

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: This technical report explores the ability of ChatGPT in recognizing emotions from text, which can be the basis of various applications like interactive chatbots, data annotation, and mental health analysis. While prior research has shown ChatGPT's basic ability in sentiment analysis, its performance in more nuanced emotion recognition is not yet explored. Here, we conducted experiments to evaluat… ▽ More

    Submitted 4 December, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 5 pages, 4 figures, 6 tables

  14. arXiv:2306.01741  [pdf, other

    cs.RO cs.CL

    GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: This technical paper introduces a chatting robot system that utilizes recent advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT. The system is integrated with a co-speech gesture generation system, which selects appropriate gestures based on the conceptual meaning of speech. Our motivation is to explore ways of utilizing the recent progress in LLMs for practical robotic a… ▽ More

    Submitted 10 May, 2023; originally announced June 2023.

  15. arXiv:2304.09966  [pdf, other

    cs.RO

    Applying Learning-from-observation to household service robots: three common-sense formulation

    Authors: Katsushi Ikeuchi, Jun Takamatsu, Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehiro

    Abstract: Utilizing a robot in a new application requires the robot to be programmed at each time. To reduce such programmings efforts, we have been developing ``Learning-from-observation (LfO)'' that automatically generates robot programs by observing human demonstrations. One of the main issues with introducing this LfO system into the domain of household tasks is the cluttered environments, which cause d… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  16. ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: This paper demonstrates how OpenAI's ChatGPT can be used in a few-shot setting to convert natural language instructions into a sequence of executable robot actions. The paper proposes easy-to-customize input prompts for ChatGPT that meet common requirements in practical applications, such as easy integration with robot execution systems and applicability to various environments while minimizing th… ▽ More

    Submitted 29 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: 21 figures, 7 tables. Published in IEEE Access (in press). Last updated August 29th, 2023

  17. arXiv:2301.01382  [pdf, other

    cs.RO

    Task-sequencing Simulator: Integrated Machine Learning to Execution Simulation for Robot Manipulation

    Authors: Kazuhiro Sasabuchi, Daichi Saito, Atsushi Kanehira, Naoki Wake, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: A task-sequencing simulator in robotics manipulation to integrate simulation-for-learning and simulation-for-execution is introduced. Unlike existing machine-learning simulation where a non-decomposed simulation is used to simulate a training scenario, the task-sequencing simulator runs a composed simulation using building blocks. This way, the simulation-for-learning is structured similarly to a… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: 7 pages, 6 figures

  18. Interactive Task Encoding System for Learning-from-Observation

    Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: We present the Interactive Task Encoding System (ITES) for teaching robots to perform manipulative tasks. ITES is designed as an input system for the Learning-from-Observation (LfO) framework, which enables household robots to be programmed using few-shot human demonstrations without the need for coding. In contrast to previous LfO systems that rely solely on visual demonstrations, ITES leverages… ▽ More

    Submitted 28 April, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 6 pages, 9 figures. Submitted to and accepted by 2023 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). Last updated April 28st, 2023

  19. arXiv:2212.09242  [pdf, other

    cs.RO

    Learning-from-Observation System Considering Hardware-Level Reusability

    Authors: Jun Takamatsu, Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Katsushi Ikeuchi

    Abstract: Robot developers develop various types of robots for satisfying users' various demands. Users' demands are related to their backgrounds and robots suitable for users may vary. If a certain developer would offer a robot that is different from the usual to a user, the robot-specific software has to be changed. On the other hand, robot-software developers would like to reuse their developed software… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

    Comments: 5 pages, 4 figures

  20. arXiv:2203.15290  [pdf, other

    cs.RO

    Design strategies for controlling neuron-connected robots using reinforcement learning

    Authors: Haruto Sawada, Naoki Wake, Kazuhiro Sasabuchi, Jun Takamatsu, Hirokazu Takahashi, Katsushi Ikeuchi

    Abstract: Despite the growing interest in robot control utilizing the computation of biological neurons, context-dependent behavior by neuron-connected robots remains a challenge. Context-dependent behavior here is defined as behavior that is not the result of a simple sensory-motor coupling, but rather based on an understanding of the task goal. This paper proposes design principles for training neuron-con… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Last updated March 29th, 2022

  21. arXiv:2203.00733  [pdf, other

    cs.RO

    Task-grasping from human demonstration

    Authors: Daichi Saito, Kazuhiro Sasabuchi, Naoki Wake, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

    Abstract: A challenge in robot grasping is to achieve task-grasping which is to select a grasp that is advantageous to the success of tasks before and after grasps. One of the frameworks to address this difficulty is Learning-from-Observation (LfO), which obtains various hints from human demonstrations. This paper solves three issues in the grasping skills in the LfO framework: 1) how to functionally mimic… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: 7 pages, 8 figures

  22. arXiv:2103.02201  [pdf, other

    cs.RO

    Semantic constraints to represent common sense required in household actions for multi-modal Learning-from-observation robot

    Authors: Katsushi Ikeuchi, Naoki Wake, Riku Arakawa, Kazuhiro Sasabuchi, Jun Takamatsu

    Abstract: The paradigm of learning-from-observation (LfO) enables a robot to learn how to perform actions by observing human-demonstrated actions. Previous research in LfO have mainly focused on the industrial domain which only consist of the observable physical constraints between a manipulating tool and the robot's working environment. In order to extend this paradigm to the household domain which consist… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: 18 pages, 31 figures

  23. Text-driven object affordance for guiding grasp-type recognition in multimodal robot teaching

    Authors: Naoki Wake, Daichi Saito, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi

    Abstract: This study investigates how text-driven object affordance, which provides prior knowledge about grasp types for each object, affects image-based grasp-type recognition in robot teaching. The researchers created labeled datasets of first-person hand images to examine the impact of object affordance on recognition performance. They evaluated scenarios with real and illusory objects, considering mixe… ▽ More

    Submitted 12 May, 2023; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: 8 pages, 11 figures. Last updated March 12, 2023 Accepted for publication in Machine Vision and Applications

  24. arXiv:2101.05061  [pdf, other

    cs.CV cs.RO

    Understanding Action Sequences based on Video Captioning for Learning-from-Observation

    Authors: Iori Yanokura, Naoki Wake, Kazuhiro Sasabuchi, Katsushi Ikeuchi, Masayuki Inaba

    Abstract: Learning actions from human demonstration video is promising for intelligent robotic systems. Extracting the exact section and re-observing the extracted video section in detail is important for imitating complex skills because human motions give valuable hints for robots. However, the general video understanding methods focus more on the understanding of the full frame,lacking consideration on ex… ▽ More

    Submitted 9 December, 2020; originally announced January 2021.

  25. arXiv:2010.06194  [pdf

    cs.RO

    Labeling the Phrases of a Conversational Agent with a Unique Personalized Vocabulary

    Authors: Naoki Wake, Machiko Sato, Kazuhiro Sasabuchi, Minako Nakamura, Katsushi Ikeuchi

    Abstract: Mapping spoken text to gestures is an important research topic for robots with conversation capabilities. According to studies on human co-speech gestures, a reasonable solution for mapping is using a concept-based approach in which a text is first mapped to a semantic cluster (i.e., a concept) containing texts with similar meanings. Subsequently, each concept is mapped to a predefined gesture. By… ▽ More

    Submitted 12 November, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: 8 pages, 3 figures. Submitted to and accepted by IEEE/SICE SII 2022. Last updated November 12th, 2021

  26. arXiv:2009.09813  [pdf

    cs.RO cs.CV

    Grasp-type Recognition Leveraging Object Affordance

    Authors: Naoki Wake, Kazuhiro Sasabuchi, Katsushi Ikeuchi

    Abstract: A key challenge in robot teaching is grasp-type recognition with a single RGB image and a target object name. Here, we propose a simple yet effective pipeline to enhance learning-based recognition by leveraging a prior distribution of grasp types for each object. In the pipeline, a convolutional neural network (CNN) recognizes the grasp type from an RGB image. The recognition result is further cor… ▽ More

    Submitted 26 August, 2020; originally announced September 2020.

    Comments: 2 pages, 2 figures. Submitted to and accepted by HOBI (IEEE RO-MAN Workshop 2020). Last updated August 26th, 2020

  27. A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations

    Authors: Naoki Wake, Riku Arakawa, Iori Yanokura, Takuya Kiyokawa, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: A household robot is expected to perform various manipulative operations with an understanding of the purpose of the task. To this end, a desirable robotic application should provide an on-site robot teaching framework for non-experts. Here we propose a Learning-from-Observation (LfO) framework for grasp-manipulation-release class household operations (GMR-operations). The framework maps human dem… ▽ More

    Submitted 20 October, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: 6 pages, 6 figures. Submitted to and accepted by IEEE/SICE SII 2021. Last updated October 20th, 2020

  28. Task-oriented Motion Mapping on Robots of Various Configuration using Body Role Division

    Authors: Kazuhiro Sasabuchi, Naoki Wake, Katsushi Ikeuchi

    Abstract: Many works in robot teaching either focus only on teaching task knowledge, such as geometric constraints, or motion knowledge, such as the motion for accomplishing a task. However, to effectively teach a complex task sequence to a robot, it is important to take advantage of both task and motion knowledge. The task knowledge provides the goals of each individual task within the sequence and reduces… ▽ More

    Submitted 30 December, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: 8 pages, 10 figures

  29. arXiv:2007.08705  [pdf

    cs.RO cs.HC

    Verbal Focus-of-Attention System for Learning-from-Observation

    Authors: Naoki Wake, Iori Yanokura, Kazuhiro Sasabuchi, Katsushi Ikeuchi

    Abstract: The learning-from-observation (LfO) framework aims to map human demonstrations to a robot to reduce programming effort. To this end, an LfO system encodes a human demonstration into a series of execution units for a robot, which are referred to as task models. Although previous research has proposed successful task-model encoders, there has been little discussion on how to guide a task-model encod… ▽ More

    Submitted 24 March, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: 8 pages, 7 figures. Submitted to and accepted by IEEE ICRA 2021. Last updated March 3rd, 2021