Skip to main content

Showing 1–50 of 80 results for author: Kasai, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.19256  [pdf, other

    cs.CV cs.RO

    LM-MCVT: A Lightweight Multi-modal Multi-view Convolutional-Vision Transformer Approach for 3D Object Recognition

    Authors: Songsong Xiong, Hamidreza Kasaei

    Abstract: In human-centered environments such as restaurants, homes, and warehouses, robots often face challenges in accurately recognizing 3D objects. These challenges stem from the complexity and variability of these environments, including diverse object shapes. In this paper, we propose a novel Lightweight Multi-modal Multi-view Convolutional-Vision Transformer network (LM-MCVT) to enhance 3D object rec… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  2. arXiv:2504.03500  [pdf, other

    cs.RO cs.LG

    Learning Dual-Arm Coordination for Grasping Large Flat Objects

    Authors: Yongliang Wang, Hamidreza Kasaei

    Abstract: Grasping large flat objects, such as books or keyboards lying horizontally, presents significant challenges for single-arm robotic systems, often requiring extra actions like pushing objects against walls or moving them to the edge of a surface to facilitate grasping. In contrast, dual-arm manipulation, inspired by human dexterity, offers a more refined solution by directly coordinating both arms… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  3. arXiv:2503.10334  [pdf, other

    cs.RO

    Enhanced View Planning for Robotic Harvesting: Tackling Occlusions with Imitation Learning

    Authors: Lun Li, Hamidreza Kasaei

    Abstract: In agricultural automation, inherent occlusion presents a major challenge for robotic harvesting. We propose a novel imitation learning-based viewpoint planning approach to actively adjust camera viewpoint and capture unobstructed images of the target crop. Traditional viewpoint planners and existing learning-based methods, depend on manually designed evaluation metrics or reward functions, often… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Accepted at ICRA 2025

  4. arXiv:2502.01809  [pdf, other

    cs.LG

    Self-supervised Subgraph Neural Network With Deep Reinforcement Walk Exploration

    Authors: Jianming Huang, Hiroyuki Kasai

    Abstract: Graph data, with its structurally variable nature, represents complex real-world phenomena like chemical compounds, protein structures, and social networks. Traditional Graph Neural Networks (GNNs) primarily utilize the message-passing mechanism, but their expressive power is limited and their prediction lacks explainability. To address these limitations, researchers have focused on graph substruc… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 20 pages, 5 figures

  5. arXiv:2412.04052  [pdf, other

    cs.RO

    Learning Dual-Arm Push and Grasp Synergy in Dense Clutter

    Authors: Yongliang Wang, Hamidreza Kasaei

    Abstract: Robotic grasping in densely cluttered environments is challenging due to scarce collision-free grasp affordances. Non-prehensile actions can increase feasible grasps in cluttered environments, but most research focuses on single-arm rather than dual-arm manipulation. Policies from single-arm systems fail to fully leverage the advantages of dual-arm coordination. We propose a target-oriented hierar… ▽ More

    Submitted 2 April, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

  6. arXiv:2411.12503  [pdf, other

    cs.RO

    ManiSkill-ViTac 2025: Challenge on Manipulation Skill Learning With Vision and Tactile Sensing

    Authors: Chuanyu Li, Renjun Dang, Xiang Li, Zhiyuan Wu, Jing Xu, Hamidreza Kasaei, Roberto Calandra, Nathan Lepora, Shan Luo, Hao Su, Rui Chen

    Abstract: This article introduces the ManiSkill-ViTac Challenge 2025, which focuses on learning contact-rich manipulation skills using both tactile and visual sensing. Expanding upon the 2024 challenge, ManiSkill-ViTac 2025 includes 3 independent tracks: tactile manipulation, tactile-vision fusion manipulation, and tactile sensor structure design. The challenge aims to push the boundaries of robotic manipul… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: Challenge webpage: https://ai-workshops.github.io/maniskill-vitac-challenge-2025/

  7. arXiv:2411.11609  [pdf, other

    cs.RO

    VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

    Authors: Bangguo Yu, Yuzhen Liu, Lei Han, Hamidreza Kasaei, Tingguang Li, Ming Cao

    Abstract: Following human instructions to explore and search for a specified target in an unfamiliar environment is a crucial skill for mobile service robots. Most of the previous works on object goal navigation have typically focused on a single input modality as the target, which may lead to limited consideration of language descriptions containing detailed attributes and spatial relationships. To address… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

    Comments: 15 pages, 9 figures

  8. arXiv:2410.04302  [pdf, other

    cs.RO

    PANav: Toward Privacy-Aware Robot Navigation via Vision-Language Models

    Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao

    Abstract: Navigating robots discreetly in human work environments while considering the possible privacy implications of robotic tasks presents significant challenges. Such scenarios are increasingly common, for instance, when robots transport sensitive objects that demand high levels of privacy in spaces crowded with human activities. While extensive research has been conducted on robotic path planning and… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: 7 pages, 6 figures, conference

  9. arXiv:2410.03522  [pdf, other

    cs.RO

    HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments

    Authors: Songsong Xiong, Hamidreza Kasaei

    Abstract: Robot grasping, whether handling isolated objects, cluttered items, or stacked objects, plays a critical role in industrial and service applications. However, current visual grasp detection methods based on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) often struggle to adapt to diverse scenarios, as they tend to emphasize either local or global features exclusively, neglecti… ▽ More

    Submitted 9 March, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  10. arXiv:2410.03031  [pdf, other

    cs.RO

    Single-Shot 6DoF Pose and 3D Size Estimation for Robotic Strawberry Harvesting

    Authors: Lun Li, Hamidreza Kasaei

    Abstract: In this study, we introduce a deep-learning approach for determining both the 6DoF pose and 3D size of strawberries, aiming to significantly augment robotic harvesting efficiency. Our model was trained on a synthetic strawberry dataset, which is automatically generated within the Ignition Gazebo simulator, with a specific focus on the inherent symmetry exhibited by strawberries. By leveraging doma… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted at IROS 2024

  11. arXiv:2407.21244  [pdf, other

    cs.RO cs.AI cs.CV

    VITAL: Interactive Few-Shot Imitation Learning via Visual Human-in-the-Loop Corrections

    Authors: Hamidreza Kasaei, Mohammadreza Kasaei

    Abstract: Imitation Learning (IL) has emerged as a powerful approach in robotics, allowing robots to acquire new skills by mimicking human actions. Despite its potential, the data collection process for IL remains a significant challenge due to the logistical difficulties and high costs associated with obtaining high-quality demonstrations. To address these issues, we propose a large-scale data generation f… ▽ More

    Submitted 21 May, 2025; v1 submitted 30 July, 2024; originally announced July 2024.

  12. arXiv:2406.18746  [pdf, other

    cs.RO

    Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

    Authors: Georgios Tziafas, Hamidreza Kasaei

    Abstract: Large Language Models (LLMs) have emerged as a new paradigm for embodied reasoning and control, most recently by generating robot policy code that utilizes a custom library of vision and control primitive skills. However, prior arts fix their skills library and steer the LLM with carefully hand-crafted prompt engineering, limiting the agent to a stationary range of addressable tasks. In this work,… ▽ More

    Submitted 15 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: ICRA 2024

  13. arXiv:2406.18742  [pdf, other

    cs.CV cs.RO

    3D Feature Distillation with Object-Centric Priors

    Authors: Georgios Tziafas, Yucheng Xu, Zhibin Li, Hamidreza Kasaei

    Abstract: Grounding natural language to the physical world is a ubiquitous topic with a wide range of applications in computer vision and robotics. Recently, 2D vision-language models such as CLIP have been widely popularized, due to their impressive capabilities for open-vocabulary grounding in 2D images. Recent works aim to elevate 2D CLIP features to 3D via feature distillation, but either learn neural f… ▽ More

    Submitted 5 October, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.18722  [pdf, other

    cs.RO cs.CV

    Towards Open-World Grasping with Large Vision-Language Models

    Authors: Georgios Tziafas, Hamidreza Kasaei

    Abstract: The ability to grasp objects in-the-wild from open-ended language instructions constitutes a fundamental challenge in robotics. An open-world grasping system should be able to combine high-level contextual with low-level physical-geometric reasoning in order to be applicable in arbitrary scenarios. Recent works exploit the web-scale knowledge inherent in large language models (LLMs) to plan and re… ▽ More

    Submitted 13 October, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 8th Conference on Robot Learning (CoRL 2024), Munich, Germany

  15. arXiv:2402.16045  [pdf, other

    cs.RO

    Harnessing the Synergy between Pushing, Grasping, and Throwing to Enhance Object Manipulation in Cluttered Scenarios

    Authors: Hamidreza Kasaei, Mohammadreza Kasaei

    Abstract: In this work, we delve into the intricate synergy among non-prehensile actions like pushing, and prehensile actions such as grasping and throwing, within the domain of robotic manipulation. We introduce an innovative approach to learning these synergies by leveraging model-free deep reinforcement learning. The robot's workflow involves detecting the pose of the target object and the basket at each… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  16. arXiv:2311.05779  [pdf, other

    cs.RO cs.CV

    Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter

    Authors: Georgios Tziafas, Yucheng Xu, Arushi Goel, Mohammadreza Kasaei, Zhibin Li, Hamidreza Kasaei

    Abstract: Robots operating in human-centric environments require the integration of visual grounding and grasping capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that firs… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Poster CoRL 2023. Dataset and code available here: https://github.com/gtziafas/OCID-VLG

  17. Anchor Space Optimal Transport as a Fast Solution to Multiple Optimal Transport Problems

    Authors: Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai

    Abstract: In machine learning, Optimal Transport (OT) theory is extensively utilized to compare probability distributions across various applications, such as graph data represented by node distributions and image data represented by pixel distributions. In practical scenarios, it is often necessary to solve multiple OT problems. Traditionally, these problems are treated independently, with each OT problem… ▽ More

    Submitted 29 January, 2025; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: 26 pages, 4 figures, 6 tables

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems, early access(2024)1-12

  18. arXiv:2310.07937  [pdf, other

    cs.RO cs.AI

    Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

    Authors: Bangguo Yu, Qihao Yuan, Kailai Li, Hamidreza Kasaei, Ming Cao

    Abstract: Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing approaches lack common-sense reasoning and are typically designed for single-robot settings, leading to reduced efficiency and robustness in complex environments. To… ▽ More

    Submitted 6 May, 2025; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures

  19. arXiv:2307.00247  [pdf, other

    math.OC cs.LG

    Safe Screening for Unbalanced Optimal Transport

    Authors: Xun Su, Zhongxi Fang, Hiroyuki Kasai

    Abstract: This paper introduces a framework that utilizes the Safe Screening technique to accelerate the optimization process of the Unbalanced Optimal Transport (UOT) problem by proactively identifying and eliminating zero elements in the sparse solutions. We demonstrate the feasibility of applying Safe Screening to the UOT problem with $\ell_2$-penalty and KL-penalty by conducting an analysis of the solut… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  20. arXiv:2306.15919  [pdf, other

    cs.CV cs.AI

    Fine-grained 3D object recognition: an approach and experiments

    Authors: Junhyung Jo, Hamidreza Kasaei

    Abstract: Three-dimensional (3D) object recognition technology is being used as a core technology in advanced technologies such as autonomous driving of automobiles. There are two sets of approaches for 3D object recognition: (i) hand-crafted approaches like Global Orthographic Object Descriptor (GOOD), and (ii) deep learning-based approaches such as MobileNet and VGG. However, it is needed to know which of… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  21. arXiv:2304.07236  [pdf, other

    cs.RO

    Learning Perceptive Bipedal Locomotion over Irregular Terrain

    Authors: Bart van Marum, Matthia Sabatelli, Hamidreza Kasaei

    Abstract: In this paper we propose a novel bipedal locomotion controller that uses noisy exteroception to traverse a wide variety of terrains. Building on the cutting-edge advancements in attention based belief encoding for quadrupedal locomotion, our work extends these methods to the bipedal domain, resulting in a robust and reliable internal belief of the terrain ahead despite noisy sensor inputs. Additio… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: 8 pages, 10 figures

  22. Frontier Semantic Exploration for Visual Target Navigation

    Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao

    Abstract: This work focuses on the problem of visual target navigation, which is very important for autonomous robots as it is closely related to high-level tasks. To find a special object in unknown environments, classical and learning-based approaches are fundamental components of navigation that have been investigated thoroughly in the past. However, due to the difficulty in the representation of complic… ▽ More

    Submitted 25 December, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: 7 pages

    Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

  23. L3MVN: Leveraging Large Language Models for Visual Target Navigation

    Authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao

    Abstract: Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and layouts. Prior state-of-the-art approaches to this task rely on learning the priors during the training and typically require significant expensive resources and time… ▽ More

    Submitted 25 December, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: 7 pages

    Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  24. arXiv:2303.05323  [pdf, other

    cs.CV

    Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE

    Authors: Yucheng Xu, Li Nanbo, Arushi Goel, Zijian Guo, Zonghai Yao, Hamidreza Kasaei, Mohammadreze Kasaei, Zhibin Li

    Abstract: Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework le… ▽ More

    Submitted 4 April, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  25. arXiv:2302.07824  [pdf, other

    cs.RO

    Instance-wise Grasp Synthesis for Robotic Grasping

    Authors: Yucheng Xu, Mohammadreza Kasaei, Hamidreza Kasaei, Zhibin Li

    Abstract: Generating high-quality instance-wise grasp configurations provides critical information of how to grasp specific objects in a multi-object environment and is of high importance for robot manipulation tasks. This work proposed a novel \textbf{S}ingle-\textbf{S}tage \textbf{G}rasp (SSG) synthesis network, which performs high-quality instance-wise grasp synthesis in a single stage: instance mask and… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  26. arXiv:2301.07037  [pdf, other

    cs.CV cs.AI

    Explain What You See: Open-Ended Segmentation and Recognition of Occluded 3D Objects

    Authors: H. Ayoobi, H. Kasaei, M. Cao, R. Verbrugge, B. Verheij

    Abstract: Local-HDP (for Local Hierarchical Dirichlet Process) is a hierarchical Bayesian method that has recently been used for open-ended 3D object category recognition. This method has been proven to be efficient in real-time robotic applications. However, the method is not robust to a high degree of occlusion. We address this limitation in two steps. First, we propose a novel semantic 3D object-parts se… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: Accepted at ICRA 2023 Conference

  27. arXiv:2210.04613  [pdf, other

    cs.CV cs.AI

    Enhancing Fine-Grained 3D Object Recognition using Hybrid Multi-Modal Vision Transformer-CNN Models

    Authors: Songsong Xiong, Georgios Tziafas, Hamidreza Kasaei

    Abstract: Robots operating in human-centered environments, such as retail stores, restaurants, and households, are often required to distinguish between similar objects in different contexts with a high degree of accuracy. However, fine-grained object recognition remains a challenge in robotics due to the high intra-category and low inter-category dissimilarities. In addition, the limited number of fine-gra… ▽ More

    Submitted 6 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  28. arXiv:2210.03628  [pdf, other

    cs.RO cs.AI

    GraspCaps: A Capsule Network Approach for Familiar 6DoF Object Grasping

    Authors: Tomas van der Velde, Hamed Ayoobi, Hamidreza Kasaei

    Abstract: As robots become more widely available outside industrial settings, the need for reliable object grasping and manipulation is increasing. In such environments, robots must be able to grasp and manipulate novel objects in various situations. This paper presents GraspCaps, a novel architecture based on Capsule Networks for generating per-point 6D grasp configurations for familiar objects. GraspCaps… ▽ More

    Submitted 29 November, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: Submitted to CVPR 2023, Supplementary video: https://youtu.be/d13rEhKgApI?si=EhgbDI84nlXL5V2M

  29. arXiv:2210.00858  [pdf, other

    cs.RO cs.AI cs.HC

    Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach

    Authors: Georgios Tziafas, Hamidreza Kasaei

    Abstract: In this paper we present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation. A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA), or a grasp action instruction. The system tackles all cases in a task-agnostic fashion through the utilization of a shared library of… ▽ More

    Submitted 7 May, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Submitted T-RO

  30. arXiv:2210.00843  [pdf, other

    cs.CV cs.RO

    Early or Late Fusion Matters: Efficient RGB-D Fusion in Vision Transformers for 3D Object Recognition

    Authors: Georgios Tziafas, Hamidreza Kasaei

    Abstract: The Vision Transformer (ViT) architecture has established its place in computer vision literature, however, training ViTs for RGB-D object recognition remains an understudied topic, viewed in recent literature only through the lens of multi-task pretraining in multiple vision modalities. Such approaches are often computationally intensive, relying on the scale of multiple pretraining datasets to a… ▽ More

    Submitted 7 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Submitted IROS 23. Supplementary video here: https://youtu.be/L2gkDPkHsfo

  31. arXiv:2210.00803  [pdf, other

    cs.RO cs.AI

    IPPO: Obstacle Avoidance for Robotic Manipulators in Joint Space via Improved Proximal Policy Optimization

    Authors: Yongliang Wang, Hamidreza Kasaei

    Abstract: Reaching tasks with random targets and obstacles is a challenging task for robotic manipulators. In this study, we propose a novel model-free reinforcement learning approach based on proximal policy optimization (PPO) for training a deep policy to map the task space to the joint space of a 6-DoF manipulator. To facilitate the training process in a large workspace, we develop an efficient represent… ▽ More

    Submitted 9 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  32. arXiv:2210.00609  [pdf, other

    cs.RO

    Throwing Objects into A Moving Basket While Avoiding Obstacles

    Authors: Hamidreza Kasaei, Mohammadreza Kasaei

    Abstract: The capabilities of a robot will be increased significantly by exploiting throwing behavior. In particular, throwing will enable robots to rapidly place the object into the target basket, located outside its feasible kinematic space, without traveling to the desired location. In previous approaches, the robot often learned a parameterized throwing kernel through analytical approaches, imitation le… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    Comments: The video of our experiments can be found at https://youtu.be/VmIFF__c_84

  33. arXiv:2207.04216  [pdf, other

    cs.LG cs.AI

    Wasserstein Graph Distance Based on $L_1$-Approximated Tree Edit Distance between Weisfeiler-Lehman Subtrees

    Authors: Zhongxi Fang, Jianming Huang, Xun Su, Hiroyuki Kasai

    Abstract: The Weisfeiler-Lehman (WL) test is a widely used algorithm in graph machine learning, including graph kernels, graph metrics, and graph neural networks. However, it focuses only on the consistency of the graph, which means that it is unable to detect slight structural differences. Consequently, this limits its ability to capture structural information, which also limits the performance of existing… ▽ More

    Submitted 1 May, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

  34. arXiv:2205.13846  [pdf, ps, other

    cs.LG cs.AI math.OC

    On the Convergence of Semi-Relaxed Sinkhorn with Marginal Constraint and OT Distance Gaps

    Authors: Takumi Fukunaga, Hiroyuki Kasai

    Abstract: This paper presents consideration of the Semi-Relaxed Sinkhorn (SR-Sinkhorn) algorithm for the semi-relaxed optimal transport (SROT) problem, which relaxes one marginal constraint of the standard OT problem. For evaluation of how the constraint relaxation affects the algorithm behavior and solution, it is vitally necessary to present the theoretical convergence analysis in terms not only of the fu… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  35. Block-coordinate Frank-Wolfe algorithm and convergence analysis for semi-relaxed optimal transport problem

    Authors: Takumi Fukunaga, Hiroyuki Kasai

    Abstract: The optimal transport (OT) problem has been used widely for machine learning. It is necessary for computation of an OT problem to solve linear programming with tight mass-conservation constraints. These constraints prevent its application to large-scale problems. To address this issue, loosening such constraints enables us to propose the relaxed-OT method using a faster algorithm. This approach ha… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2022). arXiv admin note: substantial text overlap with arXiv:2103.05857

  36. arXiv:2205.12089  [pdf, other

    cs.CV cs.AI

    Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution

    Authors: Georgios Tziafas, Hamidreza Kasaei

    Abstract: Service robots should be able to interact naturally with non-expert human users, not only to help them in various tasks but also to receive guidance in order to resolve ambiguities that might be present in the instruction. We consider the task of visual grounding, where the agent segments an object from a crowded scene given a natural language description. Modern holistic approaches to visual grou… ▽ More

    Submitted 10 July, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted CoLLAs 2022

  37. arXiv:2205.01982  [pdf, other

    cs.RO cs.LG

    Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition

    Authors: Hamidreza Kasaei, Songsong Xiong

    Abstract: Service robots are integrating more and more into our daily lives to help us with various tasks. In such environments, robots frequently face new objects while working in the environment and need to learn them in an open-ended fashion. Furthermore, such robots must be able to recognize a wide range of object categories. In this paper, we present a lifelong ensemble learning approach based on multi… ▽ More

    Submitted 9 January, 2024; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: The paper has been accepted for publication in the Robotics and Autonomous Systems journal

  38. arXiv:2203.02511  [pdf, other

    cs.RO

    Self-Supervised Learning for Joint Pushing and Grasping Policies in Highly Cluttered Environments

    Authors: Yongliang Wang, Kamal Mokhtar, Cock Heemskerk, Hamidreza Kasaei

    Abstract: Robots often face situations where grasping a goal object is desirable but not feasible due to other present objects preventing the grasp action. We present a deep Reinforcement Learning approach to learn grasping and pushing policies for manipulating a goal object in highly cluttered environments to address this problem. In particular, a dual Reinforcement Learning model approach is proposed, whi… ▽ More

    Submitted 16 March, 2024; v1 submitted 4 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted for publication at the ICRA2024 conference

  39. arXiv:2109.11544  [pdf, other

    cs.RO cs.CV cs.LG

    Lifelong 3D Object Recognition and Grasp Synthesis Using Dual Memory Recurrent Self-Organization Networks

    Authors: Krishnakumar Santhakumar, Hamidreza Kasaei

    Abstract: Humans learn to recognize and manipulate new objects in lifelong settings without forgetting the previously gained knowledge under non-stationary and sequential conditions. In autonomous systems, the agents also need to mitigate similar behavior to continually learn the new object categories and adapt to new environments. In most conventional deep neural networks, this is not possible due to the p… ▽ More

    Submitted 23 January, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

  40. arXiv:2106.01866  [pdf, other

    cs.RO cs.CV

    Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

    Authors: Hamidreza Kasaei, Sha Luo, Remo Sasso, Mohammadreza Kasaei

    Abstract: To aid humans in everyday tasks, robots need to know which objects exist in the scene, where they are, and how to grasp and manipulate them in different situations. Therefore, object recognition and grasping are two key functionalities for autonomous robots. Most state-of-the-art approaches treat object recognition and grasping as two separate problems, even though both use visual input. Furthermo… ▽ More

    Submitted 6 December, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:2103.10997

  41. arXiv:2103.13834  [pdf, other

    cs.RO cs.AI

    Self-Imitation Learning by Planning

    Authors: Sha Luo, Hamidreza Kasaei, Lambert Schomaker

    Abstract: Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. I… ▽ More

    Submitted 26 March, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

  42. arXiv:2103.10997  [pdf, other

    cs.RO

    MVGrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments

    Authors: Hamidreza Kasaei, Mohammadreza Kasaei

    Abstract: Nowadays robots play an increasingly important role in our daily life. In human-centered environments, robots often encounter piles of objects, packed items, or isolated objects. Therefore, a robot must be able to grasp and manipulate different objects in various situations to help humans with daily tasks. In this paper, we propose a multi-view deep learning approach to handle robust object graspi… ▽ More

    Submitted 5 October, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: The video of our experiments can be found here: https://youtu.be/c-4lzjbF7fY

  43. arXiv:2103.09863  [pdf, other

    cs.RO

    MORE: Simultaneous Multi-View 3D Object Recognition and Pose Estimation

    Authors: Tommaso Parisotto, Subhaditya Mukherjee, Hamidreza Kasaei

    Abstract: Simultaneous object recognition and pose estimation are two key functionalities for robots to safely interact with humans as well as environments. Although both object recognition and pose estimation use visual input, most state-of-the-art tackles them as two separate problems since the former needs a view-invariant representation while object pose estimation necessitates a view-dependent descript… ▽ More

    Submitted 7 April, 2023; v1 submitted 17 March, 2021; originally announced March 2021.

  44. arXiv:2103.09720  [pdf, other

    cs.CV cs.AI

    Few-Shot Visual Grounding for Natural Human-Robot Interaction

    Authors: Giorgos Tziafas, Hamidreza Kasaei

    Abstract: Natural Human-Robot Interaction (HRI) is one of the key components for service robots to be able to work in human-centric environments. In such dynamic environments, the robot needs to understand the intention of the user to accomplish a task successfully. Towards addressing this point, we propose a software architecture that segments a target object from a crowded scene, indicated verbally by a h… ▽ More

    Submitted 31 March, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

    Comments: 6 pages, 4 figures, ICARSC2021 accepted

  45. arXiv:2103.05857  [pdf, ps, other

    cs.LG math.OC

    Fast block-coordinate Frank-Wolfe algorithm for semi-relaxed optimal transport

    Authors: Takumi Fukunaga, Hiroyuki Kasai

    Abstract: Optimal transport (OT), which provides a distance between two probability distributions by considering their spatial locations, has been applied to widely diverse applications. Computing an OT problem requires solution of linear programming with tight mass-conservation constraints. This requirement hinders its application to large-scale problems. To alleviate this issue, the recently proposed rela… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

  46. arXiv:2103.00902  [pdf, other

    cs.LG math.OC

    Manifold optimization for non-linear optimal transport problems

    Authors: Bamdev Mishra, N T V Satyadev, Hiroyuki Kasai, Pratik Jawanpuria

    Abstract: Optimal transport (OT) has recently found widespread interest in machine learning. It allows to define novel distances between probability measures, which have shown promise in several applications. In this work, we discuss how to computationally approach general non-linear OT problems within the framework of Riemannian manifold optimization. The basis of this is the manifold of doubly stochastic… ▽ More

    Submitted 8 October, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: technical report, change is title, addition of experiments

  47. arXiv:2012.03612  [pdf, ps, other

    cs.LG cs.AI cs.DS stat.ML

    LCS Graph Kernel Based on Wasserstein Distance in Longest Common Subsequence Metric Space

    Authors: Jianming Huang, Zhongxi Fang, Hiroyuki Kasai

    Abstract: For graph learning tasks, many existing methods utilize a message-passing mechanism where vertex features are updated iteratively by aggregation of neighbor information. This strategy provides an efficient means for graph features extraction, but obtained features after many iterations might contain too much information from other vertices, and tend to be similar to each other. This makes their re… ▽ More

    Submitted 29 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Journal ref: Signal Processing, Vol.189, 2021

  48. arXiv:2011.12542  [pdf, ps, other

    cs.LG stat.ML

    Wasserstein k-means with sparse simplex projection

    Authors: Takumi Fukunaga, Hiroyuki Kasai

    Abstract: This paper presents a proposal of a faster Wasserstein $k$-means algorithm for histogram data by reducing Wasserstein distance computations and exploiting sparse simplex projection. We shrink data samples, centroids, and the ground cost matrix, which leads to considerable reduction of the computations used to solve optimal transport problems without loss of clustering quality. Furthermore, we dyna… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: Accepted in ICPR2020

  49. arXiv:2011.12532  [pdf, ps, other

    cs.LG stat.ML

    Consistency-aware and Inconsistency-aware Graph-based Multi-view Clustering

    Authors: Mitsuhiko Horie, Hiroyuki Kasai

    Abstract: Multi-view data analysis has gained increasing popularity because multi-view data are frequently encountered in machine learning applications. A simple but promising approach for clustering of multi-view data is multi-view clustering (MVC), which has been developed extensively to classify given subjects into some clustered groups by learning latent common features that are shared across multi-view… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: Accepted in EUSIPCO2020

  50. arXiv:2010.14773  [pdf, ps, other

    cs.LG

    Graph embedding using multi-layer adjacent point merging model

    Authors: Jianming Huang, Hiroyuki Kasai

    Abstract: For graph classification tasks, many traditional kernel methods focus on measuring the similarity between graphs. These methods have achieved great success on resolving graph isomorphism problems. However, in some classification problems, the graph class depends on not only the topological similarity of the whole graph, but also constituent subgraph patterns. To this end, we propose a novel graph… ▽ More

    Submitted 17 February, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021). arXiv admin note: text overlap with arXiv:2012.03612