-
Quantitative Hardness Assessment with Vision-based Tactile Sensing for Fruit Classification and Grasping
Authors:
Zhongyuan Liao,
Yipai Du,
Jianghua Duan,
Haobo Liang,
Michael Yu Wang
Abstract:
Accurate estimation of fruit hardness is essential for automated classification and handling systems, particularly in determining fruit variety, assessing ripeness, and ensuring proper harvesting force. This study presents an innovative framework for quantitative hardness assessment utilizing vision-based tactile sensing, tailored explicitly for robotic applications in agriculture. The proposed me…
▽ More
Accurate estimation of fruit hardness is essential for automated classification and handling systems, particularly in determining fruit variety, assessing ripeness, and ensuring proper harvesting force. This study presents an innovative framework for quantitative hardness assessment utilizing vision-based tactile sensing, tailored explicitly for robotic applications in agriculture. The proposed methodology derives normal force estimation from a vision-based tactile sensor, and, based on the dynamics of this normal force, calculates the hardness. This approach offers a rapid, non-destructive evaluation through single-contact interaction. The integration of this framework into robotic systems enhances real-time adaptability of grasping forces, thereby reducing the likelihood of fruit damage. Moreover, the general applicability of this approach, through a universal criterion based on average normal force dynamics, ensures its effectiveness across a wide variety of fruit types and sizes. Extensive experimental validation conducted across different fruit types and ripeness-tracking studies demonstrates the efficacy and robustness of the framework, marking a significant advancement in the domain of automated fruit handling.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
Authors:
Qi Lv,
Hao Li,
Xiang Deng,
Rui Shao,
Yinchuan Li,
Jianye Hao,
Longxiang Gao,
Michael Yu Wang,
Liqiang Nie
Abstract:
Despite the significant success of imitation learning in robotic manipulation, its application to bimanual tasks remains highly challenging. Existing approaches mainly learn a policy to predict a distant next-best end-effector pose (NBP) and then compute the corresponding joint rotation angles for motion using inverse kinematics. However, they suffer from two important issues: (1) rarely consideri…
▽ More
Despite the significant success of imitation learning in robotic manipulation, its application to bimanual tasks remains highly challenging. Existing approaches mainly learn a policy to predict a distant next-best end-effector pose (NBP) and then compute the corresponding joint rotation angles for motion using inverse kinematics. However, they suffer from two important issues: (1) rarely considering the physical robotic structure, which may cause self-collisions or interferences, and (2) overlooking the kinematics constraint, which may result in the predicted poses not conforming to the actual limitations of the robot joints. In this paper, we propose Kinematics enhanced Spatial-TemporAl gRaph Diffuser (KStar Diffuser). Specifically, (1) to incorporate the physical robot structure information into action prediction, KStar Diffuser maintains a dynamic spatial-temporal graph according to the physical bimanual joint motions at continuous timesteps. This dynamic graph serves as the robot-structure condition for denoising the actions; (2) to make the NBP learning objective consistent with kinematics, we introduce the differentiable kinematics to provide the reference for optimizing KStar Diffuser. This module regularizes the policy to predict more reliable and kinematics-aware next end-effector poses. Experimental results show that our method effectively leverages the physical structural information and generates kinematics-aware actions in both simulation and real-world
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Authors:
Kun Zhang,
Peng Yun,
Jun Cen,
Junhao Cai,
Didi Zhu,
Hangjie Yuan,
Chao Zhao,
Tao Feng,
Michael Yu Wang,
Qifeng Chen,
Jia Pan,
Wei Zhang,
Bo Yang,
Hua Chen
Abstract:
This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy le…
▽ More
This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy learning performance across diverse environments. To tackle these challenges, this survey introduces several generative model paradigms, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, probabilistic flow models, and autoregressive models, highlighting their strengths and limitations. The applications of these models are categorized into three hierarchical layers: the Foundation Layer, focusing on data generation and reward generation; the Intermediate Layer, covering language, code, visual, and state generation; and the Policy Layer, emphasizing grasp generation and trajectory generation. Each layer is explored in detail, along with notable works that have advanced the state of the art. Finally, the survey outlines future research directions and challenges, emphasizing the need for improved efficiency in data utilization, better handling of long-horizon tasks, and enhanced generalization across diverse robotic scenarios. All the related resources, including research papers, open-source data, and projects, are collected for the community in https://github.com/GAI4Manipulation/AwesomeGAIManipulation
△ Less
Submitted 10 March, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
Occlusion-Aware Contingency Safety-Critical Planning for Autonomous Vehicles
Authors:
Lei Zheng,
Rui Yang,
Minzhe Zheng,
Zengqi Peng,
Michael Yu Wang,
Jun Ma
Abstract:
Ensuring safe driving while maintaining travel efficiency for autonomous vehicles in dynamic and occluded environments is a critical challenge. This paper proposes an occlusion-aware contingency safety-critical planning approach for real-time autonomous driving in such environments. Leveraging reachability analysis for risk assessment, forward reachable sets of occluded phantom vehicles are comput…
▽ More
Ensuring safe driving while maintaining travel efficiency for autonomous vehicles in dynamic and occluded environments is a critical challenge. This paper proposes an occlusion-aware contingency safety-critical planning approach for real-time autonomous driving in such environments. Leveraging reachability analysis for risk assessment, forward reachable sets of occluded phantom vehicles are computed to quantify dynamic velocity boundaries. These velocity boundaries are incorporated into a biconvex nonlinear programming (NLP) formulation, enabling simultaneous optimization of exploration and fallback trajectories within a receding horizon planning framework. To facilitate real-time optimization and ensure coordination between trajectories, we employ the consensus alternating direction method of multipliers (ADMM) to decompose the biconvex NLP problem into low-dimensional convex subproblems. The effectiveness of the proposed approach is validated through simulation studies and real-world experiments in occluded intersections. Experimental results demonstrate enhanced safety and improved travel efficiency, enabling real-time safe trajectory generation in dynamic occluded intersections under varying obstacle conditions. A video showcasing the experimental results is available at https://youtu.be/CHayG7NChqM.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions
Authors:
Yulin Li,
Zhicheng Song,
Chunxin Zheng,
Zhihai Bi,
Kai Chen,
Michael Yu Wang,
Jun Ma
Abstract:
In this work, we present FRTree planner, a novel robot navigation framework that leverages a tree structure of free regions, specifically designed for navigation in cluttered and unknown environments with narrow passages. The framework continuously incorporates real-time perceptive information to identify distinct navigation options and dynamically expands the tree toward explorable and traversabl…
▽ More
In this work, we present FRTree planner, a novel robot navigation framework that leverages a tree structure of free regions, specifically designed for navigation in cluttered and unknown environments with narrow passages. The framework continuously incorporates real-time perceptive information to identify distinct navigation options and dynamically expands the tree toward explorable and traversable directions. This dynamically constructed tree incrementally encodes the geometric and topological information of the collision-free space, enabling efficient selection of the intermediate goals, navigating around dead-end situations, and avoidance of dynamic obstacles without a prior map. Crucially, our method performs a comprehensive analysis of the geometric relationship between free regions and the robot during online replanning. In particular, the planner assesses the accessibility of candidate passages based on the robot's geometries, facilitating the effective selection of the most viable intermediate goals through accessible narrow passages while minimizing unnecessary detours. By combining the free region information with a bi-level trajectory optimization tailored for robots with specific geometries, our approach generates robust and adaptable obstacle avoidance strategies in confined spaces. Through extensive simulations and real-world experiments, FRTree demonstrates its superiority over benchmark methods in generating safe, efficient motion plans through highly cluttered and unknown terrains with narrow gaps.
△ Less
Submitted 13 February, 2025; v1 submitted 26 October, 2024;
originally announced October 2024.
-
Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus Optimization
Authors:
Lei Zheng,
Rui Yang,
Minzhe Zheng,
Michael Yu Wang,
Jun Ma
Abstract:
Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety b…
▽ More
Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety barrier module that ensures reliable safety coverage within the spatiotemporal trajectory space across potential obstacle configurations. Following this, a bi-convex parallel trajectory optimization problem is derived that facilitates decomposition into a series of low-dimensional quadratic programming problems to accelerate computation. By leveraging the consensus alternating direction method of multipliers (ADMM) for parallel optimization, each generated candidate trajectory corresponds to a possible environment configuration while sharing a common consensus trajectory segment. This ensures driving safety and consistency when executing the consensus trajectory segment for the ego vehicle in real time. We validate our CPTO framework through extensive comparisons with state-of-the-art baselines across multiple driving tasks in partially observable environments. Our results demonstrate improved safety and consistency using both synthetic and real-world traffic datasets.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Distributed Motion Control of Multiple Mobile Manipulators for Reducing Interaction Wrench in Object Manipulation
Authors:
Wenhang Liu,
Meng Ren,
Kun Song,
Gaoming Chen,
Michael Yu Wang,
Zhenhua Xiong
Abstract:
In real-world cooperative manipulation of objects, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction wrenches and potentially causing object damage or emergency stops. Existing methods often rely on torque control and dynamic models, which are uncommon in many industrial robots and settings. Additionally, dynamic models often neglect…
▽ More
In real-world cooperative manipulation of objects, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction wrenches and potentially causing object damage or emergency stops. Existing methods often rely on torque control and dynamic models, which are uncommon in many industrial robots and settings. Additionally, dynamic models often neglect joint friction forces and are not accurate. These methods are challenging to implement and validate in physical systems. To address the problems, this paper presents a novel distributed motion control approach aimed at reducing these unnecessary interaction wrenches. The control law is only based on local information and joint velocity control to enhance practical applicability. The communication delays within the distributed architecture are considered. The stability of the control law is rigorously proven by the Lyapunov theorem. In the simulations, the effectiveness is shown, and the impact of communication graph connectivity and communication delays has been studied. A comparison with other methods shows the advantages of the proposed control law in terms of convergence speed and robustness. Finally, the control law has been validated in physical experiments. It does not require dynamic modeling or torque control, and thus is more user-friendly for physical robots.
△ Less
Submitted 7 April, 2025; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Authors:
Qi Lv,
Xiang Deng,
Gongwei Chen,
Michael Yu Wang,
Liqiang Nie
Abstract:
While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these…
▽ More
While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these studies still fail to overcome the following challenges: (1) insufficiently utilizing the historical temporal information among inter-steps, (2) overlooking the local intrastep relationships among return-to-gos (RTGs), states, and actions, (3) overfitting suboptimal trajectories with noisy labels. To address these challenges, we propose Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden state to extract the temporal information by using the mamba architecture. To capture the relationship among RTG-state-action triplets, a fine-grained SSM module is designed and integrated into the original coarse-grained SSM in mamba, resulting in a novel mamba architecture tailored for offline RL. Finally, to mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization. The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations. Extensive experiments on various tasks show that DM outperforms other baselines substantially.
△ Less
Submitted 22 January, 2025; v1 submitted 8 June, 2024;
originally announced June 2024.
-
RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
Authors:
Qi Wang,
Ruijie Lu,
Xudong Xu,
Jingbo Wang,
Michael Yu Wang,
Bo Dai,
Gang Zeng,
Dan Xu
Abstract:
The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, t…
▽ More
The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available at https://qwang666.github.io/RoomTex/.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Collision-Free Trajectory Optimization in Cluttered Environments Using Sums-of-Squares Programming
Authors:
Yulin Li,
Chunxin Zheng,
Kai Chen,
Yusen Xie,
Xindong Tang,
Michael Yu Wang,
Jun Ma
Abstract:
In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence o…
▽ More
In this work, we propose a trajectory optimization approach for robot navigation in cluttered 3D environments. We represent the robot's geometry as a semialgebraic set defined by polynomial inequalities such that robots with general shapes can be suitably characterized. To address the robot navigation task in obstacle-dense environments, we exploit the free space directly to construct a sequence of free regions, and allocate each waypoint on the trajectory to a specific region. Then, we incorporate a uniform scaling factor for each free region, and formulate a Sums-of-Squares (SOS) optimization problem that renders the containment relationship between the robot and the free space computationally tractable. The SOS optimization problem is further reformulated to a semidefinite program (SDP), and the collision-free constraints are shown to be equivalent to limiting the scaling factor along the entire trajectory. In this context, the robot at a specific configuration is tailored to stay within the free region. Next, to solve the trajectory optimization problem with the proposed safety constraints (which are implicitly dependent on the robot configurations), we derive the analytical solution to the gradient of the minimum scaling factor with respect to the robot configuration. As a result, this seamlessly facilitates the use of gradient-based methods in efficient solving of the trajectory optimization problem. Through a series of simulations and real-world experiments, the proposed trajectory optimization approach is validated in various challenging scenarios, and the results demonstrate its effectiveness in generating collision-free trajectories in dense and intricate environments populated with obstacles. Our code is available at: https://github.com/lyl00/minimum_scaling_free_region
△ Less
Submitted 26 August, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Authors:
Qi Lv,
Hao Li,
Xiang Deng,
Rui Shao,
Michael Yu Wang,
Liqiang Nie
Abstract:
Multimodal Large Language Models (MLLMs) have shown impressive reasoning abilities and general intelligence in various domains. It inspires researchers to train end-to-end MLLMs or utilize large models to generate policies with human-selected prompts for embodied agents. However, these methods exhibit limited generalization capabilities on unseen tasks or scenarios, and overlook the multimodal env…
▽ More
Multimodal Large Language Models (MLLMs) have shown impressive reasoning abilities and general intelligence in various domains. It inspires researchers to train end-to-end MLLMs or utilize large models to generate policies with human-selected prompts for embodied agents. However, these methods exhibit limited generalization capabilities on unseen tasks or scenarios, and overlook the multimodal environment information which is critical for robots to make decisions. In this paper, we introduce a novel Robotic Multimodal Perception-Planning (RoboMP$^2$) framework for robotic manipulation which consists of a Goal-Conditioned Multimodal Preceptor (GCMP) and a Retrieval-Augmented Multimodal Planner (RAMP). Specially, GCMP captures environment states by employing a tailored MLLMs for embodied agents with the abilities of semantic reasoning and localization. RAMP utilizes coarse-to-fine retrieval method to find the $k$ most-relevant policies as in-context demonstrations to enhance the planner. Extensive experiments demonstrate the superiority of RoboMP$^2$ on both VIMA benchmark and real-world tasks, with around 10% improvement over the baselines.
△ Less
Submitted 8 June, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Incremental Bayesian Learning for Fail-Operational Control in Autonomous Driving
Authors:
Lei Zheng,
Rui Yang,
Zengqi Peng,
Wei Yan,
Michael Yu Wang,
Jun Ma
Abstract:
Abrupt maneuvers by surrounding vehicles (SVs) can typically lead to safety concerns and affect the task efficiency of the ego vehicle (EV), especially with model uncertainties stemming from environmental disturbances. This paper presents a real-time fail-operational controller that ensures the asymptotic convergence of an uncertain EV to a safe state, while preserving task efficiency in dynamic e…
▽ More
Abrupt maneuvers by surrounding vehicles (SVs) can typically lead to safety concerns and affect the task efficiency of the ego vehicle (EV), especially with model uncertainties stemming from environmental disturbances. This paper presents a real-time fail-operational controller that ensures the asymptotic convergence of an uncertain EV to a safe state, while preserving task efficiency in dynamic environments. An incremental Bayesian learning approach is developed to facilitate online learning and inference of changing environmental disturbances. Leveraging disturbance quantification and constraint transformation, we develop a stochastic fail-operational barrier based on the control barrier function (CBF). With this development, the uncertain EV is able to converge asymptotically from an unsafe state to a defined safe state with probabilistic stability. Subsequently, the stochastic fail-operational barrier is integrated into an efficient fail-operational controller based on quadratic programming (QP). This controller is tailored for the EV operating under control constraints in the presence of environmental disturbances, with both safety and efficiency objectives taken into consideration. We validate the proposed framework in connected cruise control (CCC) tasks, where SVs perform aggressive driving maneuvers. The simulation results demonstrate that our method empowers the EV to swiftly return to a safe state while upholding task efficiency in real time, even under time-varying environmental disturbances.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Barrier-Enhanced Parallel Homotopic Trajectory Optimization for Safety-Critical Autonomous Driving
Authors:
Lei Zheng,
Rui Yang,
Michael Yu Wang,
Jun Ma
Abstract:
Enforcing safety while preventing overly conservative behaviors is essential for autonomous vehicles to achieve high task performance. In this paper, we propose a barrier-enhanced parallel homotopic trajectory optimization (BPHTO) approach with the over-relaxed alternating direction method of multipliers (ADMM) for real-time integrated decision-making and planning. To facilitate safety interaction…
▽ More
Enforcing safety while preventing overly conservative behaviors is essential for autonomous vehicles to achieve high task performance. In this paper, we propose a barrier-enhanced parallel homotopic trajectory optimization (BPHTO) approach with the over-relaxed alternating direction method of multipliers (ADMM) for real-time integrated decision-making and planning. To facilitate safety interactions between the ego vehicle (EV) and surrounding vehicles, a spatiotemporal safety module exhibiting bi-convexity is developed on the basis of barrier function. Varying barrier coefficients are adopted for different time steps in a planning horizon to account for the motion uncertainties of surrounding HVs and mitigate conservative behaviors. Additionally, we exploit the discrete characteristics of driving maneuvers to initialize nominal behavior-oriented free-end homotopic trajectories based on reachability analysis, and each trajectory is locally constrained to a specific driving maneuver while sharing the same task objectives. By leveraging the bi-convexity of the safety module and the kinematics of the EV, we formulate the BPHTO as a bi-convex optimization problem. Then constraint transcription and the over-relaxed ADMM are employed to streamline the optimization process, such that multiple trajectories are generated in real time with feasibility guarantees. Through a series of experiments, the proposed development demonstrates improved task accuracy, stability, and consistency in various traffic scenarios using synthetic and real-world traffic datasets.
△ Less
Submitted 9 May, 2025; v1 submitted 15 February, 2024;
originally announced February 2024.
-
CompdVision: Combining Near-Field 3D Visual and Tactile Sensing Using a Compact Compound-Eye Imaging System
Authors:
Lifan Luo,
Boyang Zhang,
Zhijie Peng,
Yik Kin Cheung,
Guanlan Zhang,
Zhigang Li,
Michael Yu Wang,
Hongyu Yu
Abstract:
As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that employs a compound-eye imaging system to combine near-field 3D visual and tactile sensing within a compact form factor. CompdVision utilizes two types of vision units to address diverse sensing needs, eliminating…
▽ More
As automation technologies advance, the need for compact and multi-modal sensors in robotic applications is growing. To address this demand, we introduce CompdVision, a novel sensor that employs a compound-eye imaging system to combine near-field 3D visual and tactile sensing within a compact form factor. CompdVision utilizes two types of vision units to address diverse sensing needs, eliminating the need for complex modality conversion. Stereo units with far-focus lenses can see through the transparent elastomer for depth estimation beyond the contact surface. Simultaneously, tactile units with near-focus lenses track the movement of markers embedded in the elastomer to obtain contact deformation. Experimental results validate the sensor's superior performance in 3D visual and tactile sensing, proving its capability for reliable external object depth estimation and precise measurement of tangential and normal contact forces. The dual modalities and compact design make the sensor a versatile tool for robotic manipulation.
△ Less
Submitted 18 July, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
A Novel Planning Framework for Complex Flipping Manipulation of Multiple Mobile Manipulators
Authors:
Wenhang Liu,
Meng Ren,
Kun Song,
Michael Yu Wang,
Zhenhua Xiong
Abstract:
During complex object manipulation, manipulator systems often face the configuration disconnectivity problem due to closed-chain constraints. Although regrasping can be adopted to get a piecewise connected manipulation, it is a challenging problem to determine whether there is a planning result without regrasping. To address this problem, a novel planning framework is proposed for multiple mobile…
▽ More
During complex object manipulation, manipulator systems often face the configuration disconnectivity problem due to closed-chain constraints. Although regrasping can be adopted to get a piecewise connected manipulation, it is a challenging problem to determine whether there is a planning result without regrasping. To address this problem, a novel planning framework is proposed for multiple mobile manipulator systems. Coordinated platform motions and regrasping motions are proposed to enhance configuration connectivity. Given the object trajectory and the grasping pose set, the planning framework includes three steps. First, inverse kinematics for each mobile manipulator is verified along the given trajectory based on different grasping poses. Coverable trajectory segments are determined for each robot for a specific grasping pose. Second, the trajectory choice problem is formulated into a set cover problem, by which we can quickly determine whether the manipulation can be completed without regrasping or with the minimal regrasping number. Finally, the motions of each mobile manipulator are planned with the assigned trajectory segments using existing methods. Both simulations and experimental results show the performance of the planner in complex flipping manipulation. Additionally, the proposed planner can greatly extend the adaptability of multiple mobile manipulator systems in complex manipulation tasks.
△ Less
Submitted 25 October, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Origami-inspired Bi-directional Actuator with Orthogonal Actuation
Authors:
Shuai Liu,
Sheeraz Athar,
Michael Yu Wang
Abstract:
Origami offers a promising alternative for designing innovative soft robotic actuators. While features of origami, such as bi-directional motion and structural anisotropy, haven't been extensively explored in the past, this letter presents a novel design inspired by origami tubes for a bi-directional actuator. This actuator is capable of moving in two orthogonal directions and has separate channel…
▽ More
Origami offers a promising alternative for designing innovative soft robotic actuators. While features of origami, such as bi-directional motion and structural anisotropy, haven't been extensively explored in the past, this letter presents a novel design inspired by origami tubes for a bi-directional actuator. This actuator is capable of moving in two orthogonal directions and has separate channels throughout its body to control each movement. We introduce a bottom-up design methodology that can also be adapted for other complex movements. The actuator was manufactured using popular 3D printing techniques. To enhance its durability, we experimented with different 3D printing technologies and materials. The actuator's strength was further improved using silicon spin coating, and we compared the performance of coated, uncoated, and silicon-only specimens. The material model was empirically derived by testing specimens on a universal testing machine (UTM). Lastly, we suggest potential applications for these actuators, such as in quadruped robots.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Real-Time Parallel Trajectory Optimization with Spatiotemporal Safety Constraints for Autonomous Driving in Congested Traffic
Authors:
Lei Zheng,
Rui Yang,
Zengqi Peng,
Haichao Liu,
Michael Yu Wang,
Jun Ma
Abstract:
Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the…
▽ More
Multi-modal behaviors exhibited by surrounding vehicles (SVs) can typically lead to traffic congestion and reduce the travel efficiency of autonomous vehicles (AVs) in dense traffic. This paper proposes a real-time parallel trajectory optimization method for the AV to achieve high travel efficiency in dynamic and congested environments. A spatiotemporal safety module is developed to facilitate the safe interaction between the AV and SVs in the presence of trajectory prediction errors resulting from the multi-modal behaviors of the SVs. By leveraging multiple shooting and constraint transcription, we transform the trajectory optimization problem into a nonlinear programming problem, which allows for the use of optimization solvers and parallel computing techniques to generate multiple feasible trajectories in parallel. Subsequently, these spatiotemporal trajectories are fed into a multi-objective evaluation module considering both safety and efficiency objectives, such that the optimal feasible trajectory corresponding to the optimal target lane can be selected. The proposed framework is validated through simulations in a dense and congested driving scenario with multiple uncertain SVs. The results demonstrate that our method enables the AV to safely navigate through a dense and congested traffic scenario while achieving high travel efficiency and task accuracy in real time.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Spatiotemporal Receding Horizon Control with Proactive Interaction Towards Autonomous Driving in Dense Traffic
Authors:
Lei Zheng,
Rui Yang,
Zengqi Peng,
Michael Yu Wang,
Jun Ma
Abstract:
In dense traffic scenarios, ensuring safety while keeping high task performance for autonomous driving is a critical challenge. To address this problem, this paper proposes a computationally-efficient spatiotemporal receding horizon control (ST-RHC) scheme to generate a safe, dynamically feasible, energy-efficient trajectory in control space, where different driving tasks in dense traffic can be a…
▽ More
In dense traffic scenarios, ensuring safety while keeping high task performance for autonomous driving is a critical challenge. To address this problem, this paper proposes a computationally-efficient spatiotemporal receding horizon control (ST-RHC) scheme to generate a safe, dynamically feasible, energy-efficient trajectory in control space, where different driving tasks in dense traffic can be achieved with high accuracy and safety in real time. In particular, an embodied spatiotemporal safety barrier module considering proactive interactions is devised to mitigate the effects of inaccuracies resulting from the trajectory prediction of other vehicles. Subsequently, the motion planning and control problem is formulated as a constrained nonlinear optimization problem, which favorably facilitates the effective use of off-the-shelf optimization solvers in conjunction with multiple shooting. The effectiveness of the proposed ST-RHC scheme is demonstrated through comprehensive comparisons with state-of-the-art algorithms on synthetic and real-world traffic datasets under dense traffic, and the attendant outcome of superior performance in terms of accuracy, efficiency and safety is achieved.
△ Less
Submitted 26 May, 2024; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Flipbot: Learning Continuous Paper Flipping via Coarse-to-Fine Exteroceptive-Proprioceptive Exploration
Authors:
Chao Zhao,
Chunli Jiang,
Junhao Cai,
Michael Yu Wang,
Hongyu Yu,
Qifeng Chen
Abstract:
This paper tackles the task of singulating and grasping paper-like deformable objects. We refer to such tasks as paper-flipping. In contrast to manipulating deformable objects that lack compression strength (such as shirts and ropes), minor variations in the physical properties of the paper-like deformable objects significantly impact the results, making manipulation highly challenging. Here, we p…
▽ More
This paper tackles the task of singulating and grasping paper-like deformable objects. We refer to such tasks as paper-flipping. In contrast to manipulating deformable objects that lack compression strength (such as shirts and ropes), minor variations in the physical properties of the paper-like deformable objects significantly impact the results, making manipulation highly challenging. Here, we present Flipbot, a novel solution for flipping paper-like deformable objects. Flipbot allows the robot to capture object physical properties by integrating exteroceptive and proprioceptive perceptions that are indispensable for manipulating deformable objects. Furthermore, by incorporating a proposed coarse-to-fine exploration process, the system is capable of learning the optimal control parameters for effective paper-flipping through proprioceptive and exteroceptive inputs. We deploy our method on a real-world robot with a soft gripper and learn in a self-supervised manner. The resulting policy demonstrates the effectiveness of Flipbot on paper-flipping tasks with various settings beyond the reach of prior studies, including but not limited to flipping pages throughout a book and emptying paper sheets in a box.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Learn to Grasp via Intention Discovery and its Application to Challenging Clutter
Authors:
Chao Zhao,
Chunli Jiang,
Junhao Cai,
Hongyu Yu,
Michael Yu Wang,
Qifeng Chen
Abstract:
Humans excel in grasping objects through diverse and robust policies, many of which are so probabilistically rare that exploration-based learning methods hardly observe and learn. Inspired by the human learning process, we propose a method to extract and exploit latent intents from demonstrations, and then learn diverse and robust grasping policies through self-exploration. The resulting policy ca…
▽ More
Humans excel in grasping objects through diverse and robust policies, many of which are so probabilistically rare that exploration-based learning methods hardly observe and learn. Inspired by the human learning process, we propose a method to extract and exploit latent intents from demonstrations, and then learn diverse and robust grasping policies through self-exploration. The resulting policy can grasp challenging objects in various environments with an off-the-shelf parallel gripper. The key component is a learned intention estimator, which maps gripper pose and visual sensory to a set of sub-intents covering important phases of the grasping movement. Sub-intents can be used to build an intrinsic reward to guide policy learning. The learned policy demonstrates remarkable zero-shot generalization from simulation to the real world while retaining its robustness against states that have never been encountered during training, novel objects such as protractors and user manuals, and environments such as the cluttered conveyor.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks
Authors:
Chao Zhao,
Shuai Yuan,
Chunli Jiang,
Junhao Cai,
Hongyu Yu,
Michael Yu Wang,
Qifeng Chen
Abstract:
This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. Coarse-resolution inference is formulated as sequence generation through a lar…
▽ More
This letter introduces ERRA, an embodied learning architecture that enables robots to jointly obtain three fundamental capabilities (reasoning, planning, and interaction) for solving long-horizon language-conditioned manipulation tasks. ERRA is based on tightly-coupled probabilistic inferences at two granularity levels. Coarse-resolution inference is formulated as sequence generation through a large language model, which infers action language from natural language instruction and environment state. The robot then zooms to the fine-resolution inference part to perform the concrete action corresponding to the action language. Fine-resolution inference is constructed as a Markov decision process, which takes action language and environmental sensing as observations and outputs the action. The results of action execution in environments provide feedback for subsequent coarse-resolution reasoning. Such coarse-to-fine inference allows the robot to decompose and achieve long-horizon tasks interactively. In extensive experiments, we show that ERRA can complete various long-horizon manipulation tasks specified by abstract language instructions. We also demonstrate successful generalization to the novel but similar natural language instructions.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
A Novel Graph-based Motion Planner of Multi-Mobile Robot Systems with Formation and Obstacle Constraints
Authors:
Wenhang Liu,
Jiawei Hu,
Heng Zhang,
Michael Yu Wang,
Zhenhua Xiong
Abstract:
Multi-mobile robot systems show great advantages over one single robot in many applications. However, the robots are required to form desired task-specified formations, making feasible motions decrease significantly. Thus, it is challenging to determine whether the robots can pass through an obstructed environment under formation constraints, especially in an obstacle-rich environment. Furthermore…
▽ More
Multi-mobile robot systems show great advantages over one single robot in many applications. However, the robots are required to form desired task-specified formations, making feasible motions decrease significantly. Thus, it is challenging to determine whether the robots can pass through an obstructed environment under formation constraints, especially in an obstacle-rich environment. Furthermore, is there an optimal path for the robots? To deal with the two problems, a novel graphbased motion planner is proposed in this paper. A mapping between workspace and configuration space of multi-mobile robot systems is first built, where valid configurations can be acquired to satisfy both formation constraints and collision avoidance. Then, an undirected graph is generated by verifying connectivity between valid configurations. The breadth-first search method is employed to answer the question of whether there is a feasible path on the graph. Finally, an optimal path will be planned on the updated graph, considering the cost of path length and formation preference. Simulation results show that the planner can be applied to get optimal motions of robots under formation constraints in obstacle-rich environments. Additionally, different constraints are considered.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Volumetric-based Contact Point Detection for 7-DoF Grasping
Authors:
Junhao Cai,
Jingcheng Su,
Zida Zhou,
Hui Cheng,
Qifeng Chen,
Michael Y Wang
Abstract:
In this paper, we propose a novel grasp pipeline based on contact point detection on the truncated signed distance function (TSDF) volume to achieve closed-loop 7-degree-of-freedom (7-DoF) grasping on cluttered environments. The key aspects of our method are that 1) the proposed pipeline exploits the TSDF volume in terms of multi-view fusion, contact-point sampling and evaluation, and collision ch…
▽ More
In this paper, we propose a novel grasp pipeline based on contact point detection on the truncated signed distance function (TSDF) volume to achieve closed-loop 7-degree-of-freedom (7-DoF) grasping on cluttered environments. The key aspects of our method are that 1) the proposed pipeline exploits the TSDF volume in terms of multi-view fusion, contact-point sampling and evaluation, and collision checking, which provides reliable and collision-free 7-DoF gripper poses with real-time performance; 2) the contact-based pose representation effectively eliminates the ambiguity introduced by the normal-based methods, which provides a more precise and flexible solution. Extensive simulated and real-robot experiments demonstrate that the proposed pipeline can select more antipodal and stable grasp poses and outperforms normal-based baselines in terms of the grasp success rate in both simulated and physical scenarios.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Open-world Semantic Segmentation for LIDAR Point Clouds
Authors:
Jun Cen,
Peng Yun,
Shiwei Zhang,
Junhao Cai,
Di Luan,
Michael Yu Wang,
Ming Liu,
Mingqian Tang
Abstract:
Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e.g., autonomous driving, since it is closed-set and static. The closed-set assumption makes the network only able to output labels of trained classes, even for objects never seen before, while a static network cannot update its knowledge base according to what it has seen. Therefore, in this work, w…
▽ More
Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e.g., autonomous driving, since it is closed-set and static. The closed-set assumption makes the network only able to output labels of trained classes, even for objects never seen before, while a static network cannot update its knowledge base according to what it has seen. Therefore, in this work, we propose the open-world semantic segmentation task for LIDAR point clouds, which aims to 1) identify both old and novel classes using open-set semantic segmentation, and 2) gradually incorporate novel objects into the existing knowledge base using incremental learning without forgetting old classes. For this purpose, we propose a REdundAncy cLassifier (REAL) framework to provide a general architecture for both the open-set semantic segmentation and incremental learning problems. The experimental results show that REAL can simultaneously achieves state-of-the-art performance in the open-set semantic segmentation task on the SemanticKITTI and nuScenes datasets, and alleviate the catastrophic forgetting problem with a large margin during incremental learning.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Viko 2.0: A Hierarchical Gecko-inspired Adhesive Gripper with Visuotactile Sensor
Authors:
Chohei Pang,
Qicheng Wang,
Kinwing Mak,
Hongyu Yu,
Michael Yu Wang
Abstract:
Robotic grippers with visuotactile sensors have access to rich tactile information for grasping tasks but encounter difficulty in partially encompassing large objects with sufficient grip force. While hierarchical gecko-inspired adhesives are a potential technique for bridging performance gaps, they require a large contact area for efficient usage. In this work, we present a new version of an adap…
▽ More
Robotic grippers with visuotactile sensors have access to rich tactile information for grasping tasks but encounter difficulty in partially encompassing large objects with sufficient grip force. While hierarchical gecko-inspired adhesives are a potential technique for bridging performance gaps, they require a large contact area for efficient usage. In this work, we present a new version of an adaptive gecko gripper called Viko 2.0 that effectively combines the advantage of adhesives and visuotactile sensors. Compared with a non-hierarchical structure, a hierarchical structure with a multimaterial design achieves approximately a 1.5 times increase in normal adhesion and double in contact area. The integrated visuotactile sensor captures a deformation image of the hierarchical structure and provides a real-time measurement of contact area, shear force, and incipient slip detection at 24 Hz. The gripper is implemented on a robotic arm to demonstrate an adaptive grasping pose based on contact area, and grasps objects with a wide range of geometries and textures.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
A Thin Format Vision-Based Tactile Sensor with A Micro Lens Array (MLA)
Authors:
Xia Chen,
Guanlan Zhang,
Michael Yu Wang,
Hongyu Yu
Abstract:
Vision-based tactile sensors have been widely studied in the robotics field for high spatial resolution and compatibility with machine learning algorithms. However, the currently employed sensor's imaging system is bulky limiting its further application. Here we present a micro lens array (MLA) based vison system to achieve a low thickness format of the sensor package with high tactile sensing per…
▽ More
Vision-based tactile sensors have been widely studied in the robotics field for high spatial resolution and compatibility with machine learning algorithms. However, the currently employed sensor's imaging system is bulky limiting its further application. Here we present a micro lens array (MLA) based vison system to achieve a low thickness format of the sensor package with high tactile sensing performance. Multiple micromachined micro lens units cover the whole elastic touching layer and provide a stitched clear tactile image, enabling high spatial resolution with a thin thickness of 5 mm. The thermal reflow and soft lithography method ensure the uniform spherical profile and smooth surface of micro lens. Both optical and mechanical characterization demonstrated the sensor's stable imaging and excellent tactile sensing, enabling precise 3D tactile information, such as displacement mapping and force distribution with an ultra compact-thin structure.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
DelTact: A Vision-based Tactile Sensor Using Dense Color Pattern
Authors:
Guanlan Zhang,
Yipai Du,
Hongyu Yu,
Michael Yu Wang
Abstract:
Tactile sensing is an essential perception for robots to complete dexterous tasks. As a promising tactile sensing technique, vision-based tactile sensors have been developed to improve robot performance in manipulation and grasping. Here we propose a new design of a vision-based tactile sensor, DelTact. The sensor uses a modular hardware architecture for compactness whilst maintaining a contact me…
▽ More
Tactile sensing is an essential perception for robots to complete dexterous tasks. As a promising tactile sensing technique, vision-based tactile sensors have been developed to improve robot performance in manipulation and grasping. Here we propose a new design of a vision-based tactile sensor, DelTact. The sensor uses a modular hardware architecture for compactness whilst maintaining a contact measurement of full resolution (798*586) and large area (675mm2). Moreover, it adopts an improved dense random color pattern based on the previous version to achieve high accuracy of contact deformation tracking. In particular, we optimize the color pattern generation process and select the appropriate pattern for coordinating with a dense optical flow algorithm under a real-world experimental sensory setting. The optical flow obtained from the raw image is processed to determine shape and force distribution on the contact surface. We also demonstrate the method to extract contact shape and force distribution from the raw images. Experimental results demonstrate that the sensor is capable of providing tactile measurements with low error and high frequency (40Hz).
△ Less
Submitted 31 May, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Open-set 3D Object Detection
Authors:
Jun Cen,
Peng Yun,
Junhao Cai,
Michael Yu Wang,
Ming Liu
Abstract:
3D object detection has been wildly studied in recent years, especially for robot perception systems. However, existing 3D object detection is under a closed-set condition, meaning that the network can only output boxes of trained classes. Unfortunately, this closed-set condition is not robust enough for practical use, as it will identify unknown objects as known by mistake. Therefore, in this pap…
▽ More
3D object detection has been wildly studied in recent years, especially for robot perception systems. However, existing 3D object detection is under a closed-set condition, meaning that the network can only output boxes of trained classes. Unfortunately, this closed-set condition is not robust enough for practical use, as it will identify unknown objects as known by mistake. Therefore, in this paper, we propose an open-set 3D object detector, which aims to (1) identify known objects, like the closed-set detection, and (2) identify unknown objects and give their accurate bounding boxes. Specifically, we divide the open-set 3D object detection problem into two steps: (1) finding out the regions containing the unknown objects with high probability and (2) enclosing the points of these regions with proper bounding boxes. The first step is solved by the finding that unknown objects are often classified as known objects with low confidence, and we show that the Euclidean distance sum based on metric learning is a better confidence score than the naive softmax probability to differentiate unknown objects from known objects. On this basis, unsupervised clustering is used to refine the bounding boxes of unknown objects. The proposed method combining metric learning and unsupervised clustering is called the MLUC network. Our experiments show that our MLUC network achieves state-of-the-art performance and can identify both known and unknown objects as expected.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Deep Metric Learning for Open World Semantic Segmentation
Authors:
Jun Cen,
Peng Yun,
Junhao Cai,
Michael Yu Wang,
Ming Liu
Abstract:
Classical close-set semantic segmentation networks have limited ability to detect out-of-distribution (OOD) objects, which is important for safety-critical applications such as autonomous driving. Incrementally learning these OOD objects with few annotations is an ideal way to enlarge the knowledge base of the deep learning models. In this paper, we propose an open world semantic segmentation syst…
▽ More
Classical close-set semantic segmentation networks have limited ability to detect out-of-distribution (OOD) objects, which is important for safety-critical applications such as autonomous driving. Incrementally learning these OOD objects with few annotations is an ideal way to enlarge the knowledge base of the deep learning models. In this paper, we propose an open world semantic segmentation system that includes two modules: (1) an open-set semantic segmentation module to detect both in-distribution and OOD objects. (2) an incremental few-shot learning module to gradually incorporate those OOD objects into its existing knowledge base. This open world semantic segmentation system behaves like a human being, which is able to identify OOD objects and gradually learn them with corresponding supervision. We adopt the Deep Metric Learning Network (DMLNet) with contrastive clustering to implement open-set semantic segmentation. Compared to other open-set semantic segmentation methods, our DMLNet achieves state-of-the-art performance on three challenging open-set semantic segmentation datasets without using additional data or generative models. On this basis, two incremental few-shot learning methods are further proposed to progressively improve the DMLNet with the annotations of OOD objects.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
Authors:
Weihao Yuan,
Rui Fan,
Michael Yu Wang,
Qifeng Chen
Abstract:
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic me…
▽ More
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at https://sites.google.com/view/multiscopic.
△ Less
Submitted 6 August, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Viko: An Adaptive Gecko Gripper with Vision-based Tactile Sensor
Authors:
Chohei Pang,
Kinwing Mak,
Yazhan Zhang,
Yang Yang,
Yu Alexander Tse,
Michael Yu Wang
Abstract:
Monitoring the state of contact is essential for robotic devices, especially grippers that implement gecko-inspired adhesives where intimate contact is crucial for a firm attachment. However, due to the lack of deformable sensors, few have demonstrated tactile sensing for gecko grippers. We present Viko, an adaptive gecko gripper that utilizes vision-based tactile sensors to monitor contact state.…
▽ More
Monitoring the state of contact is essential for robotic devices, especially grippers that implement gecko-inspired adhesives where intimate contact is crucial for a firm attachment. However, due to the lack of deformable sensors, few have demonstrated tactile sensing for gecko grippers. We present Viko, an adaptive gecko gripper that utilizes vision-based tactile sensors to monitor contact state. The sensor provides high-resolution real-time measurements of contact area and shear force. Moreover, the sensor is adaptive, low-cost, and compact. We integrated gecko-inspired adhesives into the sensor surface without impeding its adaptiveness and performance. Using a robotic arm, we evaluate the performance of the gripper by a series of grasping test. The gripper has a maximum payload of 8N even at a low fingertip pitch angle of 30 degrees. We also showcase the gripper's ability to adjust fingertip pose for better contact using sensor feedback. Further, everyday object picking is presented as a demonstration of the gripper's adaptiveness.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Stereo Matching by Self-supervision of Multiscopic Vision
Authors:
Weihao Yuan,
Yazhan Zhang,
Bingkun Wu,
Siyu Zhu,
Ping Tan,
Michael Yu Wang,
Qifeng Chen
Abstract:
Self-supervised learning for depth estimation possesses several advantages over supervised learning. The benefits of no need for ground-truth depth, online fine-tuning, and better generalization with unlimited data attract researchers to seek self-supervised solutions. In this work, we propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera…
▽ More
Self-supervised learning for depth estimation possesses several advantages over supervised learning. The benefits of no need for ground-truth depth, online fine-tuning, and better generalization with unlimited data attract researchers to seek self-supervised solutions. In this work, we propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions. A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network in learning disparity maps end-to-end without ground-truth depth information. To train this framework, we build a new multiscopic dataset consisting of synthetic images rendered by 3D engines and real images captured by real cameras. After being trained with only the synthetic images, our network can perform well in unseen outdoor scenes. Our experiment shows that our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset and is comparable to supervised methods when generalized to unseen data. Our source code and dataset are available at https://sites.google.com/view/multiscopic.
△ Less
Submitted 16 August, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
A Tactile Sensing Foot for Single Robot Leg Stabilization
Authors:
Guanlan Zhang,
Yipai Du,
Yazhan Zhan,
Michael Yu Wang
Abstract:
Tactile sensing on human feet is crucial for motion control, however, has not been explored in robotic counterparts. This work is dedicated to endowing tactile sensing to legged robot's feet and showing that a single-legged robot can be stabilized with only tactile sensing signals from its foot. We propose a robot leg with a novel vision-based tactile sensing foot system and implement a processing…
▽ More
Tactile sensing on human feet is crucial for motion control, however, has not been explored in robotic counterparts. This work is dedicated to endowing tactile sensing to legged robot's feet and showing that a single-legged robot can be stabilized with only tactile sensing signals from its foot. We propose a robot leg with a novel vision-based tactile sensing foot system and implement a processing algorithm to extract contact information for feedback control in stabilizing tasks. A pipeline to convert images of the foot skin into high-level contact information using a deep learning framework is presented. The leg was quantitatively evaluated in a stabilization task on a tilting surface to show that the tactile foot was able to estimate both the surface tilting angle and the foot poses. Feasibility and effectiveness of the tactile system were investigated qualitatively in comparison with conventional single-legged robotic systems using inertia measurement units (IMU). Experiments demonstrate the capability of vision-based tactile sensors in assisting legged robots to maintain stability on unknown terrains and the potential for regulating more complex motions for humanoid robots.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Learning to Predict Vehicle Trajectories with Model-based Planning
Authors:
Haoran Song,
Di Luan,
Wenchao Ding,
Michael Yu Wang,
Qifeng Chen
Abstract:
Predicting the future trajectories of on-road vehicles is critical for autonomous driving. In this paper, we introduce a novel prediction framework called PRIME, which stands for Prediction with Model-based Planning. Unlike recent prediction works that utilize neural networks to model scene context and produce unconstrained trajectories, PRIME is designed to generate accurate and feasibility-guara…
▽ More
Predicting the future trajectories of on-road vehicles is critical for autonomous driving. In this paper, we introduce a novel prediction framework called PRIME, which stands for Prediction with Model-based Planning. Unlike recent prediction works that utilize neural networks to model scene context and produce unconstrained trajectories, PRIME is designed to generate accurate and feasibility-guaranteed future trajectory predictions. PRIME guarantees the trajectory feasibility by exploiting a model-based generator to produce future trajectories under explicit constraints and enables accurate multimodal prediction by utilizing a learning-based evaluator to select future trajectories. We conduct experiments on the large-scale Argoverse Motion Forecasting Benchmark, where PRIME outperforms the state-of-the-art methods in prediction accuracy, feasibility, and robustness under imperfect tracking.
△ Less
Submitted 20 October, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Origami-based Shape Morphing Fingertip to Enhance Grasping Stability and Dexterity
Authors:
Zicheng Kan,
Yazhan Zhang,
Chohei Pang,
Michael Yu Wang
Abstract:
Adaptation to various scene configurations and object properties, stability and dexterity in robotic grasping manipulation is far from explored. This work presents an origami-based shape morphing fingertip design to actively tackle the grasping stability and dexterity problems. The proposed fingertip utilizes origami as its skeleton providing degrees of freedom at desired positions and motor-drive…
▽ More
Adaptation to various scene configurations and object properties, stability and dexterity in robotic grasping manipulation is far from explored. This work presents an origami-based shape morphing fingertip design to actively tackle the grasping stability and dexterity problems. The proposed fingertip utilizes origami as its skeleton providing degrees of freedom at desired positions and motor-driven four-bar-linkages as its transmission components to achieve a compact size of the fingertip. 3 morphing types that are commonly observed and essential in robotic grasping are studied and validated with geometrical modeling. Experiments including grasping an object with convex point contact to pivot or do pinch grasping, grasped object reorientation, and enveloping grasping with concave fingertip surfaces are implemented to demonstrate the advantages of our fingertip compared to conventional parallel grippers. Multi-functionality on enhancing grasping stability and dexterity via active adaptation given different grasped objects and manipulation tasks are justified. Video is available at youtu.be/jJoJ3xnDdVk/.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Vacuum Driven Auxetic Switching Structure and Its Application on a Gripper and Quadruped
Authors:
Shuai Liu,
Sheeraz Athar,
Michael Yu Wang
Abstract:
The properties and applications of auxetics have been widely explored in the past years. Through proper utilization of auxetic structures, designs with unprecedented mechanical and structural behaviors can be produced. Taking advantage of this, we present the development of novel and lowcost 3D structures inspired by a simple auxetic unit. The core part, which we call the body in this paper, is a…
▽ More
The properties and applications of auxetics have been widely explored in the past years. Through proper utilization of auxetic structures, designs with unprecedented mechanical and structural behaviors can be produced. Taking advantage of this, we present the development of novel and lowcost 3D structures inspired by a simple auxetic unit. The core part, which we call the body in this paper, is a 3D realization of 2D rotating squares. This body structure was formed by joining four similar structures through softer material at the vertices. A monolithic structure of this kind is accomplished through a custom-built multi-material 3D printer. The model works in a way that, when torque is applied along the face of the rotational squares, they tend to bend at the vertex of the softer material, and due to the connected-ness of the design, a proper opening and closing motion is achieved. To demonstrate the potential of this part as an important component for robots, two applications are presented: a soft gripper and a crawling robot. Vacuum-driven actuators move both the applications. The proposed gripper combines the benefits of two types of grippers whose fingers are placed parallel and equally spaced to each other, in a single design. This gripper is adaptable to the size of the object and can grasp objects with large and small cross-sections alike. A novel bending actuator, which is made of soft material and bends in curvature when vacuumed, provides the grasping nature of the gripper. Crawling robots, in addition to their versatile nature, provide a better interaction with humans. The designed crawling robot employs negative pressure-driven actuators to highlight linear and turning locomotion.
△ Less
Submitted 28 August, 2020;
originally announced August 2020.
-
Self-supervised Object Tracking with Cycle-consistent Siamese Networks
Authors:
Weihao Yuan,
Michael Yu Wang,
Qifeng Chen
Abstract:
Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training. In this work, we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking. Self-supervision can be performed by taking advantage of the cycle consistency in t…
▽ More
Self-supervised learning for visual object tracking possesses valuable advantages compared to supervised learning, such as the non-necessity of laborious human annotations and online training. In this work, we exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking. Self-supervision can be performed by taking advantage of the cycle consistency in the forward and backward tracking. To better leverage the end-to-end learning of deep networks, we propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame. The experiments on the VOT dataset for visual object tracking and on the DAVIS dataset for video object segmentation propagation show that our method outperforms prior approaches on both tasks.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
A Flexible Connector for Soft Modular Robots Based on Micropatterned Intersurface Jamming
Authors:
Yu Alexander Tse,
Shuai Liu,
Yang Yang,
Michael Yu Wang
Abstract:
Soft modular robots enable more flexibility and safer interaction with the changing environment than traditional robots. However, it has remained challenging to create deformable connectors that can be integrated into soft machines. In this work, we propose a flexible connector for soft modular robots based on micropatterned intersurface jamming. The connector is composed of micropatterned dry adh…
▽ More
Soft modular robots enable more flexibility and safer interaction with the changing environment than traditional robots. However, it has remained challenging to create deformable connectors that can be integrated into soft machines. In this work, we propose a flexible connector for soft modular robots based on micropatterned intersurface jamming. The connector is composed of micropatterned dry adhesives made by silicone rubber and a flexible main body with inflatable chambers for active engagement and disengagement. Through connection force tests, we evaluate the characteristics of the connector both in the linear direction and under rotational disruptions. The connector can stably support an average maximum load of 22 N (83 times the connector's body weight) linearly and 10.86 N under planar rotation. The proposed connector demonstrates the potential to create a robust connection between soft modular robots without raising the system's overall stiffness; thus guarantees high flexibility of the robotic system.
△ Less
Submitted 10 April, 2020;
originally announced April 2020.
-
PiP: Planning-informed Trajectory Prediction for Autonomous Driving
Authors:
Haoran Song,
Wenchao Ding,
Yuxuan Chen,
Shaojie Shen,
Michael Yu Wang,
Qifeng Chen
Abstract:
It is critical to predict the motion of surrounding vehicles for self-driving planning, especially in a socially compliant and flexible way. However, future prediction is challenging due to the interaction and uncertainty in driving behaviors. We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting. Our approach is differentiated from th…
▽ More
It is critical to predict the motion of surrounding vehicles for self-driving planning, especially in a socially compliant and flexible way. However, future prediction is challenging due to the interaction and uncertainty in driving behaviors. We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting. Our approach is differentiated from the traditional manner of prediction, which is only based on historical information and decoupled with planning. By informing the prediction process with the planning of ego vehicle, our method achieves the state-of-the-art performance of multi-agent forecasting on highway datasets. Moreover, our approach enables a novel pipeline which couples the prediction and planning, by conditioning PiP on multiple candidate trajectories of the ego vehicle, which is highly beneficial for autonomous driving in interactive scenarios.
△ Less
Submitted 18 January, 2021; v1 submitted 25 March, 2020;
originally announced March 2020.
-
Active Perception with A Monocular Camera for Multiscopic Vision
Authors:
Weihao Yuan,
Rui Fan,
Michael Yu Wang,
Qifeng Chen
Abstract:
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation for robotic applications. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system actively controls a robot arm with a mounted camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax.…
▽ More
We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation for robotic applications. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system actively controls a robot arm with a mounted camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we combine the cost volumes for stereo matching between the reference image and the surrounding images to form a fused cost volume that is robust to outliers. Experiments on the Middlebury dataset and real robot experiments show that our obtained disparity maps are more accurate than two-frame stereo matching: the average absolute error is reduced by 50.2% in our experiments.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Multi-Object Rearrangement with Monte Carlo Tree Search:A Case Study on Planar Nonprehensile Sorting
Authors:
Haoran Song,
Joshua A. Haustein,
Weihao Yuan,
Kaiyu Hang,
Michael Yu Wang,
Danica Kragic,
Johannes A. Stork
Abstract:
In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated and real-wo…
▽ More
In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated and real-world sorting tasks. We observe that the algorithm is capable to reliably sort large numbers of convex and non-convex objects, as well as convex objects in the presence of immovable obstacles.
△ Less
Submitted 18 January, 2021; v1 submitted 15 December, 2019;
originally announced December 2019.
-
Towards Learning to Detect and Predict Contact Events on Vision-based Tactile Sensors
Authors:
Yazhan Zhang,
Weihao Yuan,
Zicheng Kan,
Michael Yu Wang
Abstract:
In essence, successful grasp boils down to correct responses to multiple contact events between fingertips and objects. In most scenarios, tactile sensing is adequate to distinguish contact events. Due to the nature of high dimensionality of tactile information, classifying spatiotemporal tactile signals using conventional model-based methods is difficult. In this work, we propose to predict and c…
▽ More
In essence, successful grasp boils down to correct responses to multiple contact events between fingertips and objects. In most scenarios, tactile sensing is adequate to distinguish contact events. Due to the nature of high dimensionality of tactile information, classifying spatiotemporal tactile signals using conventional model-based methods is difficult. In this work, we propose to predict and classify tactile signal using deep learning methods, seeking to enhance the adaptability of the robotic grasp system to external event changes that may lead to grasping failure. We develop a deep learning framework and collect 6650 tactile image sequences with a vision-based tactile sensor, and the neural network is integrated into a contact-event-based robotic grasping system. In grasping experiments, we achieved 52% increase in terms of object lifting success rate with contact detection, significantly higher robustness under unexpected loads with slip prediction compared with open-loop grasps, demonstrating that integration of the proposed framework into robotic grasping system substantially improves picking success rate and capability to withstand external disturbances.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
Effective Estimation of Contact Force and Torque for Vision-based Tactile Sensor with Helmholtz-Hodge Decomposition
Authors:
Yazhan Zhang,
Zicheng Kan,
Yang Yang,
Alexander Yu Tse,
Michael Yu Wang
Abstract:
Retrieving rich contact information from robotic tactile sensing has been a challenging, yet significant task for the effective perception of object properties that the robot interacts with. This work is dedicated to developing an algorithm to estimate contact force and torque for vision-based tactile sensors. We first introduce the observation of the contact deformation patterns of hyperelastic m…
▽ More
Retrieving rich contact information from robotic tactile sensing has been a challenging, yet significant task for the effective perception of object properties that the robot interacts with. This work is dedicated to developing an algorithm to estimate contact force and torque for vision-based tactile sensors. We first introduce the observation of the contact deformation patterns of hyperelastic materials under ideal single-axial loads in simulation. Then based on the observation, we propose a method of estimating surface forces and torque from the contact deformation vector field with the Helmholtz-Hodge Decomposition (HHD) algorithm. Extensive experiments of calibration and baseline comparison are followed to verify the effectiveness of the proposed method in terms of prediction error and variance. The proposed algorithm is further integrated into a contact force visualization module as well as a closed-loop adaptive grasp force control framework and is shown to be useful in both visualization of contact stability and minimum force grasping task.
△ Less
Submitted 22 June, 2019;
originally announced June 2019.
-
FingerVision Tactile Sensor Design and Slip Detection Using Convolutional LSTM Network
Authors:
Yazhan Zhang,
Zicheng Kan,
Yu Alexander Tse,
Yang Yang,
Michael Yu Wang
Abstract:
Tactile sensing is essential to the human perception system, so as to robot. In this paper, we develop a novel optical-based tactile sensor "FingerVision" with effective signal processing algorithms. This sensor is composed of soft skin with embedded marker array bonded to rigid frame, and a web camera with a fisheye lens. While being excited with contact force, the camera tracks the movements of…
▽ More
Tactile sensing is essential to the human perception system, so as to robot. In this paper, we develop a novel optical-based tactile sensor "FingerVision" with effective signal processing algorithms. This sensor is composed of soft skin with embedded marker array bonded to rigid frame, and a web camera with a fisheye lens. While being excited with contact force, the camera tracks the movements of markers and deformation field is obtained. Compared to existing tactile sensors, our sensor features compact footprint, high resolution, and ease of fabrication. Besides, utilizing the deformation field estimation, we propose a slip classification framework based on convolution Long Short Term Memory (convolutional LSTM) networks. The data collection process takes advantage of the human sense of slip, during which human hand holds 12 daily objects, interacts with sensor skin and labels data with a slip or non-slip identity based on human feeling of slip. Our slip classification framework performs high accuracy of 97.62% on the test dataset. It is expected to be capable of enhancing the stability of robot grasping significantly, leading to better contact force control, finer object interaction and more active sensing manipulation.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.
-
Reinforcement Learning in Topology-based Representation for Human Body Movement with Whole Arm Manipulation
Authors:
Weihao Yuan,
Kaiyu Hang,
Haoran Song,
Danica Kragic,
Michael Y. Wang,
Johannes A. Stork
Abstract:
Moving a human body or a large and bulky object can require the strength of whole arm manipulation (WAM). This type of manipulation places the load on the robot's arms and relies on global properties of the interaction to succeed---rather than local contacts such as grasping or non-prehensile pushing. In this paper, we learn to generate motions that enable WAM for holding and transporting of human…
▽ More
Moving a human body or a large and bulky object can require the strength of whole arm manipulation (WAM). This type of manipulation places the load on the robot's arms and relies on global properties of the interaction to succeed---rather than local contacts such as grasping or non-prehensile pushing. In this paper, we learn to generate motions that enable WAM for holding and transporting of humans in certain rescue or patient care scenarios. We model the task as a reinforcement learning problem in order to provide a behavior that can directly respond to external perturbation and human motion. For this, we represent global properties of the robot-human interaction with topology-based coordinates that are computed from arm and torso positions. These coordinates also allow transferring the learned policy to other body shapes and sizes. For training and evaluation, we simulate a dynamic sea rescue scenario and show in quantitative experiments that the policy can solve unseen scenarios with differently-shaped humans, floating humans, or with perception noise. Our qualitative experiments show the subsequent transporting after holding is achieved and we demonstrate that the policy can be directly transferred to a real world setting.
△ Less
Submitted 12 September, 2018;
originally announced September 2018.
-
Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning
Authors:
Weihao Yuan,
Johannes A. Stork,
Danica Kragic,
Michael Y. Wang,
Kaiyu Hang
Abstract:
Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertain…
▽ More
Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.