-
ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model
Authors:
Tengbo Yu,
Guanxing Lu,
Zaijia Yang,
Haoyuan Deng,
Season Si Chen,
Jiwen Lu,
Wenbo Ding,
Guoqiang Hu,
Yansong Tang,
Ziwei Wang
Abstract:
Multi-task robotic bimanual manipulation is becoming increasingly popular as it enables sophisticated tasks that require diverse dual-arm collaboration patterns. Compared to unimanual manipulation, bimanual tasks pose challenges to understanding the multi-body spatiotemporal dynamics. An existing method ManiGaussian pioneers encoding the spatiotemporal dynamics into the visual representation via G…
▽ More
Multi-task robotic bimanual manipulation is becoming increasingly popular as it enables sophisticated tasks that require diverse dual-arm collaboration patterns. Compared to unimanual manipulation, bimanual tasks pose challenges to understanding the multi-body spatiotemporal dynamics. An existing method ManiGaussian pioneers encoding the spatiotemporal dynamics into the visual representation via Gaussian world model for single-arm settings, which ignores the interaction of multiple embodiments for dual-arm systems with significant performance drop. In this paper, we propose ManiGaussian++, an extension of ManiGaussian framework that improves multi-task bimanual manipulation by digesting multi-body scene dynamics through a hierarchical Gaussian world model. To be specific, we first generate task-oriented Gaussian Splatting from intermediate visual features, which aims to differentiate acting and stabilizing arms for multi-body spatiotemporal dynamics modeling. We then build a hierarchical Gaussian world model with the leader-follower architecture, where the multi-body spatiotemporal dynamics is mined for intermediate visual representation via future scene prediction. The leader predicts Gaussian Splatting deformation caused by motions of the stabilizing arm, through which the follower generates the physical consequences resulted from the movement of the acting arm. As a result, our method significantly outperforms the current state-of-the-art bimanual manipulation techniques by an improvement of 20.2% in 10 simulated tasks, and achieves 60% success rate on average in 9 challenging real-world tasks. Our code is available at https://github.com/April-Yz/ManiGaussian_Bimanual.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Authors:
Junying Wang,
Jingyuan Liu,
Xin Sun,
Krishna Kumar Singh,
Zhixin Shu,
He Zhang,
Jimei Yang,
Nanxuan Zhao,
Tuanfeng Y. Wang,
Simon S. Chen,
Ulrich Neumann,
Jae Shin Yoon
Abstract:
This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model is extremely challenging due to the lack of dataset, restricting existing image-based relighting models to a specific scenario (e.g., face or static human). To ad…
▽ More
This paper introduces Comprehensive Relighting, the first all-in-one approach that can both control and harmonize the lighting from an image or video of humans with arbitrary body parts from any scene. Building such a generalizable model is extremely challenging due to the lack of dataset, restricting existing image-based relighting models to a specific scenario (e.g., face or static human). To address this challenge, we repurpose a pre-trained diffusion model as a general image prior and jointly model the human relighting and background harmonization in the coarse-to-fine framework. To further enhance the temporal coherence of the relighting, we introduce an unsupervised temporal lighting model that learns the lighting cycle consistency from many real-world videos without any ground truth. In inference time, our temporal lighting module is combined with the diffusion models through the spatio-temporal feature blending algorithms without extra training; and we apply a new guided refinement as a post-processing to preserve the high-frequency details from the input image. In the experiments, Comprehensive Relighting shows a strong generalizability and lighting temporal coherence, outperforming existing image-based human relighting and harmonization methods.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Authors:
Guanxing Lu,
Tengbo Yu,
Haoyuan Deng,
Season Si Chen,
Yansong Tang,
Ziwei Wang
Abstract:
Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has r…
▽ More
Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has recently demonstrated impressive generalizability across a wide range of tasks because of scaled model parameters and training data, which can provide sharable manipulation knowledge for bimanual systems. To this end, we propose a plug-and-play method named AnyBimanual, which transfers pre-trained unimanual policy to general bimanual manipulation policy with few bimanual demonstrations. Specifically, we first introduce a skill manager to dynamically schedule the skill representations discovered from pre-trained unimanual policy for bimanual manipulation tasks, which linearly combines skill primitives with task-oriented compensation to represent the bimanual manipulation instruction. To mitigate the observation discrepancy between unimanual and bimanual systems, we present a visual aligner to generate soft masks for visual embedding of the workspace, which aims to align visual input of unimanual policy model for each arm with those during pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from RLBench2 with a sizable 12.67% improvement in success rate over previous methods. Experiments on 9 real-world tasks further verify its practicality with an average success rate of 84.62%.
△ Less
Submitted 26 March, 2025; v1 submitted 9 December, 2024;
originally announced December 2024.
-
A Structured Framework for Predicting Sustainable Aviation Fuel Properties using Liquid-Phase FTIR and Machine Learning
Authors:
Ana E. Comesana,
Sharon S. Chen,
Kyle E. Niemeyer,
Vi H. Rapp
Abstract:
Sustainable aviation fuels have the potential for reducing emissions and environmental impact. To help identify viable sustainable aviation fuels and accelerate research, several machine learning models have been developed to predict relevant physiochemical properties. However, many of the models have limited applicability, leverage data from complex analytical techniques with confined spectral ra…
▽ More
Sustainable aviation fuels have the potential for reducing emissions and environmental impact. To help identify viable sustainable aviation fuels and accelerate research, several machine learning models have been developed to predict relevant physiochemical properties. However, many of the models have limited applicability, leverage data from complex analytical techniques with confined spectral ranges, or use feature decomposition methods that have limited interpretability. Using liquid-phase Fourier Transform Infrared (FTIR) spectra, this study presents a structured method for creating accurate and interpretable property prediction models for neat molecules, aviation fuels, and blends. Liquid-phase FTIR spectra measurements can be collected quickly and consistently, offering high reliability, sensitivity, and component specificity using less than 2 mL of sample. The method first decomposes FTIR spectra into fundamental building blocks using Non-negative Matrix Factorization (NMF) to enable scientific analysis of FTIR spectra attributes and fuel properties. The NMF features are then used to create five ensemble models for predicting final boiling point, flash point, freezing point, density at 15C, and kinematic viscosity at -20C. All models were trained using experimental property data from neat molecules, aviation fuels, and blends. The models accurately predict properties while enabling interpretation of relationships between compositional elements of a fuel, such as functional groups or chemical classes, and its properties. To support sustainable aviation fuel research and development, the models and data are available on an interactive web tool.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Authors:
Hongjian Zhou,
Fenglin Liu,
Boyang Gu,
Xinyu Zou,
Jinfa Huang,
Jinge Wu,
Yiru Li,
Sam S. Chen,
Peilin Zhou,
Junling Liu,
Yining Hua,
Chengfeng Mao,
Chenyu You,
Xian Wu,
Yefeng Zheng,
Lei Clifton,
Zheng Li,
Jiebo Luo,
David A. Clifton
Abstract:
Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their…
▽ More
Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their development, practical applications, and outcomes in medicine, remains scarce. Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face. In terms of development, we provide a detailed introduction to the principles of existing medical LLMs, including their basic model structures, number of parameters, and sources and scales of data used for model development. It serves as a guide for practitioners in developing medical LLMs tailored to their specific needs. In terms of deployment, we offer a comparison of the performance of different LLMs across various medical tasks, and further compare them with state-of-the-art lightweight models, aiming to provide an understanding of the advantages and limitations of LLMs in medicine. Overall, in this review, we address the following questions: 1) What are the practices for developing medical LLMs 2) How to measure the medical task performance of LLMs in a medical setting? 3) How have medical LLMs been employed in real-world practice? 4) What challenges arise from the use of medical LLMs? and 5) How to more effectively develop and deploy medical LLMs? By answering these questions, this review aims to provide insights into the opportunities for LLMs in medicine and serve as a practical resource. We also maintain a regularly updated list of practical guides on medical LLMs at https://github.com/AI-in-Health/MedLLMsPracticalGuide
△ Less
Submitted 22 July, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models
Authors:
John Giorgi,
Augustin Toma,
Ronald Xie,
Sondra S. Chen,
Kevin R. An,
Grace X. Zheng,
Bo Wang
Abstract:
This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by…
▽ More
This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and first, respectively, of all submissions to the shared task. Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes, making it a promising path toward automated note generation from doctor-patient conversations.
△ Less
Submitted 3 June, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Study of improving nano-contouring performance by employing cross-coupling controller
Authors:
Wen Yuh Jywe,
Shih Shin Chen,
Hung-Shu Wang,
Chien Hung Liu,
Hsin Hung Jwo,
Yun Feng Teng,
Tung Hsien Hsieh
Abstract:
For the tracking stage path planning, we design a two-axis cross-coupling control system which uses the PI controller to compensate the contour error between axes. In this paper, the stage adoptive is designed by our laboratory (Precision Machine Center of National Formosa University). The cross-coupling controller calculates the actuating signal of each axis by combining multi-axes position err…
▽ More
For the tracking stage path planning, we design a two-axis cross-coupling control system which uses the PI controller to compensate the contour error between axes. In this paper, the stage adoptive is designed by our laboratory (Precision Machine Center of National Formosa University). The cross-coupling controller calculates the actuating signal of each axis by combining multi-axes position error. Hence, the cross-coupling controller improves the stage tracking ability and decreases the contour error. The experiments show excellent stage motion. This finding confirms that the proposed method is a powerful and efficient tool for improving stage tracking ability. Also found were the stages tracking to minimize contour error of two types circular to approximately 25nm.
△ Less
Submitted 30 April, 2008;
originally announced April 2008.