Skip to main content

Showing 1–14 of 14 results for author: Heng, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.02166  [pdf, other

    cs.RO

    CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

    Authors: Xiaoqi Li, Lingyun Xu, Mingxu Zhang, Jiaming Liu, Yan Shen, Iaroslav Ponomarenko, Jiahui Xu, Liang Heng, Siyuan Huang, Shanghang Zhang, Hao Dong

    Abstract: In robotic, task goals can be conveyed through various modalities, such as language, goal images, and goal videos. However, natural language can be ambiguous, while images or videos may offer overly detailed specifications. To tackle these challenges, we introduce CrayonRobo that leverages comprehensive multi-modal prompts that explicitly convey both low-level actions and high-level planning in a… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  2. arXiv:2505.01809  [pdf, other

    cs.CV

    3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment

    Authors: Xiaoqi Li, Jiaming Liu, Nuowei Han, Liang Heng, Yandong Guo, Hao Dong, Yang Liu

    Abstract: The 3D weakly-supervised visual grounding task aims to localize oriented 3D boxes in point clouds based on natural language descriptions without requiring annotations to guide model learning. This setting presents two primary challenges: category-level ambiguity and instance-level complexity. Category-level ambiguity arises from representing objects of fine-grained categories in a highly sparse po… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: ICRA 2025

  3. arXiv:2503.20384  [pdf, other

    cs.RO cs.AI

    MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation

    Authors: Rongyu Zhang, Menghang Dong, Yuan Zhang, Liang Heng, Xiaowei Chi, Gaole Dai, Li Du, Yuan Du, Shanghang Zhang

    Abstract: Multimodal Large Language Models (MLLMs) excel in understanding complex language and visual data, enabling generalist robotic systems to interpret instructions and perform embodied tasks. Nevertheless, their real-world deployment is hindered by substantial computational and storage demands. Recent insights into the homogeneous patterns in the LLM layer have inspired sparsification techniques to ad… ▽ More

    Submitted 14 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  4. arXiv:2405.17418  [pdf, other

    cs.CV

    A Self-Correcting Vision-Language-Action Model for Fast and Slow System Manipulation

    Authors: Chenxuan Li, Jiaming Liu, Guanqun Wang, Xiaoqi Li, Sixiang Chen, Liang Heng, Chuyan Xiong, Jiaxin Ge, Renrui Zhang, Kaichen Zhou, Shanghang Zhang

    Abstract: Recently, some studies have integrated Multimodal Large Language Models into robotic manipulation, constructing vision-language-action models (VLAs) to interpret multimodal information and predict SE(3) poses. While VLAs have shown promising progress, they may suffer from failures when faced with novel and complex tasks. To emulate human-like reasoning for more robust manipulation, we propose the… ▽ More

    Submitted 18 March, 2025; v1 submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2306.17717  [pdf, other

    cs.GR eess.IV

    Content-Preserving Diffusion Model for Unsupervised AS-OCT image Despeckling

    Authors: Li Sanqian, Higashita Risa, Fu Huazhu, Li Heng, Niu Jingxuan, Liu Jiang

    Abstract: Anterior segment optical coherence tomography (AS-OCT) is a non-invasive imaging technique that is highly valuable for ophthalmic diagnosis. However, speckles in AS-OCT images can often degrade the image quality and affect clinical analysis. As a result, removing speckles in AS-OCT images can greatly benefit automatic ophthalmology analysis. Unfortunately, challenges still exist in deploying effec… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  6. arXiv:2201.02437  [pdf, other

    cs.RO

    Continuous-time Radar-inertial Odometry for Automotive Radars

    Authors: Yin Zhi Ng, Benjamin Choi, Robby Tan, Lionel Heng

    Abstract: We present an approach for radar-inertial odometry which uses a continuous-time framework to fuse measurements from multiple automotive radars and an inertial measurement unit (IMU). Adverse weather conditions do not have a significant impact on the operating performance of radar sensors unlike that of camera and LiDAR sensors. Radar's robustness in such conditions and the increasing prevalence of… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  7. arXiv:2112.01840  [pdf, other

    cs.RO cs.AI cs.CV

    Graph-Guided Deformation for Point Cloud Completion

    Authors: Jieqi Shi, Lingyun Xu, Liang Heng, Shaojie Shen

    Abstract: For a long time, the point cloud completion task has been regarded as a pure generation task. After obtaining the global shape code through the encoder, a complete point cloud is generated using the shape priorly learnt by the networks. However, such models are undesirably biased towards prior average objects and inherently limited to fit geometry details. In this paper, we propose a Graph-Guided… ▽ More

    Submitted 11 November, 2021; originally announced December 2021.

    Comments: RAL with IROS 2021

  8. arXiv:2102.11872  [pdf, other

    cs.LG cs.AI

    Clustering Aware Classification for Risk Prediction and Subtyping in Clinical Data

    Authors: Shivin Srivastava, Siddharth Bhatia, Lingxiao Huang, Lim Jun Heng, Kenji Kawaguchi, Vaibhav Rajan

    Abstract: In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification either 1) are classifier-specific and not generic, or 2) independently perform clustering and classifier training, which may not form clusters that can potentially benefit class… ▽ More

    Submitted 3 January, 2023; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: 19 Pages, 5 figures

  9. arXiv:1909.13701  [pdf, other

    cs.CV

    Nighttime Stereo Depth Estimation using Joint Translation-Stereo Learning: Light Effects and Uninformative Regions

    Authors: Aashish Sharma, Lionel Heng, Loong-Fah Cheong, Robby T. Tan

    Abstract: Nighttime stereo depth estimation is still challenging, as assumptions associated with daytime lighting conditions do not hold any longer. Nighttime is not only about low-light and dense noise, but also about glow/glare, flares, non-uniform distribution of light, etc. One of the possible solutions is to train a network on night stereo images in a fully supervised manner. However, to obtain proper… ▽ More

    Submitted 8 October, 2020; v1 submitted 30 September, 2019; originally announced September 2019.

    Comments: Accepted to 3DV 2020 (Oral)

  10. arXiv:1810.08611  [pdf, other

    cs.SD cs.LG eess.AS

    A database linking piano and orchestral MIDI scores with application to automatic projective orchestration

    Authors: Léopold Crestel, Philippe Esling, Lena Heng, Stephen McAdams

    Abstract: This article introduces the Projective Orchestral Database (POD), a collection of MIDI scores composed of pairs linking piano scores to their corresponding orchestrations. To the best of our knowledge, this is the first database of its kind, which performs piano or orchestral prediction, but more importantly which tries to learn the correlations between piano and orchestral scores. Hence, we also… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

  11. arXiv:1809.06132  [pdf, other

    cs.RO

    Real-Time Dense Mapping for Self-driving Vehicles using Fisheye Cameras

    Authors: Zhaopeng Cui, Lionel Heng, Ye Chuan Yeo, Andreas Geiger, Marc Pollefeys, Torsten Sattler

    Abstract: We present a real-time dense geometric mapping algorithm for large-scale environments. Unlike existing methods which use pinhole cameras, our implementation is based on fisheye cameras which have larger field of view and benefit some other tasks including Visual-Inertial Odometry, localization and object detection around vehicles. Our algorithm runs on in-vehicle PCs at 15 Hz approximately, enabli… ▽ More

    Submitted 18 April, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

    Comments: 7 pages, 10 figures

  12. arXiv:1809.05477  [pdf, other

    cs.RO

    Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System

    Authors: Lionel Heng, Benjamin Choi, Zhaopeng Cui, Marcel Geppert, Sixing Hu, Benson Kuan, Peidong Liu, Rang Nguyen, Ye Chuan Yeo, Andreas Geiger, Gim Hee Lee, Marc Pollefeys, Torsten Sattler

    Abstract: Project AutoVision aims to develop localization and 3D scene perception capabilities for a self-driving vehicle. Such capabilities will enable autonomous navigation in urban and rural environments, in day and night, and with cameras as the only exteroceptive sensors. The sensor suite employs many cameras for both 360-degree coverage and accurate multi-view stereo; the use of low-cost cameras keeps… ▽ More

    Submitted 4 March, 2019; v1 submitted 14 September, 2018; originally announced September 2018.

    Journal ref: 2019 IEEE International Conference on Robotics and Automation (ICRA)

  13. arXiv:1708.09839  [pdf, other

    cs.CV

    3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection

    Authors: Christian Häne, Lionel Heng, Gim Hee Lee, Friedrich Fraundorfer, Paul Furgale, Torsten Sattler, Marc Pollefeys

    Abstract: Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avo… ▽ More

    Submitted 31 August, 2017; originally announced August 2017.

  14. arXiv:1305.7272  [pdf, other

    cs.NI cs.MA

    Accuracy of Range-Based Cooperative Localization in Wireless Sensor Networks: A Lower Bound Analysis

    Authors: Liang Heng, Grace Xingxin Gao

    Abstract: Accurate location information is essential for many wireless sensor network (WSN) applications. A location-aware WSN generally includes two types of nodes: sensors whose locations to be determined and anchors whose locations are known a priori. For range-based localization, sensors' locations are deduced from anchor-to-sensor and sensor-to-sensor range measurements. Localization accuracy depends o… ▽ More

    Submitted 14 March, 2014; v1 submitted 30 May, 2013; originally announced May 2013.

    Comments: 11 pages, 6 figures, 1 table

    ACM Class: C.2.1