Skip to main content

Showing 1–22 of 22 results for author: Sui, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10600  [pdf, ps, other

    cs.RO cs.CV

    EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

    Authors: Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, Zhizhong Su

    Abstract: Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D comp… ▽ More

    Submitted 16 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  2. arXiv:2504.09927  [pdf, ps, other

    cs.RO

    Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization

    Authors: Haiyong Yu, Yanqiong Jin, Yonghao He, Wei Sui

    Abstract: Imitation learning, particularly Diffusion Policies based methods, has recently gained significant traction in embodied AI as a powerful approach to action policy generation. These models efficiently generate action policies by learning to predict noise. However, conventional Diffusion Policy methods rely on iterative denoising, leading to inefficient inference and slow response times, which hinde… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025 Workshop on 2nd MEIS

  3. arXiv:2503.14247  [pdf

    cs.RO cs.AI

    GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial Fusion SLAM for Dynamic Legged Robotics

    Authors: Tingyang Xiao, Xiaolin Zhou, Liu Liu, Wei Sui, Wei Feng, Jiaxiong Qiu, Xinjie Wang, Zhizhong Su

    Abstract: This paper presents GeoFlow-SLAM, a robust and effective Tightly-Coupled RGBD-inertial SLAM for legged robots operating in highly dynamic environments.By integrating geometric consistency, legged odometry constraints, and dual-stream optical flow (GeoFlow), our method addresses three critical challenges:feature matching and pose initialization failures during fast locomotion and visual feature sca… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 8 pages

  4. arXiv:2502.14616  [pdf, other

    cs.CV

    Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

    Authors: Jiangyuan Liu, Hongxuan Ma, Yuxin Guo, Yuhao Zhao, Chi Zhang, Wei Sui, Wei Zou

    Abstract: Transparent object perception is indispensable for numerous robotic tasks. However, accurately segmenting and estimating the depth of transparent objects remain challenging due to complex optical properties. Existing methods primarily delve into only one task using extra inputs or specialized sensors, neglecting the valuable interactions among tasks and the subsequent refinement process, leading t… ▽ More

    Submitted 3 March, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted by ICRA(2025). The code is accessible through: https://github.com/L-J-Yuan/MODEST

  5. arXiv:2412.14680  [pdf, other

    cs.CV cs.AI cs.RO

    A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space

    Authors: Yonghao He, Hu Su, Haiyong Yu, Cong Yang, Wei Sui, Cong Wang, Song Liu

    Abstract: Open-set object detection (OSOD) is highly desirable for robotic manipulation in unstructured environments. However, existing OSOD methods often fail to meet the requirements of robotic applications due to their high computational burden and complex deployment. To address this issue, this paper proposes a light-weight framework called Decoupled OSOD (DOSOD), which is a practical and highly efficie… ▽ More

    Submitted 25 December, 2024; v1 submitted 19 December, 2024; originally announced December 2024.

  6. arXiv:2411.07762  [pdf, other

    cs.LG cs.AI

    ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

    Authors: Weibo Zhao, Yubin Shi, Xinyu Lyu, Wanchen Sui, Shen Li, Yong Li

    Abstract: Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into… ▽ More

    Submitted 11 December, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: Accepted at AAAI 2025

  7. arXiv:2407.21331  [pdf, other

    cs.CV

    CAMAv2: A Vision-Centric Approach for Static Map Element Annotation

    Authors: Shiyuan Chen, Jiaxin Zhang, Ruohong Mei, Yingfeng Cai, Haoran Yin, Tao Chen, Wei Sui, Cong Yang

    Abstract: The recent development of online static map element (a.k.a. HD map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. For instance, the manual labelled (low efficiency) nuScenes still contains misalignment and inconsistency between the HD… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.11754

  8. arXiv:2405.13571  [pdf, other

    cs.CV

    Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation

    Authors: Wenbo Sui, Daniel Lichau, Josselin Lefèvre, Harold Phelippeau

    Abstract: Recent studies of multimodal industrial anomaly detection (IAD) based on 3D point clouds and RGB images have highlighted the importance of exploiting the redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multimodal IAD in practical production lines remains a work in progress. It is essential to consider the trade-offs between the costs… ▽ More

    Submitted 23 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  9. arXiv:2403.15026  [pdf, other

    cs.CV

    VRSO: Visual-Centric Reconstruction for Static Object Annotation

    Authors: Chenyao Yu, Yingfeng Cai, Jiaxin Zhang, Hui Kong, Wei Sui, Cong Yang

    Abstract: As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labelling over the dense LiDAR point clouds and reference images.… ▽ More

    Submitted 29 August, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted at 2024 IEEE International Conference on Intelligent Robots and Systems (IROS)

  10. arXiv:2402.06854  [pdf, other

    cs.CV cs.GR cs.LG

    Gyroscope-Assisted Motion Deblurring Network

    Authors: Simin Luan, Cong Yang, Zeyd Boukhers, Xue Qin, Dongfeng Cheng, Wei Sui, Zhijun Li

    Abstract: Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic a… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  11. arXiv:2309.11754  [pdf, other

    cs.CV

    A Vision-Centric Approach for Static Map Element Annotation

    Authors: Jiaxin Zhang, Shiyuan Chen, Haoran Yin, Ruohong Mei, Xuan Liu, Cong Yang, Qian Zhang, Wei Sui

    Abstract: The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs… ▽ More

    Submitted 16 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at 2024 IEEE International Conference on Robotics and Automation (ICRA)

  12. arXiv:2306.11368  [pdf, other

    cs.CV

    RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation

    Authors: Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng, Cong Yang

    Abstract: In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational ef… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Published in: IEEE Transactions on Intelligent Vehicles

  13. arXiv:2306.07467  [pdf, other

    cs.IT

    ELF Codes: Concatenated Codes with an Expurgating Linear Function as the Outer Code

    Authors: Richard Wesel, Amaael Antonini, Linfang Wang, Wenhui Sui, Brendan Towell, Holden Grissett

    Abstract: An expurgating linear function (ELF) is a linear outer code that disallows the low-weight codewords of the inner code. ELFs can be designed either to maximize the minimum distance or to minimize the codeword error rate (CER) of the expurgated code. A list-decoding sieve of the inner code starting from the noiseless all-zeros codeword is an efficient way to identify ELFs that maximize the minimum d… ▽ More

    Submitted 1 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: 6 arXiv pages (actual ISTC paper is 5 pages with more compressed spacing), 6 figures, accepted to the 2023 International Symposium on Techniques in Coding. Latest version is Camera-Ready version for ISTC edited for clarity and to reflect reviewer suggestions and references were added

  14. arXiv:2304.09807  [pdf, other

    cs.CV

    VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

    Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

    Abstract: High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geo… ▽ More

    Submitted 27 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: https://github.com/hustvl/VMA

  15. Towards Accurate Ground Plane Normal Estimation from Ego-Motion

    Authors: Jiaxin Zhang, Wei Sui, Qian Zhang, Tao Chen, Cong Yang

    Abstract: In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of va… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Journal ref: Sensors 2022, 22(23), 9375;

  16. arXiv:2212.04064  [pdf, ps, other

    cs.IT

    CRC-Aided High-Rate Convolutional Codes With Short Blocklengths for List Decoding

    Authors: Wenhui Sui, Brendan Towell, Ava Asmani, Hengjie Yang, Holden Grissett, Richard D. Wesel

    Abstract: Recently, rate-1/n zero-terminated (ZT) and tail-biting (TB) convolutional codes (CCs) with cyclic redundancy check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRC polynomials for rate- (n-1)/n ZT and TB CCs with short blocklengths. This paper considers both standard rate-(n-1)/n CC polynomials and rat… ▽ More

    Submitted 9 October, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2111.07929

  17. arXiv:2112.08635  [pdf, other

    cs.CV

    Road-aware Monocular Structure from Motion and Homography Estimation

    Authors: Wei Sui, Teng Chen, Jiaxin Zhang, Jiao Lu, Qian Zhang

    Abstract: Structure from motion (SFM) and ground plane homography estimation are critical to autonomous driving and other robotics applications. Recently, much progress has been made in using deep neural networks for SFM and homography estimation respectively. However, directly applying existing methods for ground plane homography estimation may fail because the road is often a small part of the scene. Besi… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 10 pages

  18. Monocular Road Planar Parallax Estimation

    Authors: Haobo Yuan, Teng Chen, Wei Sui, Jiafeng Xie, Lefei Zhang, Yuan Li, Qian Zhang

    Abstract: Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following ex… ▽ More

    Submitted 9 July, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE TIP

  19. arXiv:2111.07929  [pdf, other

    cs.IT

    High-Rate Convolutional Codes with CRC-Aided List Decoding for Short Blocklengths

    Authors: Wenhui Sui, Hengjie Yang, Brendan Towell, Ava Asmani, Richard D. Wesel

    Abstract: Recently, rate-$1/ω$ zero-terminated and tail-biting convolutional codes (ZTCCs and TBCCs) with cyclic-redundancy-check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRCs for rate-$(ω-1)/ω$ CCs with short blocklengths, considering both the ZT and TB cases. The CRC design seeks to optimize the frame error… ▽ More

    Submitted 7 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 6 pages; submitted to 2022 IEEE International Conference on Communications (ICC 2022)

  20. Deep Online Correction for Monocular Visual Odometry

    Authors: Jiaxin Zhang, Wei Sui, Xinggang Wang, Wenming Meng, Hongmei Zhu, Qian Zhang

    Abstract: In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-supervised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inferenc… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted at 2021 IEEE International Conference on Robotics and Automation (ICRA)

  21. arXiv:1811.08611  [pdf, other

    cs.CV cs.LG

    A Novel Integrated Framework for Learning both Text Detection and Recognition

    Authors: Wanchen Sui, Qing Zhang, Jun Yang, Wei Chu

    Abstract: In this paper, we propose a novel integrated framework for learning both text detection and recognition. For most of the existing methods, detection and recognition are treated as two isolated tasks and trained separately, since parameters of detection and recognition models are different and two models target to optimize their own loss functions during individual training processes. In contrast t… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

  22. arXiv:1602.04502  [pdf, other

    cs.CV

    Do We Need Binary Features for 3D Reconstruction?

    Authors: Bin Fan, Qingqun Kong, Wei Sui, Zhiheng Wang, Xinchao Wang, Shiming Xiang, Chunhong Pan, Pascal Fua

    Abstract: Binary features have been incrementally popular in the past few years due to their low memory footprints and the efficient computation of Hamming distance between binary descriptors. They have been shown with promising results on some real time applications, e.g., SLAM, where the matching operations are relative few. However, in computer vision, there are many applications such as 3D reconstructio… ▽ More

    Submitted 14 February, 2016; originally announced February 2016.