Skip to main content

Showing 1–19 of 19 results for author: Bu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.06111  [pdf, ps, other

    cs.RO cs.AI cs.LG

    UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

    Authors: Qingwen Bu, Yanting Yang, Jisong Cai, Shenyuan Gao, Guanghui Ren, Maoqing Yao, Ping Luo, Hongyang Li

    Abstract: A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a… ▽ More

    Submitted 15 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted to RSS 2025. Code is available at https://github.com/OpenDriveLab/UniVLA

  2. arXiv:2503.06669  [pdf, other

    cs.RO cs.CV cs.LG

    AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

    Authors: AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang , et al. (27 additional authors not shown)

    Abstract: We explore how scalable robot data can address real-world challenges for generalized robotic manipulation. Introducing AgiBot World, a large-scale platform comprising over 1 million trajectories across 217 tasks in five deployment scenarios, we achieve an order-of-magnitude increase in data scale compared to existing datasets. Accelerated by a standardized collection pipeline with human-in-the-loo… ▽ More

    Submitted 30 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

    Comments: Project website: https://agibot-world.com/. Github repo: https://github.com/OpenDriveLab/AgiBot-World. The author list is ordered alphabetically by surname, with detailed contributions provided in the appendix

  3. arXiv:2412.18738  [pdf, other

    cs.CV

    HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation

    Authors: Xiao Zhang, Shaoxuan Wu, Peilin Zhang, Zhuo Jin, Xiaosong Xiong, Qirong Bu, Jingkun Chen, Jun Feng

    Abstract: Creating fully annotated labels for medical image segmentation is prohibitively time-intensive and costly, emphasizing the necessity for innovative approaches that minimize reliance on detailed annotations. Scribble annotations offer a more cost-effective alternative, significantly reducing the expenses associated with full annotations. However, scribble annotations offer limited and imprecise inf… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  4. arXiv:2410.08001  [pdf, other

    cs.RO cs.AI

    Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

    Authors: Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao

    Abstract: The increasing demand for versatile robotic systems to operate in diverse and dynamic environments has emphasized the importance of a generalist policy, which leverages a large cross-embodiment data corpus to facilitate broad adaptability and high-level reasoning. However, the generalist would struggle with inefficient inference and cost-expensive training. The specialist policy, instead, is curat… ▽ More

    Submitted 6 February, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: Project page: https://opendrivelab.com/RoboDual/

  5. arXiv:2409.09016  [pdf, other

    cs.RO

    Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

    Authors: Qingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, Hongyang Li

    Abstract: Despite significant progress in robotics and embodied AI in recent years, deploying robots for long-horizon tasks remains a great challenge. Majority of prior arts adhere to an open-loop philosophy and lack real-time feedback, leading to error accumulation and undesirable robustness. A handful of approaches have endeavored to establish feedback mechanisms leveraging pixel-level differences or pre-… ▽ More

    Submitted 16 October, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted at NeurIPS 2024. Code and models: https://github.com/OpenDriveLab/CLOVER

  6. arXiv:2406.00439  [pdf, other

    cs.RO cs.CV

    Learning Manipulation by Predicting Interaction

    Authors: Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

    Abstract: Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024. Project page: https://github.com/OpenDriveLab/MPI

  7. arXiv:2403.04593  [pdf, other

    cs.CV

    Embodied Understanding of Driving Scenarios

    Authors: Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li

    Abstract: Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 43 pages, 16 figures

  8. arXiv:2312.13010  [pdf, other

    cs.CL

    AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

    Authors: Dong Huang, Jie M. Zhang, Michael Luck, Qingwen Bu, Yuhao Qing, Heming Cui

    Abstract: The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case gen… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 24 pages, 12 figures

  9. arXiv:2309.14345  [pdf, other

    cs.SE cs.AI

    Bias Testing and Mitigation in LLM-based Code Generation

    Authors: Dong Huang, Jie M. Zhang, Qingwen Bu, Xiaofei Xie, Junjie Chen, Heming Cui

    Abstract: As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social bias and unfairness, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models but are underexplored in the literature.… ▽ More

    Submitted 21 March, 2025; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted by TOSEM

  10. arXiv:2308.10531  [pdf, other

    cs.CV

    SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression

    Authors: Qingwen Bu, Sungrae Park, Minsoo Khang, Yichuan Cheng

    Abstract: Existing techniques for text detection can be broadly classified into two primary groups: segmentation-based and regression-based methods. Segmentation models offer enhanced robustness to font variations but require intricate post-processing, leading to high computational overhead. Regression-based methods undertake instance-aware prediction but face limitations in robustness and data efficiency d… ▽ More

    Submitted 24 December, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Title changed. Accepted to AAAI'24

  11. arXiv:2308.08784  [pdf, other

    cs.SE cs.AI

    CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation

    Authors: Dong Huang, Qingwen Bu, Yuhao Qing, Heming Cui

    Abstract: Chain-of-thought (CoT) has emerged as a groundbreaking tool in NLP, notably for its efficacy in complex reasoning tasks, such as mathematical proofs. However, its application in code generation faces a distinct challenge, i.e., although the code generated with CoT reasoning is logically correct, it faces the problem of syntax error (e.g., invalid syntax error report) during code execution, which c… ▽ More

    Submitted 22 February, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Title changed

  12. arXiv:2307.11565  [pdf, other

    cs.LG cs.SE

    Adversarial Feature Map Pruning for Backdoor

    Authors: Dong Huang, Qingwen Bu

    Abstract: Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attacks, which are achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subseq… ▽ More

    Submitted 23 February, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: Accepted to ICLR 2024

  13. arXiv:2307.11563  [pdf, other

    cs.SE cs.AI

    Feature Map Testing for Deep Neural Networks

    Authors: Dong Huang, Qingwen Bu, Yahao Qing, Yichao Fu, Heming Cui

    Abstract: Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and repor… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2307.11011

  14. arXiv:2307.11011  [pdf, other

    cs.LG cs.SE

    Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

    Authors: Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Bocheng Xiao, Heming Cui

    Abstract: Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label t… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  15. arXiv:2307.09763  [pdf, other

    cs.CV cs.AI

    Towards Building More Robust Models with Frequency Bias

    Authors: Qingwen Bu, Dong Huang, Heming Cui

    Abstract: The vulnerability of deep neural networks to adversarial samples has been a major impediment to their broad applications, despite their success in various fields. Recently, some works suggested that adversarially-trained models emphasize the importance of low-frequency information to achieve higher robustness. While several attempts have been made to leverage this frequency characteristic, they ha… ▽ More

    Submitted 27 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV23

  16. arXiv:2208.08083  [pdf, other

    cs.CV

    Two Heads are Better than One: Robust Learning Meets Multi-branch Models

    Authors: Dong Huang, Qingwen Bu, Yuhao Qing, Haowen Pi, Sen Wang, Heming Cui

    Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples, in which DNNs are misled to false outputs due to inputs containing imperceptible perturbations. Adversarial training, a reliable and effective method of defense, may significantly reduce the vulnerability of neural networks and becomes the de facto standard for robust learning. While many recent works practice the data-centric phi… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 10 pages, 5 Figures

  17. arXiv:2105.13987  [pdf, other

    eess.SP cs.LG

    ScalingNet: extracting features from raw EEG data for emotion recognition

    Authors: Jingzhao Hu, Chen Wang, Qiaomei Jia, Qirong Bu, Jun Feng

    Abstract: Convolutional Neural Networks(CNNs) has achieved remarkable performance breakthrough in a variety of tasks. Recently, CNNs based methods that are fed with hand-extracted EEG features gradually produce a powerful performance on the EEG data based emotion recognition task. In this paper, we propose a novel convolutional layer allowing to adaptively extract effective data-driven spectrogram-like feat… ▽ More

    Submitted 7 February, 2021; originally announced May 2021.

  18. arXiv:2102.02588  [pdf, other

    cs.LG

    Lookup subnet based Spatial Graph Convolutional neural Network

    Authors: Jingzhao Hu, Xiaoqi Zhang, Qiaomei Jia, Chen Wang, Qirong Bu, Jun Feng

    Abstract: Convolutional Neural Networks(CNNs) has achieved remarkable performance breakthrough in Euclidean structure data. Recently, aggregation-transformation based Graph Neural networks(GNNs) gradually produce a powerful performance on non-Euclidean data. In this paper, we propose a cross-correlation based graph convolution method allowing to naturally generalize CNNs to non-Euclidean domains and inherit… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

  19. arXiv:2002.03152  [pdf, other

    cs.CV

    CTM: Collaborative Temporal Modeling for Action Recognition

    Authors: Qian Liu, Tao Wang, Jie Liu, Yang Guan, Qi Bu, Longfei Yang

    Abstract: With the rapid development of digital multimedia, video understanding has become an important field. For action recognition, temporal dimension plays an important role, and this is quite different from image recognition. In order to learn powerful feature of videos, we propose a Collaborative Temporal Modeling (CTM) block (Figure 1) to learn temporal information for action recognition. Besides a p… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.