Skip to main content

Showing 1–9 of 9 results for author: Bué, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.20332  [pdf, ps, other

    cs.AI

    Mobile-R1: Towards Interactive Reinforcement Learning for VLM-Based Mobile Agent via Task-Level Rewards

    Authors: Jihao Gu, Qihang Ai, Yingyao Wang, Pi Bu, Jingxuan Xing, Zekun Zhu, Wei Jiang, Ziming Wang, Yingxiu Zhao, Ming-Liang Zhang, Jun Song, Yuning Jiang, Bo Zheng

    Abstract: Vision-language model-based mobile agents have gained the ability to not only understand complex instructions and mobile screenshots, but also optimize their action outputs via thinking and reasoning, benefiting from reinforcement learning, such as Group Relative Policy Optimization (GRPO). However, existing research centers on offline reinforcement learning training or online optimization using a… ▽ More

    Submitted 27 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: 14 pages, 12 figures

  2. arXiv:2503.09527  [pdf, other

    cs.CV cs.AI

    CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

    Authors: Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, Yingxiu Zhao, Qi Zhu, Jun Song, Siran Yang, Jiamang Wang, Bo Zheng

    Abstract: Recent advances in Vision-Language-Action models (VLAs) have expanded the capabilities of embodied intelligence. However, significant challenges remain in real-time decision-making in complex 3D environments, which demand second-level responses, high-resolution perception, and tactical reasoning under dynamic conditions. To advance the field, we introduce CombatVLA, an efficient VLA model optimize… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  3. arXiv:2502.11718  [pdf, ps, other

    cs.CL cs.CV

    "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

    Authors: Jihao Gu, Yingyao Wang, Pi Bu, Chen Wang, Ziming Wang, Tengtao Song, Donglai Wei, Jiale Yuan, Yingxiu Zhao, Yancheng He, Shilong Li, Jiaheng Liu, Meng Cao, Jun Song, Yingshui Tan, Xiang Li, Wenbo Su, Zhicheng Zheng, Xiaoyong Zhu, Bo Zheng

    Abstract: The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major t… ▽ More

    Submitted 30 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 26 pages, 21 figures

  4. arXiv:2412.18424  [pdf, other

    cs.AI cs.CL

    LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating

    Authors: Chao Deng, Jiale Yuan, Pi Bu, Peijie Wang, Zhong-Zhi Li, Jian Xu, Xiao-Hui Li, Yuan Gao, Jun Song, Bo Zheng, Cheng-Lin Liu

    Abstract: Large vision language models (LVLMs) have improved the document understanding capabilities remarkably, enabling the handling of complex document elements, longer contexts, and a wider range of tasks. However, existing document understanding benchmarks have been limited to handling only a small number of pages and fail to provide a comprehensive analysis of layout elements locating. In this paper,… ▽ More

    Submitted 27 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

  5. arXiv:2412.14487  [pdf, other

    cs.CV

    Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation

    Authors: Jihao Gu, Yingyao Wang, Meng Cao, Pi Bu, Jun Song, Yancheng He, Shilong Li, Bo Zheng

    Abstract: Direct Preference Optimization (DPO) has been demonstrated to be highly effective in mitigating hallucinations in Large Vision Language Models (LVLMs) by aligning their outputs more closely with human preferences. Despite the recent progress, existing methods suffer from two drawbacks: 1) Lack of scalable token-level rewards; and 2) Neglect of visual-anchored tokens. To this end, we propose a nove… ▽ More

    Submitted 23 February, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

  6. arXiv:2409.12889  [pdf, other

    cs.AI

    Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case

    Authors: Peng Chen, Pi Bu, Jun Song, Yuan Gao, Bo Zheng

    Abstract: Recently, large language model (LLM)-based agents have made significant advances across various fields. One of the most popular research areas involves applying these agents to video games. Traditionally, these methods have relied on game APIs to access in-game environmental and action data. However, this approach is limited by the availability of APIs and does not reflect how humans play games. W… ▽ More

    Submitted 22 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

  7. arXiv:2112.13047  [pdf, other

    cs.CV cs.AI cs.LG

    Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

    Authors: Jiaxing Yan, Hong Zhao, Penghui Bu, YuSheng Jin

    Abstract: Self-supervised learning has shown very promising results for monocular depth estimation. Scene structure and local details both are significant clues for high-quality depth estimation. Recent works suffer from the lack of explicit modeling of scene structure and proper handling of details information, which leads to a performance bottleneck and blurry artefacts in predicted results. In this paper… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

  8. arXiv:1903.10601  [pdf, other

    cs.CV cs.LG

    Unifying Unsupervised Domain Adaptation and Zero-Shot Visual Recognition

    Authors: Qian Wang, Penghui Bu, Toby P. Breckon

    Abstract: Unsupervised domain adaptation aims to transfer knowledge from a source domain to a target domain so that the target domain data can be recognized without any explicit labelling information for this domain. One limitation of the problem setting is that testing data, despite having no labels, from the target domain is needed during training, which prevents the trained model being directly applied t… ▽ More

    Submitted 26 August, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

    Comments: International Joint Conference on Neural Networks 2019, Budapest

  9. arXiv:1004.1262  [pdf, ps, other

    cs.LO cs.SC cs.SE

    Syntactic Abstraction of B Models to Generate Tests

    Authors: Jacques Julliand, Nicolas Stouls, Pierre-Christophe Bué, Pierre-Alain Masson

    Abstract: In a model-based testing approach as well as for the verification of properties, B models provide an interesting solution. However, for industrial applications, the size of their state space often makes them hard to handle. To reduce the amount of states, an abstraction function can be used, often combining state variable elimination and domain abstractions of the remaining variables. This paper c… ▽ More

    Submitted 31 May, 2010; v1 submitted 8 April, 2010; originally announced April 2010.

    Comments: Tests and Proofs 2010, Malaga : Spain (2010)