Skip to main content

Showing 1–50 of 810 results for author: Xu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08744  [pdf, other

    cs.AI

    DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

    Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang , et al. (6 additional authors not shown)

    Abstract: To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures

  2. arXiv:2505.08459  [pdf, ps, other

    cs.AI

    Strategy-Augmented Planning for Large Language Models via Opponent Exploitation

    Authors: Shuai Xu, Sijia Cui, Yanna Wang, Bo Xu, Qi Wang

    Abstract: Efficiently modeling and exploiting opponents is a long-standing challenge in adversarial domains. Large Language Models (LLMs) trained on extensive textual data have recently demonstrated outstanding performance in general tasks, introducing new research directions for opponent modeling. Some studies primarily focus on directly using LLMs to generate decisions based on the elaborate prompt contex… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted to IJCNN 2025

  3. arXiv:2505.08402  [pdf, ps, other

    cs.CL

    TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers

    Authors: Aiyao He, Sijia Cui, Shuai Xu, Yanna Wang, Bo Xu

    Abstract: Recently, large language models(LLMs) have played an increasingly important role in solving a wide range of NLP tasks, leveraging their capabilities of natural language understanding and generating. Integration with external tools further enhances LLMs' effectiveness, providing more precise, timely, and specialized responses. However, LLMs still encounter difficulties with non-executable actions a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted to ICONIP 2024

  4. arXiv:2505.07313  [pdf, ps, other

    cs.CL cs.AI

    Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study

    Authors: Baixuan Xu, Chunyang Li, Weiqi Wang, Wei Fan, Tianshi Zheng, Haochen Shi, Tao Fan, Yangqiu Song, Qiang Yang

    Abstract: Designing effective collaboration structure for multi-agent LLM systems to enhance collective reasoning is crucial yet remains under-explored. In this paper, we systematically investigate how collaborative reasoning performance is affected by three key design dimensions: (1) Expertise-Domain Alignment, (2) Collaboration Paradigm (structured workflow vs. diversity-driven integration), and (3) Syste… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 18 pages

  5. arXiv:2505.07180  [pdf, ps, other

    cs.LG stat.ML

    Causal View of Time Series Imputation: Some Identification Results on Missing Mechanism

    Authors: Ruichu Cai, Kaitao Zheng, Junxian Huang, Zijian Li, Zhengming Chen, Boyan Xu, Zhifeng Hao

    Abstract: Time series imputation is one of the most challenge problems and has broad applications in various fields like health care and the Internet of Things. Existing methods mainly aim to model the temporally latent dependencies and the generation process from the observed time series data. In real-world scenarios, different types of missing mechanisms, like MAR (Missing At Random), and MNAR (Missing No… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  6. arXiv:2505.06883  [pdf, other

    cs.RO cs.AI cs.LG

    FACET: Force-Adaptive Control via Impedance Reference Tracking for Legged Robots

    Authors: Botian Xu, Haoyang Weng, Qingzhou Lu, Yang Gao, Huazhe Xu

    Abstract: Reinforcement learning (RL) has made significant strides in legged robot control, enabling locomotion across diverse terrains and complex loco-manipulation capabilities. However, the commonly used position or velocity tracking-based objectives are agnostic to forces experienced by the robot, leading to stiff and potentially dangerous behaviors and poor control during forceful interactions. To addr… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  7. InfoNCE is a Free Lunch for Semantically guided Graph Contrastive Learning

    Authors: Zixu Wang, Bingbing Xu, Yige Yuan, Huawei Shen, Xueqi Cheng

    Abstract: As an important graph pre-training method, Graph Contrastive Learning (GCL) continues to play a crucial role in the ongoing surge of research on graph foundation models or LLM as enhancer for graphs. Traditional GCL optimizes InfoNCE by using augmentations to define self-supervised tasks, treating augmented pairs as positive samples and others as negative. However, this leads to semantically simil… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures, Accepted by SIGIR2025

  8. arXiv:2505.03281  [pdf, other

    cs.LG cs.AI

    Physics-inspired Energy Transition Neural Network for Sequence Learning

    Authors: Zhou Wu, Junyi An, Baile Xu, Furao Shen, Jian Zhao

    Abstract: Recently, the superior performance of Transformers has made them a more robust and scalable solution for sequence modeling than traditional recurrent neural networks (RNNs). However, the effectiveness of Transformer in capturing long-term dependencies is primarily attributed to their comprehensive pair-modeling process rather than inherent inductive biases toward sequence semantics. In this study,… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  9. arXiv:2505.03184  [pdf, other

    cs.CV

    Interactive Instance Annotation with Siamese Networks

    Authors: Xiang Xu, Ruotong Li, Mengjun Yi, Baile XU, Furao Shen, Jian Zhao

    Abstract: Annotating instance masks is time-consuming and labor-intensive. A promising solution is to predict contours using a deep learning model and then allow users to refine them. However, most existing methods focus on in-domain scenarios, limiting their effectiveness for cross-domain annotation tasks. In this paper, we propose SiamAnno, a framework inspired by the use of Siamese networks in object tra… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  10. arXiv:2505.02078  [pdf, ps, other

    cs.CL cs.AI

    LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning

    Authors: Joy Lim Jia Yin, Daniel Zhang-Li, Jifan Yu, Haoxuan Li, Shangqing Tu, Yuanchun Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li, Bin Xu

    Abstract: Evaluating the quality of slide-based multimedia instruction is challenging. Existing methods like manual assessment, reference-based metrics, and large language model evaluators face limitations in scalability, context capture, or bias. In this paper, we introduce LecEval, an automated metric grounded in Mayer's Cognitive Theory of Multimedia Learning, to evaluate multimodal knowledge acquisition… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 6 pages, 3 figures

  11. arXiv:2504.17787  [pdf, other

    cs.CV

    The Fourth Monocular Depth Estimation Challenge

    Authors: Anton Obukhov, Matteo Poggi, Fabio Tosi, Ripudaman Singh Arora, Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden, Shuaihang Wang, Zhenxin Ma, Weijie Chen, Baobei Xu, Fengyu Sun, Di Xie, Jiang Zhu, Mykola Lavreniuk, Haining Guan, Qun Wu, Yupei Zeng, Chao Lu, Huanran Wang, Guangyuan Zhou, Haotian Zhang, Jianxiong Wang, Qiang Rao , et al. (32 additional authors not shown)

    Abstract: This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and aff… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: To appear in CVPRW2025

  12. RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network

    Authors: Boyue Xu, Yi Xu, Ruichao Hou, Jia Bei, Tongwei Ren, Gangshan Wu

    Abstract: The integration of dual-modal features has been pivotal in advancing RGB-Depth (RGB-D) tracking. However, current trackers are less efficient and focus solely on single-level features, resulting in weaker robustness in fusion and slower speeds that fail to meet the demands of real-world applications. In this paper, we introduce a novel network, denoted as HMAD (Hierarchical Modality Aggregation an… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  13. arXiv:2504.16922  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light

    Authors: Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi, Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu, Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, Humphrey Shi

    Abstract: Many sparse attention mechanisms such as Neighborhood Attention have typically failed to consistently deliver speedup over the self attention baseline. This is largely due to the level of complexity in attention infrastructure, and the rapid evolution of AI hardware architecture. At the same time, many state-of-the-art foundational models, particularly in computer vision, are heavily bound by atte… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: https://github.com/SHI-Labs/NATTEN/

  14. arXiv:2504.16877  [pdf, other

    cs.SE

    Context-Enhanced Vulnerability Detection Based on Large Language Model

    Authors: Yixin Yang, Bowen Xu, Xiang Gao, Hailong Sun

    Abstract: Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods leveraging deep learning and large language models (LLMs) have garnered increasing attention. However, existing approaches often focus on analyzing individual files or… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  15. arXiv:2504.16665  [pdf, other

    cs.CV

    A Diff-Attention Aware State Space Fusion Model for Remote Sensing Classification

    Authors: Wenping Ma, Boyou Xue, Mengru Ma, Chuang Chen, Hekai Zhang, Hao Zhu

    Abstract: Multispectral (MS) and panchromatic (PAN) images describe the same land surface, so these images not only have their own advantages, but also have a lot of similar information. In order to separate these similar information and their respective advantages, reduce the feature redundancy in the fusion stage. This paper introduces a diff-attention aware state space fusion model (DAS2F-Model) for mult… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 12 pages,9 figures

  16. RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory

    Authors: Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu

    Abstract: The RGB-Depth (RGB-D) Video Object Segmentation (VOS) aims to integrate the fine-grained texture information of RGB with the spatial geometric clues of depth modality, boosting the performance of segmentation. However, off-the-shelf RGB-D segmentation methods fail to fully explore cross-modal information and suffer from object drift during long-term prediction. In this paper, we propose a novel RG… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  17. arXiv:2504.15784  [pdf, other

    cs.CL cs.AI

    Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach

    Authors: Ruizhe Li, Chiwei Zhu, Benfeng Xu, Xiaorui Wang, Zhendong Mao

    Abstract: Creative writing is a key capability of Large Language Models (LLMs), with potential applications in literature, storytelling, and various creative domains. However, evaluating the creativity of machine-generated texts remains a significant challenge, as existing methods either rely on costly manual annotations or fail to align closely with human assessments. In this paper, we propose an effective… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  18. arXiv:2504.15270  [pdf, other

    cs.CV cs.CL

    An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

    Authors: Ji Qi, Yuan Yao, Yushi Bai, Bin Xu, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua

    Abstract: Large Multimodal Models (LMMs) uniformly perceive video frames, creating computational inefficiency for videos with inherently varying temporal information density. This paper present \textbf{Quicksviewer}, an LMM with new perceiving paradigm that partitions a video of nonuniform density into varying cubes using Gumbel Softmax, followed by a unified resampling for each cube to achieve efficient vi… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  19. arXiv:2504.13074  [pdf, other

    cs.CV

    SkyReels-V2: Infinite-length Film Generative Model

    Authors: Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, Weiming Xiong, Wei Wang, Nuo Pang, Kang Kang, Zhiheng Xu, Yuzhe Jin, Yupeng Liang, Yubing Song, Peng Zhao, Boyuan Xu, Di Qiu, Debang Li, Zhengcong Fei, Yang Li, Yahui Zhou

    Abstract: Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming fro… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: 31 pages,10 figures

  20. arXiv:2504.10044  [pdf, other

    cs.CV

    Aligning Anime Video Generation with Human Feedback

    Authors: Bingwen Zhu, Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Yidi Wu, Huyang Sun, Zuxuan Wu

    Abstract: Anime video generation faces significant challenges due to the scarcity of anime data and unusual motion patterns, leading to issues such as motion distortion and flickering artifacts, which result in misalignment with human preferences. Existing reward models, designed primarily for real-world videos, fail to capture the unique appearance and consistency requirements of anime. In this work, we pr… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figures, 7 tables

  21. arXiv:2504.09138  [pdf, other

    cs.IT

    White-Box AI Model: Next Frontier of Wireless Communications

    Authors: Jiayao Yang, Jiayi Zhang, Bokai Xu, Jiakang Zheng, Zhilong Liu, Ziheng Liu, Dusit Niyato, Mérouane Debbah, Zhu Han, Bo Ai

    Abstract: White-box AI (WAI), or explainable AI (XAI) model, a novel tool to achieve the reasoning behind decisions and predictions made by the AI algorithms, makes it more understandable and transparent. It offers a new approach to address key challenges of interpretability and mathematical validation in traditional black-box models. In this paper, WAI-aided wireless communication systems are proposed and… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  22. arXiv:2504.07754  [pdf, other

    cs.CL

    Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation

    Authors: Bo Zhang, Hui Ma, Dailin Li, Jian Ding, Jian Wang, Bo Xu, HongFei Lin

    Abstract: Large language models (LLMs) demonstrate remarkable text comprehension and generation capabilities but often lack the ability to utilize up-to-date or domain-specific knowledge not included in their training data. To address this gap, we introduce KEDiT, an efficient method for fine-tuning LLMs for knowledge-grounded dialogue generation. KEDiT operates in two main phases: first, it employs an info… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted at TACL; pre-MIT Press publication version. Code and data are available at https://github.com/zhangbo-nlp/KEDiT

  23. arXiv:2504.06835  [pdf, other

    cs.CV

    LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding

    Authors: Ziyi Wang, Haoran Wu, Yiming Rong, Deyang Jiang, Yixin Zhang, Yunlong Zhao, Shuang Xu, Bo XU

    Abstract: Long video understanding is a complex task that requires both spatial detail and temporal awareness. While Vision-Language Models (VLMs) obtain frame-level understanding capabilities through multi-frame input, they suffer from information loss due to the sparse sampling strategy. In contrast, Video Large Language Models (Video-LLMs) capture temporal relationships within visual features but are lim… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  24. arXiv:2504.06684  [pdf, other

    cs.RO cs.MA

    SDHN: Skewness-Driven Hypergraph Networks for Enhanced Localized Multi-Robot Coordination

    Authors: Delin Zhao, Yanbo Shan, Chang Liu, Shenghang Lin, Yingxin Shou, Bin Xu

    Abstract: Multi-Agent Reinforcement Learning is widely used for multi-robot coordination, where simple graphs typically model pairwise interactions. However, such representations fail to capture higher-order collaborations, limiting effectiveness in complex tasks. While hypergraph-based approaches enhance cooperation, existing methods often generate arbitrary hypergraph structures and lack adaptability to e… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  25. arXiv:2504.06310  [pdf

    cs.GR

    Conformal Slit Mapping Based Spiral Tool Trajectory Planning for Ball-end Milling on Complex Freeform Surfaces

    Authors: Changqing Shen, BingZhou Xu, Xiaojian Zhang, Sijie Yan, Han Ding

    Abstract: This study presents a spiral-based complete coverage strategy for ball-end milling on freeform surfaces, utilizing conformal slit mapping to generate milling trajectories that are more compact, smoother, and evenly distributed when machining 2D cavities with islands. This approach, an upgrade from traditional methods, extends the original algorithm to effectively address 3D perforated surface mill… ▽ More

    Submitted 12 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: The revised manuscript has improved the quality of the figures

  26. arXiv:2504.05295  [pdf, other

    cs.LG cs.AI math.OC

    Dion: A Communication-Efficient Optimizer for Large Models

    Authors: Kwangjun Ahn, Byron Xu

    Abstract: Training large AI models efficiently requires distributing computation across multiple accelerators, but this often incurs significant communication overhead -- especially during gradient synchronization. We introduce Dion, a communication-efficient optimizer that retains the synchronous semantics of standard distributed training (e.g., DDP, FSDP) while substantially reducing I/O costs. Unlike con… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: technical report; comments welcome!

  27. arXiv:2504.05081  [pdf, other

    cs.CL

    The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

    Authors: Tianshi Zheng, Yixiang Chen, Chengxi Li, Chunyang Li, Qing Zong, Haochen Shi, Baixuan Xu, Yangqiu Song, Ginny Y. Wong, Simon See

    Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs) through the generation of explicit explanatory rationales. However, our study reveals a surprising contradiction to this prevailing perspective. Through extensive experiments involving 16 state-of-the-art LLMs and nine diverse pattern-based in-context learni… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 30 pages, 12 tables, 6 figures

  28. arXiv:2504.02855  [pdf, other

    eess.SY cs.AI

    Exploration of Multi-Element Collaborative Research and Application for Modern Power System Based on Generative Large Models

    Authors: Lu Cheng, Qixiu Zhang, Beibei Xu, Zhiwei Huang, Cirun Zhang, Yanan Lyu, Fan Zhang

    Abstract: The transition to intelligent, low-carbon power systems necessitates advanced optimization strategies for managing renewable energy integration, energy storage, and carbon emissions. Generative Large Models (GLMs) provide a data-driven approach to enhancing forecasting, scheduling, and market operations by processing multi-source data and capturing complex system dynamics. This paper explores the… ▽ More

    Submitted 26 March, 2025; originally announced April 2025.

  29. arXiv:2504.00532  [pdf, other

    cs.SE cs.CL

    SRLCG: Self-Rectified Large-Scale Code Generation with Multidimensional Chain-of-Thought and Dynamic Backtracking

    Authors: Hongru Ma, Yanjie Liang, Jiasheng Si, Weiyu Zhang, Hongjiao Guan, Chaoqun Zheng, Bing Xu, Wenpeng Lu

    Abstract: Large language models (LLMs) have revolutionized code generation, significantly enhancing developer productivity. However, for a vast number of users with minimal coding knowledge, LLMs provide little support, as they primarily generate isolated code snippets rather than complete, large-scale project code. Without coding expertise, these users struggle to interpret, modify, and iteratively refine… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 23 pages

  30. arXiv:2503.24377  [pdf, other

    cs.CL cs.AI

    Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

    Authors: Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks, transitioning from fast and intuitive thinking (System 1) to slow and deep reasoning (System 2). While System 2 reasoning improves task accuracy, it often incurs substantial computational costs due to its slow thinking nature and inefficient or unnecessary reasoning beh… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: In Progress; Paper list Repo: https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers

  31. arXiv:2503.23985  [pdf, other

    cs.PL

    An Empirical Study of Rust-Specific Bugs in the rustc Compiler

    Authors: Zixi Liu, Yang Feng, Yunbo Ni, Shaohua Li, Xizhe Yin, Qingkai Shi, Baowen Xu, Zhendong Su

    Abstract: Rust is gaining popularity for its well-known memory safety guarantees and high performance, distinguishing it from C/C++ and JVM-based languages. Its compiler, rustc, enforces these guarantees through specialized mechanisms such as trait solving, borrow checking, and specific optimizations. However, Rust's unique language mechanisms introduce complexity to its compiler, leading to Rust-specific c… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  32. arXiv:2503.23199  [pdf, other

    cs.RO cs.AI

    Incorporating GNSS Information with LIDAR-Inertial Odometry for Accurate Land-Vehicle Localization

    Authors: Jintao Cheng, Bohuan Xue, Shiyang Chen, Qiuchi Xiang, Xiaoyu Tang

    Abstract: Currently, visual odometry and LIDAR odometry are performing well in pose estimation in some typical environments, but they still cannot recover the localization state at high speed or reduce accumulated drifts. In order to solve these problems, we propose a novel LIDAR-based localization framework, which achieves high accuracy and provides robust localization in 3D pointcloud maps with informatio… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  33. arXiv:2503.21187  [pdf

    cs.CV

    DSU-Net:An Improved U-Net Model Based on DINOv2 and SAM2 with Multi-scale Cross-model Feature Enhancement

    Authors: Yimin Xu, Fan Yang, Bin Xu

    Abstract: Despite the significant advancements in general image segmentation achieved by large-scale pre-trained foundation models (such as Meta's Segment Any-thing Model (SAM) series and DINOv2), their performance in specialized fields remains limited by two critical issues: the excessive training costs due to large model parameters, and the insufficient ability to represent specific domain characteristics… ▽ More

    Submitted 31 March, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  34. arXiv:2503.17965  [pdf, other

    cs.CL cs.AI

    Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts

    Authors: Beining Xu, Arkaitz Zubiaga

    Abstract: Large Language Models (LLMs) have demonstrated exceptional performance on a range of downstream NLP tasks by generating text that closely resembles human writing. However, the ease of achieving this similarity raises concerns from potential malicious uses at scale by bad actors, as LLM-generated text becomes increasingly difficult to discern from human text. Although detection methods have been de… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 14 pages, 3 figures

    MSC Class: 68T50 ACM Class: I.2.7

  35. arXiv:2503.17195  [pdf, other

    cs.LG cs.AI

    TreeSynth: Synthesizing Diverse Data from Scratch via Tree-Guided Subspace Partitioning

    Authors: Sheng Wang, Pengan Chen, Jingqi Zhou, Qintong Li, Jingwei Dong, Jiahui Gao, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu

    Abstract: Model customization requires high-quality and diverse datasets, but acquiring such data remains challenging and costly. Although large language models (LLMs) can synthesize training data, current approaches are constrained by limited seed data, model bias and insufficient control over the generation process, resulting in limited diversity and biased distribution with the increase of data scales. T… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  36. arXiv:2503.16304  [pdf, other

    cs.CY cs.AI

    Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1

    Authors: Peiran Gu, Fuhao Duan, Wenhao Li, Bochen Xu, Ying Cai, Teng Yao, Chenxun Zhuo, Tianming Liu, Bao Ge

    Abstract: In recent years, the development of Large Language Models (LLMs) has made significant breakthroughs in the field of natural language processing and has gradually been applied to the field of humanities and social sciences research. LLMs have a wide range of application value in the field of humanities and social sciences because of its strong text understanding, generation and reasoning capabiliti… ▽ More

    Submitted 15 April, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

    Comments: 52 pages, 19 figures

  37. arXiv:2503.15470  [pdf, other

    cs.CV cs.AI

    EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining

    Authors: Boshen Xu, Yuting Mei, Xinbi Liu, Sipeng Zheng, Qin Jin

    Abstract: Egocentric video-language pretraining has significantly advanced video representation learning. Humans perceive and interact with a fully 3D world, developing spatial awareness that extends beyond text-based understanding. However, most previous works learn from 1D text or 2D visual cues, such as bounding boxes, which inherently lack 3D understanding. To bridge this gap, we introduce EgoDTM, an Eg… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: Code will be released at: https://github.com/xuboshen/EgoDTM

  38. arXiv:2503.15147  [pdf, other

    cs.GR

    Diffusion-based G-buffer generation and rendering

    Authors: Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri

    Abstract: Despite recent advances in text-to-image generation, controlling geometric layout and material properties in synthesized scenes remains challenging. We present a novel pipeline that first produces a G-buffer (albedo, normals, depth, roughness, and metallic) from a text prompt and then renders a final image through a modular neural network. This intermediate representation enables fine-grained edit… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  39. Flying in Highly Dynamic Environments with End-to-end Learning Approach

    Authors: Xiyu Fan, Minghao Lu, Bowen Xu, Peng Lu

    Abstract: Obstacle avoidance for unmanned aerial vehicles like quadrotors is a popular research topic. Most existing research focuses only on static environments, and obstacle avoidance in environments with multiple dynamic obstacles remains challenging. This paper proposes a novel deep-reinforcement learning-based approach for the quadrotors to navigate through highly dynamic environments. We propose a lid… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: IEEE Robotics and Automation Letters (2025)

  40. arXiv:2503.13377  [pdf, other

    cs.CV cs.AI cs.CL

    TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM

    Authors: Ye Wang, Boshen Xu, Zihao Yue, Zihan Xiao, Ziheng Wang, Liang Zhang, Dingyi Yang, Wenxuan Wang, Qin Jin

    Abstract: We introduce TimeZero, a reasoning-guided LVLM designed for the temporal video grounding (TVG) task. This task requires precisely localizing relevant video segments within long videos based on a given language query. TimeZero tackles this challenge by extending the inference process, enabling the model to reason about video-language relationships solely through reinforcement learning. To evaluate… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Code: https://github.com/www-Ye/TimeZero

  41. arXiv:2503.09029  [pdf, other

    cs.CL

    DAST: Difficulty-Aware Self-Training on Large Language Models

    Authors: Boyang Xue, Qi Zhu, Hongru Wang, Rui Wang, Sheng Wang, Hongling Xu, Fei Mi, Yasheng Wang, Lifeng Shang, Qun Liu, Kam-Fai Wong

    Abstract: Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  42. arXiv:2503.08985  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Rydberg Atomic Receivers

    Authors: Bokai Xu, Jiayi Zhang, Zhongtao Chen, Bingyang Cheng, Ziheng Liu, Yik-Chung Wu, Bo Ai

    Abstract: The rapid development of the quantum technology presents huge opportunities for 6G communications. Leveraging the quantum properties of highly excited Rydberg atoms, Rydberg atom-based antennas present distinct advantages, such as high sensitivity, broad frequency range, and compact size, over traditional antennas. To realize efficient precoding, accurate channel state information is essential. Ho… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  43. arXiv:2503.07189  [pdf, ps, other

    cs.IT eess.SP

    Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems

    Authors: Yizhuo Li, Jiakang Zheng, Bokai Xu, Yiyang Zhu, Jiayi Zhang, Bo Ai

    Abstract: Reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising architecture for further improving spectral efficiency (SE) with low cost and power consumption. However, conventional RIS has inevitable limitations due to its capability of only reflecting signals. In contrast, beyond-diagonal RIS (BD-RIS), with its ability to both reflect… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  44. arXiv:2503.07077  [pdf, other

    cs.AI

    Rule-Based Conflict-Free Decision Framework in Swarm Confrontation

    Authors: Zhaoqi Dong, Zhinan Wang, Quanqi Zheng, Bin Xu, Lei Chen, Jinhu Lv

    Abstract: Traditional rule-based decision-making methods with interpretable advantage, such as finite state machine, suffer from the jitter or deadlock(JoD) problems in extremely dynamic scenarios. To realize agent swarm confrontation, decision conflicts causing many JoD problems are a key issue to be solved. Here, we propose a novel decision-making framework that integrates probabilistic finite state machi… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  45. arXiv:2503.05182  [pdf, other

    cs.CV

    MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions

    Authors: Qingyuan Zhou, Yuehu Gong, Weidong Yang, Jiaze Li, Yeqi Luo, Baixin Xu, Shuhao Li, Ben Fei, Ying He

    Abstract: Novel view synthesis (NVS) and surface reconstruction (SR) are essential tasks in 3D Gaussian Splatting (3D-GS). Despite recent progress, these tasks are often addressed independently, with GS-based rendering methods struggling under diverse light conditions and failing to produce accurate surfaces, while GS-based reconstruction methods frequently compromise rendering quality. This raises a centra… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 11 pages, 7 figures

  46. arXiv:2503.04834  [pdf, other

    cs.CL cs.AI

    Extrapolation Merging: Keep Improving With Extrapolation and Merging

    Authors: Yiguan Lin, Bin Xu, Yinghao Li, Yang Gao

    Abstract: Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. However, the instruction fine-tuning phase still demands significant computational resources and labeled data, lacking a paradigm that can improve model performance without additional computational power and data. Model merging aims to enhance performance by combining the parameters of different mod… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  47. arXiv:2502.21291  [pdf, other

    cs.CV

    MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing

    Authors: Xueyun Tian, Wei Li, Bingbing Xu, Yige Yuan, Yuanzhuo Wang, Huawei Shen

    Abstract: Despite significant progress in diffusion-based image generation, subject-driven generation and instruction-based editing remain challenging. Existing methods typically treat them separately, struggling with limited high-quality data and poor generalization. However, both tasks require capturing complex visual variations while maintaining consistency between inputs and outputs. Therefore, we propo… ▽ More

    Submitted 3 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

  48. arXiv:2502.20807  [pdf, other

    cs.LG

    Digital Player: Evaluating Large Language Models based Human-like Agent in Games

    Authors: Jiawei Wang, Kai Wang, Shaojie Lin, Runze Wu, Bihan Xu, Lingeng Jiang, Shiwei Zhao, Renyu Zhu, Haoyu Liu, Zhipeng Hu, Zhong Fan, Le Li, Tangjie Lyu, Changjie Fan

    Abstract: With the rapid advancement of Large Language Models (LLMs), LLM-based autonomous agents have shown the potential to function as digital employees, such as digital analysts, teachers, and programmers. In this paper, we develop an application-level testbed based on the open-source strategy game "Unciv", which has millions of active players, to enable researchers to build a "data flywheel" for studyi… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: neurips datasets and benchmarks 2024, not accepted

  49. arXiv:2502.19994  [pdf, other

    cs.LG

    Learning Hamiltonian Density Using DeepONet

    Authors: Baige Xu, Yusuke Tanaka, Takashi Matsubara, Takaharu Yaguchi

    Abstract: In recent years, deep learning for modeling physical phenomena which can be described by partial differential equations (PDEs) have received significant attention. For example, for learning Hamiltonian mechanics, methods based on deep neural networks such as Hamiltonian Neural Networks (HNNs) and their variants have achieved progress. However, existing methods typically depend on the discretizatio… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  50. arXiv:2502.19328  [pdf, other

    cs.CL cs.AI

    Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

    Authors: Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, Juanzi Li

    Abstract: Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown strong potential in training LLMs. In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness s… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 16 pages, 5 figures