Skip to main content

Showing 1–50 of 4,841 results for author: Chen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10464  [pdf, ps, other

    eess.IV cs.CV

    HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation

    Authors: Jiaming Liang, Lihuan Dai, Xiaoqi Sheng, Xiangguang Chen, Chun Yao, Guihua Tao, Qibin Leng, Honming Cai, Xi Zhong

    Abstract: Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial res… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: This work has been provisionally accepted for MICCAI 2025

  2. arXiv:2505.10008  [pdf, other

    cs.SE

    SVA-ICL: Improving LLM-based Software Vulnerability Assessment via In-Context Learning and Information Fusion

    Authors: Chaoyang Gao, Xiang Chen, Guangbei Zhang

    Abstract: Context: Software vulnerability assessment (SVA) is critical for identifying, evaluating, and prioritizing security weaknesses in software applications. Objective: Despite the increasing application of large language models (LLMs) in various software engineering tasks, their effectiveness in SVA remains underexplored. Method: To address this gap, we introduce a novel approach SVA-ICL, which levera… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.09998  [pdf, other

    cs.CV

    From Air to Wear: Personalized 3D Digital Fashion with AR/VR Immersive 3D Sketching

    Authors: Ying Zang, Yuanqi Hu, Xinyu Chen, Yuxia Xu, Suhui Wang, Chunan Yu, Lanyun Zhu, Deyi Ji, Xin Xu, Tianrun Chen

    Abstract: In the era of immersive consumer electronics, such as AR/VR headsets and smart devices, people increasingly seek ways to express their identity through virtual fashion. However, existing 3D garment design tools remain inaccessible to everyday users due to steep technical barriers and limited data. In this work, we introduce a 3D sketch-driven 3D garment generation framework that empowers ordinary… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 8 pages, 5 figures

  4. arXiv:2505.09655  [pdf, other

    cs.CL

    DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models

    Authors: Xiwen Chen, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hao Wang, Haiyu Wu, Huayu Li, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

    Abstract: Recent advances in reinforcement learning for language model post-training, such as Group Relative Policy Optimization (GRPO), have shown promise in low-resource settings. However, GRPO typically relies on solution-level and scalar reward signals that fail to capture the semantic diversity among sampled completions. This leads to what we identify as a diversity-quality inconsistency, where distinc… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.09466  [pdf, other

    cs.CV cs.AI

    A 2D Semantic-Aware Position Encoding for Vision Transformers

    Authors: Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi

    Abstract: Vision transformers have demonstrated significant advantages in computer vision tasks due to their ability to capture long-range dependencies and contextual relationships through self-attention. However, existing position encoding techniques, which are largely borrowed from natural language processing, fail to effectively capture semantic-aware positional relationships between image patches. Tradi… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures, 3 tables

  6. arXiv:2505.08748  [pdf, ps, other

    cs.LG

    Implet: A Post-hoc Subsequence Explainer for Time Series Models

    Authors: Fanyu Meng, Ziwen Kan, Shahbaz Rezaei, Zhaodan Kong, Xin Chen, Xin Liu

    Abstract: Explainability in time series models is crucial for fostering trust, facilitating debugging, and ensuring interpretability in real-world applications. In this work, we introduce Implet, a novel post-hoc explainer that generates accurate and concise subsequence-level explanations for time series models. Our approach identifies critical temporal segments that significantly contribute to the model's… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2505.08744  [pdf, other

    cs.AI

    DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

    Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang , et al. (6 additional authors not shown)

    Abstract: To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures

  8. arXiv:2505.08598  [pdf, ps, other

    cs.SE

    Grouptuner: Efficient Group-Aware Compiler Auto-tuning

    Authors: Bingyu Gao, Mengyu Yao, Ziming Wang, Dong Liu, Ding Li, Xiangqun Chen, Yao Guo

    Abstract: Modern compilers typically provide hundreds of options to optimize program performance, but users often cannot fully leverage them due to the huge number of options. While standard optimization combinations (e.g., -O3) provide reasonable defaults, they often fail to deliver near-peak performance across diverse programs and architectures. To address this challenge, compiler auto-tuning techniques h… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: The final version of this paper is going to appear in the ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'25), June 16-17, 2025, Seoul, Republic of Korea

  9. arXiv:2505.08532  [pdf, ps, other

    cs.SI cs.AI

    The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News

    Authors: Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, Rui Yan

    Abstract: In today's digital environment, the rapid propagation of fake news via social networks poses significant social challenges. Most existing detection methods either employ traditional classification models, which suffer from low interpretability and limited generalization capabilities, or craft specific prompts for large language models (LLMs) to produce explanations and results directly, failing to… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: SIGIR 2025

  10. arXiv:2505.08448  [pdf, ps, other

    cs.MA

    Scalable UAV Multi-Hop Networking via Multi-Agent Reinforcement Learning with Large Language Models

    Authors: Yanggang Xu, Weijie Hong, Jirong Zha, Geng Chen, Jianfeng Zheng, Chen-Chun Hsia, Xinlei Chen

    Abstract: In disaster scenarios, establishing robust emergency communication networks is critical, and unmanned aerial vehicles (UAVs) offer a promising solution to rapidly restore connectivity. However, organizing UAVs to form multi-hop networks in large-scale dynamic environments presents significant challenges, including limitations in algorithmic scalability and the vast exploration space required for c… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.08343  [pdf, other

    cs.AI

    An Identifiable Cost-Aware Causal Decision-Making Framework Using Counterfactual Reasoning

    Authors: Ruichu Cai, Xi Chen, Jie Qiao, Zijian Li, Yuequn Liu, Wei Chen, Keli Zhang, Jiale Zheng

    Abstract: Decision making under abnormal conditions is a critical process that involves evaluating the current state and determining the optimal action to restore the system to a normal state at an acceptable cost. However, in such scenarios, existing decision-making frameworks highly rely on reinforcement learning or root cause analysis, resulting in them frequently neglecting the cost of the actions or fa… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  12. arXiv:2505.07747  [pdf, other

    cs.CV

    Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets

    Authors: Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, Ping Tan

    Abstract: While generative artificial intelligence has advanced significantly across text, image, audio, and video domains, 3D generation remains comparatively underdeveloped due to fundamental challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an open framework addressing these challenges through: (1) a rigorous data curation pipeline… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Technical report

  13. arXiv:2505.07581  [pdf, other

    cs.AI cs.CY

    YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models

    Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen

    Abstract: Leveraging large language model (LLM) based agents to simulate human social behaviors has recently gained significant attention. In this paper, we introduce a novel social simulator called YuLan-OneSim. Compared to previous works, YuLan-OneSim distinguishes itself in five key aspects: (1) Code-free scenario construction: Users can simply describe and refine their simulation scenarios through natur… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. arXiv:2505.07425  [pdf, ps, other

    cs.SE

    A Systematic Literature Review on Neural Code Translation

    Authors: Xiang Chen, Jiacheng Xue, Xiaofei Xie, Caokai Liang, Xiaolin Ju

    Abstract: Code translation aims to convert code from one programming language to another automatically. It is motivated by the need for multi-language software development and legacy system migration. In recent years, neural code translation has gained significant attention, driven by rapid advancements in deep learning and large language models. Researchers have proposed various techniques to improve neura… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  15. Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity

    Authors: Guang Yan, Yuhui Zhang, Zimu Guo, Lutan Zhao, Xiaojun Chen, Chen Wang, Wenhao Wang, Dan Meng, Rui Hou

    Abstract: With the growing use of large language models (LLMs) hosted on cloud platforms to offer inference services, privacy concerns about the potential leakage of sensitive information are escalating. Secure multi-party computation (MPC) is a promising solution to protect the privacy in LLM inference. However, MPC requires frequent inter-server communication, causing high performance overhead. Inspired… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Accepted to SP 2025

  16. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.06987  [pdf, other

    cs.CL cs.AI

    Convert Language Model into a Value-based Strategic Planner

    Authors: Xiaoyu Wang, Yue Zhao, Qingqing Gu, Zhonglin Jiang, Xiaokai Chen, Yong Chen, Luo Ji

    Abstract: Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage t… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 11 pages, 5 figures, Accepted by ACL 2025 Industry Track

  18. arXiv:2505.06114  [pdf, other

    cs.LG

    FIC-TSC: Learning Time Series Classification with Fisher Information Constraint

    Authors: Xiwen Chen, Wenhui Zhu, Peijie Qiu, Hao Wang, Huayu Li, Zihan Li, Yalin Wang, Aristeidis Sotiras, Abolfazl Razi

    Abstract: Analyzing time series data is crucial to a wide spectrum of applications, including economics, online marketplaces, and human healthcare. In particular, time series classification plays an indispensable role in segmenting different phases in stock markets, predicting customer behavior, and classifying worker actions and engagement levels. These aspects contribute significantly to the advancement o… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML2025. Pre camera-ready version

  19. arXiv:2505.05622  [pdf, ps, other

    cs.RO cs.AI

    CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory

    Authors: Weichen Zhang, Chen Gao, Shiquan Yu, Ruiying Peng, Baining Zhao, Qian Zhang, Jinqiang Cui, Xinlei Chen, Yong Li

    Abstract: Aerial vision-and-language navigation (VLN), requiring drones to interpret natural language instructions and navigate complex urban environments, emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN du… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  20. arXiv:2505.05465  [pdf, other

    cs.CL cs.AI cs.LG

    ComPO: Preference Alignment via Comparison Oracles

    Authors: Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin

    Abstract: Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy preference pairs that induce similar likelihood for preferred and dispreferred responses. The contributions of this paper are two-fold. First, we propose a new prefer… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 25 pages

  21. arXiv:2505.05446  [pdf, ps, other

    cs.CV cs.CL

    Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding

    Authors: Han Xiao, Yina Xie, Guanxin Tan, Yinghao Chen, Rui Hu, Ke Wang, Aojun Zhou, Hao Li, Hao Shao, Xudong Lu, Peng Gao, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li

    Abstract: Visual Document Understanding has become essential with the increase of text-rich visual content. This field poses significant challenges due to the need for effective integration of visual perception and textual comprehension, particularly across diverse document types with complex layouts. Moreover, existing fine-tuning datasets for this domain often fall short in providing the detailed contextu… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: CVPR2025

  22. arXiv:2505.05391  [pdf, other

    cs.CV

    EDmamba: A Simple yet Effective Event Denoising Method with State Space Model

    Authors: Ciyu Ruan, Zihang Gong, Ruishan Guo, Jingao Xu, Xinlei Chen

    Abstract: Event cameras excel in high-speed vision due to their high temporal resolution, high dynamic range, and low power consumption. However, as dynamic vision sensors, their output is inherently noisy, making efficient denoising essential to preserve their ultra-low latency and real-time processing capabilities. Existing event denoising methods struggle with a critical dilemma: computationally intensiv… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  23. arXiv:2505.05307  [pdf, other

    cs.CV

    PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining

    Authors: Ciyu Ruan, Ruishan Guo, Zihang Gong, Jingao Xu, Wenhan Yang, Xinlei Chen

    Abstract: Event cameras excel in high temporal resolution and dynamic range but suffer from dense noise in rainy conditions. Existing event deraining methods face trade-offs between temporal precision, deraining effectiveness, and computational efficiency. In this paper, we propose PRE-Mamba, a novel point-based event camera deraining framework that fully exploits the spatiotemporal characteristics of raw e… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  24. arXiv:2505.05185  [pdf, other

    cs.DS

    Efficient Parallel Ising Samplers via Localization Schemes

    Authors: Xiaoyu Chen, Hongyang Liu, Yitong Yin, Xinyuan Zhang

    Abstract: We introduce efficient parallel algorithms for sampling from the Gibbs distribution and estimating the partition function of Ising models. These algorithms achieve parallel efficiency, with polylogarithmic depth and polynomial total work, and are applicable to Ising models in the following regimes: (1) Ferromagnetic Ising models with external fields; (2) Ising models with interaction matrix $J$ of… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  25. arXiv:2505.05126  [pdf, other

    cs.LG

    Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach

    Authors: Xuyang Chen, Keyu Yan, Lin Zhao

    Abstract: Offline reinforcement learning (RL) aims to learn decision-making policies from fixed datasets without online interactions, providing a practical solution where online data collection is expensive or risky. However, offline RL often suffers from distribution shift, resulting in inaccurate evaluation and substantial overestimation on out-of-distribution (OOD) actions. To address this, existing appr… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  26. arXiv:2505.04921  [pdf, other

    cs.CV cs.CL

    Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

    Authors: Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang

    Abstract: Reasoning lies at the heart of intelligence, shaping the ability to make decisions, draw conclusions, and generalize across domains. In artificial intelligence, as systems increasingly operate in open, uncertain, and multimodal environments, reasoning becomes essential for enabling robust and adaptive behavior. Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integra… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 75 Pages,10 figures; Project: https://github.com/HITsz-TMG/Awesome-Large-Multimodal-Reasoning-Models

  27. arXiv:2505.04869  [pdf, other

    cs.HC

    From First Draft to Final Insight: A Multi-Agent Approach for Feedback Generation

    Authors: Jie Cao, Chloe Qianhui Zhao, Xian Chen, Shuman Wang, Christian Schunn, Kenneth R. Koedinger, Jionghao Lin

    Abstract: Producing large volumes of high-quality, timely feedback poses significant challenges to instructors. To address this issue, automation technologies-particularly Large Language Models (LLMs)-show great potential. However, current LLM-based research still shows room for improvement in terms of feedback quality. Our study proposed a multi-agent approach performing "generation, evaluation, and regene… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 14 pages, to be published at the 26th International Conference on Artificial Intelligence in Education (AIED '25)

  28. arXiv:2505.04396  [pdf, other

    cs.LG physics.ao-ph

    Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast

    Authors: Jingnan Wang, Jie Chao, Shangshang Yang, Congyi Nai, Kaijun Ren, Kefeng Deng, Xi Chen, Yaxin Liu, Hanqiuzi Wen, Ziniu Xiao, Lifeng Zhang, Xiaodong Wang, Jiping Guan, Baoxiang Pan

    Abstract: The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from c… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  29. arXiv:2505.04368  [pdf, other

    cs.NI

    Pipelining Split Learning in Multi-hop Edge Networks

    Authors: Wei Wei, Zheng Lin, Tao Li, Xuanheng Li, Xianhao Chen

    Abstract: To support large-scale model training, split learning (SL) enables multiple edge devices/servers to share the intensive training workload. However, most existing works on SL focus solely on two-tier model splitting. Moreover, while some recent works have investigated the model splitting and placement problems for multi-hop SL, these solutions fail to overcome the resource idleness issue, resulting… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  30. arXiv:2505.04084  [pdf, other

    cs.SE cs.AI

    An Empirical Study of OpenAI API Discussions on Stack Overflow

    Authors: Xiang Chen, Jibin Wang, Chaoyang Gao, Xiaolin Ju, Zhanqi Cui

    Abstract: The rapid advancement of large language models (LLMs), represented by OpenAI's GPT series, has significantly impacted various domains such as natural language processing, software development, education, healthcare, finance, and scientific research. However, OpenAI APIs introduce unique challenges that differ from traditional APIs, such as the complexities of prompt engineering, token-based cost m… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  31. arXiv:2505.03374  [pdf, ps, other

    cs.CV

    Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models

    Authors: Abram Schonfeldt, Benjamin Maylor, Xiaofang Chen, Ronald Clark, Aiden Doherty

    Abstract: Introduction: Data from wearable devices collected in free-living settings, and labelled with physical activity behaviours compatible with health research, are essential for both validating existing wearable-based measurement approaches and developing novel machine learning approaches. One common way of obtaining these labels relies on laborious annotation of sequences of images captured by camera… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  32. arXiv:2505.03285  [pdf, other

    cs.IR

    Soft Reasoning Paths for Knowledge Graph Completion

    Authors: Yanning Hou, Sihang Zhou, Ke Liang, Lingyuan Meng, Xiaoshu Chen, Ke Xu, Siwei Wang, Xinwang Liu, Jian Huang

    Abstract: Reasoning paths are reliable information in knowledge graph completion (KGC) in which algorithms can find strong clues of the actual relation between entities. However, in real-world applications, it is difficult to guarantee that computationally affordable paths exist toward all candidate entities. According to our observation, the prediction accuracy drops significantly when paths are absent. To… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  33. arXiv:2505.03266  [pdf

    physics.optics cs.IT eess.SP

    Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation

    Authors: Yi Ning Zheng, Lei Zhang, Xiao Qing Chen, Marco Rossi, Giuseppe Castaldi, Shuo Liu, Tie Jun Cui, Vincenzo Galdi

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 30 pages, 6 figures, 1 table, supporting information

  34. arXiv:2505.02847  [pdf, other

    cs.CL cs.AI cs.CY

    Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

    Authors: Bang Zhang, Ruotian Ma, Qingxuan Jiang, Peisong Wang, Jiaqi Chen, Zheng Xie, Xingyu Chen, Yue Wang, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li

    Abstract: Assessing how well a large language model (LLM) understands human, rather than merely text, remains an open challenge. To bridge the gap, we introduce Sentient Agent as a Judge (SAGE), an automated evaluation framework that measures an LLM's higher-order social cognition. SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction, providing… ▽ More

    Submitted 9 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: code: https://github.com/Tencent/digitalhuman/tree/main/SAGE

  35. arXiv:2505.02795  [pdf, other

    cs.LG cs.AI cs.DC

    HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Praneeth Vepakomma, Wei Ni, Jun Luo, Yue Gao

    Abstract: Recently, large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond. Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream. Though federated learning (FL) offers a promising solution for fine-tuning LLMs without sharing raw data, substantial computing… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages, 22 figures

  36. arXiv:2505.02387  [pdf, ps, other

    cs.CL cs.AI cs.LG

    RM-R1: Reward Modeling as Reasoning

    Authors: Xiusi Chen, Gaotang Li, Ziqi Wang, Bowen Jin, Cheng Qian, Yu Wang, Hongru Wang, Yu Zhang, Denghui Zhang, Tong Zhang, Hanghang Tong, Heng Ji

    Abstract: Reward modeling is essential for aligning large language models (LLMs) with human preferences through reinforcement learning (RL). To provide accurate reward signals, a reward model (RM) should stimulate deep thinking and conduct interpretable reasoning before assigning a score or a judgment. Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we hypothesize an… ▽ More

    Submitted 15 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: 24 pages, 8 figures

  37. arXiv:2505.02099  [pdf, other

    cs.AI

    MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents

    Authors: Zeyu Zhang, Quanyu Dai, Xu Chen, Rui Li, Zhongyang Li, Zhenhua Dong

    Abstract: Recently, large language model based (LLM-based) agents have been widely applied across various fields. As a critical part, their memory capabilities have captured significant interest from both industrial and academic communities. Despite the proposal of many advanced memory models in recent research, however, there remains a lack of unified implementations under a general framework. To address t… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Just accepted by TheWebConf'25 Resource Track

  38. arXiv:2505.02056  [pdf, other

    cs.CV cs.LG

    Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin

    Authors: Yuchen Wang, Xuefeng Bai, Xiucheng Li, Weili Guan, Liqiang Nie, Xinyang Chen

    Abstract: Adapting vision-language models (VLMs) to downstream tasks with pseudolabels has gained increasing attention. A major obstacle is that the pseudolabels generated by VLMs tend to be imbalanced, leading to inferior performance. While existing methods have explored various strategies to address this, the underlying causes of imbalance remain insufficiently investigated. To fill this gap, we delve int… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  39. arXiv:2505.01998  [pdf, other

    cs.RO cs.AI physics.app-ph

    A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction

    Authors: Xiaoliang Chen, Xin Yu, Le Chang, Yunhe Huang, Jiashuai He, Shibo Zhang, Jin Li, Likai Lin, Ziyu Zeng, Xianling Tu, Shuyu Zhang

    Abstract: This paper introduces a novel framework integrating nonlinear acoustic computing and reinforcement learning to enhance advanced human-robot interaction under complex noise and reverberation. Leveraging physically informed wave equations (e.g., Westervelt, KZK), the approach captures higher-order phenomena such as harmonic generation and shock formation. By embedding these models in a reinforcement… ▽ More

    Submitted 6 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

    Comments: 34 pages, 11 figures, 10 tables, and 10 equations

    MSC Class: 68T01 ACM Class: I.2.8

  40. arXiv:2505.01755  [pdf, other

    eess.IV cs.CV

    LensNet: An End-to-End Learning Framework for Empirical Point Spread Function Modeling and Lensless Imaging Reconstruction

    Authors: Jiesong Bai, Yuhao Yin, Yihang Dong, Xiaofeng Zhang, Chi-Man Pun, Xuhang Chen

    Abstract: Lensless imaging stands out as a promising alternative to conventional lens-based systems, particularly in scenarios demanding ultracompact form factors and cost-effective architectures. However, such systems are fundamentally governed by the Point Spread Function (PSF), which dictates how a point source contributes to the final captured signal. Traditional lensless techniques often require explic… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  41. VisTaxa: Developing a Taxonomy of Historical Visualizations

    Authors: Yu Zhang, Xinyue Chen, Weili Zheng, Yuhan Guo, Guozheng Li, Siming Chen, Xiaoru Yuan

    Abstract: Historical visualizations are a rich resource for visualization research. While taxonomy is commonly used to structure and understand the design space of visualizations, existing taxonomies primarily focus on contemporary visualizations and largely overlook historical visualizations. To address this gap, we describe an empirical method for taxonomy development. We introduce a coding protocol and t… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Accepted to IEEE TVCG (IEEE PacificVis 2025 Journal Track)

  42. arXiv:2505.01234  [pdf, ps, other

    cs.IT eess.SP

    Robust Deep Learning-Based Physical Layer Communications: Strategies and Approaches

    Authors: Fenghao Zhu, Xinquan Wang, Chen Zhu, Tierui Gong, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang, Mérouane Debbah

    Abstract: Deep learning (DL) has emerged as a transformative technology with immense potential to reshape the sixth-generation (6G) wireless communication network. By utilizing advanced algorithms for feature extraction and pattern recognition, DL provides unprecedented capabilities in optimizing the network efficiency and performance, particularly in physical layer communications. Although DL technologies… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures. Accept by IEEE Network Magazine

  43. arXiv:2505.01076  [pdf, other

    cs.IT eess.SP

    Quasi-Static IRS: 3D Shaped Beamforming for Area Coverage Enhancement

    Authors: Zhenyu Jiang, Xintong Chen, Jiangbin Lyu, Liqun Fu, Rui Zhang

    Abstract: Intelligent reflecting surface (IRS) is a promising paradigm to reconfigure the wireless environment for enhanced communication coverage and quality. However, to compensate for the double pathloss effect, massive IRS elements are required, raising concerns on the scalability of cost and complexity. This paper introduces a new architecture of quasi-static IRS (QS-IRS), which tunes element phases vi… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 6 pages, 6 figures

  44. Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator

    Authors: Xuyang Chen, Jingliang Duan, Lin Zhao

    Abstract: Actor-critic methods have achieved state-of-the-art performance in various challenging tasks. However, theoretical understandings of their performance remain elusive and challenging. Existing studies mostly focus on practically uncommon variants such as double-loop or two-timescale stepsize actor-critic algorithms for simplicity. These results certify local convergence on finite state- or action-s… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.08744

    Report number: Article No. 422, Pages 3816--3824

    Journal ref: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  45. arXiv:2505.00973  [pdf, ps, other

    cs.LG math.OC

    A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems

    Authors: Xin Chen, Yuze Chen, Yuan Zhou

    Abstract: We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become progressively refined over time, and seeks decisions that are competitive with the hindsight optimal under all possible realizations of both parameters and predi… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 64 pages, 1 figure

  46. arXiv:2505.00887  [pdf, other

    cs.LG cs.AI

    Rethinking Time Encoding via Learnable Transformation Functions

    Authors: Xi Chen, Yateng Tang, Jiarong Xu, Jiawei Zhang, Siwei Zhang, Sijia Peng, Xuehao Zheng, Yun Xiong

    Abstract: Effectively modeling time information and incorporating it into applications or models involving chronologically occurring events is crucial. Real-world scenarios often involve diverse and complex time patterns, which pose significant challenges for time encoding methods. While previous methods focus on capturing time patterns, many rely on specific inductive biases, such as using trigonometric fu… ▽ More

    Submitted 14 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: 26 pages, 19 figures, 10 tables

  47. arXiv:2505.00515  [pdf, other

    cs.RO cs.AI cs.MA

    Safety-Critical Traffic Simulation with Guided Latent Diffusion Model

    Authors: Mingxing Peng, Ruoyu Yao, Xusen Guo, Yuting Xie, Xianda Chen, Jun Ma

    Abstract: Safety-critical traffic simulation plays a crucial role in evaluating autonomous driving systems under rare and challenging scenarios. However, existing approaches often generate unrealistic scenarios due to insufficient consideration of physical plausibility and suffer from low generation efficiency. To address these limitations, we propose a guided latent diffusion model (LDM) capable of generat… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 7 pages, 3 figures

  48. arXiv:2505.00332  [pdf, other

    cs.RO

    Active Contact Engagement for Aerial Navigation in Unknown Environments with Glass

    Authors: Xinyi Chen, Yichen Zhang, Hetai Zou, Junzhe Wang, Shaojie Shen

    Abstract: Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent glass obstacles present significant challenges to reliable navigation. Researchers have investigated the use of non-contact sensors and passive contact-resilient aerial vehicle designs to detect glass surfaces, which are often limited in terms of robustness and efficiency. In this work, we propose a… ▽ More

    Submitted 8 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted in the IEEE RA-L. See video at https://youtu.be/AKw-umiDkPU?si=zaNqecdD5sp9m3pZ

  49. arXiv:2505.00063  [pdf, other

    cs.CL cs.CV

    GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

    Authors: Siqi Li, Yufan Shen, Xiangnan Chen, Jiayi Chen, Hengwei Ju, Haodong Duan, Song Mao, Hongbin Zhou, Bo Zhang, Pinlong Cai, Licheng Wen, Botian Shi, Yong Liu, Xinyu Cai, Yu Qiao

    Abstract: The rapid advancement of multimodal large language models (MLLMs) has profoundly impacted the document domain, creating a wide array of application scenarios. This progress highlights the need for a comprehensive benchmark to evaluate these models' capabilities across various document-specific tasks. However, existing benchmarks often fail to locate specific model weaknesses or guide systematic im… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  50. arXiv:2505.00028  [pdf, other

    cs.CL cs.AI cs.IR

    Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation

    Authors: Pengchao Feng, Ziyang Ma, Wenxi Chen, Yao Li, Sheng Wang, Kai Yu, Xie Chen

    Abstract: In recent years, end-to-end speech-to-speech (S2S) dialogue systems have garnered increasing research attention due to their advantages over traditional cascaded systems, including achieving lower latency and more natural integration of nonverbal cues such as emotion and speaker identity. However, these end-to-end systems face key challenges, particularly in incorporating external knowledge, a cap… ▽ More

    Submitted 27 April, 2025; originally announced May 2025.