Skip to main content

Showing 1–50 of 6,155 results for author: Zhang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09995  [pdf, ps, other

    cs.NI cs.DL

    A Survey on Open-Source Edge Computing Simulators and Emulators: The Computing and Networking Convergence Perspective

    Authors: Jianpeng Qi, Chao Liu, Xiao Zhang, Lei Wang, Rui Wang, Junyu Dong, Yanwei Yu

    Abstract: Edge computing, with its low latency, dynamic scalability, and location awareness, along with the convergence of computing and communication paradigms, has been successfully applied in critical domains such as industrial IoT, smart healthcare, smart homes, and public safety. This paper provides a comprehensive survey of open-source edge computing simulators and emulators, presented in our GitHub r… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 10 pages, 2 figures, 5 tables

  2. UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs

    Authors: Yi Gui, Yao Wan, Zhen Li, Zhongyi Zhang, Dongping Chen, Hongyu Zhang, Yi Su, Bohua Chen, Xing Zhou, Wenbin Jiang, Xiangliang Zhang

    Abstract: Automating the synthesis of User Interfaces (UIs) plays a crucial role in enhancing productivity and accelerating the development lifecycle, reducing both development time and manual effort. Recently, the rapid development of Multimodal Large Language Models (MLLMs) has made it possible to generate front-end Hypertext Markup Language (HTML) code directly from webpage designs. However, real-world w… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: WWW' 2025

  3. arXiv:2505.09768  [pdf, other

    cs.LG

    Self-Consuming Generative Models with Adversarially Curated Data

    Authors: Xiukun Wei, Xueru Zhang

    Abstract: Recent advances in generative models have made it increasingly difficult to distinguish real data from model-generated synthetic data. Using synthetic data for successive training of future model generations creates "self-consuming loops", which may lead to model collapse or training instability. Furthermore, synthetic data is often subject to human feedback and curated by users based on their pre… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.09377  [pdf, ps, other

    cs.RO

    Strategic Jenga Play via Graph Based Dynamics Modeling

    Authors: Kavya Puthuveetil, Xinyi Zhang, Kazuto Yokoyama, Tetsuya Narita

    Abstract: Controlled manipulation of multiple objects whose dynamics are closely linked is a challenging problem within contact-rich manipulation, requiring an understanding of how the movement of one will impact the others. Using the Jenga game as a testbed to explore this problem, we graph-based modeling to tackle two different aspects of the task: 1) block selection and 2) block extraction. For block sel… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 5 pages, Oral Spotlight at ICRA 2025 Workshop "Learning Meets Model-Based Methods for Contact-Rich Manipulation"

  6. arXiv:2505.09325  [pdf, ps, other

    cs.SD eess.AS

    SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset

    Authors: Yicheng Gu, Chaoren Wang, Junan Zhang, Xueyao Zhang, Zihao Fang, Haorui He, Zhizheng Wu

    Abstract: The lack of a publicly-available large-scale and diverse dataset has long been a significant bottleneck for singing voice applications like Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC). To tackle this problem, we present SingNet, an extensive, diverse, and in-the-wild singing voice dataset. Specifically, we propose a data processing pipeline to extract ready-to-use training dat… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.08705  [pdf, ps, other

    cs.CV cs.AI

    Controllable Image Colorization with Instance-aware Texts and Masks

    Authors: Yanru An, Ling Gui, Qiang Hu, Chunlei Cai, Tianxiao Ye, Xiaoyun Zhang, Yanfeng Wang

    Abstract: Recently, the application of deep learning in image colorization has received widespread attention. The maturation of diffusion models has further advanced the development of image colorization models. However, current mainstream image colorization models still face issues such as color bleeding and color binding errors, and cannot colorize images at the instance level. In this paper, we propose a… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. arXiv:2505.08532  [pdf, ps, other

    cs.SI cs.AI

    The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News

    Authors: Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, Rui Yan

    Abstract: In today's digital environment, the rapid propagation of fake news via social networks poses significant social challenges. Most existing detection methods either employ traditional classification models, which suffer from low interpretability and limited generalization capabilities, or craft specific prompts for large language models (LLMs) to produce explanations and results directly, failing to… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: SIGIR 2025

  9. arXiv:2505.08367  [pdf, ps, other

    cs.RO

    MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

    Authors: Xianghui Wang, Xinming Zhang, Yanjun Chen, Xiaoyu Shen, Wei Zhang

    Abstract: Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  10. arXiv:2505.08341  [pdf, ps, other

    cs.AI cs.MA q-bio.GN

    Benchmarking AI scientists in omics data-driven biological research

    Authors: Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Lei Wei, Xuegong Zhang

    Abstract: The rise of large language models and multi-agent systems has sparked growing interest in AI scientists capable of autonomous biological research. However, existing benchmarks either focus on reasoning without data or on data analysis with predefined statistical answers, lacking realistic, data-driven evaluation settings. Here, we introduce the Biological AI Scientist Benchmark (BaisBench), a benc… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  11. arXiv:2505.08298  [pdf, ps, other

    cs.IT

    On Analysis of Superimposed Pilot in Multi-User Massive MIMO with Massive Connectivity

    Authors: Shuxiao Ye, Xianchao Zhang, Neng Ye

    Abstract: The simultaneous transmission of numerous users presents substantial challenges due to the inherent trade-off between channel estimation and information transmission in multi-user multiple-input multiple-output (MIMO) system. In this paper, we explore the use of the superimposed pilot (SP) scheme to tackle the large transmitting users, where the number of users may exceed the coherent time. SP sch… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  12. arXiv:2505.08292  [pdf, ps, other

    cs.CR

    On the Account Security Risks Posed by Password Strength Meters

    Authors: Ming Xu, Weili Han, Jitao Yu, Jing Liu, Xinyi Zhang, Yun Lin, Jin Song Dong

    Abstract: Password strength meters (PSMs) have been widely used by websites to gauge password strength, encouraging users to create stronger passwords. Popular data-driven PSMs, e.g., based on Markov, Probabilistic Context-free Grammar (PCFG) and neural networks, alarm strength based on a model learned from real passwords. Despite their proven effectiveness, the secure utility that arises from the leakage o… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  13. arXiv:2505.08281  [pdf, ps, other

    cs.CV eess.IV

    Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion

    Authors: Anle Ke, Xu Zhang, Tong Chen, Ming Lu, Chao Zhou, Jiawen Gu, Zhan Ma

    Abstract: Existing multimodal large model-based image compression frameworks often rely on a fragmented integration of semantic retrieval, latent compression, and generative models, resulting in suboptimal performance in both reconstruction fidelity and coding efficiency. To address these challenges, we propose a residual-guided ultra lowrate image compression named ResULIC, which incorporates residual sign… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Journal ref: ICML 2025

  14. arXiv:2505.08245  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Large Language Model Psychometrics: A Systematic Review of Evaluation, Validation, and Enhancement

    Authors: Haoran Ye, Jing Jin, Yuhang Xie, Xin Zhang, Guojie Song

    Abstract: The rapid advancement of large language models (LLMs) has outpaced traditional evaluation methodologies. It presents novel challenges, such as measuring human-like psychological constructs, navigating beyond static and task-specific benchmarks, and establishing human-centered evaluation. These challenges intersect with Psychometrics, the science of quantifying the intangible aspects of human psych… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 63 pages, 482 references

  15. arXiv:2505.07961  [pdf, ps, other

    cs.LG

    Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

    Authors: Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak

    Abstract: Recent research enhances language model reasoning by scaling test-time compute via longer chain-of-thought traces. This often improves accuracy but also introduces redundancy and high computational cost, especially for small language models distilled with supervised fine-tuning (SFT). In this work, we propose new algorithms to improve token-efficient reasoning with small-scale models by effectivel… ▽ More

    Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  16. arXiv:2505.07889  [pdf, ps, other

    cs.CL

    BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning

    Authors: Yuyang Liu, Liuzhenghao Lv, Xiancheng Zhang, Li Yuan, Yonghong Tian

    Abstract: Biological protocols are fundamental to reproducible and safe life science research. While LLMs excel on general tasks, their systematic evaluation on these highly specialized, accuracy-critical, and inherently procedural texts remains limited. In this work, we present BioProBench, the first large-scale, integrated multi-task benchmark for biological protocol understanding and reasoning. While lim… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.07863  [pdf, ps, other

    cs.CL

    QoSBERT: An Uncertainty-Aware Approach based on Pre-trained Language Models for Service Quality Prediction

    Authors: Ziliang Wang, Xiaohong Zhang, Ze Shi Li, Meng Yan

    Abstract: Accurate prediction of Quality of Service (QoS) metrics is fundamental for selecting and managing cloud based services. Traditional QoS models rely on manual feature engineering and yield only point estimates, offering no insight into the confidence of their predictions. In this paper, we propose QoSBERT, the first framework that reformulates QoS prediction as a semantic regression task based on p… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  18. arXiv:2505.07747  [pdf, other

    cs.CV

    Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets

    Authors: Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, Xiao Chen, Feipeng Tian, Jianxiong Pan, Zeming Li, Gang Yu, Xiangyu Zhang, Daxin Jiang, Ping Tan

    Abstract: While generative artificial intelligence has advanced significantly across text, image, audio, and video domains, 3D generation remains comparatively underdeveloped due to fundamental challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation. To this end, we present Step1X-3D, an open framework addressing these challenges through: (1) a rigorous data curation pipeline… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Technical report

  19. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  20. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  21. arXiv:2505.06923  [pdf, ps, other

    cs.RO

    YOPOv2-Tracker: An End-to-End Agile Tracking and Navigation Framework from Perception to Action

    Authors: Junjie Lu, Yulin Hui, Xuewei Zhang, Wencan Feng, Hongming Shen, Zhiyu Li, Bailing Tian

    Abstract: Traditional target tracking pipelines including detection, mapping, navigation, and control are comprehensive but introduce high latency, limitting the agility of quadrotors. On the contrary, we follow the design principle of "less is more", striving to simplify the process while maintaining effectiveness. In this work, we propose an end-to-end agile tracking and navigation framework for quadrotor… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  22. arXiv:2505.06896  [pdf, ps, other

    cs.DC stat.CO

    RCOMPSs: A Scalable Runtime System for R Code Execution on Manycore Systems

    Authors: Xiran Zhang, Javier Conejero, Sameh Abdulah, Jorge Ejarque, Ying Sun, Rosa M. Badia, David E. Keyes, Marc G. Genton

    Abstract: R has become a cornerstone of scientific and statistical computing due to its extensive package ecosystem, expressive syntax, and strong support for reproducible analysis. However, as data sizes and computational demands grow, native R parallelism support remains limited. This paper presents RCOMPSs, a scalable runtime system that enables efficient parallel execution of R applications on multicore… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  23. arXiv:2505.06766  [pdf, other

    cs.SD eess.AS eess.SP

    Beyond Identity: A Generalizable Approach for Deepfake Audio Detection

    Authors: Yasaman Ahmadiadli, Xiao-Ping Zhang, Naimul Khan

    Abstract: Deepfake audio presents a growing threat to digital security, due to its potential for social engineering, fraud, and identity misuse. However, existing detection models suffer from poor generalization across datasets, due to implicit identity leakage, where models inadvertently learn speaker-specific features instead of manipulation artifacts. To the best of our knowledge, this is the first study… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Submitted to IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM)

  24. arXiv:2505.06665  [pdf, other

    cs.CV

    MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning

    Authors: Zixian Zhao, Andrew Howes, Xingchen Zhang

    Abstract: Visible and infrared image fusion (VIF) has attracted significant attention in recent years. Traditional VIF methods primarily focus on generating fused images with high visual quality, while recent advancements increasingly emphasize incorporating semantic information into the fusion model during training. However, most existing segmentation-oriented VIF methods adopt a cascade structure comprisi… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  25. arXiv:2505.06302  [pdf, other

    cs.LG cs.AI

    QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives

    Authors: Xuzhi Zhang, Shaohui Peng, Qirui Zhou, Yuanbo Wen, Qi Guo, Ruizhi Chen, Xinguo Zhu, Weiqiang Xiong, Haixin Chen, Congying Ma, Ke Gao, Chen Zhao, Yanjun Wu, Yunji Chen, Ling Li

    Abstract: Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks po… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

    ACM Class: I.2.2

  26. arXiv:2505.06269  [pdf

    cs.LG

    A machine learning model for skillful climate system prediction

    Authors: Chenguang Zhou, Lei Chen, Xiaohui Zhong, Bo Lu, Hao Li, Libo Wu, Jie Wu, Jiahui Hu, Zesheng Dou, Pang-Chi Hsu, Xiaoye Zhang

    Abstract: Climate system models (CSMs), through integrating cross-sphere interactions among the atmosphere, ocean, land, and cryosphere, have emerged as pivotal tools for deciphering climate dynamics and improving forecasting capabilities. Recent breakthroughs in artificial intelligence (AI)-driven meteorological modeling have demonstrated remarkable success in single-sphere systems and partially spheres co… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  27. arXiv:2505.06133  [pdf, ps, other

    cs.CV

    BrainSegDMlF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation

    Authors: Hongming Wang, Yifeng Wu, Huimin Huang, Hongtao Wu, Jia-Xuan Jiang, Xiaodong Zhang, Hao Zheng, Xian Wu, Yefeng Zheng, Jinping Xu, Jing Cheng

    Abstract: The segmentation of substantial brain lesions is a significant and challenging task in the field of medical image segmentation. Substantial brain lesions in brain imaging exhibit high heterogeneity, with indistinct boundaries between lesion regions and normal brain tissue. Small lesions in single slices are difficult to identify, making the accurate and reproducible segmentation of abnormal region… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  28. arXiv:2505.05840  [pdf, ps, other

    cs.RO eess.SY

    Versatile Distributed Maneuvering with Generalized Formations using Guiding Vector Fields

    Authors: Yang Lu, Sha Luo, Pengming Zhu, Weijia Yao, Hector Garcia de Marina, Xinglong Zhang, Xin Xu

    Abstract: This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding s… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  29. arXiv:2505.05799  [pdf, ps, other

    cs.LG cs.AI

    MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design

    Authors: Haojie Duanmu, Xiuhong Li, Zhihang Yuan, Size Zheng, Jiangfei Duan, Xingcheng Zhang, Dahua Lin

    Abstract: Mixture-of-Experts (MoE) models face deployment challenges due to their large parameter counts and computational demands. We explore quantization for MoE models and highlight two key insights: 1) linear blocks exhibit varying quantization sensitivity, and 2) divergent expert activation frequencies create heterogeneous computational characteristics. Based on these observations, we introduce MxMoE,… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  30. Smart Starts: Accelerating Convergence through Uncommon Region Exploration

    Authors: Xinyu Zhang, Mário Antunes, Tyler Estro, Erez Zadok, Klaus Mueller

    Abstract: Initialization profoundly affects evolutionary algorithm (EA) efficacy by dictating search trajectories and convergence. This study introduces a hybrid initialization strategy combining empty-space search algorithm (ESA) and opposition-based learning (OBL). OBL initially generates a diverse population, subsequently augmented by ESA, which identifies under-explored regions. This synergy enhances po… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  31. arXiv:2505.05504  [pdf, other

    eess.IV cs.CV

    Image Restoration via Multi-domain Learning

    Authors: Xingyu Jiang, Ning Gao, Xiuhui Zhang, Hongkun Dou, Shaowen Fu, Xiaoqing Zhong, Hongjue Li, Yue Deng

    Abstract: Due to adverse atmospheric and imaging conditions, natural images suffer from various degradation phenomena. Consequently, image restoration has emerged as a key solution and garnered substantial attention. Although recent Transformer architectures have demonstrated impressive success across various restoration tasks, their considerable model complexity poses significant challenges for both traini… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  32. arXiv:2505.05472  [pdf, other

    cs.CV

    Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

    Authors: Chao Liao, Liyang Liu, Xun Wang, Zhengxiong Luo, Xinyu Zhang, Wenliang Zhao, Jie Wu, Liang Li, Zhi Tian, Weilin Huang

    Abstract: Recent progress in unified models for image understanding and generation has been impressive, yet most approaches remain limited to single-modal generation conditioned on multiple modalities. In this paper, we present Mogao, a unified framework that advances this paradigm by enabling interleaved multi-modal generation through a causal approach. Mogao integrates a set of key technical improvements… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Mogao Technical Report

  33. arXiv:2505.05317  [pdf, ps, other

    cs.RO

    CottonSim: Development of an autonomous visual-guided robotic cotton-picking system in the Gazebo

    Authors: Thevathayarajh Thayananthan, Xin Zhang, Yanbo Huang, Jingdao Chen, Nuwan K. Wijewardane, Vitor S. Martins, Gary D. Chesser, Christopher T. Goodin

    Abstract: In this study, an autonomous visual-guided robotic cotton-picking system, built on a Clearpath's Husky robot platform and the Cotton-Eye perception system, was developed in the Gazebo robotic simulator. Furthermore, a virtual cotton farm was designed and developed as a Robot Operating System (ROS 1) package to deploy the robotic cotton picker in the Gazebo environment for simulating autonomous fie… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 45 pages, 15 figures, 4 tables

  34. arXiv:2505.05283  [pdf, ps, other

    cs.SE cs.AI

    Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents

    Authors: Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Bin Shi

    Abstract: Code large language models (CodeLLMs) and agents have shown great promise in tackling complex software engineering tasks.Compared to traditional software engineering methods, CodeLLMs and agents offer stronger abilities, and can flexibly process inputs and outputs in both natural and code. Benchmarking plays a crucial role in evaluating the capabilities of CodeLLMs and agents, guiding their develo… ▽ More

    Submitted 8 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  35. arXiv:2505.05240  [pdf, other

    cs.CV

    PADriver: Towards Personalized Autonomous Driving

    Authors: Genghua Kou, Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Ziheng Zhang, Osamu Yoshie, Tiancai Wang, Ying Li, Xiangyu Zhang

    Abstract: In this paper, we propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD). Built upon Multi-modal Large Language Model (MLLM), PADriver takes streaming frames and personalized textual prompts as inputs. It autoaggressively performs scene understanding, danger level estimation and action decision. The predicted danger level reflects the risk of the potential action… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  36. arXiv:2505.05185  [pdf, other

    cs.DS

    Efficient Parallel Ising Samplers via Localization Schemes

    Authors: Xiaoyu Chen, Hongyang Liu, Yitong Yin, Xinyuan Zhang

    Abstract: We introduce efficient parallel algorithms for sampling from the Gibbs distribution and estimating the partition function of Ising models. These algorithms achieve parallel efficiency, with polylogarithmic depth and polynomial total work, and are applicable to Ising models in the following regimes: (1) Ferromagnetic Ising models with external fields; (2) Ising models with interaction matrix $J$ of… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  37. arXiv:2505.05114  [pdf, other

    eess.AS cs.SD

    Listen to Extract: Onset-Prompted Target Speaker Extraction

    Authors: Pengjie Shen, Kangrui Chen, Shulin He, Pengru Chen, Shuqi Yuan, He Kong, Xueliang Zhang, Zhong-Qiu Wang

    Abstract: We propose $\textit{listen to extract}$ (LExt), a highly-effective while extremely-simple algorithm for monaural target speaker extraction (TSE). Given an enrollment utterance of a target speaker, LExt aims at extracting the target speaker from the speaker's mixed speech with other speakers. For each mixture, LExt concatenates an enrollment utterance of the target speaker to the mixture signal at… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: in submission

  38. arXiv:2505.05017  [pdf, other

    cs.CL

    Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

    Authors: Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, Jianwei Yin

    Abstract: Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may provide valuable insights. Influence functions have been proposed as a means to explain model predictions based on training data. However, existing approaches fai… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages, accepted by IJCAI 2025

  39. arXiv:2505.04960  [pdf, other

    cs.IR cs.MM

    Learning Item Representations Directly from Multimodal Features for Effective Recommendation

    Authors: Xin Zhou, Xiaoxiong Zhang, Dusit Niyato, Zhiqi Shen

    Abstract: Conventional multimodal recommender systems predominantly leverage Bayesian Personalized Ranking (BPR) optimization to learn item representations by amalgamating item identity (ID) embeddings with multimodal features. Nevertheless, our empirical and theoretical findings unequivocally demonstrate a pronounced optimization gradient bias in favor of acquiring representations from multimodal features… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/enoche/LIRDRec

  40. arXiv:2505.04921  [pdf, other

    cs.CV cs.CL

    Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

    Authors: Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang

    Abstract: Reasoning lies at the heart of intelligence, shaping the ability to make decisions, draw conclusions, and generalize across domains. In artificial intelligence, as systems increasingly operate in open, uncertain, and multimodal environments, reasoning becomes essential for enabling robust and adaptive behavior. Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integra… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 75 Pages,10 figures; Project: https://github.com/HITsz-TMG/Awesome-Large-Multimodal-Reasoning-Models

  41. arXiv:2505.04852  [pdf, other

    cs.SE cs.AI cs.PL

    PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

    Authors: Yifei Gao, Chengpeng Wang, Pengxiang Huang, Xuwei Liu, Mingwei Zheng, Xiangyu Zhang

    Abstract: There has been a growing interest in translating C code to Rust due to Rust's robust memory and thread safety guarantees. Tools such as C2RUST enable syntax-guided transpilation from C to semantically equivalent Rust code. However, the resulting Rust programs often rely heavily on unsafe constructs--particularly raw pointers--which undermines Rust's safety guarantees. This paper aims to improve th… ▽ More

    Submitted 9 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  42. arXiv:2505.04718  [pdf, other

    cs.CV cs.LG

    Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

    Authors: Divyansh Srivastava, Xiang Zhang, He Wen, Chenru Wen, Zhuowen Tu

    Abstract: We present Lay-Your-Scene (shorthand LayouSyn), a novel text-to-layout generation pipeline for natural scenes. Prior scene layout generation methods are either closed-vocabulary or use proprietary large language models for open-vocabulary generation, limiting their modeling capabilities and broader applicability in controllable image generation. In this work, we propose to use lightweight open-sou… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  43. arXiv:2505.04616  [pdf, other

    cs.CV

    Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait

    Authors: Feng Liu, Nicholas Chimitt, Lanqing Guo, Jitesh Jain, Aditya Kane, Minchul Kim, Wes Robbins, Yiyang Su, Dingqiang Ye, Xingguang Zhang, Jie Zhu, Siddharth Satyakam, Christopher Perry, Stanley H. Chan, Arun Ross, Humphrey Shi, Zhangyang Wang, Anil Jain, Xiaoming Liu

    Abstract: We address the problem of whole-body person recognition in unconstrained environments. This problem arises in surveillance scenarios such as those in the IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) program, where biometric data is captured at long standoff distances, elevated viewing angles, and under adverse atmospheric conditions (e.g., turbulence and high wind v… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 18 pages, 12 figures

  44. arXiv:2505.04369  [pdf, other

    cs.CV

    WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image Dehazing

    Authors: Jie Sun, Heng Liu, Yongzhen Wang, Xiao-Ping Zhang, Mingqiang Wei

    Abstract: In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail en… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  45. arXiv:2505.04172  [pdf, other

    eess.IV cs.HC physics.med-ph

    A Dataset and Toolkit for Multiparameter Cardiovascular Physiology Sensing on Rings

    Authors: Jiankai Tang, Kegang Wang, Yingke Ding, Jiatong Ji, Zeyu Wang, Xiyuxing Zhang, Ping Chen, Yuanchun Shi, Yuntao Wang

    Abstract: Smart rings offer a convenient way to continuously and unobtrusively monitor cardiovascular physiological signals. However, a gap remains between the ring hardware and reliable methods for estimating cardiovascular parameters, partly due to the lack of publicly available datasets and standardized analysis tools. In this work, we present $τ$-Ring, the first open-source ring-based dataset designed f… ▽ More

    Submitted 8 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  46. arXiv:2505.04123  [pdf, other

    cs.CR

    A Framework to Prevent Biometric Data Leakage in the Immersive Technologies Domain

    Authors: Keshav Sood, Iynkaran Natgunanathan, Uthayasanker Thayasivam, Vithurabiman Senthuran, Xiaoning Zhang, Shui Yu

    Abstract: Doubtlessly, the immersive technologies have potential to ease people's life and uplift economy, however the obvious data privacy risks cannot be ignored. For example, a participant wears a 3D headset device which detects participant's head motion to track the pose of participant's head to match the orientation of camera with participant's eyes positions in the real-world. In a preliminary study,… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 11 pages, 6 figures

  47. arXiv:2505.04116  [pdf, other

    cs.MM

    RFNNS: Robust Fixed Neural Network Steganography with Popular Deep Generative Models

    Authors: Yu Cheng, Jiuan Zhou, Jiawei Chen, Zhaoxia Yin, Xinpeng Zhang

    Abstract: Image steganography is a technique that conceals secret information in a cover image to achieve covert communication. Recent research has demonstrated that Fixed Neural Network Steganography (FNNS) exhibits significant practical advantages, as it enables stable and efficient steganographic embedding and extraction without requiring neural network training. However, the stego image generated by exi… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  48. arXiv:2505.04113  [pdf, ps, other

    cs.SD eess.AS

    Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment

    Authors: Xueyao Zhang, Yuancheng Wang, Chaoren Wang, Ziniu Li, Zhuo Chen, Zhizheng Wu

    Abstract: Modern zero-shot text-to-speech (TTS) systems, despite using extensive pre-training, often struggle in challenging scenarios such as tongue twisters, repeated words, code-switching, and cross-lingual synthesis, leading to intelligibility issues. To address these limitations, this paper leverages preference alignment techniques, which enable targeted construction of out-of-pretraining-distribution… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  49. arXiv:2505.04068  [pdf, other

    cs.NI eess.SP

    Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications

    Authors: Yuanai Xie, Zhaozhi Liu, Xiao Zhang, Shihua Zhang, Rui Hou, Minrui Xu, Ruichen Zhang, Dusit Niyato

    Abstract: Covert Communications (CC) can secure sensitive transmissions in industrial, military, and mission-critical applications within 6G wireless networks. However, traditional optimization methods based on Artificial Noise (AN), power control, and channel manipulation might not adapt to dynamic and adversarial environments due to the high dimensionality, nonlinearity, and stringent real-time covertness… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  50. arXiv:2505.03896  [pdf, other

    cs.CV cs.AI

    Novel Extraction of Discriminative Fine-Grained Feature to Improve Retinal Vessel Segmentation

    Authors: Shuang Zeng, Chee Hong Lee, Micky C Nnamdi, Wenqi Shi, J Ben Tamo, Lei Zhu, Hangzhou He, Xinliang Zhang, Qian Chen, May D. Wang, Yanye Lu, Qiushi Ren

    Abstract: Retinal vessel segmentation is a vital early detection method for several severe ocular diseases. Despite significant progress in retinal vessel segmentation with the advancement of Neural Networks, there are still challenges to overcome. Specifically, retinal vessel segmentation aims to predict the class label for every pixel within a fundus image, with a primary focus on intra-image discriminati… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.