Skip to main content

Showing 1–50 of 546 results for author: Guo, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06116  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis

    Authors: Xintong Hu, Yixuan Chen, Rui Yang, Wenxiang Guo, Changhao Pan

    Abstract: Automatic speech quality assessment plays a crucial role in the development of speech synthesis systems, but existing models exhibit significant performance variations across different granularity levels of prediction tasks. This paper proposes an enhanced MOS prediction system based on self-supervised learning speech models, incorporating a Mixture of Experts (MoE) classification head and utilizi… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.05653  [pdf, ps, other

    cs.DC

    Archetype-Aware Predictive Autoscaling with Uncertainty Quantification for Serverless Workloads on Kubernetes

    Authors: Guilin Zhang, Srinivas Vippagunta, Raghavendra Nandagopal, Suchitra Raman, Jeff Xu, Marcus Pfeiffer, Shree Chatterjee, Ziqi Tan, Wulan Guo, Hailong Jiang

    Abstract: High-performance extreme computing (HPEC) platforms increasingly adopt serverless paradigms, yet face challenges in efficiently managing highly dynamic workloads while maintaining service-level objectives (SLOs). We propose **AAPA**, an archetype-aware predictive autoscaling system that leverages weak supervision to automatically classify 300\,000\,+ workload windows into four archetypes (PERIODIC… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 6 pages, 4 figures, 1 table. First three authors contributed equally. Correspondence to Hailong Jiang

  3. arXiv:2507.04756  [pdf, ps, other

    cs.CL cs.AI

    CoSteer: Collaborative Decoding-Time Personalization via Local Delta Steering

    Authors: Hang Lv, Sheng Liang, Hao Wang, Hongchao Gu, Yaxiong Wu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen

    Abstract: Personalized text generation has become crucial for adapting language models to diverse and evolving users' personal context across cultural, temporal, and contextual dimensions. While existing methods often rely on centralized fine-tuning or static preference alignment, they struggle to achieve real-time adaptation under resource constraints inherent to personal devices. This limitation creates a… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  4. arXiv:2507.04233  [pdf, ps, other

    eess.IV cs.CV

    Grid-Reg: Grid-Based SAR and Optical Image Registration Across Platforms

    Authors: Xiaochen Wei, Weiwei Guo, Zenghui Zhang, Wenxian Yu

    Abstract: Registering airborne SAR with spaceborne optical images is crucial for SAR image interpretation and geo-localization. It is challenging for this cross-platform heterogeneous image registration due to significant geometric and radiation differences, which current methods fail to handle. To tackle these challenges, we propose a novel grid-based multimodal registration framework (Grid-Reg) across air… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  5. arXiv:2507.02456  [pdf, ps, other

    cs.AR

    System-performance and cost modeling of Large Language Model training and inference

    Authors: Wenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, Manu Perumkunnil

    Abstract: Large language models (LLMs), based on transformer architectures, have revolutionized numerous domains within artificial intelligence, science, and engineering due to their exceptional scalability and adaptability. However, the exponential growth in LLM size and complexity has outpaced advancements in compute capacity, memory bandwidth, network performance, and cost efficiency, posing significant… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  6. arXiv:2507.02029  [pdf, ps, other

    cs.RO

    RoboBrain 2.0 Technical Report

    Authors: BAAI RoboBrain Team, Mingyu Cao, Huajie Tan, Yuheng Ji, Minglan Lin, Zhiyu Li, Zhou Cao, Pengwei Wang, Enshen Zhou, Yi Han, Yingbo Tang, Xiangqi Xu, Wei Guo, Yaoxu Lyu, Yijie Xu, Jiayu Shi, Mengfei Du, Cheng Chi, Mengdi Zhao, Xiaoshuai Hao, Junkai Zhao, Xiaojie Zhang, Sh/anyu Rong, Huaihai Lyu, Zhengliang Cai , et al. (26 additional authors not shown)

    Abstract: We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain… ▽ More

    Submitted 5 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

  7. arXiv:2507.00485  [pdf, ps, other

    cs.LG cs.AI

    PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning

    Authors: Weiran Guo, Guanjun Liu, Ziyuan Zhou, Ling Wang

    Abstract: Reinforcement Learning (RL) is widely used in tasks where agents interact with an environment to maximize rewards. Building on this foundation, Safe Reinforcement Learning (Safe RL) incorporates a cost metric alongside the reward metric, ensuring that agents adhere to safety constraints during decision-making. In this paper, we identify that Safe RL is vulnerable to backdoor attacks, which can man… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2506.23485  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent

    Authors: Haocheng Yu, Yaxiong Wu, Hao Wang, Wei Guo, Yong Liu, Yawen Li, Yuyang Ye, Junping Du, Enhong Chen

    Abstract: Interactive recommendation is a typical information-seeking task that allows users to interactively express their needs through natural language and obtain personalized recommendations. Large language model-powered (LLM-powered) agents have become a new paradigm in interactive recommendations, effectively capturing users' real-time needs and enhancing personalized experiences. However, due to limi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  9. arXiv:2506.21142  [pdf, ps, other

    cs.LG

    Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks

    Authors: Deepak Kumar Panda, Weisi Guo

    Abstract: The growing integration of UAVs into civilian airspace underscores the need for resilient and intelligent intrusion detection systems (IDS), as traditional anomaly detection methods often fail to identify novel threats. A common approach treats unfamiliar attacks as out-of-distribution (OOD) samples; however, this leaves systems vulnerable when mitigation is inadequate. Moreover, conventional OOD… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  10. arXiv:2506.21129  [pdf, ps, other

    cs.LG cs.AI

    Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks

    Authors: Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo

    Abstract: Reinforcement learning (RL) policies deployed in safety-critical systems, such as unmanned aerial vehicle (UAV) navigation in dynamic airspace, are vulnerable to out-ofdistribution (OOD) adversarial attacks in the observation space. These attacks induce distributional shifts that significantly degrade value estimation, leading to unsafe or suboptimal decision making rendering the existing policy f… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  11. arXiv:2506.21127  [pdf, ps, other

    cs.LG cs.AI

    Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments

    Authors: Deepak Kumar Panda, Weisi Guo

    Abstract: The increasing automation of navigation for unmanned aerial vehicles (UAVs) has exposed them to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL) through sensor manipulation. Although existing robust RL methods aim to mitigate such threats, their effectiveness has limited generalization to out-of-distribution shifts from the optimal value distribution, as they are pri… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  12. arXiv:2506.20294  [pdf, ps, other

    cs.CV

    Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations

    Authors: Shunqi Mao, Wei Guo, Chaoyi Zhang, Weidong Cai

    Abstract: Diffusion models have shown strong performance in conditional generation by progressively denoising Gaussian noise toward a target data distribution. This denoising process can be interpreted as a form of hill climbing in a learned latent space, where the model iteratively refines the sample toward regions of higher probability. However, diffusion models often converge to local optima that are loc… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 10 pages, 3 figures, 2 tables

  13. arXiv:2506.17578  [pdf, ps, other

    cs.CL

    AgriCHN: A Comprehensive Cross-domain Resource for Chinese Agricultural Named Entity Recognition

    Authors: Lingxiao Zeng, Yiqi Tong, Wei Guo, Huarui Wu, Lihao Ge, Yijun Ye, Fuzhen Zhuang, Deqing Wang, Wei Guo, Cheng Chen

    Abstract: Agricultural named entity recognition is a specialized task focusing on identifying distinct agricultural entities within vast bodies of text, including crops, diseases, pests, and fertilizers. It plays a crucial role in enhancing information extraction from extensive agricultural text resources. However, the scarcity of high-quality agricultural datasets, particularly in Chinese, has resulted in… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  14. arXiv:2506.13552  [pdf, ps, other

    cs.CV

    A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects

    Authors: Guohuan Xie, Syed Ariff Syed Hesham, Wenya Guo, Bing Li, Ming-Ming Cheng, Guolei Sun, Yun Liu

    Abstract: Video Scene Parsing (VSP) has emerged as a cornerstone in computer vision, facilitating the simultaneous segmentation, recognition, and tracking of diverse visual entities in dynamic scenes. In this survey, we present a holistic review of recent advances in VSP, covering a wide array of vision tasks, including Video Semantic Segmentation (VSS), Video Instance Segmentation (VIS), Video Panoptic Seg… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  15. arXiv:2506.12401  [pdf, ps, other

    cs.CV

    Feature Complementation Architecture for Visual Place Recognition

    Authors: Weiwei Wang, Meijia Wang, Haoyi Wang, Wenqiang Guo, Jiapan Guo, Changming Sun, Lingkun Ma, Weichuan Zhang

    Abstract: Visual place recognition (VPR) plays a crucial role in robotic localization and navigation. The key challenge lies in constructing feature representations that are robust to environmental changes. Existing methods typically adopt convolutional neural networks (CNNs) or vision Transformers (ViTs) as feature extractors. However, these architectures excel in different aspects -- CNNs are effective at… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  16. arXiv:2506.10774  [pdf, ps, other

    cs.CV cs.AI

    Stroke-based Cyclic Amplifier: Image Super-Resolution at Arbitrary Ultra-Large Scales

    Authors: Wenhao Guo, Peng Lu, Xujun Peng, Zhaoran Zhao, Sheng Li

    Abstract: Prior Arbitrary-Scale Image Super-Resolution (ASISR) methods often experience a significant performance decline when the upsampling factor exceeds the range covered by the training data, introducing substantial blurring. To address this issue, we propose a unified model, Stroke-based Cyclic Amplifier (SbCA), for ultra-large upsampling tasks. The key of SbCA is the stroke vector amplifier, which de… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  17. arXiv:2506.09327  [pdf, ps, other

    cs.CV

    MSSDF: Modality-Shared Self-supervised Distillation for High-Resolution Multi-modal Remote Sensing Image Learning

    Authors: Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Jiaqi Wang, Xiaoliang Tan, Wenchao Guo, Qingyuan Yang, Kaiqi Zhang

    Abstract: Remote sensing image interpretation plays a critical role in environmental monitoring, urban planning, and disaster assessment. However, acquiring high-quality labeled data is often costly and time-consuming. To address this challenge, we proposes a multi-modal self-supervised learning framework that leverages high-resolution RGB images, multi-spectral data, and digital surface models (DSM) for pr… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  18. arXiv:2506.07915  [pdf, other

    cs.AI cs.CL eess.SY

    LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement

    Authors: Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo

    Abstract: In dynamic environments, the rapid obsolescence of pre-existing environmental knowledge creates a gap between an agent's internal model and the evolving reality of its operational context. This disparity between prior and updated environmental valuations fundamentally limits the effectiveness of autonomous decision-making. To bridge this gap, the contextual bias of human domain stakeholders, who n… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 12 pages, 4 Figures, 3 Tables, submitted to the IEEE for possible publication

  19. arXiv:2506.06786  [pdf, ps, other

    cs.AI

    Learning What Matters Now: A Dual-Critic Context-Aware RL Framework for Priority-Driven Information Gain

    Authors: Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo

    Abstract: Autonomous systems operating in high-stakes search-and-rescue (SAR) missions must continuously gather mission-critical information while flexibly adapting to shifting operational priorities. We propose CA-MIQ (Context-Aware Max-Information Q-learning), a lightweight dual-critic reinforcement learning (RL) framework that dynamically adjusts its exploration strategy whenever mission priorities chang… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 6 pages, 2 figures, 3 tables, submitted as a regural paper to IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2025

  20. arXiv:2506.03576  [pdf, ps, other

    cs.CL cs.AI

    KG-BiLM: Knowledge Graph Embedding via Bidirectional Language Models

    Authors: Zirui Chen, Xin Wang, Zhao Li, Wenbin Guo, Dongxiao He

    Abstract: Recent advances in knowledge representation learning (KRL) highlight the urgent necessity to unify symbolic knowledge graphs (KGs) with language models (LMs) for richer semantic understanding. However, existing approaches typically prioritize either graph structure or textual semantics, leaving a gap: a unified framework that simultaneously captures global KG connectivity, nuanced linguistic conte… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  21. arXiv:2506.03337  [pdf, ps, other

    cs.LG cs.AI

    Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity

    Authors: Yide Ran, Wentao Guo, Jingwei Sun, Yanzhou Pan, Xiaodong Yu, Hao Wang, Jianwen Xie, Yiran Chen, Denghui Zhang, Zhaozhuo Xu

    Abstract: Federated Learning enables collaborative fine-tuning of Large Language Models (LLMs) across decentralized Non-Independent and Identically Distributed (Non-IID) clients, but such models' massive parameter sizes lead to significant memory and communication challenges. This work introduces Meerkat, a sparse zeroth-order optimization (ZO) method designed for federated LLM fine-tuning. By limiting fine… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 56 pages, 11 figures

  22. arXiv:2506.02467  [pdf, other

    eess.IV cs.CV

    Multi-modal brain MRI synthesis based on SwinUNETR

    Authors: Haowen Pang, Weiyan Guo, Chuyang Ye

    Abstract: Multi-modal brain magnetic resonance imaging (MRI) plays a crucial role in clinical diagnostics by providing complementary information across different imaging modalities. However, a common challenge in clinical practice is missing MRI modalities. In this paper, we apply SwinUNETR to the synthesize of missing modalities in brain MRI. SwinUNETR is a novel neural network architecture designed for me… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures

  23. arXiv:2506.00782  [pdf, other

    cs.AI

    Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning

    Authors: Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li

    Abstract: As large language models (LLMs) grow in power and influence, ensuring their safety and preventing harmful output becomes critical. Automated red teaming serves as a tool to detect security vulnerabilities in LLMs without manual labor. However, most existing methods struggle to balance the effectiveness and diversity of red-team generated attack prompts. To address this challenge, we propose \ourap… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 21 pages, 8 figures

  24. arXiv:2505.21411  [pdf, ps, other

    cs.CL

    Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

    Authors: Yehui Tang, Xiaosong Li, Fangcheng Liu, Wei Guo, Hang Zhou, Yaoyuan Wang, Kai Han, Xianzhi Yu, Jinpeng Li, Hui Zang, Fei Mi, Xiaojun Meng, Zhicheng Liu, Hanting Chen, Binfan Zheng, Can Chen, Youliang Yan, Ruiming Tang, Peifeng Qin, Xinghao Chen, Dacheng Tao, Yunhe Wang

    Abstract: The surgence of Mixture of Experts (MoE) in Large Language Models promises a small price of execution cost for a much larger model parameter count and learning capacity, because only a small fraction of parameters are activated for each input token. However, it is commonly observed that some experts are activated far more often than others, leading to system inefficiency when running the experts o… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  25. arXiv:2505.20925  [pdf, ps, other

    cs.CL cs.AI

    Multi-objective Large Language Model Alignment with Hierarchical Experts

    Authors: Zhuo Li, Guodong Du, Weiyang Guo, Yigeng Zhou, Xiucheng Li, Wenya Wang, Fangming Liu, Yequan Wang, Deheng Ye, Min Zhang, Jing Li

    Abstract: Aligning large language models (LLMs) to simultaneously satisfy multiple objectives remains a significant challenge, especially given the diverse and often conflicting nature of human preferences. Existing alignment methods struggle to balance trade-offs effectively, often requiring costly retraining or yielding suboptimal results across the Pareto frontier of preferences. In this paper, we introd… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  26. DLF: Enhancing Explicit-Implicit Interaction via Dynamic Low-Order-Aware Fusion for CTR Prediction

    Authors: Kefan Wang, Hao Wang, Wei Guo, Yong Liu, Jianghao Lin, Defu Lian, Enhong Chen

    Abstract: Click-through rate (CTR) prediction is a critical task in online advertising and recommender systems, relying on effective modeling of feature interactions. Explicit interactions capture predefined relationships, such as inner products, but often suffer from data sparsity, while implicit interactions excel at learning complex patterns through non-linear transformations but lack inductive biases fo… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)

  27. arXiv:2505.18955  [pdf, ps, other

    cs.AI cs.CR cs.SE

    Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models

    Authors: Yuheng Tang, Hongwei Li, Kaijie Zhu, Michael Yang, Yangruibo Ding, Wenbo Guo

    Abstract: Motivated by the success of general-purpose large language models (LLMs) in software patching, recent works started to train specialized patching models. Most works trained one model to handle the end-to-end patching pipeline (including issue localization, patch generation, and patch validation). However, it is hard for a small model to handle all tasks, as different sub-tasks have different workf… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  28. arXiv:2505.18719  [pdf, ps, other

    cs.RO cs.AI

    VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

    Authors: Guanxing Lu, Wenkai Guo, Chubin Zhang, Yuheng Zhou, Haonan Jiang, Zifeng Gao, Yansong Tang, Ziwei Wang

    Abstract: Recent high-capacity vision-language-action (VLA) models have demonstrated impressive performance on a range of robotic manipulation tasks by imitating human demonstrations. However, exploiting offline data with limited visited states will cause execution failure in out-of-distribution scenarios. Intuitively, an exploration-based method that improves on online collected data at test time could add… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  29. arXiv:2505.17147  [pdf, other

    cs.CR cs.AI

    MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming

    Authors: Weiyang Guo, Jing Li, Wenya Wang, YU LI, Daojing He, Jun Yu, Min Zhang

    Abstract: The proliferation of jailbreak attacks against large language models (LLMs) highlights the need for robust security measures. However, in multi-round dialogues, malicious intentions may be hidden in interactions, leading LLMs to be more prone to produce harmful responses. In this paper, we propose the \textbf{M}ulti-\textbf{T}urn \textbf{S}afety \textbf{A}lignment (\ourapproach) framework, to addr… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 19 pages,6 figures,ACL2025

  30. arXiv:2505.17011  [pdf, ps, other

    cs.CV

    Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

    Authors: Yan Li, Changyao Tian, Renqiu Xia, Ning Liao, Weiwei Guo, Junchi Yan, Hongsheng Li, Jifeng Dai, Hao Li, Xue Yang

    Abstract: We propose AdapTok, an adaptive temporal causal video tokenizer that can flexibly allocate tokens for different frames based on video content. AdapTok is equipped with a block-wise masking strategy that randomly drops tail tokens of each block during training, and a block causal scorer to predict the reconstruction quality of video frames using different numbers of tokens. During inference, an ada… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/VisionXLab/AdapTok

  31. arXiv:2505.14916  [pdf

    eess.IV cs.CV

    Super-Resolution Optical Coherence Tomography Using Diffusion Model-Based Plug-and-Play Priors

    Authors: Yaning Wang, Jinglun Yu, Wenhan Guo, Yu Sun, Jin U. Kang

    Abstract: We propose an OCT super-resolution framework based on a plug-and-play diffusion model (PnP-DM) to reconstruct high-quality images from sparse measurements (OCT B-mode corneal images). Our method formulates reconstruction as an inverse problem, combining a diffusion prior with Markov chain Monte Carlo sampling for efficient posterior inference. We collect high-speed under-sampled B-mode corneal ima… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  32. arXiv:2505.14910  [pdf, ps, other

    eess.AS cs.CL cs.SD

    TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Dongyu Yao, Zhiyuan Zhu, Ziyue Jiang, Yuhan Wang, Tao Jin, Zhou Zhao

    Abstract: Customizable multilingual zero-shot singing voice synthesis (SVS) has various potential applications in music composition and short video dubbing. However, existing SVS models overly depend on phoneme and note boundary annotations, limiting their robustness in zero-shot scenarios and producing poor transitions between phonemes and notes. Moreover, they also lack effective multi-level style control… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by Findings of ACL 2025

  33. arXiv:2505.14560  [pdf, ps, other

    eess.IV cs.CV

    Neural Inverse Scattering with Score-based Regularization

    Authors: Yuan Gao, Wenhan Guo, Yu Sun

    Abstract: Inverse scattering is a fundamental challenge in many imaging applications, ranging from microscopy to remote sensing. Solving this problem often requires jointly estimating two unknowns -- the image and the scattering field inside the object -- necessitating effective image prior to regularize the inference. In this paper, we propose a regularized neural field (NF) approach which integrates the d… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  34. arXiv:2505.14135  [pdf, other

    cs.CV

    Hunyuan-Game: Industrial-grade Intelligent Game Creation Model

    Authors: Ruihuang Li, Caijin Zhou, Shoujian Zheng, Jianxiang Lu, Jiabin Huang, Comi Chen, Junshu Tang, Guangzheng Xu, Jiale Tao, Hongmei Wang, Donghao Li, Wenqing Yu, Senbo Wang, Zhimin Li, Yetshuan Shi, Haoyu Yang, Yukun Wang, Wenxun Dai, Jiaqi Li, Linqing Wang, Qixun Wang, Zhiyong Xu, Yingfang Zhang, Jiangfeng Xiong, Weijie Kong , et al. (33 additional authors not shown)

    Abstract: Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simult… ▽ More

    Submitted 28 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  35. arXiv:2505.12754  [pdf, other

    cs.LG

    ProDS: Preference-oriented Data Selection for Instruction Tuning

    Authors: Wenya Guo, Zhengkun Zhang, Xumeng Liu, Ying Zhang, Ziyu Lu, Haoze Zhu, Xubo Liu, Ruxue Yan

    Abstract: Instruction data selection aims to identify a high-quality subset from the training set that matches or exceeds the performance of the full dataset on target tasks. Existing methods focus on the instruction-to-response mapping, but neglect the human preference for diverse responses. In this paper, we propose Preference-oriented Data Selection method (ProDS) that scores training samples based on th… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  36. arXiv:2505.08453  [pdf, ps, other

    cs.RO cs.LG

    Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges

    Authors: Miguel Arana-Catania, Weisi Guo

    Abstract: Causal understanding is important in many disciplines of science and engineering, where we seek to understand how different factors in the system causally affect an experiment or situation and pave a pathway towards creating effective or optimising existing models. Examples of use cases are autonomous exploration and modelling of unknown environments or assessing key variables in optimising large… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 24 pages, 10 figures, 9 tables

  37. arXiv:2505.06335  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients

    Authors: Jinsheng Yuan, Yuhang Hao, Weisi Guo, Yun Wu, Chongyan Gu

    Abstract: Federated Learning (FL) has the potential for simultaneous global learning amongst a large number of parallel agents, enabling emerging AI such as LLMs to be trained across demographically diverse data. Central to this being efficient is the ability for FL to perform sparse gradient updates and remote direct memory access at the central server. Most of the research in FL security focuses on protec… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  38. arXiv:2505.05849  [pdf, ps, other

    cs.CR cs.AI

    AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

    Authors: Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song

    Abstract: The strong planning and reasoning capabilities of Large Language Models (LLMs) have fostered the development of agent-based systems capable of leveraging external tools and interacting with increasingly complex environments. However, these powerful features also introduce a critical security risk: indirect prompt injection, a sophisticated attack vector that compromises the core of these agents, t… ▽ More

    Submitted 13 June, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

  39. arXiv:2505.05595  [pdf, ps, other

    q-fin.TR cs.AI cs.CE cs.LG

    Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer

    Authors: Wenhao Guo, Yuda Wang, Zeqiao Huang, Changjiang Zhang, Shumin ma

    Abstract: In the complex landscape of traditional futures trading, where vast data and variables like real-time Limit Order Books (LOB) complicate price predictions, we introduce the FutureQuant Transformer model, leveraging attention mechanisms to navigate these challenges. Unlike conventional models focused on point predictions, the FutureQuant model excels in forecasting the range and volatility of futur… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 16 pages, 12 figures

  40. arXiv:2505.04936  [pdf, other

    cs.IT eess.SP

    Fluid Antenna-Assisted MU-MIMO Systems with Decentralized Baseband Processing

    Authors: Tianyi Liao, Wei Guo, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: The fluid antenna system (FAS) has emerged as a disruptive technology, offering unprecedented degrees of freedom (DoF) for wireless communication systems. However, optimizing fluid antenna (FA) positions entails significant computational costs, especially when the number of FAs is large. To address this challenge, we introduce a decentralized baseband processing (DBP) architecture to FAS, which pa… ▽ More

    Submitted 12 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 7 pages, 5 figures, submitted to an IEEE conference

  41. arXiv:2505.04930  [pdf, ps, other

    cs.IT eess.SP

    Accurate and Fast Channel Estimation for Fluid Antenna Systems with Diffusion Models

    Authors: Erqiang Tang, Wei Guo, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Fluid antenna systems (FAS) offer enhanced spatial diversity for next-generation wireless systems. However, acquiring accurate channel state information (CSI) remains challenging due to the large number of reconfigurable ports and the limited availability of radio-frequency (RF) chains -- particularly in high-dimensional FAS scenarios. To address this challenge, we propose an efficient posterior s… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 6 pages, 5 figures, submitted to an IEEE conference

  42. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  43. arXiv:2505.01431  [pdf, other

    cs.CV

    ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation

    Authors: Wenqi Guo, Shan Du

    Abstract: Camouflaged object segmentation presents unique challenges compared to traditional segmentation tasks, primarily due to the high similarity in patterns and colors between camouflaged objects and their backgrounds. Effective solutions to this problem have significant implications in critical areas such as pest control, defect detection, and lesion segmentation in medical imaging. Prior research has… ▽ More

    Submitted 10 April, 2025; originally announced May 2025.

  44. arXiv:2504.21568  [pdf, other

    cs.AI

    A Study on Group Decision Making Problem Based on Fuzzy Reasoning and Bayesian Networks

    Authors: Shui-jin Rong, Wei Guo, Da-qing Zhang

    Abstract: Aiming at the group decision - making problem with multi - objective attributes, this study proposes a group decision - making system that integrates fuzzy inference and Bayesian network. A fuzzy rule base is constructed by combining threshold values, membership functions, expert experience, and domain knowledge to address quantitative challenges such as scale differences and expert linguistic var… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  45. arXiv:2504.20630  [pdf, ps, other

    eess.AS cs.MM cs.SD

    ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Tao Jin, Zhou Zhao

    Abstract: Multimodal immersive spatial drama generation focuses on creating continuous multi-speaker binaural speech with dramatic prosody based on multimodal prompts, with potential applications in AR, VR, and others. This task requires simultaneous modeling of spatial information and dramatic prosody based on multimodal inputs, with high data collection costs. To the best of our knowledge, our work is the… ▽ More

    Submitted 30 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  46. arXiv:2504.19062  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Versatile Framework for Song Generation with Prompt-based Control

    Authors: Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Ruiqi Li, Jingyu Lu, Rongjie Huang, Ruiyuan Zhang, Zhiqing Hong, Ziyue Jiang, Zhou Zhao

    Abstract: Song generation focuses on producing controllable high-quality songs based on various prompts. However, existing methods struggle to generate vocals and accompaniments with prompt-based control and proper alignment. Additionally, they fall short in supporting various tasks. To address these challenges, we introduce VersBand, a multi-task song generation framework for synthesizing high-quality, ali… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

  47. arXiv:2504.16454  [pdf, other

    cs.IR

    Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model

    Authors: Luankang Zhang, Kenan Song, Yi Quan Lee, Wei Guo, Hao Wang, Yawen Li, Huifeng Guo, Yong Liu, Defu Lian, Enhong Chen

    Abstract: In recommendation systems, the traditional multi-stage paradigm, which includes retrieval and ranking, often suffers from information loss between stages and diminishes performance. Recent advances in generative models, inspired by natural language processing, suggest the potential for unifying these stages to mitigate such loss. This paper presents the Unified Generative Recommendation Framework… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted at SIGIR 2025

  48. arXiv:2504.16320  [pdf, ps, other

    cs.RO cs.LG

    PCF-Grasp: Converting Point Completion to Geometry Feature to Enhance 6-DoF Grasp

    Authors: Yaofeng Cheng, Fusheng Zha, Wei Guo, Pengfei Wang, Chao Zeng, Lining Sun, Chenguang Yang

    Abstract: The 6-Degree of Freedom (DoF) grasp method based on point clouds has shown significant potential in enabling robots to grasp target objects. However, most existing methods are based on the point clouds (2.5D points) generated from single-view depth images. These point clouds only have one surface side of the object providing incomplete geometry information, which mislead the grasping algorithm to… ▽ More

    Submitted 26 June, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

  49. arXiv:2504.14462  [pdf, other

    cs.CL

    CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge

    Authors: Armin Toroghi, Willis Guo, Scott Sanner

    Abstract: The rise of Large Language Models (LLMs) has redefined the AI landscape, particularly due to their ability to encode factual and commonsense knowledge, and their outstanding performance in tasks requiring reasoning. Despite these advances, hallucinations and reasoning errors remain a significant barrier to their deployment in high-stakes settings. In this work, we observe that even the most promin… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  50. arXiv:2504.13916  [pdf, other

    cs.HC cs.RO

    Task Matters: Investigating Human Questioning Behavior in Different Household Service for Learning by Asking Robots

    Authors: Yuanda Hu, Hou Jiani, Zhang Junyu, Yate Ge, Xiaohua Sun, Weiwei Guo

    Abstract: Learning by Asking (LBA) enables robots to identify knowledge gaps during task execution and acquire the missing information by asking targeted questions. However, different tasks often require different types of questions, and how to adapt questioning strategies accordingly remains underexplored. This paper investigates human questioning behavior in two representative household service tasks: a G… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.