Skip to main content

Showing 1–50 of 601 results for author: Guo, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.23566  [pdf, ps, other

    cs.CV

    Towards Interpretable Visual Decoding with Attention to Brain Representations

    Authors: Pinyuan Feng, Hossein Adeli, Wenxuan Guo, Fan Cheng, Ethan Hwang, Nikolaus Kriegeskorte

    Abstract: Recent work has demonstrated that complex visual stimuli can be decoded from human brain activity using deep generative models, helping brain science researchers interpret how the brain represents real-world scenes. However, most current approaches leverage mapping brain signals into intermediate image or text feature spaces before guiding the generative process, masking the effect of contribution… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: 10 pages, 7 figures, under review

    ACM Class: I.2.0; I.4.9

  2. arXiv:2509.23089  [pdf, ps, other

    cs.LG cs.NI

    Demystifying Network Foundation Models

    Authors: Sylee, Beltiukov, Satyandra Guthula, Wenbo Guo, Walter Willinger, Arpit Gupta

    Abstract: This work presents a systematic investigation into the latent knowledge encoded within Network Foundation Models (NFMs) that focuses on hidden representations analysis rather than pure downstream task performance. Different from existing efforts, we analyze the models through a three-part evaluation: Embedding Geometry Analysis to assess representation space utilization, Metric Alignment Assessmen… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  3. arXiv:2509.22643  [pdf, ps, other

    cs.RO

    VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search

    Authors: Wenkai Guo, Guanxing Lu, Haoyuan Deng, Zhenyu Wu, Yansong Tang, Ziwei Wang

    Abstract: Vision-Language-Action models (VLAs) achieve strong performance in general robotic manipulation tasks by scaling imitation learning. However, existing VLAs are limited to predicting short-sighted next-action, which struggle with long-horizon trajectory tasks due to incremental deviations. To address this problem, we propose a plug-in framework named VLA-Reasoner that effectively empowers off-the-s… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 9 pages

  4. arXiv:2509.21747  [pdf, ps, other

    cs.CV

    Incorporating Scene Context and Semantic Labels for Enhanced Group-level Emotion Recognition

    Authors: Qing Zhu, Wangdong Guo, Qirong Mao, Xiaohua Huang, Xiuyan Shao, Wenming Zheng

    Abstract: Group-level emotion recognition (GER) aims to identify holistic emotions within a scene involving multiple individuals. Current existed methods underestimate the importance of visual scene contextual information in modeling individual relationships. Furthermore, they overlook the crucial role of semantic information from emotional labels for complete understanding of emotions. To address this limi… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 10 pages, 5figures, submitted to IEEE Transactions on Human-Machine Systems

  5. arXiv:2509.20883  [pdf, ps, other

    cs.IR cs.DC cs.LG

    RecIS: Sparse to Dense, A Unified Training Framework for Recommendation Models

    Authors: Hua Zong, Qingtao Zeng, Zhengxiong Zhou, Zhihua Han, Zhensong Yan, Mingjie Liu, Hechen Sun, Jiawei Liu, Yiwen Hu, Qi Wang, YiHan Xian, Wenjie Guo, Houyuan Xiang, Zhiyuan Zeng, Xiangrong Sheng, Bencheng Yan, Nan Hu, Yuheng Huang, Jinqing Lian, Ziru Xu, Yan Zhang, Ju Huang, Siran Yang, Huimin Yi, Jiamang Wang , et al. (9 additional authors not shown)

    Abstract: In this paper, we propose RecIS, a unified Sparse-Dense training framework designed to achieve two primary goals: 1. Unified Framework To create a Unified sparse-dense training framework based on the PyTorch ecosystem that meets the training needs of industrial-grade recommendation models that integrated with large models. 2.System Optimization To optimize the sparse component, offering superior e… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  6. arXiv:2509.20857  [pdf, ps, other

    cs.CV cs.AI

    TasselNetV4: A vision foundation model for cross-scene, cross-scale, and cross-species plant counting

    Authors: Xiaonan Hu, Xuebing Li, Jinyu Xu, Abdulkadir Duran Adan, Letian Zhou, Xuhui Zhu, Yanan Li, Wei Guo, Shouyang Liu, Wenzhong Liu, Hao Lu

    Abstract: Accurate plant counting provides valuable information for agriculture such as crop yield prediction, plant density assessment, and phenotype quantification. Vision-based approaches are currently the mainstream solution. Prior art typically uses a detection or a regression model to count a specific plant. However, plants have biodiversity, and new cultivars are increasingly bred each year. It is al… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: 13 figures, 7 tables, code is available at https://github.com/tiny-smart/tasselnetv4

  7. arXiv:2509.20680  [pdf, ps, other

    cs.LG cs.CL cs.CR

    Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

    Authors: Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, Jing Yuan

    Abstract: Fine-tuning large language models (LLMs) with local data is a widely adopted approach for organizations seeking to adapt LLMs to their specific domains. Given the shared characteristics in data across different organizations, the idea of collaboratively fine-tuning an LLM using data from multiple sources presents an appealing opportunity. However, organizations are often reluctant to share local d… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 28 pages, 32 figures, accepted to the Findings of EMNLP 2025

  8. arXiv:2509.18957  [pdf, ps, other

    cs.DC

    TD3-Sched: Learning to Orchestrate Container-based Cloud-Edge Resources via Distributed Reinforcement Learning

    Authors: Shengye Song, Minxian Xu, Kan Hu, Wenxia Guo, Kejiang Ye

    Abstract: Resource scheduling in cloud-edge systems is challenging as edge nodes run latency-sensitive workloads under tight resource constraints, while existing centralized schedulers can suffer from performance bottlenecks and user experience degradation. To address the issues of distributed decisions in cloud-edge environments, we present TD3-Sched, a distributed reinforcement learning (DRL) scheduler ba… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: 14 pages, 5 figures

    Journal ref: PDCAT 2025

  9. arXiv:2509.18898  [pdf, ps, other

    cs.CV

    DeblurSplat: SfM-free 3D Gaussian Splatting with Event Camera for Robust Deblurring

    Authors: Pengteng Li, Yunfan Lu, Pinhao Song, Weiyu Guo, Huizai Yao, F. Richard Yu, Hui Xiong

    Abstract: In this paper, we propose the first Structure-from-Motion (SfM)-free deblurring 3D Gaussian Splatting method via event camera, dubbed DeblurSplat. We address the motion-deblurring problem in two ways. First, we leverage the pretrained capability of the dense stereo module (DUSt3R) to directly obtain accurate initial point clouds from blurred images. Without calculating camera poses as an intermedi… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

  10. arXiv:2509.18606  [pdf, ps, other

    eess.AS cs.AI cs.SD

    FlexSED: Towards Open-Vocabulary Sound Event Detection

    Authors: Jiarui Hai, Helin Wang, Weizhe Guo, Mounya Elhilali

    Abstract: Despite recent progress in large-scale sound event detection (SED) systems capable of handling hundreds of sound classes, existing multi-class classification frameworks remain fundamentally limited. They cannot process free-text sound queries, which enable more flexible and user-friendly interaction, and they lack zero-shot capabilities and offer poor few-shot adaptability. Although text-query-bas… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

  11. arXiv:2509.16950  [pdf, ps, other

    cs.CR

    Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving

    Authors: Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang

    Abstract: Assessing the safety of autonomous driving (AD) systems against security threats, particularly backdoor attacks, is a stepping stone for real-world deployment. However, existing works mainly focus on pixel-level triggers that are impractical to deploy in the real world. We address this gap by introducing a novel backdoor attack against the end-to-end AD systems that leverage one or more other vehi… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

  12. arXiv:2509.16087  [pdf, ps, other

    cs.CV cs.AI

    See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model

    Authors: Pengteng Li, Pinhao Song, Wuyang Li, Weiyu Guo, Huizai Yao, Yijie Xu, Dugang Liu, Hui Xiong

    Abstract: We introduce SEE&TREK, the first training-free prompting framework tailored to enhance the spatial understanding of Multimodal Large Language Models (MLLMS) under vision-only constraints. While prior efforts have incorporated modalities like depth or point clouds to improve spatial reasoning, purely visualspatial understanding remains underexplored. SEE&TREK addresses this gap by focusing on two c… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

    Comments: Accepted by NeurIPS 2025

  13. arXiv:2509.15507  [pdf, ps, other

    cs.RO

    STARC: See-Through-Wall Augmented Reality Framework for Human-Robot Collaboration in Emergency Response

    Authors: Shenghai Yuan, Weixiang Guo, Tianxin Hu, Yu Yang, Jinyu Chen, Rui Qian, Zhongyuan Liu, Lihua Xie

    Abstract: In emergency response missions, first responders must navigate cluttered indoor environments where occlusions block direct line-of-sight, concealing both life-threatening hazards and victims in need of rescue. We present STARC, a see-through AR framework for human-robot collaboration that fuses mobile-robot mapping with responder-mounted LiDAR sensing. A ground robot running LiDAR-inertial odometr… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  14. arXiv:2509.15062  [pdf, ps, other

    cs.RO

    Energy-Constrained Navigation for Planetary Rovers under Hybrid RTG-Solar Power

    Authors: Tianxin Hu, Weixiang Guo, Ruimeng Liu, Xinhang Xu, Rui Qian, Jinyu Chen, Shenghai Yuan, Lihua Xie

    Abstract: Future planetary exploration rovers must operate for extended durations on hybrid power inputs that combine steady radioisotope thermoelectric generator (RTG) output with variable solar photovoltaic (PV) availability. While energy-aware planning has been studied for aerial and underwater robots under battery limits, few works for ground rovers explicitly model power flow or enforce instantaneous p… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  15. arXiv:2509.14915  [pdf, ps, other

    cs.RO

    PERAL: Perception-Aware Motion Control for Passive LiDAR Excitation in Spherical Robots

    Authors: Shenghai Yuan, Jason Wai Hao Yee, Weixiang Guo, Zhongyuan Liu, Thien-Minh Nguyen, Lihua Xie

    Abstract: Autonomous mobile robots increasingly rely on LiDAR-IMU odometry for navigation and mapping, yet horizontally mounted LiDARs such as the MID360 capture few near-ground returns, limiting terrain awareness and degrading performance in feature-scarce environments. Prior solutions - static tilt, active rotation, or high-density sensors - either sacrifice horizontal perception or incur added actuators,… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

  16. arXiv:2509.13784  [pdf, ps, other

    cs.CV

    CETUS: Causal Event-Driven Temporal Modeling With Unified Variable-Rate Scheduling

    Authors: Hanfang Liang, Bing Wang, Shizhen Zhang, Wen Jiang, Yizhuo Yang, Weixiang Guo, Shenghai Yuan

    Abstract: Event cameras capture asynchronous pixel-level brightness changes with microsecond temporal resolution, offering unique advantages for high-speed vision tasks. Existing methods often convert event streams into intermediate representations such as frames, voxel grids, or point clouds, which inevitably require predefined time windows and thus introduce window latency. Meanwhile, pointwise detection… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: 8 pages, 6 figures

  17. arXiv:2509.10454  [pdf, ps, other

    cs.RO cs.CV

    GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation

    Authors: Hang Yin, Haoyu Wei, Xiuwei Xu, Wenxuan Guo, Jie Zhou, Jiwen Lu

    Abstract: In this paper, we propose a training-free framework for vision-and-language navigation (VLN). Existing zero-shot VLN methods are mainly designed for discrete environments or involve unsupervised training in continuous simulator environments, which makes it challenging to generalize and deploy them in real-world scenarios. To achieve a training-free framework in continuous environments, our framewo… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: Accepted to CoRL 2025. Project page: [this https URL](https://bagh2178.github.io/GC-VLN/)

  18. arXiv:2509.08833  [pdf, ps, other

    cs.CY

    Position: The Pitfalls of Over-Alignment: Overly Caution Health-Related Responses From LLMs are Unethical and Dangerous

    Authors: Wenqi Marshall Guo, Yiyang Du, Heidi J. S. Tworek, Shan Du

    Abstract: Large Language Models (LLMs) are usually aligned with "human values/preferences" to prevent harmful output. Discussions around the alignment of Large Language Models (LLMs) generally focus on preventing harmful outputs. However, in this paper, we argue that in health-related queries, over-alignment-leading to overly cautious responses-can itself be harmful, especially for people with anxiety and o… ▽ More

    Submitted 27 August, 2025; originally announced September 2025.

  19. arXiv:2509.08748  [pdf, ps, other

    cs.CR

    Prototype-Guided Robust Learning against Backdoor Attacks

    Authors: Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio

    Abstract: Backdoor attacks poison the training data to embed a backdoor in the model, causing it to behave normally on legitimate inputs but maliciously when specific trigger signals appear. Training a benign model from a dataset poisoned by backdoor attacks is challenging. Existing works rely on various assumptions and can only defend against backdoor attacks with specific trigger signals, high poisoning r… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  20. arXiv:2509.08747  [pdf, ps, other

    cs.CR

    Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

    Authors: Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio

    Abstract: In the deployment phase, semi-structured sparsity accelerates the execution of deep neural networks on modern GPUs via sparse matrix multiplication. In this paper, targeting the semi-structured sparsity, we introduce a Silent Until Sparse (SUS) backdoor attack, where the released full model remains silent (benign), but becomes a backdoored model after sparsification. The attack operates in two pha… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  21. arXiv:2509.07493  [pdf, ps, other

    cs.CV cs.CG

    Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning

    Authors: Wenzhi Guo, Bing Wang

    Abstract: 3D Gaussian Splatting (3DGS) has recently emerged as a powerful paradigm for photorealistic view synthesis, representing scenes with spatially distributed Gaussian primitives. While highly effective for rendering, achieving accurate and complete surface reconstruction remains challenging due to the unstructured nature of the representation and the absence of explicit geometric supervision. In this… ▽ More

    Submitted 21 September, 2025; v1 submitted 9 September, 2025; originally announced September 2025.

  22. arXiv:2509.06907  [pdf

    cs.CV

    FoMo4Wheat: Toward reliable crop vision foundation models with globally curated data

    Authors: Bing Han, Chen Zhu, Dong Han, Rui Yu, Songliang Cao, Jianhui Wu, Scott Chapman, Zijian Wang, Bangyou Zheng, Wei Guo, Marie Weiss, Benoit de Solan, Andreas Hund, Lukas Roth, Kirchgessner Norbert, Andrea Visioni, Yufeng Ge, Wenjuan Li, Alexis Comar, Dong Jiang, Dejun Han, Fred Baret, Yanfeng Ding, Hao Lu, Shouyang Liu

    Abstract: Vision-driven field monitoring is central to digital agriculture, yet models built on general-domain pretrained backbones often fail to generalize across tasks, owing to the interaction of fine, variable canopy structures with fluctuating field conditions. We present FoMo4Wheat, one of the first crop-domain vision foundation model pretrained with self-supervision on ImAg4Wheat, the largest and mos… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  23. arXiv:2509.04273  [pdf, ps, other

    cs.CV

    Dual-Scale Volume Priors with Wasserstein-Based Consistency for Semi-Supervised Medical Image Segmentation

    Authors: Junying Meng, Gangxuan Zhou, Jun Liu, Weihong Guo

    Abstract: Despite signi cant progress in semi-supervised medical image segmentation, most existing segmentation networks overlook e ective methodological guidance for feature extraction and important prior information from datasets. In this paper, we develop a semi-supervised medical image segmentation framework that e ectively integrates spatial regularization methods and volume priors. Speci cally, our… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  24. arXiv:2509.01835  [pdf, ps, other

    cs.CR

    From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs

    Authors: Saad Ullah, Praneeth Balasubramanian, Wenbo Guo, Amanda Burnett, Hammond Pearce, Christopher Kruegel, Giovanni Vigna, Gianluca Stringhini

    Abstract: High-quality datasets of real-world vulnerabilities and their corresponding verifiable exploits are crucial resources in software security research. Yet such resources remain scarce, as their creation demands intensive manual effort and deep security expertise. In this paper, we present CVE-GENIE, an automated, large language model (LLM)-based multi-agent framework designed to reproduce real-world… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

  25. arXiv:2509.01526  [pdf

    cs.LG cs.NE

    Prediction, Generation of WWTPs microbiome community structures and Clustering of WWTPs various feature attributes using DE-BP model, SiTime-GAN model and DPNG-EPMC ensemble clustering algorithm with modulation of microbial ecosystem health

    Authors: Mingzhi Dai, Weiwei Cai, Xiang Feng, Huiqun Yu, Weibin Guo, Miao Guo

    Abstract: Microbiomes not only underpin Earth's biogeochemical cycles but also play crucial roles in both engineered and natural ecosystems, such as the soil, wastewater treatment, and the human gut. However, microbiome engineering faces significant obstacles to surmount to deliver the desired improvements in microbiome control. Here, we use the backpropagation neural network (BPNN), optimized through diffe… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

    Comments: 48 pages,25 figures, three major research sections: Prediction, Generation and Clustering

  26. arXiv:2508.18268  [pdf, ps, other

    cs.RO cs.AI

    SafeBimanual: Diffusion-based Trajectory Optimization for Safe Bimanual Manipulation

    Authors: Haoyuan Deng, Wenkai Guo, Qianzhun Wang, Zhenyu Wu, Ziwei Wang

    Abstract: Bimanual manipulation has been widely applied in household services and manufacturing, which enables the complex task completion with coordination requirements. Recent diffusion-based policy learning approaches have achieved promising performance in modeling action distributions for bimanual manipulation. However, they ignored the physical safety constraints of bimanual manipulation, which leads t… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: Project website is at: https://denghaoyuan123.github.io/SafeBimanip/

  27. arXiv:2508.13654  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Input-Time Scaling

    Authors: Rapheal Huang, Weilong Guo

    Abstract: Current Large Language Models (LLMs) are usually post-trained on large-scale carefully curated datasets (data & training scaling) and doing reasoning in test time (inference time scaling). In this work, we present a new scaling paradigm, Input-Time Scaling, to complement previous scaling methods by putting resources on queries (input time). During training and testing, we utilize meta-knowledge fr… ▽ More

    Submitted 12 September, 2025; v1 submitted 19 August, 2025; originally announced August 2025.

  28. arXiv:2508.13423  [pdf, ps, other

    cs.IR cs.AI

    AdaptJobRec: Enhancing Conversational Career Recommendation through an LLM-Powered Agentic System

    Authors: Qixin Wang, Dawei Wang, Kun Chen, Yaowei Hu, Puneet Girdhar, Ruoteng Wang, Aadesh Gupta, Chaitanya Devella, Wenlai Guo, Shangwen Huang, Bachir Aoun, Greg Hayworth, Han Li, Xintao Wu

    Abstract: In recent years, recommendation systems have evolved from providing a single list of recommendations to offering a comprehensive suite of topic focused services. To better accomplish this task, conversational recommendation systems (CRS) have progressed from basic retrieval augmented LLM generation to agentic systems with advanced reasoning and self correction capabilities. However, agentic system… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  29. arXiv:2508.11911  [pdf, ps, other

    math.NA cs.LG physics.comp-ph

    Reduced-order modeling of Hamiltonian dynamics based on symplectic neural networks

    Authors: Yongsheng Chen, Wei Guo, Qi Tang, Xinghui Zhong

    Abstract: We introduce a novel data-driven symplectic induced-order modeling (ROM) framework for high-dimensional Hamiltonian systems that unifies latent-space discovery and dynamics learning within a single, end-to-end neural architecture. The encoder-decoder is built from Henon neural networks (HenonNets) and may be augmented with linear SGS-reflector layers. This yields an exact symplectic map between fu… ▽ More

    Submitted 16 August, 2025; originally announced August 2025.

  30. arXiv:2508.10931  [pdf, ps, other

    cs.CV cs.GR

    VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip

    Authors: Wenqi Guo, Shan Du

    Abstract: We introduce Value Sign Flip (VSF), a simple and efficient method for incorporating negative prompt guidance in few-step diffusion and flow-matching image generation models. Unlike existing approaches such as classifier-free guidance (CFG), NASA, and NAG, VSF dynamically suppresses undesired content by flipping the sign of attention values from negative prompts. Our method requires only small comp… ▽ More

    Submitted 29 September, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

  31. arXiv:2508.10924  [pdf, ps, other

    eess.AS cs.SD

    ASAudio: A Survey of Advanced Spatial Audio Research

    Authors: Zhiyuan Zhu, Yu Zhang, Wenxiang Guo, Changhao Pan, Zhou Zhao

    Abstract: With the rapid development of spatial audio technologies today, applications in AR, VR, and other scenarios have garnered extensive attention. Unlike traditional mono sound, spatial audio offers a more realistic and immersive auditory experience. Despite notable progress in the field, there remains a lack of comprehensive surveys that systematically organize and analyze these methods and their und… ▽ More

    Submitted 20 August, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

  32. arXiv:2508.10684  [pdf, ps, other

    cs.LG math.OC stat.CO stat.ML

    MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control

    Authors: Yuchen Zhu, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, Molei Tao

    Abstract: We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $π\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardi… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  33. arXiv:2508.10615  [pdf, ps, other

    cs.IR

    FuXi-β: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model

    Authors: Yufei Ye, Wei Guo, Hao Wang, Hong Zhu, Yuyang Ye, Yong Liu, Huifeng Guo, Ruiming Tang, Defu Lian, Enhong Chen

    Abstract: Scaling laws for autoregressive generative recommenders reveal potential for larger, more versatile systems but mean greater latency and training costs. To accelerate training and inference, we investigated the recent generative recommendation models HSTU and FuXi-$α$, identifying two efficiency bottlenecks: the indexing operations in relative temporal attention bias and the computation of the que… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  34. arXiv:2508.09166  [pdf, ps, other

    cs.NI cs.HC

    WPTrack: A Wi-Fi and Pressure Insole Fusion System for Single Target Tracking

    Authors: Wei Guo, Shunsei Yamagishi, Lei Jing

    Abstract: As the Internet of Things (IoT) continues to evolve, indoor location has become a critical element for enabling smart homes, behavioral monitoring, and elderly care. Existing WiFi-based human tracking solutions typically require specialized equipment or multiple Wi-Fi links, a limitation in most indoor settings where only a single pair of Wi-Fi devices is usually available. However, despite effort… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

    Comments: 6 pages, 12 figures, conference

  35. arXiv:2508.08606  [pdf, ps, other

    cs.LG math.OC stat.ML

    Distributed optimization: designed for federated learning

    Authors: Wenyou Guo, Ting Qu, Chunrong Pan, George Q. Huang

    Abstract: Federated Learning (FL), as a distributed collaborative Machine Learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both… ▽ More

    Submitted 28 August, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

    Comments: 16 pages, 6 figures

  36. arXiv:2508.06869  [pdf, ps, other

    cs.CV cs.AI

    VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

    Authors: Jianxiang He, Meisheng Hong, Jungang Li, Yijie Xu, Ziyang Chen, Weiyu Guo, Hui Xiong

    Abstract: Long video understanding presents a significant challenge to multimodal large language models (MLLMs) primarily due to the immense data scale. A critical and widely adopted strategy for making this task computationally tractable is keyframe retrieval, which seeks to identify a sparse set of video frames that are most salient to a given textual query. However, the efficacy of this approach is hinde… ▽ More

    Submitted 6 September, 2025; v1 submitted 9 August, 2025; originally announced August 2025.

    Comments: 9 pages,3 figures

    ACM Class: I.2.10

  37. arXiv:2508.06832  [pdf, ps, other

    cs.AI

    Remote Sensing Image Intelligent Interpretation with the Language-Centered Perspective: Principles, Methods and Challenges

    Authors: Haifeng Li, Wang Guo, Haiyang Wu, Mengwei Wu, Jipeng Zhang, Qing Zhu, Yu Liu, Xin Huang, Chao Tao

    Abstract: The mainstream paradigm of remote sensing image interpretation has long been dominated by vision-centered models, which rely on visual features for semantic understanding. However, these models face inherent limitations in handling multi-modal reasoning, semantic abstraction, and interactive decision-making. While recent advances have introduced Large Language Models (LLMs) into remote sensing wor… ▽ More

    Submitted 9 August, 2025; originally announced August 2025.

  38. arXiv:2508.06004  [pdf, ps, other

    cs.DL cs.IR

    When a Paper Has 1000 Authors: Rethinking Citation Metrics in the Era of LLMs

    Authors: Weihang Guo, Zhao Song, Jiahao Zhang

    Abstract: Author-level citation metrics provide a practical, interpretable, and scalable signal of scholarly influence in a complex research ecosystem. It has been widely used as a proxy in hiring decisions. However, the past five years have seen the rapid emergence of large-scale publications in the field of large language models and foundation models, with papers featuring hundreds to thousands of co-auth… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  39. arXiv:2508.05543  [pdf, ps, other

    cs.RO

    CleanUpBench: Embodied Sweeping and Grasping Benchmark

    Authors: Wenbo Li, Guanting Chen, Tao Zhao, Jiyao Wang, Tianxin Hu, Yuwen Liao, Weixiang Guo, Shenghai Yuan

    Abstract: Embodied AI benchmarks have advanced navigation, manipulation, and reasoning, but most target complex humanoid agents or large-scale simulations that are far from real-world deployment. In contrast, mobile cleaning robots with dual mode capabilities, such as sweeping and grasping, are rapidly emerging as realistic and commercially viable platforms. However, no benchmark currently exists that syste… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

  40. arXiv:2508.04200  [pdf, ps, other

    cs.CV cs.LG

    Bootstrap Deep Spectral Clustering with Optimal Transport

    Authors: Wengang Guo, Wei Ye, Chunchun Chen, Xin Sun, Christian Böhm, Claudia Plant, Susanto Rahardja

    Abstract: Spectral clustering is a leading clustering method. Two of its major shortcomings are the disjoint optimization process and the limited representation capacity. To address these issues, we propose a deep spectral clustering model (named BootSC), which jointly learns all stages of spectral clustering -- affinity matrix construction, spectral embedding, and $k$-means clustering -- using a single net… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

  41. arXiv:2508.03752  [pdf, ps, other

    eess.IV cs.AI cs.CV

    M$^3$HL: Mutual Mask Mix with High-Low Level Feature Consistency for Semi-Supervised Medical Image Segmentation

    Authors: Yajun Liu, Zenghui Zhang, Jiang Yue, Weiwei Guo, Dongying Li

    Abstract: Data augmentation methods inspired by CutMix have demonstrated significant potential in recent semi-supervised medical image segmentation tasks. However, these approaches often apply CutMix operations in a rigid and inflexible manner, while paying insufficient attention to feature-level consistency constraints. In this paper, we propose a novel method called Mutual Mask Mix with High-Low level fea… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: MICCAI 2025

  42. arXiv:2508.03337  [pdf, ps, other

    cs.CV

    Less is More: Token-Efficient Video-QA via Adaptive Frame-Pruning and Semantic Graph Integration

    Authors: Shaoguang Wang, Ziyang Chen, Yijie Xu, Weiyu Guo, Hui Xiong

    Abstract: The practical application of Multimodal Large Language Models (MLLMs) to Video Question Answering (Video-QA) is severely hindered by the high token cost of processing numerous video frames. While increasing the number of sampled frames is a common strategy, we observe a "less is more" phenomenon where excessive frames can paradoxically degrade performance due to context dilution. Concurrently, sta… ▽ More

    Submitted 15 September, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

    Comments: Corresponding authors: Weiyu Guo, Hui Xiong. This manuscript is a preprint. An earlier version of this work was submitted to AAAI 2026. This version has been revised and is formatted using the AAAI 2026 style file

  43. arXiv:2508.02066  [pdf, ps, other

    cs.LG cs.AI cs.CL

    MolReasoner: Toward Effective and Interpretable Reasoning for Molecular LLMs

    Authors: Guojiang Zhao, Sihang Li, Zixiang Lu, Zheng Cheng, Haitao Lin, Lirong Wu, Hanchen Xia, Hengxing Cai, Wentao Guo, Hongshuai Wang, Mingjun Xu, Siyu Zhu, Guolin Ke, Linfeng Zhang, Zhifeng Gao

    Abstract: Large Language Models(LLMs) have demonstrated remarkable performance across various domains, yet their capabilities in molecular reasoning remain insufficiently explored. Current approaches tend to rely heavily on general-purpose prompting, which lacks domain-specific molecular semantics, while those that use fine-tuning strategies often face challenges with interpretability and reasoning depth. T… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

  44. arXiv:2508.00823  [pdf, ps, other

    cs.CV cs.RO

    IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation

    Authors: Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu

    Abstract: Visual navigation with an image as goal is a fundamental and challenging problem. Conventional methods either rely on end-to-end RL learning or modular-based policy with topological graph or BEV map as memory, which cannot fully model the geometric relationship between the explored 3D environment and the goal image. In order to efficiently and accurately localize the goal image in 3D space, we bui… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

    Comments: Accepted to ICCV 2025. Project page: https://gwxuan.github.io/IGL-Nav/

  45. arXiv:2507.22633  [pdf, ps, other

    cs.LG cs.AI

    H2Tune: Federated Foundation Model Fine-Tuning with Hybrid Heterogeneity

    Authors: Wei Guo, Siyuan Lu, Yiqi Tong, Zhaojun Hu, Fuzhen Zhuang, Xiao Zhang, Tao Fan, Jin Dong

    Abstract: Different from existing federated fine-tuning (FFT) methods for foundation models, hybrid heterogeneous federated fine-tuning (HHFFT) is an under-explored scenario where clients exhibit double heterogeneity in model architectures and downstream tasks. This hybrid heterogeneity introduces two significant challenges: 1) heterogeneous matrix aggregation, where clients adopt different large-scale foun… ▽ More

    Submitted 30 July, 2025; v1 submitted 30 July, 2025; originally announced July 2025.

  46. arXiv:2507.22488  [pdf, ps, other

    cs.LG cs.AI

    Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

    Authors: Wei Guo, Yiyang Duan, Zhaojun Hu, Yiqi Tong, Fuzhen Zhuang, Xiao Zhang, Jin Dong, Ruofan Wu, Tengfei Liu, Yifan Sun

    Abstract: In vertical federated learning (VFL), multiple enterprises address aligned sample scarcity by leveraging massive locally unaligned samples to facilitate collaborative learning. However, unaligned samples across different parties in VFL can be extremely class-imbalanced, leading to insufficient feature representation and limited model prediction space. Specifically, class-imbalanced problems consis… ▽ More

    Submitted 30 July, 2025; originally announced July 2025.

  47. arXiv:2507.20643  [pdf, ps, other

    cs.CL cs.AI

    Ontology-Enhanced Knowledge Graph Completion using Large Language Models

    Authors: Wenbin Guo, Xin Wang, Jiaoyan Chen, Zhao Li, Zirui Chen

    Abstract: Large Language Models (LLMs) have been extensively adopted in Knowledge Graph Completion (KGC), showcasing significant research advancements. However, as black-box models driven by deep neural architectures, current LLM-based KGC methods rely on implicit knowledge representation with parallel propagation of erroneous knowledge, thereby hindering their ability to produce conclusive and decisive rea… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

  48. arXiv:2507.18671  [pdf, ps, other

    cs.LG cs.AI

    Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

    Authors: Ning Liao, Xiaoxing Wang, Zehao Lin, Weiyang Guo, Feng Hong, Shixiang Song, Geng Yu, Zihua Zhao, Sitao Xie, Longxuan Wei, Xiangqi Jin, Xiaohan Qin, Jiale Ma, Kai Chen, Jiangchao Yao, Zhouhan Lin, Junchi Yan, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Linfeng Zhang

    Abstract: A large language model (LLM) with knowledge in both scientific and general tasks is the foundation of science general intelligence. However, directly continued pretraining an LLM using science data usually leads to catastrophic forgetting, which indicates severe degradation in general ability. In this report, we present Innovator, which solves this problem by upcycling a pre-trained dense LLM into… ▽ More

    Submitted 24 July, 2025; originally announced July 2025.

    Comments: Technical Report

  49. arXiv:2507.15219  [pdf, ps, other

    cs.CR cs.AI

    PromptArmor: Simple yet Effective Prompt Injection Defenses

    Authors: Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, Dawn Song

    Abstract: Despite their potential, recent research has demonstrated that LLM agents are vulnerable to prompt injection attacks, where malicious prompts are injected into the agent's input, causing it to perform an attacker-specified task rather than the intended task provided by the user. In this paper, we present PromptArmor, a simple yet effective defense against prompt injection attacks. Specifically, Pr… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  50. arXiv:2507.11173  [pdf, ps, other

    cs.LG

    Real-Time Bayesian Detection of Drift-Evasive GNSS Spoofing in Reinforcement Learning Based UAV Deconfliction

    Authors: Deepak Kumar Panda, Weisi Guo

    Abstract: Autonomous unmanned aerial vehicles (UAVs) rely on global navigation satellite system (GNSS) pseudorange measurements for accurate real-time localization and navigation. However, this dependence exposes them to sophisticated spoofing threats, where adversaries manipulate pseudoranges to deceive UAV receivers. Among these, drift-evasive spoofing attacks subtly perturb measurements, gradually divert… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.