Skip to main content

Showing 1–50 of 911 results for author: sun, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2507.02303  [pdf, ps, other

    cs.IT

    Measurements and Modeling of Air-Ground Integrated Channel in Forest Environment Based on OFDM Signals

    Authors: Zhe Xiao, Shu Sun, Na Liu, Lianming Xu, Li Wang

    Abstract: Forests are frequently impacted by climate conditions, vegetation density, and intricate terrain and geology, which contribute to natural disasters. Personnel engaged in or supporting rescue operations in such environments rely on robust communication systems to ensure their safety, highlighting the criticality of channel measurements in forest environments. However, according to current research,… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  3. arXiv:2507.01782  [pdf, ps, other

    cs.IT

    Symbiotic Backscatter Communication: A Design Perspective on the Modulation Scheme of Backscatter Devices

    Authors: Yinghui Ye, Shuang Lu, Liqin Shi, Xiaoli Chu, Sumei Sun

    Abstract: Symbiotic Backscatter Communication (SBC) has emerged as a spectrum-efficient and low-power communication technology, where backscatter devices (BDs) modulate and reflect incident radio frequency (RF) signals from primary transmitters (PTs). While previous studies have assumed a circularly symmetric complex Gaussian (CSCG) distribution for the BD's signal, this assumption may not be practical beca… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Submitted to IEEE Trans

  4. arXiv:2507.01485  [pdf, ps, other

    cs.RO cs.AI cs.MA q-bio.QM

    BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

    Authors: Yibo Qiu, Zan Huang, Zhiyu Wang, Handi Liu, Yiling Qiao, Yifeng Hu, Shu'ang Sun, Hangke Peng, Ronald X Xu, Mingzhai Sun

    Abstract: Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), a… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  5. arXiv:2507.00839  [pdf, ps, other

    cs.DB

    RapidStore: An Efficient Dynamic Graph Storage System for Concurrent Queries

    Authors: Chiyu Hao, Jixian Su, Shixuan Sun, Hao Zhang, Sen Gao, Jianwen Zhao, Chenyi Zhang, Jieru Zhao, Chen Chen, Minyi Guo

    Abstract: Dynamic graph storage systems are essential for real-time applications such as social networks and recommendation, where graph data continuously evolves. However, they face significant challenges in efficiently handling concurrent read and write operations. We find that existing methods suffer from write queries interfering with read efficiency, substantial time and space overhead due to per-edge… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 17 pages, 18 figures

  6. arXiv:2506.23549  [pdf, ps, other

    cs.AI cs.HC cs.LG

    CooT: Learning to Coordinate In-Context with Coordination Transformers

    Authors: Huai-Chih Wang, Hsiang-Chun Chuang, Hsi-Chun Cheng, Dai-Jie Wu, Shao-Hua Sun

    Abstract: Effective coordination among artificial agents in dynamic and uncertain environments remains a significant challenge in multi-agent systems. Existing approaches, such as self-play and population-based methods, either generalize poorly to unseen partners or require extensive training. To overcome these limitations, we propose Coordination Transformers (CooT), a novel in-context coordination framewo… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: 23 pages, 10 tables, 8 figures

  7. arXiv:2506.18295  [pdf, ps, other

    cs.LG cs.AI

    GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing

    Authors: Kejia Bian, Meixia Tao, Shu Sun, Jun Yu

    Abstract: Neural ray tracing (RT) has emerged as a promising paradigm for channel modeling by combining physical propagation principles with neural networks. It enables high modeling accuracy and efficiency. However, current neural RT methods face two key limitations: constrained generalization capability due to strong spatial dependence, and weak adherence to electromagnetic laws. In this paper, we propose… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  8. arXiv:2506.16773  [pdf, ps, other

    cs.CV

    Infrared and Visible Image Fusion Based on Implicit Neural Representations

    Authors: Shuchen Sun, Ligen Shi, Chang Liu, Lina Wu, Jun Qiu

    Abstract: Infrared and visible light image fusion aims to combine the strengths of both modalities to generate images that are rich in information and fulfill visual or computational requirements. This paper proposes an image fusion method based on Implicit Neural Representations (INR), referred to as INRFuse. This method parameterizes a continuous function through a neural network to implicitly represent t… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  9. arXiv:2506.16716  [pdf, ps, other

    cs.HC

    V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos

    Authors: Qixin Wang, Songtao Zhou, Zeyu Jin, Chenglin Guo, Shikun Sun, Xiaoyu Qin

    Abstract: Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the vi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCNN 2025

  10. arXiv:2506.16101  [pdf, ps, other

    cs.SE

    Regression Testing Optimization for ROS-based Autonomous Systems: A Comprehensive Review of Techniques

    Authors: Yupeng Jiang, Shuaiyi Sun, Xi Zheng

    Abstract: Regression testing plays a critical role in maintaining software reliability, particularly for ROS-based autonomous systems (ROSAS), which frequently undergo continuous integration and iterative development. However, conventional regression testing techniques face significant challenges when applied to autonomous systems due to their dynamic and non-deterministic behaviors, complex multi-modal sen… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  11. arXiv:2506.14853  [pdf, ps, other

    q-bio.QM cs.LG

    DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

    Authors: Max Ku, Sun Sun, Hongyu Guo, Wenhu Chen

    Abstract: We introduce DisProtEdit, a controllable protein editing framework that leverages dual-channel natural language supervision to learn disentangled representations of structural and functional properties. Unlike prior approaches that rely on joint holistic embeddings, DisProtEdit explicitly separates semantic factors, enabling modular and interpretable control. To support this, we construct SwissPro… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to ICMLW (GenBio) 2025 and ICMLW (FM4LS) 2025

  12. arXiv:2506.14851  [pdf, ps, other

    cs.DC cs.AI cs.LG

    Efficient Serving of LLM Applications with Probabilistic Demand Modeling

    Authors: Yifei Liu, Zuo Gan, Zhenghao Gan, Weiye Wang, Chen Chen, Yizhou Shan, Xusheng Chen, Zhenhua Han, Yifei Zhu, Shixuan Sun, Minyi Guo

    Abstract: Applications based on Large Language Models (LLMs) contains a series of tasks to address real-world problems with boosted capability, which have dynamic demand volumes on diverse backends. Existing serving systems treat the resource demands of LLM applications as a blackbox, compromising end-to-end efficiency due to improper queuing order and backend warm up latency. We find that the resource dema… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  13. arXiv:2506.14516  [pdf, ps, other

    cs.IR

    RMIT-ADM+S at the SIGIR 2025 LiveRAG Challenge

    Authors: Kun Ran, Shuoqi Sun, Khoi Nguyen Dinh Anh, Damiano Spina, Oleg Zendel

    Abstract: This paper presents the RMIT--ADM+S participation in the SIGIR 2025 LiveRAG Challenge. Our Generation-Retrieval-Augmented Generation (GRAG) approach relies on generating a hypothetical answer that is used in the retrieval phase, alongside the original question. GRAG also incorporates a pointwise large language model (LLM)-based re-ranking step prior to final answer generation. We describe the syst… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted for oral presentation at SIGIR 2025 LiveRAG

  14. arXiv:2506.13485  [pdf, ps, other

    q-bio.BM cs.LG

    Curriculum Learning for Biological Sequence Prediction: The Case of De Novo Peptide Sequencing

    Authors: Xiang Zhang, Jiaqi Wei, Zijie Qiu, Sheng Xu, Nanqing Dong, Zhiqiang Gao, Siqi Sun

    Abstract: Peptide sequencing-the process of identifying amino acid sequences from mass spectrometry data-is a fundamental task in proteomics. Non-Autoregressive Transformers (NATs) have proven highly effective for this task, outperforming traditional methods. Unlike autoregressive models, which generate tokens sequentially, NATs predict all positions simultaneously, leveraging bidirectional context through… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  15. arXiv:2506.12787  [pdf, ps, other

    cs.CV

    Rasterizing Wireless Radiance Field via Deformable 2D Gaussian Splatting

    Authors: Mufan Liu, Cixiao Zhang, Qi Yang, Yujie Cao, Yiling Xu, Yin Xu, Shu Sun, Mingzeng Dai, Yunfeng Guan

    Abstract: Modeling the wireless radiance field (WRF) is fundamental to modern communication systems, enabling key tasks such as localization, sensing, and channel estimation. Traditional approaches, which rely on empirical formulas or physical simulations, often suffer from limited accuracy or require strong scene priors. Recent neural radiance field (NeRF-based) methods improve reconstruction fidelity thro… ▽ More

    Submitted 18 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

  16. arXiv:2506.12735  [pdf, ps, other

    cs.LG cs.AI

    Revealing the Challenges of Sim-to-Real Transfer in Model-Based Reinforcement Learning via Latent Space Modeling

    Authors: Zhilin Lin, Shiliang Sun

    Abstract: Reinforcement learning (RL) is playing an increasingly important role in fields such as robotic control and autonomous driving. However, the gap between simulation and the real environment remains a major obstacle to the practical deployment of RL. Agents trained in simulators often struggle to maintain performance when transferred to real-world physical environments. In this paper, we propose a l… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  17. arXiv:2506.12430  [pdf, ps, other

    cs.CR cs.CV

    Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025

    Authors: Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song , et al. (22 additional authors not shown)

    Abstract: Multimodal Large Language Models (MLLMs) have enabled transformative advancements across diverse applications but remain susceptible to safety threats, especially jailbreak attacks that induce harmful outputs. To systematically evaluate and improve their safety, we organized the Adversarial Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This technical report presents finding… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  18. arXiv:2506.11659  [pdf

    cs.SE

    An Empirical study on LLM-based Log Retrieval for Software Engineering Metadata Management

    Authors: Simin Sun, Yuchuan Jin, Miroslaw Staron

    Abstract: Developing autonomous driving systems (ADSs) involves generating and storing extensive log data from test drives, which is essential for verification, research, and simulation. However, these high-frequency logs, recorded over varying durations, pose challenges for developers attempting to locate specific driving scenarios. This difficulty arises due to the wide range of signals representing vario… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  19. arXiv:2506.10521  [pdf, ps, other

    cs.AI cs.CL

    Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

    Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv , et al. (2 additional authors not shown)

    Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on ev… ▽ More

    Submitted 25 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 82 pages

  20. arXiv:2506.09954  [pdf, ps, other

    cs.CV cs.AI

    Vision Generalist Model: A Survey

    Authors: Ziyi Wang, Yongming Rao, Shuofeng Sun, Xinrun Liu, Yi Wei, Xumin Yu, Zuyan Liu, Yanbo Wang, Hongmin Liu, Jie Zhou, Jiwen Lu

    Abstract: Recently, we have witnessed the great success of the generalist model in natural language processing. The generalist model is a general framework trained with massive data and is able to process various downstream tasks simultaneously. Encouraged by their impressive performance, an increasing number of researchers are venturing into the realm of applying these models to computer vision tasks. Howe… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted by International Journal of Computer Vision (IJCV)

  21. arXiv:2506.09937  [pdf, ps, other

    cs.RO cs.AI

    SAFE: Multitask Failure Detection for Vision-Language-Action Models

    Authors: Qiao Gu, Yuanliang Ju, Shengxiang Sun, Igor Gilitschenski, Haruki Nishimura, Masha Itkina, Florian Shkurti

    Abstract: While vision-language-action models (VLAs) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks out-of-the-box. To allow these policies to safely interact with their environments, we need a failure detector that gives a timely alert such that the robot can stop, backtrack, or ask for help. However, existi… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Project Page: https://vla-safe.github.io/

  22. arXiv:2506.08844  [pdf, ps, other

    cs.LG cs.CE

    IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples)

    Authors: Siyi Sun, David Antony Selby, Yunchuan Huang, Sebastian Vollmer, Seth Flaxman, Anisoara Calinescu

    Abstract: Missing data imputation in tabular datasets remains a pivotal challenge in data science and machine learning, particularly within socioeconomic research. However, real-world socioeconomic datasets are typically subject to strict data protection protocols, which often prohibit public sharing, even for synthetic derivatives. This severely limits the reproducibility and accessibility of benchmark stu… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  23. arXiv:2506.07971  [pdf, ps, other

    cs.CV

    CyberV: Cybernetics for Test-time Scaling in Video Understanding

    Authors: Jiahao Meng, Shuyang Sun, Yue Tan, Lu Qi, Yunhai Tong, Xiangtai Li, Longyin Wen

    Abstract: Current Multimodal Large Language Models (MLLMs) may struggle with understanding long or complex videos due to computational demands at test time, lack of robustness, and limited accuracy, primarily stemming from their feed-forward processing nature. These limitations could be more severe for models with fewer parameters. To address these limitations, we propose a novel framework inspired by cyber… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  24. arXiv:2506.07019  [pdf, ps, other

    cs.IT eess.SP

    Passive Detection in Multi-Static ISAC Systems: Performance Analysis and Joint Beamforming Optimization

    Authors: Renjie He, Yiqiu Wang, Meixia Tao, Shu Sun

    Abstract: This paper investigates the passive detection problem in multi-static integrated sensing and communication (ISAC) systems, where multiple sensing receivers (SRs) jointly detect a target using random unknown communication signals transmitted by a collaborative base station. Unlike traditional active detection, the considered passive detection does not require complete prior knowledge of the transmi… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  25. arXiv:2506.06820  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs

    Authors: Wenyu Zhang, Yingxu He, Geyu Lin, Zhuohan Liu, Shuo Sun, Bin Wang, Xunlong Zou, Jeremy H. M. Wong, Qiongqiong Wang, Hardik B. Sailor, Nancy F. Chen, Ai Ti Aw

    Abstract: Audio Large Language Models (AudioLLMs) have achieved strong results in semantic tasks like speech recognition and translation, but remain limited in modeling paralinguistic cues such as emotion. Existing approaches often treat emotion understanding as a classification problem, offering little insight into the underlying rationale behind predictions. In this work, we explore emotion reasoning, a s… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  26. arXiv:2506.05682  [pdf, ps, other

    cs.AR

    Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy

    Authors: Yu Feng, Weikai Lin, Yuge Cheng, Zihan Liu, Jingwen Leng, Minyi Guo, Chen Chen, Shixuan Sun, Yuhao Zhu

    Abstract: 3D Gaussian Splatting (3DGS) has vastly advanced the pace of neural rendering, but it remains computationally demanding on today's mobile SoCs. To address this challenge, we propose Lumina, a hardware-algorithm co-designed system, which integrates two principal optimizations: a novel algorithm, S^2, and a radiance caching mechanism, RC, to improve the efficiency of neural rendering. S2 algorithm e… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  27. arXiv:2506.03673  [pdf, ps, other

    cs.AI

    Reason from Future: Reverse Thought Chain Enhances LLM Reasoning

    Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu

    Abstract: It has been demonstrated that carefully designed reasoning paradigms, like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), can enhance the reasoning capabilities of small language models by detailed thinking and extensive thought searching, unbounded branching factors in the searching space create prohibitive reasoning consumption. However these methods fall into the trap of local optimum reason… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 findings

  28. arXiv:2506.02327  [pdf, ps, other

    cs.CV

    Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

    Authors: Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen

    Abstract: Providing effective treatment and making informed clinical decisions are essential goals of modern medicine and clinical care. We are interested in simulating disease dynamics for clinical decision-making, leveraging recent advances in large generative models. To this end, we introduce the Medical World Model (MeWM), the first world model in medicine that visually predicts future disease states ba… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  29. arXiv:2506.02079  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Robust Federated Learning against Noisy Clients via Masked Optimization

    Authors: Xuefeng Jiang, Tian Wen, Zhiqin Yang, Lvhua Wu, Yufeng Chen, Sheng Sun, Yuwei Wang, Min Liu

    Abstract: In recent years, federated learning (FL) has made significant advance in privacy-sensitive applications. However, it can be hard to ensure that FL participants provide well-annotated data for training. The corresponding annotations from different clients often contain complex label noise at varying levels. This label noise issue has a substantial impact on the performance of the trained models, an… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Under review

  30. arXiv:2506.01672  [pdf, ps, other

    cs.LG

    Minimal Impact ControlNet: Advancing Multi-ControlNet Integration

    Authors: Shikun Sun, Min Zhou, Zixuan Wang, Xubin Li, Tiezheng Ge, Zijie Ye, Xiaoyu Qin, Junliang Xing, Bo Zheng, Jia Jia

    Abstract: With the advancement of diffusion models, there is a growing demand for high-quality, controllable image generation, particularly through methods that utilize one or multiple control signals based on ControlNet. However, in current ControlNet training, each control is designed to influence all areas of an image, which can lead to conflicts when different control signals are expected to manage diff… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: ICLR 2025

  31. A Reverse Causal Framework to Mitigate Spurious Correlations for Debiasing Scene Graph Generation

    Authors: Shuzhou Sun, Li Liu, Tianpeng Liu, Shuaifeng Zhi, Ming-Ming Cheng, Janne Heikkilä, Yongxiang Liu

    Abstract: Existing two-stage Scene Graph Generation (SGG) frameworks typically incorporate a detector to extract relationship features and a classifier to categorize these relationships; therefore, the training paradigm follows a causal chain structure, where the detector's inputs determine the classifier's inputs, which in turn influence the final predictions. However, such a causal chain structure can yie… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 21 pages, 11 figures, 12 tables

    ACM Class: I.2.10; I.4.8

  32. arXiv:2505.23275  [pdf, ps, other

    cs.NI

    Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception

    Authors: Guangyuan Liu, Yinqiu Liu, Ruichen Zhang, Hongyang Du, Dusit Niyato, Zehui Xiong, Sumei Sun, Abbas Jamalipour

    Abstract: The rapid development of multimodal AI and Large Language Models (LLMs) has greatly enhanced real-time interaction, decision-making, and collaborative tasks. However, in wireless multi-agent scenarios, limited bandwidth poses significant challenges to exchanging semantically rich multimodal information efficiently. Traditional semantic communication methods, though effective, struggle with redunda… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  33. arXiv:2505.21666  [pdf, ps, other

    cs.LG cs.AI

    Efficient Controllable Diffusion via Optimal Classifier Guidance

    Authors: Owen Oertell, Shikun Sun, Yiding Chen, Jin Peng Zhou, Zhiyong Wang, Wen Sun

    Abstract: The controllable generation of diffusion models aims to steer the model to generate samples that optimize some given objective functions. It is desirable for a variety of applications including image generation, molecule generation, and DNA/sequence generation. Reinforcement Learning (RL) based fine-tuning of the base model is a popular approach but it can overfit the reward function while requiri… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 28 pages, 9 figures, 3 tables

  34. arXiv:2505.21184  [pdf, other

    cs.LG cs.AI cs.CL

    PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing

    Authors: Yu Yan, Sheng Sun, Zhifei Zheng, Ziji Hao, Teli Liu, Min Liu

    Abstract: To construct responsible and secure AI applications, harmful information data is widely utilized for adversarial testing and the development of safeguards. Existing studies mainly leverage Large Language Models (LLMs) to synthesize data to obtain high-quality task datasets at scale, thereby avoiding costly human annotation. However, limited by the safety alignment mechanisms of LLMs, the synthesis… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  35. arXiv:2505.21180  [pdf, ps, other

    cs.LG cs.AI

    Latent label distribution grid representation for modeling uncertainty

    Authors: ShuNing Sun, YinSong Xiong, Yu Zhang, Zhuoran Zheng

    Abstract: Although \textbf{L}abel \textbf{D}istribution \textbf{L}earning (LDL) has promising representation capabilities for characterizing the polysemy of an instance, the complexity and high cost of the label distribution annotation lead to inexact in the construction of the label space. The existence of a large number of inexact labels generates a label space with uncertainty, which misleads the LDL alg… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Under review

  36. arXiv:2505.20694  [pdf, ps, other

    cs.CV cs.LG

    Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets

    Authors: Xulin Gu, Xinhao Zhong, Zhixing Wei, Yimin Zhou, Shuoyang Sun, Bin Chen, Hongpeng Wang, Yuan Luo

    Abstract: Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video d… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  37. arXiv:2505.19507  [pdf, ps, other

    cs.CV cs.LG

    Multimodal Machine Translation with Visual Scene Graph Pruning

    Authors: Chenyu Lu, Shiliang Sun, Jing Zhao, Nan Zhang, Tengfei Song, Hao Yang

    Abstract: Multimodal machine translation (MMT) seeks to address the challenges posed by linguistic polysemy and ambiguity in translation tasks by incorporating visual information. A key bottleneck in current MMT research is the effective utilization of visual data. Previous approaches have focused on extracting global or region-level image features and using attention or gating mechanisms for multimodal inf… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  38. arXiv:2505.17552  [pdf, ps, other

    cs.LG cs.AI

    Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing

    Authors: Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun

    Abstract: De novo peptide sequencing is a critical task in proteomics. However, the performance of current deep learning-based methods is limited by the inherent complexity of mass spectrometry data and the heterogeneous distribution of noise signals, leading to data-specific biases. We present RankNovo, the first deep reranking framework that enhances de novo peptide sequencing by leveraging the complement… ▽ More

    Submitted 30 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  39. arXiv:2505.16834  [pdf, other

    cs.CL cs.AI cs.IR

    SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

    Authors: Shuang Sun, Huatong Song, Yuhao Wang, Ruiyang Ren, Jinhao Jiang, Junjie Zhang, Fei Bai, Jia Deng, Wayne Xin Zhao, Zheng Liu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen

    Abstract: Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios requiring multi-step reasoning and iterative information retrieval. However, existing approaches face critical limitations that lack high-quality training trajectories or suffer from the distributional mismatches in simulated environments and prohibitive computational costs for… ▽ More

    Submitted 25 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  40. arXiv:2505.16811  [pdf, ps, other

    cs.CV

    Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining

    Authors: Shangquan Sun, Wenqi Ren, Juxiang Zhou, Shu Wang, Jianhou Gan, Xiaochun Cao

    Abstract: Significant progress has been made in video restoration under rainy conditions over the past decade, largely propelled by advancements in deep learning. Nevertheless, existing methods that depend on paired data struggle to generalize effectively to real-world scenarios, primarily due to the disparity between synthetic and authentic rain effects. To address these limitations, we propose a dual-bran… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 11 Pages, 8 figures, CVPR 2025 Oral Presentation

  41. arXiv:2505.16774  [pdf, ps, other

    cs.CL

    IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models

    Authors: Yiming Gao, Bin Wang, Chengwei Wei, Shuo Sun, AiTi Aw

    Abstract: Large language models (LLMs) have demonstrated strong instruction-following capabilities in text-based tasks. However, this ability often deteriorates in multimodal models after alignment with non-text modalities such as images or audio. While several recent efforts have investigated instruction-following performance in text and vision-language models, instruction-following in audio-based large la… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Link: https://github.com/AudioLLMs/AudioBench/tree/main/IFEval-Audio

  42. arXiv:2505.16502  [pdf, ps, other

    cs.DC cs.NI

    Recursive Offloading for LLM Serving in Multi-tier Networks

    Authors: Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Jinda Lu, Zheming Yang, Tian Wen

    Abstract: Heterogeneous device-edge-cloud computing infrastructures have become widely adopted in telecommunication operators and Wide Area Networks (WANs), offering multi-tier computational support for emerging intelligent services. With the rapid proliferation of Large Language Model (LLM) services, efficiently coordinating inference tasks and reducing communication overhead within these multi-tier networ… ▽ More

    Submitted 24 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 7 figures, 3 tables

  43. arXiv:2505.16164  [pdf, ps, other

    cs.CL

    Can LLMs Simulate Human Behavioral Variability? A Case Study in the Phonemic Fluency Task

    Authors: Mengyang Qiu, Zoe Brisebois, Siena Sun

    Abstract: Large language models (LLMs) are increasingly explored as substitutes for human participants in cognitive tasks, but their ability to simulate human behavioral variability remains unclear. This study examines whether LLMs can approximate individual differences in the phonemic fluency task, where participants generate words beginning with a target letter. We evaluated 34 model configurations, varyi… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  44. arXiv:2505.13319  [pdf, ps, other

    cs.CR cs.DC

    SVAFD: A Secure and Verifiable Co-Aggregation Protocol for Federated Distillation

    Authors: Tian Wen, Sheng Sun, Yuwei Wang, Peiyan Chen, Zhiyuan Wu, Min Liu, Bo Gao

    Abstract: Secure Aggregation (SA) is an indispensable component of Federated Learning (FL) that concentrates on privacy preservation while allowing for robust aggregation. However, most SA designs rely heavily on the unrealistic assumption of homogeneous model architectures. Federated Distillation (FD), which aggregates locally computed logits instead of model parameters, introduces a promising alternative… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 15 pages, 16 figures, 3 tables, 27 equations

  45. arXiv:2505.13175  [pdf, other

    cs.AI

    Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment

    Authors: Siming Sun, Kai Zhang, Xuejun Jiang, Wenchao Meng, Qinmin Yang

    Abstract: The emerging paradigm of leveraging pretrained large language models (LLMs) for time series forecasting has predominantly employed linguistic-temporal modality alignment strategies through token-level or layer-wise feature mapping. However, these approaches fundamentally neglect a critical insight: the core competency of LLMs resides not merely in processing localized token features but in their i… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  46. arXiv:2505.12844  [pdf, ps, other

    cs.AI cs.RO

    AGI-Elo: How Far Are We From Mastering A Task?

    Authors: Shuo Sun, Yimin Zhao, Christina Dao Wen Lee, Jiawei Sun, Chengran Yuan, Zefan Huang, Dongen Li, Justin KW Yeoh, Alok Prakash, Thomas W. Malone, Marcelo H. Ang Jr

    Abstract: As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unli… ▽ More

    Submitted 24 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  47. arXiv:2505.09315  [pdf, other

    cs.RO cs.CV cs.LG

    TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

    Authors: Xuefeng Jiang, Yuan Ma, Pengxiang Li, Leimeng Xu, Xin Wen, Kun Zhan, Zhongpu Xia, Peng Jia, XianPeng Lang, Sheng Sun

    Abstract: In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promising direction.In this work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Under review

  48. arXiv:2505.08392  [pdf, other

    cs.CL cs.AI

    Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping

    Authors: Ren Zhuang, Ben Wang, Shuifa Sun

    Abstract: Large Language Models leverage Chain-of-Thought (CoT) prompting for complex tasks, but their reasoning traces are often excessively verbose and inefficient, leading to significant computational costs and latency. Current CoT compression techniques typically rely on generic importance metrics and static compression rates, which may inadvertently remove functionally critical tokens or fail to adapt… ▽ More

    Submitted 17 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  49. arXiv:2505.08366  [pdf

    eess.SP cs.AI

    Non-contact Vital Signs Detection in Dynamic Environments

    Authors: Shuai Sun, Chong-Xi Liang, Chengwei Ye, Huanzhen Zhang, Kangsheng Wang

    Abstract: Accurate phase demodulation is critical for vital sign detection using millimeter-wave radar. However, in complex environments, time-varying DC offsets and phase imbalances can severely degrade demodulation performance. To address this, we propose a novel DC offset calibration method alongside a Hilbert and Differential Cross-Multiply (HADCM) demodulation algorithm. The approach estimates time-var… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  50. arXiv:2505.08157  [pdf, other

    cs.IR cs.AI

    Hyperbolic Contrastive Learning with Model-augmentation for Knowledge-aware Recommendation

    Authors: Shengyin Sun, Chen Ma

    Abstract: Benefiting from the effectiveness of graph neural networks (GNNs) and contrastive learning, GNN-based contrastive learning has become mainstream for knowledge-aware recommendation. However, most existing contrastive learning-based methods have difficulties in effectively capturing the underlying hierarchical structure within user-item bipartite graphs and knowledge graphs. Moreover, they commonly… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 18 pages