Skip to main content

Showing 1–50 of 1,111 results for author: Gao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10415  [pdf, ps, other

    cs.RO cs.HC

    Internal State Estimation in Groups via Active Information Gathering

    Authors: Xuebo Ji, Zherong Pan, Xifeng Gao, Lei Yang, Xinxin Du, Kaiyun Li, Yongjin Liu, Wenping Wang, Changhe Tu, Jia Pan

    Abstract: Accurately estimating human internal states, such as personality traits or behavioral patterns, is critical for enhancing the effectiveness of human-robot interaction, particularly in group settings. These insights are key in applications ranging from social navigation to autism diagnosis. However, prior methods are limited by scalability and passive observation, making real-time estimation in com… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09783  [pdf

    stat.AP cs.LG

    Pure Component Property Estimation Framework Using Explainable Machine Learning Methods

    Authors: Jianfeng Jiao, Xi Gao, Jie Li

    Abstract: Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modeling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding re… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.08854  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Generative AI for Autonomous Driving: Frontiers and Opportunities

    Authors: Yuping Wang, Shuo Xing, Cui Can, Renjie Li, Hongyuan Hua, Kexin Tian, Zhaobin Mo, Xiangbo Gao, Keshu Wu, Sulong Zhou, Hengxu You, Juntong Peng, Junge Zhang, Zehao Wang, Rui Song, Mingxuan Yan, Walter Zimmer, Xingcheng Zhou, Peiran Li, Zhaohan Lu, Chia-Ju Chen, Yue Huang, Ryan A. Rossi, Lichao Sun, Hongkai Yu , et al. (22 additional authors not shown)

    Abstract: Generative Artificial Intelligence (GenAI) constitutes a transformative technological wave that reconfigures industries through its unparalleled capabilities for content creation, reasoning, planning, and multimodal understanding. This revolutionary force offers the most promising path yet toward solving one of engineering's grandest challenges: achieving reliable, fully autonomous driving, partic… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.08830  [pdf, ps, other

    cs.CR cs.AI

    Federated Large Language Models: Feasibility, Robustness, Security and Future Directions

    Authors: Wenhao Jiang, Yuchuan Luo, Guilin Deng, Silong Chen, Xu Yang, Shihong Wu, Xinwen Gao, Lin Liu, Shaojing Fu

    Abstract: The integration of Large Language Models (LLMs) and Federated Learning (FL) presents a promising solution for joint training on distributed data while preserving privacy and addressing data silo issues. However, this emerging field, known as Federated Large Language Models (FLLM), faces significant challenges, including communication and computation overheads, heterogeneity, privacy and security c… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 35 pages

  5. arXiv:2505.08316  [pdf, ps, other

    cs.CE cs.CV

    Improving Unsupervised Task-driven Models of Ventral Visual Stream via Relative Position Predictivity

    Authors: Dazhong Rong, Hao Dong, Xing Gao, Jiyu Wei, Di Hong, Yaoyao Hao, Qinming He, Yueming Wang

    Abstract: Based on the concept that ventral visual stream (VVS) mainly functions for object recognition, current unsupervised task-driven methods model VVS by contrastive learning, and have achieved good brain similarity. However, we believe functions of VVS extend beyond just object recognition. In this paper, we introduce an additional function involving VVS, named relative position (RP) prediction. We fi… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted for full publication at CogSci 2025 (https://cognitivesciencesociety.org/cogsci-2025/)

  6. arXiv:2505.07894  [pdf, other

    cs.NI cs.ET cs.LG eess.SP math.ST

    EnvCDiff: Joint Refinement of Environmental Information and Channel Fingerprints via Conditional Generative Diffusion Model

    Authors: Zhenzhou Jin, Li You, Xiang-Gen Xia, Xiqi Gao

    Abstract: The paradigm shift from environment-unaware communication to intelligent environment-aware communication is expected to facilitate the acquisition of channel state information for future wireless communications. Channel Fingerprint (CF), as an emerging enabling technology for environment-aware communication, provides channel-related knowledge for potential locations within the target communication… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 6 pages, 2 figures

  7. arXiv:2505.07893  [pdf, other

    cs.NI cs.LG eess.SP math.PR math.ST

    Channel Fingerprint Construction for Massive MIMO: A Deep Conditional Generative Approach

    Authors: Zhenzhou Jin, Li You, Xudong Li, Zhen Gao, Yuanwei Liu, Xiang-Gen Xia, Xiqi Gao

    Abstract: Accurate channel state information (CSI) acquisition for massive multiple-input multiple-output (MIMO) systems is essential for future mobile communication networks. Channel fingerprint (CF), also referred to as channel knowledge map, is a key enabler for intelligent environment-aware communication and can facilitate CSI acquisition. However, due to the cost limitations of practical sensing nodes… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 15 pages, 7 figures

  8. arXiv:2505.07687  [pdf, ps, other

    eess.IV cs.CV

    ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

    Authors: Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

    Abstract: Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for pr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: MICCAI 2025(under view)

  9. arXiv:2505.07539  [pdf, ps, other

    cs.CV

    GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

    Authors: Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, Yiyi Liao

    Abstract: Immersive video offers a 6-Dof-free viewing experience, potentially playing a key role in future video technology. Recently, 4D Gaussian Splatting has gained attention as an effective approach for immersive video due to its high rendering efficiency and quality, though maintaining quality with manageable storage remains challenging. To address this, we introduce GIFStream, a novel 4D Gaussian repr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures

  10. arXiv:2505.06900  [pdf, other

    eess.SP cs.IT cs.LG

    Near-Field Channel Estimation for XL-MIMO: A Deep Generative Model Guided by Side Information

    Authors: Zhenzhou Jin, Li You, Derrick Wing Kwan Ng, Xiang-Gen Xia, Xiqi Gao

    Abstract: This paper investigates the near-field (NF) channel estimation (CE) for extremely large-scale multiple-input multiple-output (XL-MIMO) systems. Considering the pronounced NF effects in XL-MIMO communications, we first establish a joint angle-distance (AD) domain-based spherical-wavefront physical channel model that captures the inherent sparsity of XL-MIMO channels. Leveraging the channel's sparsi… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 15 pages, 11 figures, to appear on IEEE Transactions on Cognitive Communications and Networking

  11. arXiv:2505.06411  [pdf, ps, other

    cs.CV cs.AI

    MAGE:A Multi-stage Avatar Generator with Sparse Observations

    Authors: Fangyu Du, Yang Yang, Xuehao Gao, Hongye Hou

    Abstract: Inferring full-body poses from Head Mounted Devices, which capture only 3-joint observations from the head and wrists, is a challenging task with wide AR/VR applications. Previous attempts focus on learning one-stage motion mapping and thus suffer from an over-large inference space for unobserved body joint motions. This often leads to unsatisfactory lower-body predictions and poor temporal consis… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  12. arXiv:2505.06146  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Learning-Augmented Algorithms for Boolean Satisfiability

    Authors: Idan Attias, Xing Gao, Lev Reyzin

    Abstract: Learning-augmented algorithms are a prominent recent development in beyond worst-case analysis. In this framework, a problem instance is provided with a prediction (``advice'') from a machine-learning oracle, which provides partial information about an optimal solution, and the goal is to design algorithms that leverage this advice to improve worst-case performance. We study the classic Boolean sa… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  13. Statistical CSI Acquisition for Multi-frequency Massive MIMO Systems

    Authors: Jinke Tang, Li You, Xinrui Gong, Chenjie Xie, Xiqi Gao, Xiang-Gen Xia, Xueyuan Shi

    Abstract: Multi-frequency massive multi-input multi-output (MIMO) communication is a promising strategy for both 5G and future 6G systems, ensuring reliable transmission while enhancing frequency resource utilization. Statistical channel state information (CSI) has been widely adopted in multi-frequency massive MIMO transmissions to reduce overhead and improve transmission performance. In this paper, we pro… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 9 figures. Accepted for publication on IEEE Transactions on Communications

  14. Massive MIMO-OFDM Channel Acquisition with Time-Frequency Phase-Shifted Pilots

    Authors: Jinke Tang, Xiqi Gao, Li You, Ding Shi, Jiyuan Yang, Xiang-Gen Xia, Xinwei Zhao, Peigang Jiang

    Abstract: In this paper, we propose a channel acquisition approach with time-frequency phase-shifted pilots (TFPSPs) for massive multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We first present a triple-beam (TB) based channel tensor model, allowing for the representation of the space-frequency-time (SFT) domain channel as the product of beam matrices and the TB doma… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 15 pages, 10 figures. Accepted for publication on IEEE Transactions on Communications

  15. arXiv:2505.04905  [pdf, ps, other

    cs.CV

    Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization

    Authors: Xi Yang, Songsong Duan, Nannan Wang, Xinbo Gao

    Abstract: Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications. Current studies focus on the Class Activation Map (CAM) of CNN and the self-attention map of transformer to identify the region of objects. However, both CAM and self-attention maps can not learn pixel-l… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Accepted by ECCV 2024

  16. arXiv:2505.04758  [pdf, other

    cs.CV

    Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective

    Authors: Songsong Duan, Xi Yang, Nannan Wang, Xinbo Gao

    Abstract: Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency. Meanwhile, several existing lightweight methods are difficult to achieve high-precision performance. To balance the efficiency and performance, we propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives: depth quality, modality fusion,… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Accepted by TIP 2025

  17. GNN-enabled Precoding for Massive MIMO LEO Satellite Communications

    Authors: Huibin Zhou, Xinrui Gong, Christos G. Tsinos, Li You, Xiqi Gao, Björn Ottersten

    Abstract: Low Earth Orbit (LEO) satellite communication is a critical component in the development of sixth generation (6G) networks. The integration of massive multiple-input multiple-output (MIMO) technology is being actively explored to enhance the performance of LEO satellite communications. However, the limited power of LEO satellites poses a significant challenge in improving communication energy effi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 14 pages, 13 figures

  18. arXiv:2505.01932  [pdf, other

    cs.GR cs.CV

    OT-Talk: Animating 3D Talking Head with Optimal Transportation

    Authors: Xinmu Wang, Xiang Gao, Xiyun Song, Heather Yu, Zongfang Lin, Liang Peng, Xianfeng Gu

    Abstract: Animating 3D head meshes using audio inputs has significant applications in AR/VR, gaming, and entertainment through 3D avatars. However, bridging the modality gap between speech signals and facial dynamics remains a challenge, often resulting in incorrect lip syncing and unnatural facial movements. To address this, we propose OT-Talk, the first approach to leverage optimal transportation to optim… ▽ More

    Submitted 10 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

  19. arXiv:2505.01168  [pdf, other

    cs.LG cs.AI

    Harmonizing Intra-coherence and Inter-divergence in Ensemble Attacks for Adversarial Transferability

    Authors: Zhaoyang Ma, Zhihao Wu, Wang Lu, Xin Gao, Jinghang Yue, Taolin Zhang, Lipo Wang, Youfang Lin, Jing Wang

    Abstract: The development of model ensemble attacks has significantly improved the transferability of adversarial examples, but this progress also poses severe threats to the security of deep neural networks. Existing methods, however, face two critical challenges: insufficient capture of shared gradient directions across models and a lack of adaptive weight allocation mechanisms. To address these issues, w… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  20. arXiv:2504.19244  [pdf, other

    cs.CV

    Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID

    Authors: De Cheng, Lingfeng He, Nannan Wang, Dingwen Zhang, Xinbo Gao

    Abstract: Unsupervised visible-infrared person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning. Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning. However, these methods ove… ▽ More

    Submitted 5 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Accepted by IJCV 2025

  21. arXiv:2504.19093  [pdf, other

    cs.CR cs.AI cs.PF

    CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

    Authors: Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, especially the recent advancements in reasoning, such as o1 and o3, pushing the boundaries of AI. Despite these impressive achievements in mathematics and coding, the reasoning abilities of LLMs in domains requiring cryptographic expertise remain underexplored. In this paper, we introduce CipherBank, a comprehensive benchmark… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: Work in progress

  22. arXiv:2504.18866  [pdf, other

    cs.CV

    PiercingEye: Dual-Space Video Violence Detection with Hyperbolic Vision-Language Guidance

    Authors: Jiaxu Leng, Zhanjie Wu, Mingpi Tan, Mengjingcheng Mo, Jiankang Zheng, Qingqing Li, Ji Gan, Xinbo Gao

    Abstract: Existing weakly supervised video violence detection (VVD) methods primarily rely on Euclidean representation learning, which often struggles to distinguish visually similar yet semantically distinct events due to limited hierarchical modeling and insufficient ambiguous training samples. To address this challenge, we propose PiercingEye, a novel dual-space learning framework that synergizes Euclide… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  23. arXiv:2504.17279  [pdf, other

    cs.CL

    Evaluating and Mitigating Bias in AI-Based Medical Text Generation

    Authors: Xiuying Chen, Tairan Wang, Juexiao Zhou, Zirui Song, Xin Gao, Xiangliang Zhang

    Abstract: Artificial intelligence (AI) systems, particularly those based on deep learning models, have increasingly achieved expert-level performance in medical applications. However, there is growing concern that such AI systems may reflect and amplify human bias, and reduce the quality of their performance in historically under-served populations. The fairness issue has attracted considerable research int… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 12 pages, 8 figures, published in Nature Computational Science

    Journal ref: Nature Computational Science 2025

  24. arXiv:2504.16877  [pdf, other

    cs.SE

    Context-Enhanced Vulnerability Detection Based on Large Language Model

    Authors: Yixin Yang, Bowen Xu, Xiang Gao, Hailong Sun

    Abstract: Vulnerability detection is a critical aspect of software security. Accurate detection is essential to prevent potential security breaches and protect software systems from malicious attacks. Recently, vulnerability detection methods leveraging deep learning and large language models (LLMs) have garnered increasing attention. However, existing approaches often focus on analyzing individual files or… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  25. arXiv:2504.16616  [pdf, other

    cs.CV

    EHGCN: Hierarchical Euclidean-Hyperbolic Fusion via Motion-Aware GCN for Hybrid Event Stream Perception

    Authors: Haosheng Chen, Lian Luo, Mengjingcheng Mo, Zhanjie Wu, Guobao Xiao, Ji Gan, Jiaxu Leng, Xinbo Gao

    Abstract: Event cameras, with microsecond temporal resolution and high dynamic range (HDR) characteristics, emit high-speed event stream for perception tasks. Despite the recent advancement in GNN-based perception methods, they are prone to use straightforward pairwise connectivity mechanisms in the pure Euclidean space where they struggle to capture long-range dependencies and fail to effectively character… ▽ More

    Submitted 27 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  26. arXiv:2504.15054  [pdf, other

    cs.CV

    Structure-guided Diffusion Transformer for Low-Light Image Enhancement

    Authors: Xiangchen Yin, Zhenda Yu, Longtao Jiang, Xin Gao, Xiao Sun, Zhi Liu, Xun Yang

    Abstract: While the diffusion transformer (DiT) has become a focal point of interest in recent years, its application in low-light image enhancement remains a blank area for exploration. Current methods recover the details from low-light images while inevitably amplifying the noise in images, resulting in poor visual quality. In this paper, we firstly introduce DiT into the low-light enhancement task and de… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE Transactions on Multimedia (TMM)

  27. arXiv:2504.14946  [pdf, other

    cs.LG

    Symmetry-Preserving Architecture for Multi-NUMA Environments (SPANE): A Deep Reinforcement Learning Approach for Dynamic VM Scheduling

    Authors: Tin Ping Chan, Yunlong Cheng, Yizhan Zhu, Xiaofeng Gao, Guihai Chen

    Abstract: As cloud computing continues to evolve, the adoption of multi-NUMA (Non-Uniform Memory Access) architecture by cloud service providers has introduced new challenges in virtual machine (VM) scheduling. To address these challenges and more accurately reflect the complexities faced by modern cloud environments, we introduce the Dynamic VM Allocation problem in Multi-NUMA PM (DVAMP). We formally defin… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 10 pages, 7 figures. Accepted to IEEE INFOCOM 2025

  28. arXiv:2504.13726  [pdf, other

    cs.CV

    MLEP: Multi-granularity Local Entropy Patterns for Universal AI-generated Image Detection

    Authors: Lin Yuan, Xiaowan Li, Yan Zhang, Jiawei Zhang, Hongbo Li, Xinbo Gao

    Abstract: Advancements in image generation technologies have raised significant concerns about their potential misuse, such as producing misinformation and deepfakes. Therefore, there is an urgent need for effective methods to detect AI-generated images (AIGI). Despite progress in AIGI detection, achieving reliable performance across diverse generation models and scenes remains challenging due to the lack o… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 9 pages, 6 figures

  29. arXiv:2504.13406  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    LangCoop: Collaborative Driving with Language

    Authors: Xiangbo Gao, Yuheng Wu, Rujia Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu

    Abstract: Multi-agent collaboration holds great promise for enhancing the safety, reliability, and mobility of autonomous driving systems by enabling information sharing among multiple connected agents. However, existing multi-agent communication approaches are hindered by limitations of existing communication media, including high bandwidth demands, agent heterogeneity, and information loss. To address the… ▽ More

    Submitted 20 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Journal ref: CVPRW 2025

  30. arXiv:2504.12322  [pdf, other

    cs.CL cs.AI cs.LG

    A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

    Authors: Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Lijun Wu, Conghui He

    Abstract: While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language Models (LLMs), which suffer from high computational costs, environmental inefficiency, and potential biases inherited from monolithic architectures. In contrast, smaller LLMs are more accessible and sustainable, but their individual capabilities often f… ▽ More

    Submitted 21 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  31. arXiv:2504.11739  [pdf, other

    cs.CV cs.CL

    The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

    Authors: Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang

    Abstract: The evolution of Text-to-video (T2V) generative models, trained on large-scale datasets, has been marked by significant progress. However, the sensitivity of T2V generative models to input prompts highlights the critical role of prompt design in influencing generative outcomes. Prior research has predominantly relied on Large Language Models (LLMs) to align user-provided prompts with the distribut… ▽ More

    Submitted 5 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: accepted by CVPR2025, Project website: https://whynothaha.github.io/Prompt_optimizer/RAPO.html

  32. arXiv:2504.11309  [pdf, other

    cs.CV

    Big Brother is Watching: Proactive Deepfake Detection via Learnable Hidden Face

    Authors: Hongbo Li, Shangchao Yang, Ruiyang Xia, Lin Yuan, Xinbo Gao

    Abstract: As deepfake technologies continue to advance, passive detection methods struggle to generalize with various forgery manipulations and datasets. Proactive defense techniques have been actively studied with the primary aim of preventing deepfake operation effectively working. In this paper, we aim to bridge the gap between passive detection and proactive defense, and seek to solve the detection prob… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  33. arXiv:2504.10647  [pdf, other

    cs.CL

    Improving In-Context Learning with Reasoning Distillation

    Authors: Nafis Sadeq, Xin Xu, Zhouhang Xie, Julian McAuley, Byungkyu Kang, Prarit Lamba, Xiang Gao

    Abstract: Language models rely on semantic priors to perform in-context learning, which leads to poor performance on tasks involving inductive reasoning. Instruction-tuning methods based on imitation learning can superficially enhance the in-context learning performance of language models, but they often fail to improve the model's understanding of the underlying rules that connect inputs and outputs in few… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  34. arXiv:2504.09441  [pdf, other

    cs.CV eess.IV

    Structure-Accurate Medical Image Translation based on Dynamic Frequency Balance and Knowledge Guidance

    Authors: Jiahua Xu, Dawei Zhou, Lei Hu, Zaiyi Liu, Nannan Wang, Xinbo Gao

    Abstract: Multimodal medical images play a crucial role in the precise and comprehensive clinical diagnosis. Diffusion model is a powerful strategy to synthesize the required medical images. However, existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information and the weakening of low-frequency information. Thus, we propose a novel… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Medical image translation, Diffusion model, 16 pages

  35. arXiv:2504.08411  [pdf, other

    cs.CV cs.AI

    A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation

    Authors: Dawei Zhou, Suzhi Gang, Decheng Liu, Tongliang Liu, Nannan Wang, Xinbo Gao

    Abstract: Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields. To alleviate these issues, adversarial noise-based defenses have been enthusiastically studied in recent years. However, ``data-only" methods tend to distort fake samples in the low-level feature space rather than the high-level semantic space, leading to limitations in… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  36. arXiv:2504.07691  [pdf, other

    cs.LG cs.CV

    Distilling Knowledge from Heterogeneous Architectures for Semantic Segmentation

    Authors: Yanglin Huang, Kai Hu, Yuan Zhang, Zhineng Chen, Xieping Gao

    Abstract: Current knowledge distillation (KD) methods for semantic segmentation focus on guiding the student to imitate the teacher's knowledge within homogeneous architectures. However, these methods overlook the diverse knowledge contained in architectures with different inductive biases, which is crucial for enabling the student to acquire a more precise and comprehensive understanding of the data during… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted to AAAI 2025

  37. arXiv:2504.07634  [pdf, other

    cs.SE

    Agent That Debugs: Dynamic State-Guided Vulnerability Repair

    Authors: Zhengyao Liu, Yunlong Ma, Jingxuan Xu, Junchen Ai, Xiang Gao, Hailong Sun, Abhik Roychoudhury

    Abstract: In recent years, more vulnerabilities have been discovered every day, while manual vulnerability repair requires specialized knowledge and is time-consuming. As a result, many detected or even published vulnerabilities remain unpatched, thereby increasing the exposure of software systems to attacks. Recent advancements in agents based on Large Language Models have demonstrated their increasing cap… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  38. arXiv:2504.06670  [pdf, other

    cs.RO

    Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making

    Authors: Kaifeng Wang, Yinsong Chen, Qi Liu, Xueyuan Li, Xin Gao

    Abstract: In multi-agent safety-critical scenarios, traditional autonomous driving frameworks face significant challenges in balancing safety constraints and task performance. These frameworks struggle to quantify dynamic interaction risks in real-time and depend heavily on manual rules, resulting in low computational efficiency and conservative strategies. To address these limitations, we propose a Dynamic… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  39. arXiv:2504.06544  [pdf, ps, other

    cs.CV

    LCGC: Learning from Consistency Gradient Conflicting for Class-Imbalanced Semi-Supervised Debiasing

    Authors: Weiwei Xing, Yue Cheng, Hongzhu Yi, Xiaohui Gao, Xiang Wei, Xiaoyu Guo, Yuming Zhang, Xinyu Pang

    Abstract: Classifiers often learn to be biased corresponding to the class-imbalanced dataset, especially under the semi-supervised learning (SSL) set. While previous work tries to appropriately re-balance the classifiers by subtracting a class-irrelevant image's logit, but lacks a firm theoretical basis. We theoretically analyze why exploiting a baseline image can refine pseudo-labels and prove that the bla… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted by AAAI 2025

  40. arXiv:2504.05846  [pdf, other

    cs.IR cs.AI cs.LG

    PathGPT: Leveraging Large Language Models for Personalized Route Generation

    Authors: Steeve Cuthbert Marcelyn, Yucen Gao, Yuzhe Zhang, Xiaofeng Gao, Guihai Chen

    Abstract: The proliferation of GPS enabled devices has led to the accumulation of a substantial corpus of historical trajectory data. By leveraging these data for training machine learning models,researchers have devised novel data-driven methodologies that address the personalized route recommendation (PRR) problem. In contrast to conventional algorithms such as Dijkstra shortest path algorithm,these novel… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  41. arXiv:2504.02327  [pdf, other

    cs.CL

    LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models

    Authors: Weibin Liao, Xin Gao, Tianyu Jia, Rihong Qiu, Yifan Zhu, Yang Lin, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: Natural Language to SQL (NL2SQL) has emerged as a critical task for enabling seamless interaction with databases. Recent advancements in Large Language Models (LLMs) have demonstrated remarkable performance in this domain. However, existing NL2SQL methods predominantly rely on closed-source LLMs leveraging prompt engineering, while open-source models typically require fine-tuning to acquire domain… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  42. SCNR Maximization for MIMO ISAC Assisted by Fluid Antenna System

    Authors: Yuqi Ye, Li You, Hao Xu, Ahmed Elzanaty, Kai-Kit Wong, Xiqi Gao

    Abstract: The integrated sensing and communication (ISAC) technology has been extensively researched to enhance communication rates and radar sensing capabilities. Additionally, a new technology known as fluid antenna system (FAS) has recently been proposed to obtain higher communication rates for future wireless networks by dynamically altering the antenna position to obtain a more favorable channel condit… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 6 Pages, 3 figures, to appear in IEEE Transactions on Vehicular Technology

  43. arXiv:2504.01016  [pdf, other

    cs.GR cs.AI cs.CV

    GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

    Authors: Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, Ying Shan

    Abstract: Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from ope… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Project webpage: https://geometrycrafter.github.io/

  44. Challenges of Interaction in Optimizing Mixed Categorical-Continuous Variables

    Authors: Youhei Akimoto, Xilin Gao, Ze Kai Ng, Daiki Morinaga

    Abstract: Optimization of mixed categorical-continuous variables is prevalent in real-world applications of black-box optimization. Recently, CatCMA has been proposed as a method for optimizing such variables and has demonstrated success in hyper-parameter optimization problems. However, it encounters challenges when optimizing categorical variables in the presence of interaction between continuous and cate… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Accepted at GECCO 2025

  45. arXiv:2504.00347  [pdf, other

    astro-ph.SR cs.LG

    Using machine learning method for variable star classification using the TESS Sectors 1-57 data

    Authors: Li-Heng Wang, Kai Li, Xiang Gao, Ya-Ni Guo, Guo-You Sun

    Abstract: The Transiting Exoplanet Survey Satellite (TESS) is a wide-field all-sky survey mission designed to detect Earth-sized exoplanets. After over four years photometric surveys, data from sectors 1-57, including approximately 1,050,000 light curves with a 2-minute cadence, were collected. By cross-matching the data with Gaia's variable star catalogue, we obtained labeled datasets for further analysis.… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 15pages, 12 figures, 3 tables, accepted by ApJ, Data available via China-VO PaperData repository

  46. arXiv:2503.23035  [pdf, other

    cs.CV

    FreeInv: Free Lunch for Improving DDIM Inversion

    Authors: Yuxiang Bao, Huijie Liu, Xun Gao, Huan Fu, Guoliang Kang

    Abstract: Naive DDIM inversion process usually suffers from a trajectory deviation issue, i.e., the latent trajectory during reconstruction deviates from the one during inversion. To alleviate this issue, previous methods either learn to mitigate the deviation or design cumbersome compensation strategy to reduce the mismatch error, exhibiting substantial time and computation cost. In this work, we present a… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  47. arXiv:2503.22263  [pdf, other

    cs.LG cs.CV

    FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning

    Authors: Dongping Liao, Xitong Gao, Yabo Xu, Chengzhong Xu

    Abstract: The increasing emphasis on privacy and data security has driven the adoption of federated learning, a decentralized approach to train machine learning models without sharing raw data. Prompt learning, which fine-tunes prompt embeddings of pretrained models, offers significant advantages in federated settings by reducing computational costs and communication overheads while leveraging the strong pe… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: https://github.com/0-ml/flip

  48. arXiv:2503.22248  [pdf, other

    cs.LG cs.RO

    CRLLK: Constrained Reinforcement Learning for Lane Keeping in Autonomous Driving

    Authors: Xinwei Gao, Arambam James Singh, Gangadhar Royyuru, Michael Yuhas, Arvind Easwaran

    Abstract: Lane keeping in autonomous driving systems requires scenario-specific weight tuning for different objectives. We formulate lane-keeping as a constrained reinforcement learning problem, where weight coefficients are automatically learned along with the policy, eliminating the need for scenario-specific tuning. Empirically, our approach outperforms traditional RL in efficiency and reliability. Addit… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted at AAMAS 2025 (Demonstration Track), 3 pages, 2 figures, 1 table

    ACM Class: I.2.6; I.2.9; I.5.1; C.3; I.2.11

  49. arXiv:2503.22197  [pdf, other

    cs.CV

    Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning

    Authors: Yang Liu, Xun Zhang, Jiale Du, Xinbo Gao, Jungong Han

    Abstract: Zero-shot Learning(ZSL) attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information, which is a promising yet difficult research topic. In this field, Audio-Visual Generalized Zero-Shot Learning~(AV-GZSL) has aroused researchers' great interest in which intricate relations within triple modalities~(audio, video, and natural language) render this task… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  50. arXiv:2503.22193  [pdf, other

    cs.CV

    Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need

    Authors: Yang Liu, Feixiang Liu, Jiale Du, Xinbo Gao, Jungong Han

    Abstract: Convolutional neural networks and supervised learning have achieved remarkable success in various fields but are limited by the need for large annotated datasets. Few-shot learning (FSL) addresses this limitation by enabling models to generalize from only a few labeled examples. Transductive few-shot learning (TFSL) enhances FSL by leveraging both labeled and unlabeled data, though it faces challe… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.