Skip to main content

Showing 1–50 of 2,161 results for author: Xu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10359  [pdf, other

    cs.RO cs.CV

    NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning

    Authors: Le Shi, Yifei Shi, Xin Xu, Tenglong Liu, Junhua Xi, Chengyuan Chen

    Abstract: Recent advances in deep generative models demonstrate unprecedented zero-shot generalization capabilities, offering great potential for robot manipulation in unstructured environments. Given a partial observation of a scene, deep generative models could generate the unseen regions and therefore provide more context, which enhances the capability of robots to generalize across unseen environments.… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09998  [pdf, other

    cs.CV

    From Air to Wear: Personalized 3D Digital Fashion with AR/VR Immersive 3D Sketching

    Authors: Ying Zang, Yuanqi Hu, Xinyu Chen, Yuxia Xu, Suhui Wang, Chunan Yu, Lanyun Zhu, Deyi Ji, Xin Xu, Tianrun Chen

    Abstract: In the era of immersive consumer electronics, such as AR/VR headsets and smart devices, people increasingly seek ways to express their identity through virtual fashion. However, existing 3D garment design tools remain inaccessible to everyday users due to steep technical barriers and limited data. In this work, we introduce a 3D sketch-driven 3D garment generation framework that empowers ordinary… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 8 pages, 5 figures

  3. arXiv:2505.09938  [pdf, ps, other

    cs.HC

    Design and Evaluation of Generative Agent-based Platform for Human-Assistant Interaction Research: A Tale of 10 User Studies

    Authors: Ziyi Xuan, Yiwen Wu, Xuhai Xu, Vinod Namboodiri, Mooi Choo Chuah, Yu Yang

    Abstract: Designing and evaluating personalized and proactive assistant agents remains challenging due to the time, cost, and ethical concerns associated with human-in-the-loop experimentation. Existing Human-Computer Interaction (HCI) methods often require extensive physical setup and human participation, which introduces privacy concerns and limits scalability. Simulated environments offer a partial solut… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.08739  [pdf, ps, other

    cs.CL

    Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies

    Authors: Xiaoliang Luo, Xinyi Xu, Michael Ramscar, Bradley C. Love

    Abstract: Can autoregressive large language models (LLMs) learn consistent probability distributions when trained on sequences in different token orders? We prove formally that for any well-defined probability distribution, sequence perplexity is invariant under any factorization, including forward, backward, or arbitrary permutations. This result establishes a rigorous theoretical foundation for studying h… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  6. arXiv:2505.07968  [pdf, other

    cs.CL

    Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

    Authors: Weiyi Wu, Xinwen Xu, Chongyang Gao, Xingjian Diao, Siting Li, Lucas A. Salas, Jiang Gui

    Abstract: Large Language Models (LLMs) have great potential in the field of health care, yet they face great challenges in adapting to rapidly evolving medical knowledge. This can lead to outdated or contradictory treatment suggestions. This study investigated how LLMs respond to evolving clinical guidelines, focusing on concept drift and internal inconsistencies. We developed the DriftMedQA benchmark to si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  7. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.07347  [pdf, other

    cs.CV

    AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography

    Authors: Jiewen Yang, Taoran Huang, Shangwei Ding, Xiaowei Xu, Qinhua Zhao, Yong Jiang, Jiarong Guo, Bin Pu, Jiexuan Zheng, Caojin Zhang, Hongwen Fei, Xiaomeng Li

    Abstract: Echocardiographers can detect pulmonary hypertension using Doppler echocardiography; however, accurately assessing its progression often proves challenging. Right heart catheterization (RHC), the gold standard for precise evaluation, is invasive and unsuitable for routine use, limiting its practicality for timely diagnosis and monitoring of pulmonary hypertension progression. Here, we propose MePH… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  9. arXiv:2505.07257  [pdf, ps, other

    cs.IR

    DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward

    Authors: Yi Zhang, Ruihong Qiu, Xuwei Xu, Jiajun Liu, Sen Wang

    Abstract: Model-based offline reinforcement learning (RL) has emerged as a promising approach for recommender systems, enabling effective policy learning by interacting with frozen world models. However, the reward functions in these world models, trained on sparse offline logs, often suffer from inaccuracies. Specifically, existing methods face two major limitations in addressing this challenge: (1) determ… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: SIGIR 2025

  10. arXiv:2505.06584  [pdf, ps, other

    cs.RO cs.AI

    JAEGER: Dual-Level Humanoid Whole-Body Controller

    Authors: Ziluo Ding, Haobin Jiang, Yuxuan Wang, Zhenguo Sun, Yu Zhang, Xiaojie Niu, Ming Yang, Weishuai Zeng, Xinrun Xu, Zongqing Lu

    Abstract: This paper presents JAEGER, a dual-level whole-body controller for humanoid robots that addresses the challenges of training a more robust and versatile policy. Unlike traditional single-controller approaches, JAEGER separates the control of the upper and lower bodies into two independent controllers, so that they can better focus on their distinct tasks. This separation alleviates the dimensional… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 15 pages, 2 figures

  11. arXiv:2505.06131  [pdf, ps, other

    cs.RO

    ELA-ZSON: Efficient Layout-Aware Zero-Shot Object Navigation Agent with Hierarchical Planning

    Authors: Jiawei Hou, Yuting Xiao, Xiangyang Xue, Taiping Zeng

    Abstract: We introduce ELA-ZSON, an efficient layout-aware zero-shot object navigation (ZSON) approach designed for complex multi-room indoor environments. By planning hierarchically leveraging a global topologigal map with layout information and local imperative approach with detailed scene representation memory, ELA-ZSON achieves both efficient and effective navigation. The process is managed by an LL… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  12. arXiv:2505.05840  [pdf, ps, other

    cs.RO eess.SY

    Versatile Distributed Maneuvering with Generalized Formations using Guiding Vector Fields

    Authors: Yang Lu, Sha Luo, Pengming Zhu, Weijia Yao, Hector Garcia de Marina, Xinglong Zhang, Xin Xu

    Abstract: This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding s… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  13. arXiv:2505.05811  [pdf, ps, other

    cs.RO

    Unsupervised Anomaly Detection for Autonomous Robots via Mahalanobis SVDD with Audio-IMU Fusion

    Authors: Yizhuo Yang, Jiulin Zhao, Xinhang Xu, Kun Cao, Shenghai Yuan, Lihua Xie

    Abstract: Reliable anomaly detection is essential for ensuring the safety of autonomous robots, particularly when conventional detection systems based on vision or LiDAR become unreliable in adverse or unpredictable conditions. In such scenarios, alternative sensing modalities are needed to provide timely and robust feedback. To this end, we explore the use of audio and inertial measurement unit (IMU) senso… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  14. arXiv:2505.05703  [pdf

    eess.IV cs.CV

    Hybrid Learning: A Novel Combination of Self-Supervised and Supervised Learning for MRI Reconstruction without High-Quality Training Reference

    Authors: Haoyang Pei, Ding Xia, Xiang Xu, William Moore, Yao Wang, Hersh Chandarana, Li Feng

    Abstract: Purpose: Deep learning has demonstrated strong potential for MRI reconstruction, but conventional supervised learning methods require high-quality reference images, which are often unavailable in practice. Self-supervised learning offers an alternative, yet its performance degrades at high acceleration rates. To overcome these limitations, we propose hybrid learning, a novel two-stage training fra… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  15. arXiv:2505.05589  [pdf, ps, other

    cs.CV cs.AI cs.LG

    ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation

    Authors: Jingzhong Lin, Yuanyuan Qi, Xinru Li, Wenxuan Huang, Xiangfeng Xu, Bangyan Li, Xuejiao Wang, Gaoqi He

    Abstract: Reactive dance generation (RDG) produces follower movements conditioned on guiding dancer and music while ensuring spatial coordination and temporal coherence. However, existing methods overemphasize global constraints and optimization, overlooking local information, such as fine-grained spatial interactions and localized temporal context. Therefore, we present ReactDance, a novel diffusion-based… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  16. arXiv:2505.05488  [pdf, other

    cs.CV

    From Events to Enhancement: A Survey on Event-Based Imaging Technologies

    Authors: Yunfan Lu, Xiaogang Xu, Pengteng Li, Yusheng Wang, Yi Cui, Huizai Yao, Hui Xiong

    Abstract: Event cameras offering high dynamic range and low latency have emerged as disruptive technologies in imaging. Despite growing research on leveraging these benefits for different imaging tasks, a comprehensive study of recently advances and challenges are still lacking. This limits the broader understanding of how to utilize events in universal imaging applications. In this survey, we first introdu… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  17. arXiv:2505.05195  [pdf, other

    cs.LG cs.AI cs.CV

    Concept-Based Unsupervised Domain Adaptation

    Authors: Xinyue Xu, Yueying Hu, Hui Tang, Yi Qin, Lu Mi, Hao Wang, Xiaomeng Li

    Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by explaining predictions through human-understandable concepts but typically assume that training and test data share the same distribution. This assumption often fails under domain shifts, leading to degraded performance and poor generalization. To address these limitations and improve the robustness of CBMs, we propose the Concept-based… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  18. arXiv:2505.04460  [pdf, other

    cs.CV

    Learning Real Facial Concepts for Independent Deepfake Detection

    Authors: Ming-Hui Liu, Harry Cheng, Tianyi Wang, Xin Luo, Xin-Shun Xu

    Abstract: Deepfake detection models often struggle with generalization to unseen datasets, manifesting as misclassifying real instances as fake in target domains. This is primarily due to an overreliance on forgery artifacts and a limited understanding of real faces. To address this challenge, we propose a novel approach RealID to enhance generalization by learning a comprehensive concept of real faces whil… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  19. arXiv:2505.04416  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

    Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu

    Abstract: Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 18 pages, 2 figures

  20. arXiv:2505.04384  [pdf, other

    cs.CV

    DATA: Multi-Disentanglement based Contrastive Learning for Open-World Semi-Supervised Deepfake Attribution

    Authors: Ming-Hui Liu, Xiao-Qian Liu, Xin Luo, Xin-Shun Xu

    Abstract: Deepfake attribution (DFA) aims to perform multiclassification on different facial manipulation techniques, thereby mitigating the detrimental effects of forgery content on the social order and personal reputations. However, previous methods focus only on method-specific clues, which easily lead to overfitting, while overlooking the crucial role of common forgery features. Additionally, they strug… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE TMM on 17-Jan-2025; Submitted to IEEE TMM on 11-Jul-2024

  21. arXiv:2505.03334  [pdf, other

    cs.CV cs.DB

    From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection

    Authors: Guoting Wei, Yu Liu, Xia Yuan, Xizhe Xue, Linlin Guo, Yifan Yang, Chunxia Zhao, Zongwen Bai, Haokui Zhang, Rong Xiao

    Abstract: In recent years, language-guided open-world aerial object detection has gained significant attention due to its better alignment with real-world application needs. However, due to limited datasets, most existing language-guided methods primarily focus on vocabulary, which fails to meet the demands of more fine-grained open-world detection. To address this limitation, we propose constructing a larg… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  22. arXiv:2505.03184  [pdf, other

    cs.CV

    Interactive Instance Annotation with Siamese Networks

    Authors: Xiang Xu, Ruotong Li, Mengjun Yi, Baile XU, Furao Shen, Jian Zhao

    Abstract: Annotating instance masks is time-consuming and labor-intensive. A promising solution is to predict contours using a deep learning model and then allow users to refine them. However, most existing methods focus on in-domain scenarios, limiting their effectiveness for cross-domain annotation tasks. In this paper, we propose SiamAnno, a framework inspired by the use of Siamese networks in object tra… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  23. arXiv:2505.01288  [pdf, other

    cs.RO cs.AI

    ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow

    Authors: Changhe Chen, Quantao Yang, Xiaohao Xu, Nima Fazeli, Olov Andersson

    Abstract: One of the central challenges preventing robots from acquiring complex manipulation skills is the prohibitive cost of collecting large-scale robot demonstrations. In contrast, humans are able to learn efficiently by watching others interact with their environment. To bridge this gap, we introduce semantic action flow as a core intermediate representation capturing the essential spatio-temporal man… ▽ More

    Submitted 12 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

  24. arXiv:2505.01224  [pdf, other

    cs.CV eess.IV

    RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement

    Authors: Kui Jiang, Yan Luo, Junjun Jiang, Xin Xu, Fei Ma, Fei Yu

    Abstract: Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications, where wavelength-dependent attenuation causes severe content degradation and color distortion. While recent state space models like Mamba show potential for long-range dependency modeling, their unfolding operations and fixed scan paths on 1D sequences fail to adapt to local object semantics and glo… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  25. arXiv:2504.19746  [pdf, other

    cs.LG cs.AR

    FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs

    Authors: Xilong Xie, Liang Wang, Limin Xiao, Meng Han, Lin Sun, Shuai Zheng, Xiangrong Xu

    Abstract: Large language models (LLMs) have significantly advanced the natural language processing paradigm but impose substantial demands on memory and computational resources. Quantization is one of the most effective ways to reduce memory consumption of LLMs. However, advanced single-precision quantization methods experience significant accuracy degradation when quantizing to ultra-low bits. Existing mix… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: DATE 2025

  26. arXiv:2504.19086  [pdf, other

    cs.CV

    Boosting Single-domain Generalized Object Detection via Vision-Language Knowledge Interaction

    Authors: Xiaoran Xu, Jiangang Yang, Wenyue Chong, Wenhui Shi, Shichu Sun, Jing Xing, Jian Liu

    Abstract: Single-Domain Generalized Object Detection~(S-DGOD) aims to train an object detector on a single source domain while generalizing well to diverse unseen target domains, making it suitable for multimedia applications that involve various domain shifts, such as intelligent video surveillance and VR/AR technologies. With the success of large-scale Vision-Language Models, recent S-DGOD approaches expl… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  27. arXiv:2504.18425  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.MM cs.SD

    Kimi-Audio Technical Report

    Authors: KimiTeam, Ding Ding, Zeqian Ju, Yichong Leng, Songxiang Liu, Tong Liu, Zeyu Shang, Kai Shen, Wei Song, Xu Tan, Heyi Tang, Zhengtao Wang, Chu Wei, Yifei Xin, Xinran Xu, Jianwei Yu, Yutao Zhang, Xinyu Zhou, Y. Charles, Jun Chen, Yanru Chen, Yulun Du, Weiran He, Zhenxing Hu, Guokun Lai , et al. (15 additional authors not shown)

    Abstract: We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input a… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  28. arXiv:2504.17807  [pdf, other

    cs.NI cs.AI cs.LG

    Research on Cloud Platform Network Traffic Monitoring and Anomaly Detection System based on Large Language Models

    Authors: Ze Yang, Yihong Jin, Juntian Liu, Xinhe Xu, Yihan Zhang, Shuyang Ji

    Abstract: The rapidly evolving cloud platforms and the escalating complexity of network traffic demand proper network traffic monitoring and anomaly detection to ensure network security and performance. This paper introduces a large language model (LLM)-based network traffic monitoring and anomaly detection system. In addition to existing models such as autoencoders and decision trees, we harness the power… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Proceedings of 2025 IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE 2025)

  29. arXiv:2504.17236  [pdf, ps, other

    cs.IT cs.LG

    Rate-Distortion-Perception Theory for the Quadratic Wasserstein Space

    Authors: Xiqiang Qu, Jun Chen, Lei Yu, Xiangyu Xu

    Abstract: We establish a single-letter characterization of the fundamental distortion-rate-perception tradeoff with limited common randomness under the squared error distortion measure and the squared Wasserstein-2 perception measure. Moreover, it is shown that this single-letter characterization can be explicitly evaluated for the Gaussian source. Various notions of universal representation are also clarif… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  30. arXiv:2504.16552  [pdf, other

    cs.DC

    DTVM: Revolutionizing Smart Contract Execution with Determinism and Compatibility

    Authors: Wei Zhou, Changzheng Wei, Ying Yan, Wei Tang, Zhihao Chen, Xiong Xu, Xuebing Huang, Wengang Chen, Jie Zhang, Yang Chen, Xiaofu Zheng, Hanghang Wu, Shenglong Chen, Ermei Wang, Xiangfei Chen, Yang Yu, Meng Wu, Tao Zhu, Liwei Yuan, Feng Yu, Alex Zhang, Wei Wang, Ji Luo, Zhengyu He, Wenbiao Zhao

    Abstract: We introduce the DeTerministic Virtual Machine (DTVM) Stack, a next-generation smart contract execution framework designed to address critical performance, determinism, and ecosystem compatibility challenges in blockchain networks. Building upon WebAssembly (Wasm) while maintaining full Ethereum Virtual Machine (EVM) ABI compatibility, DTVM introduces a Deterministic Middle Intermediate Representa… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  31. arXiv:2504.16423  [pdf, other

    cs.HC

    Advancing Radar Hand Gesture Recognition: A Hybrid Spectrum Synthetic Framework Merging Simulation with Neural Networks

    Authors: Jiaqi Tang, Xinbo Xu, Yinsong Xu, Qingchao Chen

    Abstract: Millimeter wave (mmWave) radar sensors play a vital role in hand gesture recognition (HGR) by detecting subtle motions while preserving user privacy. However, the limited scale of radar datasets hinders the performance. Existing synthetic data generation methods fall short in two key areas. On the one hand, modeling-based approaches fail to accurately simulate the wave propagation and reflection a… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  32. arXiv:2504.14899  [pdf, other

    cs.CV

    Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

    Authors: Chenjie Cao, Jingkai Zhou, Shikai Li, Jingyun Liang, Chaohui Yu, Fan Wang, Xiangyang Xue, Yanwei Fu

    Abstract: Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. Uni3C includes two key contributions. F… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Project page: https://github.com/ewrfcas/Uni3C

  33. arXiv:2504.14888  [pdf, other

    cs.CV

    WMKA-Net: A Weighted Multi-Kernel Attention NetworkMethod for Retinal Vessel Segmentation

    Authors: Xinran Xu, Yuliang Ma, Sifu Cai

    Abstract: We propose a novel retinal vessel segmentation network, the Weighted Multi-Kernel Attention Network (WMKA-Net), which aims to address the issues of insufficient multiscale feature capture, loss of contextual information, and noise sensitivity in retinal vessel segmentation. WMKA-Net significantly improves the segmentation performance of small vessels and low-contrast regions by integrating several… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  34. arXiv:2504.14655  [pdf, other

    cs.LG cs.CL cs.SE

    LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs

    Authors: Yunhui Xia, Wei Shen, Yan Wang, Jason Klein Liu, Huifeng Sun, Siyue Wu, Jian Hu, Xiaolong Xu

    Abstract: We introduce LeetCodeDataset, a high-quality benchmark for evaluating and training code-generation models, addressing two key challenges in LLM research: the lack of reasoning-focused coding benchmarks and self-contained training testbeds. By curating LeetCode Python problems with rich metadata, broad coverage, 100+ test cases per problem, and temporal splits (pre/post July 2024), our dataset enab… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  35. arXiv:2504.14257  [pdf, other

    cs.GR cs.CV

    HoLa: B-Rep Generation using a Holistic Latent Representation

    Authors: Yilin Liu, Duoteng Xu, Xingyao Yu, Xiang Xu, Daniel Cohen-Or, Hao Zhang, Hui Huang

    Abstract: We introduce a novel representation for learning and generating Computer-Aided Design (CAD) models in the form of $\textit{boundary representations}$ (B-Reps). Our representation unifies the continuous geometric properties of B-Rep primitives in different orders (e.g., surfaces and curves) and their discrete topological relations in a $\textit{holistic latent}$ (HoLa) space. This is based on the s… ▽ More

    Submitted 12 May, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

    Comments: ACM TOG and SIGGRAPH 2025 (Patent Protected); Project page: https://vcc.tech/research/2025/HolaBrep; Demo page: https://huggingface.co/spaces/YuXingyao/HoLa-BRep

  36. arXiv:2504.13845  [pdf, other

    cs.HC

    Towards Enhanced Learning through Presence: A Systematic Review of Presence in Virtual Reality Across Tasks and Disciplines

    Authors: Zheng Wei, Junxiang Liao, Lik-Hang Lee, Huamin Qu, Xian Xu

    Abstract: The rising interest in Virtual Reality (VR) technology has sparked a desire to create immersive learning platforms capable of handling various tasks across environments. Through immersive interfaces, users can engage deeply with virtual environments, enhancing both learning outcomes and task performance. In fields such as education, engineering, and collaboration, presence has emerged as a critica… ▽ More

    Submitted 8 February, 2025; originally announced April 2025.

  37. arXiv:2504.13748  [pdf, other

    cs.CV

    DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection

    Authors: Hongjia Chen, Xin Xu, Fangling Pu

    Abstract: Change detection (CD) in remote sensing imagery plays a crucial role in various applications such as urban planning, damage assessment, and resource management. While deep learning approaches have significantly advanced CD performance, current methods suffer from poor domain adaptability, requiring extensive labeled data for retraining when applied to new scenarios. This limitation severely restri… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 13 pages, 6 figures

  38. CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent

    Authors: Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, Feiran Huang

    Abstract: Recently, Large Language Model (LLM)-empowered recommender systems (RecSys) have brought significant advances in personalized user experience and have attracted considerable attention. Despite the impressive progress, the research question regarding the safety vulnerability of LLM-empowered RecSys still remains largely under-investigated. Given the security and privacy concerns, it is more practic… ▽ More

    Submitted 23 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted by KDD 2024;

  39. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  40. arXiv:2504.11457  [pdf, other

    cs.CV

    Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

    Authors: Ziqi Pang, Xin Xu, Yu-Xiong Wang

    Abstract: With the success of image generation, generative diffusion models are increasingly adopted for discriminative tasks, as pixel generation provides a unified perception interface. However, directly repurposing the generative denoising process for discriminative objectives reveals critical gaps rarely addressed previously. Generative models tolerate intermediate sampling errors if the final distribut… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: ICLR 2025

    Journal ref: ICLR 2025

  41. arXiv:2504.11230  [pdf, other

    cs.CV cs.RO

    CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image

    Authors: Jingshun Huang, Haitao Lin, Tianyu Wang, Yanwei Fu, Xiangyang Xue, Yi Zhu

    Abstract: This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks and introduces a new benchmark dataset. While recent methods estimate part poses and sizes at the category level, they often rely on geometric cues and complex multi-stage pipelines that first segment parts from the point cloud, followed by Normalized Part Coordinate Space (NPCS) estimation for 6… ▽ More

    Submitted 17 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: To appear in CVPR 2025 (Highlight)

  42. MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems

    Authors: Yibiao Wei, Jie Zou, Weikang Guo, Guoqing Wang, Xing Xu, Yang Yang

    Abstract: Conversational Recommender Systems (CRSs) aim to provide personalized recommendations by interacting with users through conversations. Most existing studies of CRS focus on extracting user preferences from conversational contexts. However, due to the short and sparse nature of conversational contexts, it is difficult to fully capture user preferences by conversational contexts only. We argue that… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  43. arXiv:2504.10828  [pdf, other

    cs.RO

    Following Is All You Need: Robot Crowd Navigation Using People As Planners

    Authors: Yuwen Liao, Xinhang Xu, Ruofei Bai, Yizhuo Yang, Muqing Cao, Shenghai Yuan, Lihua Xie

    Abstract: Navigating in crowded environments requires the robot to be equipped with high-level reasoning and planning techniques. Existing works focus on developing complex and heavyweight planners while ignoring the role of human intelligence. Since humans are highly capable agents who are also widely available in a crowd navigation setting, we propose an alternative scheme where the robot utilises people… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  44. arXiv:2504.10647  [pdf, other

    cs.CL

    Improving In-Context Learning with Reasoning Distillation

    Authors: Nafis Sadeq, Xin Xu, Zhouhang Xie, Julian McAuley, Byungkyu Kang, Prarit Lamba, Xiang Gao

    Abstract: Language models rely on semantic priors to perform in-context learning, which leads to poor performance on tasks involving inductive reasoning. Instruction-tuning methods based on imitation learning can superficially enhance the in-context learning performance of language models, but they often fail to improve the model's understanding of the underlying rules that connect inputs and outputs in few… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  45. arXiv:2504.09221  [pdf, ps, other

    cs.HC cs.LG

    CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

    Authors: Siyuan Kan, Huanyu Wu, Zhenyao Cui, Fan Huang, Xiaolong Xu, Dongrui Wu

    Abstract: Emotion recognition is an important component of affective computing, and also human-machine interaction. Unimodal emotion recognition is convenient, but the accuracy may not be high enough; on the contrary, multi-modal emotion recognition may be more accurate, but it also increases the complexity and cost of the data collection system. This paper considers cross-modal emotion recognition, i.e., u… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  46. arXiv:2504.07085  [pdf, other

    cs.LG

    Identifying Unknown Stochastic Dynamics via Finite expression methods

    Authors: Senwei Liang, Chunmei Wang, Xingjian Xu

    Abstract: Modeling stochastic differential equations (SDEs) is crucial for understanding complex dynamical systems in various scientific fields. Recent methods often employ neural network-based models, which typically represent SDEs through a combination of deterministic and stochastic terms. However, these models usually lack interpretability and have difficulty generalizing beyond their training domain. T… ▽ More

    Submitted 16 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: 27 pages, 20 figures

  47. arXiv:2504.06672  [pdf, other

    cs.CV

    RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism

    Authors: Elia Peruzzo, Dejia Xu, Xingqian Xu, Humphrey Shi, Nicu Sebe

    Abstract: Video generation is experiencing rapid growth, driven by advances in diffusion models and the development of better and larger datasets. However, producing high-quality videos remains challenging due to the high-dimensional data and the complexity of the task. Recent efforts have primarily focused on enhancing visual quality and addressing temporal inconsistencies, such as flickering. Despite prog… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Code available at: https://github.com/helia95/ragme

  48. TangibleNet: Synchronous Network Data Storytelling through Tangible Interactions in Augmented Reality

    Authors: Kentaro Takahira, Wong Kam-Kwai, Leni Yang, Xian Xu, Takanori Fujiwara, Huamin Qu

    Abstract: Synchronous data-driven storytelling with network visualizations presents significant challenges due to the complexity of real-time manipulation of network components. While existing research addresses asynchronous scenarios, there is a lack of effective tools for live presentations. To address this gap, we developed TangibleNet, a projector-based AR prototype that allows presenters to interact wi… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25), April 26-May 1, 2025, Yokohama, Japan

  49. arXiv:2504.04099  [pdf, other

    cs.CV cs.AI

    TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection

    Authors: Chunzhao Xie, Tongxuan Liu, Lei Jiang, Yuting Zeng, jinrong Guo, Yunheng Shen, Weizhe Huang, Jing Li, Xiaohua Xu

    Abstract: Large Vision-Language Models have demonstrated remarkable performance across various tasks; however, the challenge of hallucinations constrains their practical applications. The hallucination problem arises from multiple factors, including the inherent hallucinations in language models, the limitations of visual encoders in perception, and biases introduced by multimodal data. Extensive research h… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  50. arXiv:2504.02402  [pdf, other

    cs.SD cs.AI eess.AS

    EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling

    Authors: Hao Yin, Shi Guo, Xu Jia, Xudong XU, Lu Zhang, Si Liu, Dong Wang, Huchuan Lu, Tianfan Xue

    Abstract: When sound waves hit an object, they induce vibrations that produce high-frequency and subtle visual changes, which can be used for recovering the sound. Early studies always encounter trade-offs related to sampling rate, bandwidth, field of view, and the simplicity of the optical path. Recent advances in event camera hardware show good potential for its application in visual sound recovery, becau… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Our project page: https://yyzq1.github.io/EvMic/