Skip to main content

Showing 1–50 of 1,126 results for author: XU, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09568  [pdf, ps, other

    cs.CV cs.AI

    BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

    Authors: Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, Ran Xu

    Abstract: Unifying image understanding and generation has gained growing attention in recent research on multimodal models. Although design choices for image understanding have been extensively studied, the optimal model architecture and training recipe for a unified framework with image generation remain underexplored. Motivated by the strong potential of autoregressive and diffusion models for high-qualit… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2505.09315  [pdf, other

    cs.RO cs.CV cs.LG

    TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

    Authors: Xuefeng Jiang, Yuan Ma, Pengxiang Li, Leimeng Xu, Xin Wen, Kun Zhan, Zhongpu Xia, Peng Jia, XianPeng Lang, Sheng Sun

    Abstract: In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promising direction.In this work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Under review

  3. Towards Adaptive Meta-Gradient Adversarial Examples for Visual Tracking

    Authors: Wei-Long Tian, Peng Gao, Xiao Liu, Long Xu, Hamido Fujita, Hanan Aljuai, Mao-Li Wang

    Abstract: In recent years, visual tracking methods based on convolutional neural networks and Transformers have achieved remarkable performance and have been successfully applied in fields such as autonomous driving. However, the numerous security issues exposed by deep learning models have gradually affected the reliable application of visual tracking methods in real-world scenarios. Therefore, how to reve… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.08535  [pdf

    eess.SY cs.LG

    Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation

    Authors: Linna Xu, Yongli Zhu

    Abstract: This paper presents a modified model predictive control (MPC) framework for real-time power system operation. The framework incorporates a diffusion model tailored for time series generation to enhance the accuracy of the load forecasting module used in the system operation. In the absence of explicit state transition law, a model-identification procedure is leveraged to derive the system dynamics… ▽ More

    Submitted 14 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted by the 2025 IEEE PES General Meeting (PESGM), which will be held in Austin, TX, July 27-31, 2025

  5. arXiv:2505.07854  [pdf

    cs.AI cs.MA

    CCL: Collaborative Curriculum Learning for Sparse-Reward Multi-Agent Reinforcement Learning via Co-evolutionary Task Evolution

    Authors: Yufei Lin, Chengwei Ye, Huanzhen Zhang, Kangsheng Wang, Linuo Xu, Shuyan Liu, Zeyu Zhang

    Abstract: Sparse reward environments pose significant challenges in reinforcement learning, especially within multi-agent systems (MAS) where feedback is delayed and shared across agents, leading to suboptimal learning. We propose Collaborative Multi-dimensional Course Learning (CCL), a novel curriculum learning framework that addresses this by (1) refining intermediate tasks for individual agents, (2) usin… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  6. arXiv:2505.07573  [pdf, other

    cs.CV cs.AI

    Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework

    Authors: Sarah de Boer, Hartmut Häntze, Kiran Vaidhya Venkadesh, Myrthe A. D. Buser, Gabriel E. Humpire Mamani, Lina Xu, Lisa C. Adams, Jawed Nawabi, Keno K. Bressem, Bram van Ginneken, Mathias Prokop, Alessa Hering

    Abstract: Kidney abnormality segmentation has important potential to enhance the clinical workflow, especially in settings requiring quantitative assessments. Kidney volume could serve as an important biomarker for renal diseases, with changes in volume correlating directly with kidney function. Currently, clinical practice often relies on subjective visual assessment for evaluating kidney size and abnormal… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 35 pages, 11 figures

  7. Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes

    Authors: Xijie Yang, Linning Xu, Lihan Jiang, Dahua Lin, Bo Dai

    Abstract: 3D Gaussian Splatting (3DGS) enables the reconstruction of intricate digital 3D assets from multi-view images by leveraging a set of 3D Gaussian primitives for rendering. Its explicit and discrete representation facilitates the seamless composition of complex digital worlds, offering significant advantages over previous neural implicit methods. However, when applied to large-scale compositions, su… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: project page: https://xijie-yang.github.io/V3DG/

  8. arXiv:2505.05520  [pdf

    cs.CV cs.AI

    GaMNet: A Hybrid Network with Gabor Fusion and NMamba for Efficient 3D Glioma Segmentation

    Authors: Chengwei Ye, Huanzhen Zhang, Yufei Lin, Kangsheng Wang, Linuo Xu, Shuyan Liu

    Abstract: Gliomas are aggressive brain tumors that pose serious health risks. Deep learning aids in lesion segmentation, but CNN and Transformer-based models often lack context modeling or demand heavy computation, limiting real-time use on mobile medical devices. We propose GaMNet, integrating the NMamba module for global modeling and a multi-scale CNN for efficient local feature extraction. To improve int… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  9. arXiv:2505.03846  [pdf

    cs.CV cs.AI

    GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation

    Authors: Kangsheng Wang, Yuhang Li, Chengwei Ye, Yufei Lin, Huanzhen Zhang, Bohan Hu, Linuo Xu, Shuyan Liu

    Abstract: Apparent personality analysis from short videos poses significant chal-lenges due to the complex interplay of visual, auditory, and textual cues. In this paper, we propose GAME, a Graph-Augmented Multimodal Encoder designed to robustly model and fuse multi-source features for automatic personality prediction. For the visual stream, we construct a facial graph and introduce a dual-branch Geo Two-St… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  10. arXiv:2505.02166  [pdf, other

    cs.RO

    CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

    Authors: Xiaoqi Li, Lingyun Xu, Mingxu Zhang, Jiaming Liu, Yan Shen, Iaroslav Ponomarenko, Jiahui Xu, Liang Heng, Siyuan Huang, Shanghang Zhang, Hao Dong

    Abstract: In robotic, task goals can be conveyed through various modalities, such as language, goal images, and goal videos. However, natural language can be ambiguous, while images or videos may offer overly detailed specifications. To tackle these challenges, we introduce CrayonRobo that leverages comprehensive multi-modal prompts that explicitly convey both low-level actions and high-level planning in a… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  11. arXiv:2505.01743  [pdf, other

    cs.CV cs.AI cs.LG

    An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding

    Authors: Siyang Jiang, Bufang Yang, Lilin Xu, Mu Yuan, Yeerzhati Abudunuer, Kaiwei Liu, Liekang Zeng, Hongkai Chen, Zhenyu Yan, Xiaofan Jiang, Guoliang Xing

    Abstract: The rapid advancements in Large Vision Language Models (LVLMs) offer the potential to surpass conventional labeling by generating richer, more detailed descriptions of on-device human behavior understanding (HBU) in low-resolution vision systems, such as depth, thermal, and infrared. However, existing large vision language model (LVLM) approaches are unable to understand low-resolution data well a… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  12. arXiv:2505.01083  [pdf, other

    cs.RO

    DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction

    Authors: Xiaoyi Lin, Kunpeng Yao, Lixin Xu, Xueqiang Wang, Xuetao Li, Yuchen Wang, Miao Li

    Abstract: Despite advances in hand-object interaction modeling, generating realistic dexterous manipulation data for robotic hands remains a challenge. Retargeting methods often suffer from low accuracy and fail to account for hand-object interactions, leading to artifacts like interpenetration. Generative methods, lacking human hand priors, produce limited and unnatural poses. We propose a data transformat… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  13. arXiv:2504.21226  [pdf, other

    cs.CV cs.AI

    MemeBLIP2: A novel lightweight multimodal system to detect harmful memes

    Authors: Jiaqi Liu, Ran Tong, Aowei Shen, Shuzheng Li, Changlin Yang, Lisha Xu

    Abstract: Memes often merge visuals with brief text to share humor or opinions, yet some memes contain harmful messages such as hate speech. In this paper, we introduces MemeBLIP2, a light weight multimodal system that detects harmful memes by combining image and text features effectively. We build on previous studies by adding modules that align image and text representations into a shared space and fuse t… ▽ More

    Submitted 6 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: 11pages, 3 figures, manucripts in preparation

  14. arXiv:2504.19730  [pdf, other

    cs.SE cs.CL

    Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge

    Authors: Wenhan Mu, Ling Xu, Shuren Pei, Le Mi, Huichi Zhou

    Abstract: The widespread adoption of code language models in software engineering tasks has exposed vulnerabilities to adversarial attacks, especially the identifier substitution attacks. Although existing identifier substitution attackers demonstrate high success rates, they often produce adversarial examples with unnatural code patterns. In this paper, we systematically assess the quality of adversarial e… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 25 pages, 6 figures

  15. arXiv:2504.19649  [pdf, other

    cs.LG cs.AR

    Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models

    Authors: Lei Xu, Shanshan Wang, Emmanuel Casseau, Chenglong Xiao

    Abstract: High-level synthesis (HLS) design space exploration (DSE) is an optimization process in electronic design automation (EDA) that systematically explores high-level design configurations to achieve Pareto-optimal hardware implementations balancing performance, area, and power (PPA). To optimize this process, HLS prediction tasks often employ message-passing neural networks (MPNNs), leveraging comple… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  16. arXiv:2504.19566  [pdf, other

    cs.CR

    Metadata-private Messaging without Coordination

    Authors: Peipei Jiang, Yihao Wu, Lei Xu, Wentao Dong, Peiyuan Chen, Yulong Ming, Cong Wang, Xiaohua Jia, Qian Wang

    Abstract: For those seeking end-to-end private communication free from pervasive metadata tracking and censorship, the Tor network has been the de-facto choice in practice, despite its susceptibility to traffic analysis attacks. Recently, numerous metadata-private messaging proposals have emerged with the aim to surpass Tor in the messaging context by obscuring the relationships between any two messaging bu… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  17. arXiv:2504.16943  [pdf, other

    cs.CY cs.LG

    Flexibility of German gas-fired generation: evidence from clustering empirical operation

    Authors: Chiara Fusar Bassini, Alice Lixuan Xu, Jorge Sánchez Canales, Lion Hirth, Lynn H. Kaack

    Abstract: A key input to energy models are assumptions about the flexibility of power generation units, i.e., how quickly and often they can start up. These assumptions are usually calibrated on the technical characteristics of the units, such as installed capacity or technology type. However, even if power generation units technically can dispatch flexibly, service obligations and market incentives may con… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 29 pages, 6 figures, 6 tables

  18. arXiv:2504.15770  [pdf, other

    cs.CV

    Multi-Scale Tensorial Summation and Dimensional Reduction Guided Neural Network for Edge Detection

    Authors: Lei Xu, Mehmet Yamac, Mete Ahishali, Moncef Gabbouj

    Abstract: Edge detection has attracted considerable attention thanks to its exceptional ability to enhance performance in downstream computer vision tasks. In recent years, various deep learning methods have been explored for edge detection tasks resulting in a significant performance improvement compared to conventional computer vision algorithms. In neural networks, edge detection tasks require considerab… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  19. arXiv:2504.15742  [pdf, other

    cs.DB cs.SE

    Proving Cypher Query Equivalence

    Authors: Lei Tang, Wensheng Dou, Yingying Zheng, Lijie Xu, Wei Wang, Jun Wei, Tao Huang

    Abstract: Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing graph query performance, ensuring graph query reliability, etc. Although researchers have proposed many SQL query equivalence provers for relational database syste… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 14 pages, accepted by ICDE 2025

  20. arXiv:2504.15247  [pdf, other

    cs.DB

    Lance: Efficient Random Access in Columnar Storage through Adaptive Structural Encodings

    Authors: Weston Pace, Chang She, Lei Xu, Will Jones, Albert Lockett, Jun Wang, Raunak Shah

    Abstract: The growing interest in artificial intelligence has created workloads that require both sequential and random access. At the same time, NVMe-backed storage solutions have emerged, providing caching capability for large columnar datasets in cloud storage. Current columnar storage libraries fall short of effectively utilizing an NVMe device's capabilities, especially when it comes to random access.… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    ACM Class: H.3.2

  21. arXiv:2504.13167  [pdf, other

    cs.CV

    ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos

    Authors: Zetong Zhang, Manuel Kaufmann, Lixin Xue, Jie Song, Martin R. Oswald

    Abstract: Creating a photorealistic scene and human reconstruction from a single monocular in-the-wild video figures prominently in the perception of a human-centric 3D world. Recent neural rendering advances have enabled holistic human-scene reconstruction but require pre-calibrated camera and human poses, and days of training time. In this work, we introduce a novel unified framework that simultaneously p… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted at CVPR 2025

    ACM Class: I.4.5

  22. arXiv:2504.12710  [pdf, other

    quant-ph cs.ET

    Quantum circuit synthesis with qudit phase gadget method

    Authors: Shuai Yang, Lihao Xu, Guojing Tian, Xiaoming Sun

    Abstract: Current quantum devices have unutilized high-level quantum resources. More and more attention has been paid to the qudit quantum systems with larger than two dimensions to maximize the potential computing power of quantum computation. Then, a natural problem arises: How do we implement quantum algorithms on qudit quantum systems? In this work, we propose a novel qudit phase gadget method for synth… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 13 pages, 10 figures

  23. arXiv:2504.12597  [pdf, other

    cs.CL

    GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning

    Authors: Liangyu Xu, Yingxiu Zhao, Jingyun Wang, Yingyao Wang, Bu Pi, Chen Wang, Mingliang Zhang, Jihao Gu, Xiang Li, Xiaoyong Zhu, Jun Song, Bo Zheng

    Abstract: Geometry problem-solving (GPS), a challenging task requiring both visual comprehension and symbolic reasoning, effectively measures the reasoning capabilities of multimodal large language models (MLLMs). Humans exhibit strong reasoning ability in this task through accurate identification and adaptive application of geometric principles within visual contexts. However, existing benchmarks fail to j… ▽ More

    Submitted 23 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: 10 pages, 8 figures

  24. arXiv:2504.12341  [pdf, other

    cs.CL

    Streamlining Biomedical Research with Specialized LLMs

    Authors: Linqing Chen, Weilei Wang, Yubin Xia, Wentao Wu, Peng Xu, Zilong Bai, Jie Fang, Chaobo Xu, Ran Hu, Licong Xu, Haoran Hua, Jing Sun, Hanmeng Zhong, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yong Gu, Tao Shi, Chaochao Wang, Jianping Lu, Cheng Sun, Yixin Wang , et al. (8 additional authors not shown)

    Abstract: In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, imag… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations,p9--19,2025

  25. Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment

    Authors: Kangsheng Wang, Chengwei Ye, Huanzhen Zhang, Linuo Xu, Shuyan Liu

    Abstract: Predicting personality traits automatically has become a challenging problem in computer vision. This paper introduces an innovative multimodal feature learning framework for personality analysis in short video clips. For visual processing, we construct a facial graph and design a Geo-based two-stream network incorporating an attention mechanism, leveraging both Graph Convolutional Networks (GCN)… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Journal ref: IECE Trans. Emerg. Top. Artif. Intell. 2 (2025) 57--67

  26. arXiv:2504.09446  [pdf, other

    cs.CV

    Sparse Deformable Mamba for Hyperspectral Image Classification

    Authors: Lincoln Linlin Xu, Yimin Zhu, Zack Dewis, Zhengsen Xu, Motasem Alkayid, Mabel Heffring, Saeid Taleghanidoozdoozan

    Abstract: Although Mamba models significantly improve hyperspectral image (HSI) classification, one critical challenge is the difficulty in building the sequence of Mamba tokens efficiently. This paper presents a Sparse Deformable Mamba (SDMamba) approach for enhanced HSI classification, with the following contributions. First, to enhance Mamba sequence, an efficient Sparse Deformable Sequencing (SDS) appro… ▽ More

    Submitted 15 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  27. arXiv:2504.05049  [pdf, other

    cs.CV

    CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

    Authors: Shuai Chen, Fanman Meng, Haoran Wei, Chenhao Wu, Qingbo Wu, Linfeng Xu, Hongliang Li

    Abstract: Few-shot segmentation (FSS) aims to segment new classes using few annotated images. While recent FSS methods have shown considerable improvements by leveraging Segment Anything Model (SAM), they face two critical limitations: insufficient utilization of structural correlations in query images, and significant information loss when converting continuous position priors to discrete point prompts. To… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: 7 figures

  28. arXiv:2504.04516  [pdf, other

    cs.RO

    DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Cluttered Environments

    Authors: Lixin Xu, Zixuan Liu, Zhewei Gui, Jingxiang Guo, Zeyu Jiang, Zhixuan Xu, Chongkai Gao, Lin Shao

    Abstract: Grasping objects in cluttered environments remains a fundamental yet challenging problem in robotic manipulation. While prior works have explored learning-based synergies between pushing and grasping for two-fingered grippers, few have leveraged the high degrees of freedom (DoF) in dexterous hands to perform efficient singulation for grasping in cluttered settings. In this work, we introduce DexSi… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  29. arXiv:2504.04121  [pdf, other

    cs.AI

    Improving Question Embeddings with Cognitiv Representation Optimization for Knowledge Tracing

    Authors: Lixiang Xu, Xianwei Ding, Xin Yuan, Zhanlong Wang, Lu Bai, Enhong Chen, Philip S. Yu, Yuanyan Tang

    Abstract: The Knowledge Tracing (KT) aims to track changes in students' knowledge status and predict their future answers based on their historical answer records. Current research on KT modeling focuses on predicting student' future performance based on existing, unupdated records of student learning interactions. However, these approaches ignore the distractors (such as slipping and guessing) in the answe… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  30. arXiv:2504.03748  [pdf, other

    cs.LG cs.AI cs.CL

    TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images

    Authors: Kaiyuan Hou, Minghui Zhao, Lilin Xu, Yuang Fan, Xiaofan Jiang

    Abstract: The rapid emergence of Vision-Language Models (VLMs) has significantly advanced multimodal understanding, enabling applications in scene comprehension and visual reasoning. While these models have been primarily evaluated and developed for front-view image understanding, their capabilities in interpreting top-down images have received limited attention, partly due to the scarcity of diverse top-do… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  31. arXiv:2504.03648  [pdf, other

    cs.DC cs.AI

    AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure

    Authors: The AIBrix Team, Jiaxin Shan, Varun Gupta, Le Xu, Haiyang Shi, Jingyuan Zhang, Ning Wang, Linhui Xu, Rong Kang, Tongping Liu, Yifei Zhang, Yiqing Zhu, Shuowei Jin, Gangmuk Lim, Binbin Chen, Zuzhi Chen, Xiao Liu, Xin Chen, Kante Yin, Chak-Pong Chung, Chenyu Jiang, Yicheng Lu, Jianjun Chen, Caixue Lin, Wu Xiang , et al. (2 additional authors not shown)

    Abstract: We introduce AIBrix, a cloud-native, open-source framework designed to optimize and simplify large-scale LLM deployment in cloud environments. Unlike traditional cloud-native stacks, AIBrix follows a co-design philosophy, ensuring every layer of the infrastructure is purpose-built for seamless integration with inference engines like vLLM. AIBrix introduces several key innovations to reduce inferen… ▽ More

    Submitted 22 February, 2025; originally announced April 2025.

  32. arXiv:2504.02878  [pdf, other

    cs.CV cs.LG

    Exploring the Capabilities of LLMs for IMU-based Fine-grained Human Activity Understanding

    Authors: Lilin Xu, Kaiyuan Hou, Xiaofan Jiang

    Abstract: Human activity recognition (HAR) using inertial measurement units (IMUs) increasingly leverages large language models (LLMs), yet existing approaches focus on coarse activities like walking or running. Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy. In this wor… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Accepted to The 2nd International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys 2025)

  33. arXiv:2504.02222  [pdf, other

    eess.IV cs.CV

    APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification

    Authors: Liying Xu, Hongliang He, Wei Han, Hanbin Huang, Siwei Feng, Guohong Fu

    Abstract: Nuclear instance segmentation and classification provide critical quantitative foundations for digital pathology diagnosis. With the advent of the foundational Segment Anything Model (SAM), the accuracy and efficiency of nuclear segmentation have improved significantly. However, SAM imposes a strong reliance on precise prompts, and its class-agnostic design renders its classification results entir… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 10 pages, 3 figures

  34. arXiv:2504.01735  [pdf, other

    cs.CV cs.AI

    AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization

    Authors: Chaohu Liu, Tianyi Gui, Yu Liu, Linli Xu

    Abstract: Large Vision-Language Models (LVLMs), such as GPT-4o and LLaVA, have recently witnessed remarkable advancements and are increasingly being deployed in real-world applications. However, inheriting the sensitivity of visual neural networks, LVLMs remain vulnerable to adversarial attacks, which can result in erroneous or malicious outputs. While existing efforts utilize adversarial fine-tuning to enh… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  35. arXiv:2504.00241  [pdf, other

    cs.CL cs.AI

    Synthesizing Public Opinions with LLMs: Role Creation, Impacts, and the Future to eDemorcacy

    Authors: Rabimba Karanjai, Boris Shor, Amanda Austin, Ryan Kennedy, Yang Lu, Lei Xu, Weidong Shi

    Abstract: This paper investigates the use of Large Language Models (LLMs) to synthesize public opinion data, addressing challenges in traditional survey methods like declining response rates and non-response bias. We introduce a novel technique: role creation based on knowledge injection, a form of in-context learning that leverages RAG and specified personality profiles from the HEXACO model and demographi… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  36. arXiv:2503.23875  [pdf, other

    cs.RO cs.AI cs.MA

    GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models

    Authors: Wenkang Ji, Huaben Chen, Mingyang Chen, Guobin Zhu, Lufeng Xu, Roderich Groß, Rui Zhou, Ming Cao, Shiyu Zhao

    Abstract: The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has motivated research on methods to automatically create control policies. However, these methods require iterative processes of manually crafting and refining objective functions, thereby prolonging the development… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  37. arXiv:2503.23776  [pdf, other

    cs.DB

    VIDEX: A Disaggregated and Extensible Virtual Index for the Cloud and AI Era

    Authors: Rong Kang, Shuai Wang, Tieying Zhang, Xianghong Xu, Linhui Xu, Zhimin Liang, Lei Zhang, Rui Shi, Jianjun Chen

    Abstract: Virtual index, also known as hypothetical indexes, play a crucial role in database query optimization. However, with the rapid advancement of cloud computing and AI-driven models for database optimization, traditional virtual index approaches face significant challenges. Cloud-native environments often prohibit direct conducting query optimization process on production databases due to stability r… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 4 pages, 2 figures

  38. arXiv:2503.23085  [pdf

    cs.RO

    Microscopic Robots That Sense, Think, Act, and Compute

    Authors: Maya M. Lassiter, Jungho Lee, Kyle Skelil, Li Xu, Lucas Hanson, William H. Reinhardt, Dennis Sylvester, Mark Yim, David Blaauw, Marc Z. Miskin

    Abstract: While miniaturization has been a goal in robotics for nearly 40 years, roboticists have struggled to access sub-millimeter dimensions without making sacrifices to on-board information processing due to the unique physics of the microscale. Consequently, microrobots often lack the key features that distinguish their macroscopic cousins from other machines, namely on-robot systems for decision makin… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

    Comments: 17 pages, 5 figures with supplement

  39. arXiv:2503.21979  [pdf, other

    cs.CV

    Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

    Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Zhonghua Wu, Qingyi Tao, Wentao Liu, Wei Li, Chen Change Loy

    Abstract: Unifying visual understanding and generation within a single multimodal framework remains a significant challenge, as the two inherently heterogeneous tasks require representations at different levels of granularity. Current approaches that utilize vector quantization (VQ) or variational autoencoders (VAE) for unified visual representation prioritize intrinsic imagery features over semantics, comp… ▽ More

    Submitted 22 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  40. arXiv:2503.21268  [pdf, other

    cs.CV

    ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate

    Authors: Ming Yan, Xincheng Lin, Yuhua Luo, Shuqi Fan, Yudi Dai, Qixin Zhong, Lincai Zhong, Yuexin Ma, Lan Xu, Chenglu Wen, Siqi Shen, Cheng Wang

    Abstract: Human Motion Recovery (HMR) research mainly focuses on ground-based motions such as running. The study on capturing climbing motion, an off-ground motion, is sparse. This is partly due to the limited availability of climbing motion datasets, especially large-scale and challenging 3D labeled datasets. To address the insufficiency of climbing motion datasets, we collect AscendMotion, a large-scale w… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: CVPR2025, project in \href{this link}{http://www.lidarhumanmotion.net/climbingcap/}

  41. arXiv:2503.20648  [pdf, other

    cs.CL cs.AI

    TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes

    Authors: Raj Sanjay Shah, Lei Xu, Qianchu Liu, Jon Burnsky, Drew Bertagnolli, Chaitanya Shivade

    Abstract: Behavioral therapy notes are important for both legal compliance and patient care. Unlike progress notes in physical health, quality standards for behavioral therapy notes remain underdeveloped. To address this gap, we collaborated with licensed therapists to design a comprehensive rubric for evaluating therapy notes across key dimensions: completeness, conciseness, and faithfulness. Further, we e… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  42. arXiv:2503.18349  [pdf, other

    cs.CV

    Human-Object Interaction with Vision-Language Model Guided Relative Movement Dynamics

    Authors: Zekai Deng, Ye Shi, Kaiyang Ji, Lan Xu, Shaoli Huang, Jingya Wang

    Abstract: Human-Object Interaction (HOI) is vital for advancing simulation, animation, and robotics, enabling the generation of long-term, physically plausible motions in 3D environments. However, existing methods often fall short of achieving physics realism and supporting diverse types of interactions. To address these challenges, this paper introduces a unified Human-Object Interaction framework that pro… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  43. arXiv:2503.17666  [pdf, other

    cs.LG q-bio.QM

    Multi-Modality Representation Learning for Antibody-Antigen Interactions Prediction

    Authors: Peijin Guo, Minghui Li, Hewen Pan, Ruixiang Huang, Lulu Xue, Shengqing Hu, Zikang Guo, Wei Wan, Shengshan Hu

    Abstract: While deep learning models play a crucial role in predicting antibody-antigen interactions (AAI), the scarcity of publicly available sequence-structure pairings constrains their generalization. Current AAI methods often focus on residue-level static details, overlooking fine-grained structural representations of antibodies and their inter-antibody similarities. To tackle this challenge, we introdu… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: 2025 IEEE International Conference on Multimedia and Expo (ICME 2025), June 30 - July 4, 2025, Nantes, France

  44. arXiv:2503.16187  [pdf, other

    cs.LG stat.ML

    Manifold learning in metric spaces

    Authors: Liane Xu, Amit Singer

    Abstract: Laplacian-based methods are popular for dimensionality reduction of data lying in $\mathbb{R}^N$. Several theoretical results for these algorithms depend on the fact that the Euclidean distance approximates the geodesic distance on the underlying submanifold which the data are assumed to lie on. However, for some applications, other metrics, such as the Wasserstein distance, may provide a more app… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  45. Comparative and Interpretative Analysis of CNN and Transformer Models in Predicting Wildfire Spread Using Remote Sensing Data

    Authors: Yihang Zhou, Ruige Kong, Zhengsen Xu, Linlin Xu, Sibo Cheng

    Abstract: Facing the escalating threat of global wildfires, numerous computer vision techniques using remote sensing data have been applied in this area. However, the selection of deep learning methods for wildfire prediction remains uncertain due to the lack of comparative analysis in a quantitative and explainable manner, crucial for improving prevention measures and refining models. This study aims to th… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  46. arXiv:2503.13559  [pdf

    cs.LG

    Dynamical Mode Recognition of Turbulent Flames in a Swirl-stabilized Annular Combustor by a Time-series Learning Approach

    Authors: Tao Yang, Weiming Xu, Liangliang Xu, Peng Zhang

    Abstract: Thermoacoustic instability in annular combustors, essential to aero engines and modern gas turbines, can severely impair operational stability and efficiency, accurately recognizing and understanding various combustion modes is the prerequisite for understanding and controlling combustion instabilities. However, the high-dimensional spatial-temporal dynamics of turbulent flames typically pose cons… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 5 pages, 3 figures

  47. arXiv:2503.12662  [pdf, other

    cs.LG

    TuneNSearch: a hybrid transfer learning and local search approach for solving vehicle routing problems

    Authors: Arthur Corrêa, Cristóvão Silva, Liming Xu, Alexandra Brintrup, Samuel Moniz

    Abstract: This paper introduces TuneNSearch, a hybrid transfer learning and local search approach for addressing different variants of vehicle routing problems (VRP). Recently, multi-task learning has gained much attention for solving VRP variants. However, this adaptability often compromises the performance of the models. To address this challenge, we first pre-train a reinforcement learning model on the m… ▽ More

    Submitted 14 May, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  48. arXiv:2503.12242  [pdf, other

    cs.CV

    RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance

    Authors: Yuheng Jiang, Zhehao Shen, Chengcheng Guo, Yu Hong, Zhuo Su, Yingliang Zhang, Marc Habermann, Lan Xu

    Abstract: Human-centric volumetric videos offer immersive free-viewpoint experiences, yet existing methods focus either on replaying general dynamic scenes or animating human avatars, limiting their ability to re-perform general dynamic scenes. In this paper, we present RePerformer, a novel Gaussian-based representation that unifies playback and re-performance for high-fidelity human-centric volumetric vide… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025. Project Page: https://moqiyinlun.github.io/Reperformer/

  49. arXiv:2503.12220  [pdf, other

    cs.LG cs.CR

    PA-CFL: Privacy-Adaptive Clustered Federated Learning for Transformer-Based Sales Forecasting on Heterogeneous Retail Data

    Authors: Yunbo Long, Liming Xu, Ge Zheng, Alexandra Brintrup

    Abstract: Federated learning (FL) enables retailers to share model parameters for demand forecasting while maintaining privacy. However, heterogeneous data across diverse regions, driven by factors such as varying consumer behavior, poses challenges to the effectiveness of federated learning. To tackle this challenge, we propose Privacy-Adaptive Clustered Federated Learning (PA-CFL) tailored for demand fore… ▽ More

    Submitted 21 March, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

  50. arXiv:2503.12156  [pdf, other

    cs.LG cs.SI

    Efficient and Privacy-Preserved Link Prediction via Condensed Graphs

    Authors: Yunbo Long, Liming Xu, Alexandra Brintrup

    Abstract: Link prediction is crucial for uncovering hidden connections within complex networks, enabling applications such as identifying potential customers and products. However, this research faces significant challenges, including concerns about data privacy, as well as high computational and storage costs, especially when dealing with large-scale networks. Condensed graphs, which are much smaller than… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.