Skip to main content

Showing 1–50 of 3,842 results for author: Chen, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10257  [pdf, ps, other

    cs.CV

    Sage Deer: A Super-Aligned Driving Generalist Is Your Copilot

    Authors: Hao Lu, Jiaqi Tang, Jiyao Wang, Yunfan LU, Xu Cao, Qingyong Hu, Yin Wang, Yuting Zhang, Tianxin Xie, Yunpeng Zhang, Yong Chen, Jiayu. Gao, Bin Huang, Dengbo He, Shuiguang Deng, Hao Chen, Ying-Cong Chen

    Abstract: The intelligent driving cockpit, an important part of intelligent driving, needs to match different users' comfort, interaction, and safety needs. This paper aims to build a Super-Aligned and GEneralist DRiving agent, SAGE DeeR. Sage Deer achieves three highlights: (1) Super alignment: It achieves different reactions according to different people's preferences and biases. (2) Generalist: It can u… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10134  [pdf, other

    eess.SP cs.AI cs.LG

    Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks

    Authors: Guangjin Pan, Kaixuan Huang, Hui Chen, Shunqing Zhang, Christian Häger, Henk Wymeersch

    Abstract: Accurate and robust localization is a critical enabler for emerging 5G and 6G applications, including autonomous driving, extended reality (XR), and smart manufacturing. While data-driven approaches have shown promise, most existing models require large amounts of labeled data and struggle to generalize across deployment scenarios and wireless configurations. To address these limitations, we propo… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 13 pages,16 figures.This work has been submitted to the IEEE for possible publication

  3. arXiv:2505.10039  [pdf, other

    cs.LG

    Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates

    Authors: Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang

    Abstract: Circuit discovery has gradually become one of the prominent methods for mechanistic interpretability, and research on circuit completeness has also garnered increasing attention. Methods of circuit discovery that do not guarantee completeness not only result in circuits that are not fixed across different runs but also cause key mechanisms to be omitted. The nature of incompleteness arises from th… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 10 pages

  4. arXiv:2505.09882  [pdf, ps, other

    cs.HC

    SnapNCode: An Integrated Development Environment for Programming Physical Objects Interactions

    Authors: Xiaoyan Wei, Zijian Yue, Hsiang-Ting Chen

    Abstract: Spatial computing technologies have the potential to revolutionize how we interact with the world around us. However, most modern integrated development environments (IDEs) have not fully adapted to this paradigm shift. For example, physical 3D objects in the real world are still represented as 2D text variables in code, creating a significant perceptual distance between these representations. In… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 18 pages, HCII 2025

  5. arXiv:2505.09872  [pdf, ps, other

    cs.HC

    Context-AI Tunes: Context-Aware AI-Generated Music for Stress Reduction

    Authors: Xiaoyan Wei, Zebang Zhang, Zijian Yue, Hsiang-Ting Chen

    Abstract: Music plays a critical role in emotional regulation and stress relief; however, individuals often need different types of music tailored to their unique stress levels or surrounding environment. Choosing the right music can be challenging due to the overwhelming number of options and the time-consuming trial-and-error process. To address this, we propose Context-AI Tune (CAT), a system that genera… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 17 pages, HCII 2025

  6. arXiv:2505.09809  [pdf, ps, other

    math.CO cs.DM

    On Alternating 6-Cycles in Edge-Coloured Graphs

    Authors: Hao Chen, Jonathan A. Noel

    Abstract: In this short note, we use flag algebras to prove that the number of colour alternating 6-cycles in a red/blue colouring of a large clique is asymptotically maximized by a uniformly random colouring. This settles the first open case of a problem of Basit, Granet, Horsley, Kündgen and Staden.

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 19 pages, 2 figures

    MSC Class: 05C35

  7. arXiv:2505.09702  [pdf, ps, other

    cs.LG

    Enabling Group Fairness in Graph Unlearning via Bi-level Debiasing

    Authors: Yezi Liu, Prathyush Poduval, Wenjun Huang, Yang Ni, Hanning Chen, Mohsen Imani

    Abstract: Graph unlearning is a crucial approach for protecting user privacy by erasing the influence of user data on trained graph models. Recent developments in graph unlearning methods have primarily focused on maintaining model prediction performance while removing user information. However, we have observed that when user information is deleted from the model, the prediction distribution across differe… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  8. arXiv:2505.08438  [pdf, other

    cs.CV cs.AI

    A Survey of 3D Reconstruction with Event Cameras: From Event-based Geometry to Neural 3D Rendering

    Authors: Chuanzhi Xu, Haoxian Zhou, Langyi Chen, Haodong Chen, Ying Zhou, Vera Chung, Qiang Qu

    Abstract: Event cameras have emerged as promising sensors for 3D reconstruction due to their ability to capture per-pixel brightness changes asynchronously. Unlike conventional frame-based cameras, they produce sparse and temporally rich data streams, which enable more accurate 3D reconstruction and open up the possibility of performing reconstruction in extreme environments such as high-speed motion, low l… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 35 pages, 12 figures, 11 tables

  9. arXiv:2505.08295  [pdf, ps, other

    cs.LG cs.AI

    A Practical Introduction to Deep Reinforcement Learning

    Authors: Yinghan Sun, Hongxi Wang, Hua Chen, Wei Zhang

    Abstract: Deep reinforcement learning (DRL) has emerged as a powerful framework for solving sequential decision-making problems, achieving remarkable success in a wide range of applications, including game AI, autonomous driving, biomedicine, and large language models. However, the diversity of algorithms and the complexity of theoretical foundations often pose significant challenges for beginners seeking t… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  10. arXiv:2505.08013  [pdf, ps, other

    cs.CV

    RDD: Robust Feature Detector and Descriptor using Deformable Transformer

    Authors: Gonglin Chen, Tianwen Fu, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao

    Abstract: As a core step in structure-from-motion and SLAM, robust feature detection and description under challenging scenarios such as significant viewpoint changes remain unresolved despite their ubiquity. While recent works have identified the importance of local features in modeling geometric transformations, these methods fail to learn the visual cues present in long-range relationships. We present Ro… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  11. arXiv:2505.07921  [pdf, ps, other

    cs.LG cs.AI

    Self-cross Feature based Spiking Neural Networks for Efficient Few-shot Learning

    Authors: Qi Xu, Junyang Zhu, Dongdong Zhou, Hao Chen, Yang Liu, Jiangrong Shen, Qiang Zhang

    Abstract: Deep neural networks (DNNs) excel in computer vision tasks, especially, few-shot learning (FSL), which is increasingly important for generalizing from limited examples. However, DNNs are computationally expensive with scalability issues in real world. Spiking Neural Networks (SNNs), with their event-driven nature and low energy consumption, are particularly efficient in processing sparse and dynam… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  12. arXiv:2505.07839  [pdf

    eess.IV cs.AI

    Sub-diffraction terahertz backpropagation compressive imaging

    Authors: Yongsheng Zhu, Shaojing Liu, Ximiao Wang, Runli Li, Haili Yang, Jiali Wang, Hongjia Zhu, Yanlin Ke, Ningsheng Xu, Huanjun Chen, Shaozhi Deng

    Abstract: Terahertz single-pixel imaging (TSPI) has garnered significant attention due to its simplicity and cost-effectiveness. However, the relatively long wavelength of THz waves limits sub-diffraction-scale imaging resolution. Although TSPI technique can achieve sub-wavelength resolution, it requires harsh experimental conditions and time-consuming processes. Here, we propose a sub-diffraction THz backp… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  13. KAQG: A Knowledge-Graph-Enhanced RAG for Difficulty-Controlled Question Generation

    Authors: Ching Han Chen, Ming Fang Shiu

    Abstract: KAQG introduces a decisive breakthrough for Retrieval-Augmented Generation (RAG) by explicitly tackling the two chronic weaknesses of current pipelines: transparent multi-step reasoning and fine-grained cognitive difficulty control. This transforms RAG from a passive retriever into an accountable generator of calibrated exam items. Technically, the framework fuses knowledge graphs, RAG retrieval,… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  14. AgentFlow: Resilient Adaptive Cloud-Edge Framework for Multi-Agent Coordination

    Authors: Ching Han Chen, Ming Fang Shiu

    Abstract: This paper presents AgentFlow, a MAS-based framework for programmable distributed systems in heterogeneous cloud-edge environments. It introduces logistics objects and abstract agent interfaces to enable dynamic service flows and modular orchestration. AgentFlow supports decentralized publish-subscribe messaging and many-to-many service elections, enabling decision coordination without a central s… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 8 pages, 9 figures, 3 tables

  15. arXiv:2505.07130  [pdf, ps, other

    cs.IT

    Minimal Linear Codes Violating the Ashikhmin-Barg Condition from Arbitrary Projective Linear Codes

    Authors: Hao Chen, Yaqi Chen, Conghui Xie, Huimin Lao

    Abstract: In recent years, there have been many constructions of minimal linear codes violating the Ashikhmin-Barg condition from Boolean functions, linear codes with few nonzero weights or partial difference sets. In this paper, we first give a general method to transform a minimal code satisfying the Ashikhmin-Barg condition to a minimal code violating the Ashikhmin-Barg condition. Then we give a construc… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 27 pages

  16. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  17. arXiv:2505.06948  [pdf, other

    cs.CV cs.LG

    Unsupervised Learning for Class Distribution Mismatch

    Authors: Pan Du, Wangbo Zhao, Xinai Lu, Nian Liu, Zhikai Li, Chaoyu Gong, Suyun Zhao, Hong Chen, Cuiping Li, Kai Wang, Yang You

    Abstract: Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an "other" category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability a… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  18. arXiv:2505.06905  [pdf, ps, other

    cs.CV eess.IV

    Enhancing Monocular Height Estimation via Sparse LiDAR-Guided Correction

    Authors: Jian Song, Hongruixuan Chen, Naoto Yokoya

    Abstract: Monocular height estimation (MHE) from very-high-resolution (VHR) remote sensing imagery via deep learning is notoriously challenging due to the lack of sufficient structural information. Conventional digital elevation models (DEMs), typically derived from airborne LiDAR or multi-view stereo, remain costly and geographically limited. Recently, models trained on synthetic data and refined through d… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  19. arXiv:2505.06576  [pdf, ps, other

    cs.CV cs.AI

    Two-Stage Random Alternation Framework for Zero-Shot Pansharpening

    Authors: Haorui Chen, Zeyu Ren, Jiaxuan Ren, Ran Ran, Jinliang Shao, Jie Huang, Liangjian Deng

    Abstract: In recent years, pansharpening has seen rapid advancements with deep learning methods, which have demonstrated impressive fusion quality. However, the challenge of acquiring real high-resolution images limits the practical applicability of these methods. To address this, we propose a two-stage random alternating framework (TRA-PAN) that effectively integrates strong supervision constraints from re… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  20. arXiv:2505.06302  [pdf, other

    cs.LG cs.AI

    QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives

    Authors: Xuzhi Zhang, Shaohui Peng, Qirui Zhou, Yuanbo Wen, Qi Guo, Ruizhi Chen, Xinguo Zhu, Weiqiang Xiong, Haixin Chen, Congying Ma, Ke Gao, Chen Zhao, Yanjun Wu, Yunji Chen, Ling Li

    Abstract: Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks po… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

    ACM Class: I.2.2

  21. arXiv:2505.06274  [pdf, ps, other

    cs.LG cs.AI

    PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

    Authors: Baijiong Lin, Weisen Jiang, Yuancheng Xu, Hao Chen, Ying-Cong Chen

    Abstract: Multi-objective test-time alignment aims to adapt large language models (LLMs) to diverse multi-dimensional user preferences during inference while keeping LLMs frozen. Recently, GenARM (Xu et al., 2025) first independently trains Autoregressive Reward Models (ARMs) for each preference dimension without awareness of each other, then combines their outputs based on user-specific preference vectors… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  22. arXiv:2505.06117  [pdf, other

    cs.CV

    Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation

    Authors: Dongying Li, Binyi Su, Hua Zhang, Yong Li, Haiyong Chen

    Abstract: Accurate defect detection of photovoltaic (PV) cells is critical for ensuring quality and efficiency in intelligent PV manufacturing systems. However, the scarcity of rich defect data poses substantial challenges for effective model training. While existing methods have explored generative models to augment datasets, they often suffer from instability, limited diversity, and domain shifts. To addr… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  23. arXiv:2505.05517  [pdf, other

    cs.CV cs.LG cs.RO

    Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions

    Authors: Hongyi Chen, Yunchao Yao, Yufei Ye, Zhixuan Xu, Homanga Bharadhwaj, Jiashun Wang, Shubham Tulsiani, Zackory Erickson, Jeffrey Ichnowski

    Abstract: Functional grasp is essential for enabling dexterous multi-finger robot hands to manipulate objects effectively. However, most prior work either focuses on power grasping, which simply involves holding an object still, or relies on costly teleoperated robot demonstrations to teach robots how to grasp each object functionally. Instead, we propose extracting human grasp information from web images s… ▽ More

    Submitted 12 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

  24. arXiv:2505.05209  [pdf, other

    cs.CV

    EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution

    Authors: Haizhen Xie, Kunpeng Du, Qiangyu Yan, Sen Lu, Jianhong Han, Hanting Chen, Hailin Hu, Jie Hu

    Abstract: Utilizing pre-trained Text-to-Image (T2I) diffusion models to guide Blind Super-Resolution (BSR) has become a predominant approach in the field. While T2I models have traditionally relied on U-Net architectures, recent advancements have demonstrated that Diffusion Transformers (DiT) achieve significantly higher performance in this domain. In this work, we introduce Enhancing Anything Model (EAM),… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  25. arXiv:2505.04652  [pdf, other

    eess.IV cs.CV

    Rethinking Boundary Detection in Deep Learning-Based Medical Image Segmentation

    Authors: Yi Lin, Dong Zhang, Xiao Fang, Yufan Chen, Kwang-Ting Cheng, Hao Chen

    Abstract: Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transfo… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by Medical Image Analysis

  26. arXiv:2505.04461  [pdf, other

    cs.LG cs.AI cs.SI

    A Survey on Temporal Interaction Graph Representation Learning: Progress, Challenges, and Opportunities

    Authors: Pengfei Jiao, Hongjiang Chen, Xuan Guo, Zhidong Zhao, Dongxiao He, Di Jin

    Abstract: Temporal interaction graphs (TIGs), defined by sequences of timestamped interaction events, have become ubiquitous in real-world applications due to their capability to model complex dynamic system behaviors. As a result, temporal interaction graph representation learning (TIGRL) has garnered significant attention in recent years. TIGRL aims to embed nodes in TIGs into low-dimensional representati… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: IJCAI 2025 Survey Track

  27. arXiv:2505.03850  [pdf, other

    cs.CR cs.AI

    Impact Analysis of Inference Time Attack of Perception Sensors on Autonomous Vehicles

    Authors: Hanlin Chen, Simin Chen, Wenyu Li, Wei Yang, Yiheng Feng

    Abstract: As a safety-critical cyber-physical system, cybersecurity and related safety issues for Autonomous Vehicles (AVs) have been important research topics for a while. Among all the modules on AVs, perception is one of the most accessible attack surfaces, as drivers and AVs have no control over the outside environment. Most current work targeting perception security for AVs focuses on perception correc… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted and presented in TRBAM 2024

  28. arXiv:2505.03743  [pdf

    cs.CR math.NT quant-ph

    Implementation of Shor Algorithm: Factoring a 4096-Bit Integer Under Specific Constraints

    Authors: Abel C. H. Chen

    Abstract: In recent years, advancements in quantum chip technology, such as Willow, have contributed to reducing quantum computation error rates, potentially accelerating the practical adoption of quantum computing. As a result, the design of quantum algorithms suitable for real-world applications has become a crucial research direction. This study focuses on the implementation of Shor algorithm, aiming to… ▽ More

    Submitted 6 April, 2025; originally announced May 2025.

    Comments: in Chinese language

  29. arXiv:2505.03418  [pdf, other

    cs.LG

    Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey

    Authors: Da Zheng, Lun Du, Junwei Su, Yuchen Tian, Yuqi Zhu, Jintian Zhang, Lanning Wei, Ningyu Zhang, Huajun Chen

    Abstract: Problem-solving has been a fundamental driver of human progress in numerous domains. With advancements in artificial intelligence, Large Language Models (LLMs) have emerged as powerful tools capable of tackling complex problems across diverse domains. Unlike traditional computational systems, LLMs combine raw computational power with an approximation of human reasoning, allowing them to generate s… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  30. arXiv:2505.03146  [pdf, other

    cs.RO cs.LG

    Learn to Swim: Data-Driven LSTM Hydrodynamic Model for Quadruped Robot Gait Optimization

    Authors: Fei Han, Pengming Guo, Hao Chen, Weikun Li, Jingbo Ren, Naijun Liu, Ning Yang, Dixia Fan

    Abstract: This paper presents a Long Short-Term Memory network-based Fluid Experiment Data-Driven model (FED-LSTM) for predicting unsteady, nonlinear hydrodynamic forces on the underwater quadruped robot we constructed. Trained on experimental data from leg force and body drag tests conducted in both a recirculating water tank and a towing tank, FED-LSTM outperforms traditional Empirical Formulas (EF) commo… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: This work has been accepted for publication in the IEEE International Conference on Robotics and Automation (ICRA) 2025. The final version will be available in IEEE Xplore (DOI to be assigned upon publication)

  31. DPNet: Dynamic Pooling Network for Tiny Object Detection

    Authors: Luqi Gong, Haotian Chen, Yikun Chen, Tianliang Yao, Chao Li, Shuai Zhao, Guangjie Han

    Abstract: In unmanned aerial systems, especially in complex environments, accurately detecting tiny objects is crucial. Resizing images is a common strategy to improve detection accuracy, particularly for small objects. However, simply enlarging images significantly increases computational costs and the number of negative samples, severely degrading detection performance and limiting its applicability. This… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 15 pages, 12 figures Haotian Chen and Luqi Gong contributed equally to this work

  32. arXiv:2505.02690  [pdf, other

    cs.CV

    Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation

    Authors: Haotian Chen, Ziyu Liu, Xi Cheng, Chuangqi Li

    Abstract: This study introduces Dance of Fireworks, an interactive system designed to combat sedentary health risks by enhancing engagement in radio calisthenics. Leveraging mobile device cameras and lightweight pose estimation (PoseNet/TensorFlow Lite), the system extracts body keypoints, computes joint angles, and compares them with standardized motions to deliver real-time corrective feedback. To incenti… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 21 pages, 13 figures

  33. arXiv:2505.02385  [pdf, other

    eess.IV cs.CV

    An Arbitrary-Modal Fusion Network for Volumetric Cranial Nerves Tract Segmentation

    Authors: Lei Xie, Huajun Zhou, Junxiong Huang, Jiahao Huang, Qingrun Zeng, Jianzhong He, Jiawei Zhang, Baohua Fan, Mingchu Li, Guoqiang Xie, Hao Chen, Yuanjing Feng

    Abstract: The segmentation of cranial nerves (CNs) tract provides a valuable quantitative tool for the analysis of the morphology and trajectory of individual CNs. Multimodal CNs tract segmentation networks, e.g., CNTSeg, which combine structural Magnetic Resonance Imaging (MRI) and diffusion MRI, have achieved promising segmentation performance. However, it is laborious or even infeasible to collect comple… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  34. arXiv:2505.02304  [pdf, other

    cs.CL cs.CV

    Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition

    Authors: Siyu Liang, Yunan Li, Wentian Xin, Huizhou Chen, Xujie Liu, Kang Liu, Qiguang Miao

    Abstract: Sign language recognition (SLR) faces fundamental challenges in creating accurate annotations due to the inherent complexity of simultaneous manual and non-manual signals. To the best of our knowledge, this is the first work to integrate generative large language models (LLMs) into SLR tasks. We propose a novel Generative Sign-description Prompts Multi-positive Contrastive learning (GSP-MC) method… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 9 pages, 6 figures

  35. arXiv:2505.01958  [pdf, ps, other

    cs.CV cs.CL

    A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models

    Authors: Liqiang Jing, Guiming Hardy Chen, Ehsan Aghazadeh, Xin Eric Wang, Xinya Du

    Abstract: Large Vision-Language Models (LVLMs) demonstrate remarkable capabilities in multimodal tasks, but visual object hallucination remains a persistent issue. It refers to scenarios where models generate inaccurate visual object-related information based on the query input, potentially leading to misinformation and concerns about safety and reliability. Previous works focus on the evaluation and mitiga… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  36. arXiv:2505.01822  [pdf, other

    cs.LG cs.AI

    Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

    Authors: Jifeng Hu, Sili Huang, Zhejian Yang, Shengchao Hu, Li Shen, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  37. arXiv:2505.01743  [pdf, other

    cs.CV cs.AI cs.LG

    An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding

    Authors: Siyang Jiang, Bufang Yang, Lilin Xu, Mu Yuan, Yeerzhati Abudunuer, Kaiwei Liu, Liekang Zeng, Hongkai Chen, Zhenyu Yan, Xiaofan Jiang, Guoliang Xing

    Abstract: The rapid advancements in Large Vision Language Models (LVLMs) offer the potential to surpass conventional labeling by generating richer, more detailed descriptions of on-device human behavior understanding (HBU) in low-resolution vision systems, such as depth, thermal, and infrared. However, existing large vision language model (LVLM) approaches are unable to understand low-resolution data well a… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  38. arXiv:2505.01695  [pdf, other

    cs.IR

    SimAug: Enhancing Recommendation with Pretrained Language Models for Dense and Balanced Data Augmentation

    Authors: Yuying Zhao, Xiaodong Yang, Huiyuan Chen, Xiran Fan, Yu Wang, Yiwei Cai, Tyler Derr

    Abstract: Deep Neural Networks (DNNs) are extensively used in collaborative filtering due to their impressive effectiveness. These systems depend on interaction data to learn user and item embeddings that are crucial for recommendations. However, the data often suffers from sparsity and imbalance issues: limited observations of user-item interactions can result in sub-optimal performance, and a predominance… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  39. arXiv:2505.01489  [pdf, other

    cs.LG cs.CR

    Machine Learning for Cyber-Attack Identification from Traffic Flows

    Authors: Yujing Zhou, Marc L. Jacquet, Robel Dawit, Skyler Fabre, Dev Sarawat, Faheem Khan, Madison Newell, Yongxin Liu, Dahai Liu, Hongyun Chen, Jian Wang, Huihui Wang

    Abstract: This paper presents our simulation of cyber-attacks and detection strategies on the traffic control system in Daytona Beach, FL. using Raspberry Pi virtual machines and the OPNSense firewall, along with traffic dynamics from SUMO and exploitation via the Metasploit framework. We try to answer the research questions: are we able to identify cyber attacks by only analyzing traffic flow patterns. In… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  40. arXiv:2505.01488  [pdf, other

    cs.LG cs.CR

    Explainable Machine Learning for Cyberattack Identification from Traffic Flows

    Authors: Yujing Zhou, Marc L. Jacquet, Robel Dawit, Skyler Fabre, Dev Sarawat, Faheem Khan, Madison Newell, Yongxin Liu, Dahai Liu, Hongyun Chen, Jian Wang, Huihui Wang

    Abstract: The increasing automation of traffic management systems has made them prime targets for cyberattacks, disrupting urban mobility and public safety. Traditional network-layer defenses are often inaccessible to transportation agencies, necessitating a machine learning-based approach that relies solely on traffic flow data. In this study, we simulate cyberattacks in a semi-realistic environment, using… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  41. arXiv:2505.01305  [pdf, other

    cs.AI

    Early Detection of Patient Deterioration from Real-Time Wearable Monitoring System

    Authors: Lo Pang-Yun Ting, Hong-Pei Chen, An-Shan Liu, Chun-Yin Yeh, Po-Lin Chen, Kun-Ta Chuang

    Abstract: Early detection of patient deterioration is crucial for reducing mortality rates. Heart rate data has shown promise in assessing patient health, and wearable devices offer a cost-effective solution for real-time monitoring. However, extracting meaningful insights from diverse heart rate data and handling missing values in wearable device data remain key challenges. To address these challenges, we… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  42. arXiv:2505.01255  [pdf, other

    cs.CL cs.IR cs.MM

    PREMISE: Matching-based Prediction for Accurate Review Recommendation

    Authors: Wei Han, Hui Chen, Soujanya Poria

    Abstract: We present PREMISE (PREdict with Matching ScorEs), a new architecture for the matching-based learning in the multimodal fields for the multimodal review helpfulness (MRHP) task. Distinct to previous fusion-based methods which obtains multimodal representations via cross-modal attention for downstream tasks, PREMISE computes the multi-scale and multi-field representations, filters duplicated semant… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 19 pages, 16 figures

  43. arXiv:2505.01008  [pdf, other

    cs.LG

    Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content

    Authors: Haoyue Bai, Yiyou Sun, Wei Cheng, Haifeng Chen

    Abstract: The recent proliferation of photorealistic images created by generative models has sparked both excitement and concern, as these images are increasingly indistinguishable from real ones to the human eye. While offering new creative and commercial possibilities, the potential for misuse, such as in misinformation and fraud, highlights the need for effective detection methods. Current detection appr… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  44. arXiv:2505.00976  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    Attack and defense techniques in large language models: A survey and new perspectives

    Authors: Zhiyu Liao, Kang Chen, Yuanguo Lin, Kangkang Li, Yunxuan Liu, Hefeng Chen, Xingwang Huang, Yuanhui Yu

    Abstract: Large Language Models (LLMs) have become central to numerous natural language processing tasks, but their vulnerabilities present significant security and ethical challenges. This systematic survey explores the evolving landscape of attack and defense techniques in LLMs. We classify attacks into adversarial prompt attack, optimized attacks, model theft, as well as attacks on application of LLMs, d… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  45. arXiv:2504.21853  [pdf, other

    cs.CV

    A Survey of Interactive Generative Video

    Authors: Jiwen Yu, Yiran Qin, Haoxuan Che, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Hao Chen, Xihui Liu

    Abstract: Interactive Generative Video (IGV) has emerged as a crucial technology in response to the growing demand for high-quality, interactive video content across various domains. In this paper, we define IGV as a technology that combines generative capabilities to produce diverse high-quality video content with interactive features that enable user engagement through control signals and responsive feedb… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  46. arXiv:2504.21447  [pdf, other

    cs.CV cs.AI

    Rethinking Visual Layer Selection in Multimodal LLMs

    Authors: Haoran Chen, Junyan Lin, Xinhao Chen, Yue Fan, Xin Jin, Hui Su, Jianfeng Dong, Jinlan Fu, Xiaoyu Shen

    Abstract: Multimodal large language models (MLLMs) have achieved impressive performance across a wide range of tasks, typically using CLIP-ViT as their visual encoder due to its strong text-image alignment capabilities. While prior studies suggest that different CLIP-ViT layers capture different types of information, with shallower layers focusing on fine visual details and deeper layers aligning more close… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 8 pages, 4 figures, submitted to ICCV 2025

  47. arXiv:2504.21336  [pdf, other

    cs.CV

    UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

    Authors: Linshan Wu, Yuxiang Nie, Sunan He, Jiaxin Zhuang, Hao Chen

    Abstract: Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: The first universal foundation model for grounded biomedical image interpretation

  48. arXiv:2504.21304  [pdf, other

    cs.LG

    Unsupervised Feature Transformation via In-context Generation, Generator-critic LLM Agents, and Duet-play Teaming

    Authors: Nanxu Gong, Xinyuan Wang, Wangyang Ying, Haoyue Bai, Sixun Dong, Haifeng Chen, Yanjie Fu

    Abstract: Feature transformation involves generating a new set of features from the original dataset to enhance the data's utility. In certain domains like material performance screening, dimensionality is large and collecting labels is expensive and lengthy. It highly necessitates transforming feature spaces efficiently and without supervision to enhance data readiness and AI utility. However, existing met… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: Accepted to IJCAI 2025

  49. arXiv:2504.21054  [pdf, other

    cs.CR cs.AI

    FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

    Authors: Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Liantao Wu, Zhe Li, Weifeng Liu

    Abstract: Backdoor attacks pose a significant threat to deep neural networks, as backdoored models would misclassify poisoned samples with specific triggers into target classes while maintaining normal performance on clean samples. Among these, multi-target backdoor attacks can simultaneously target multiple classes. However, existing multi-target backdoor attacks all follow the dirty-label paradigm, where… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  50. arXiv:2504.21052  [pdf, other

    cs.CR cs.AI

    SFIBA: Spatial-based Full-target Invisible Backdoor Attacks

    Authors: Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Zhishuai Li, Weifeng Liu

    Abstract: Multi-target backdoor attacks pose significant security threats to deep neural networks, as they can preset multiple target classes through a single backdoor injection. This allows attackers to control the model to misclassify poisoned samples with triggers into any desired target class during inference, exhibiting superior attack performance compared with conventional backdoor attacks. However, e… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.