Skip to main content

Showing 1–50 of 1,005 results for author: Hu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01827  [pdf, ps, other

    cs.SE

    APRMCTS: Improving LLM-based Automated Program Repair with Iterative Tree Search

    Authors: Haichuan Hu, Congqing He, Hao Zhang, Xiaochen Xie, Quanjun Zhang

    Abstract: Automated Program Repair (APR) attempts to fix software bugs without human intervention, which plays a crucial role in software development and maintenance. Recently, with the advances in Large Language Models (LLMs), a rapidly increasing number of APR techniques have been proposed with remarkable performance. However, existing LLM-based APR techniques typically adopt trial-and-error strategies, w… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2506.23077  [pdf, ps, other

    cs.CV

    Dynamic Contrastive Learning for Hierarchical Retrieval: A Case Study of Distance-Aware Cross-View Geo-Localization

    Authors: Suofei Zhang, Xinxin Wang, Xiaofu Wu, Quan Zhou, Haifeng Hu

    Abstract: Existing deep learning-based cross-view geo-localization methods primarily focus on improving the accuracy of cross-domain image matching, rather than enabling models to comprehensively capture contextual information around the target and minimize the cost of localization errors. To support systematic research into this Distance-Aware Cross-View Geo-Localization (DACVGL) problem, we construct Dist… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  3. arXiv:2506.22316  [pdf, ps, other

    cs.CL

    Evaluating Scoring Bias in LLM-as-a-Judge

    Authors: Qingquan Li, Shaoyu Dou, Kailai Shao, Chao Chen, Haixiang Hu

    Abstract: The remarkable performance of Large Language Models (LLMs) gives rise to``LLM-as-a-Judge'', where LLMs are employed as evaluators for complex tasks. Moreover, it has been widely adopted across fields such as Natural Language Processing (NLP), preference learning, and various specific domains. However, there are various biases within LLM-as-a-Judge, which adversely affect the fairness and reliabili… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  4. arXiv:2506.21591   

    cs.CL

    FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning

    Authors: Shaoyu Dou, Yutian Shen, Mofan Chen, Zixuan Wang, Jiajie Xu, Qi Guo, Kailai Shao, Chao Chen, Haixiang Hu, Haibo Shi, Min Min, Liwen Zhang

    Abstract: Large Language Models (LLMs) demonstrate significant potential but face challenges in complex financial reasoning tasks requiring both domain knowledge and sophisticated reasoning. Current evaluation benchmarks often fall short by not decoupling these capabilities indicators from single task performance and lack root cause analysis for task failure. To address this, we introduce FinEval-KR, a nove… ▽ More

    Submitted 29 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: The statistics included in the paper are incomplete (e.g., Tables 2 and 5 report only the results of a single run), which may lead readers to misunderstand

  5. arXiv:2506.18651  [pdf, ps, other

    cs.AI

    Dual-level Behavioral Consistency for Inter-group and Intra-group Coordination in Multi-Agent Systems

    Authors: Shuocun Yang, Huawen Hu, Enze Shi, Shu Zhang

    Abstract: Behavioral diversity in Multi-agent reinforcement learning(MARL) represents an emerging and promising research area. Prior work has largely centered on intra-group behavioral consistency in multi-agent systems, with limited attention given to behavioral consistency in multi-agent grouping scenarios. In this paper, we introduce Dual-Level Behavioral Consistency (DLBC), a novel MARL control method d… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  6. arXiv:2506.18048  [pdf, ps, other

    cs.CV

    CLGRPO: Reasoning Ability Enhancement for Small VLMs

    Authors: Fanyi Wang, Binzhi Dong, Haotian Hu, Jinjin Xu, Zhiwang Zhang

    Abstract: Small Vision Language Models (SVLMs) generally refer to models with parameter sizes less than or equal to 2B. Their low cost and power consumption characteristics confer high commercial value. However, their reasoning abilities are limited by the number of parameters. To address this issue, this paper proposes a post-training optimization paradigm called the Incremental Training Strategy to enhanc… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 11 pages, 5 figures

  7. arXiv:2506.17733  [pdf, ps, other

    cs.CV

    YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception

    Authors: Mengqi Lei, Siqi Li, Yihong Wu, Han Hu, You Zhou, Xinhu Zheng, Guiguang Ding, Shaoyi Du, Zongze Wu, Yue Gao

    Abstract: The YOLO series models reign supreme in real-time object detection due to their superior accuracy and computational efficiency. However, both the convolutional architectures of YOLO11 and earlier versions and the area-based self-attention mechanism introduced in YOLOv12 are limited to local information aggregation and pairwise correlation modeling, lacking the capability to capture global multi-to… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  8. arXiv:2506.15693  [pdf, ps, other

    cs.LG

    Verifiable Safety Q-Filters via Hamilton-Jacobi Reachability and Multiplicative Q-Networks

    Authors: Jiaxing Li, Hanjiang Hu, Yujie Yang, Changliu Liu

    Abstract: Recent learning-based safety filters have outperformed conventional methods, such as hand-crafted Control Barrier Functions (CBFs), by effectively adapting to complex constraints. However, these learning-based approaches lack formal safety guarantees. In this work, we introduce a verifiable model-free safety filter based on Hamilton-Jacobi reachability analysis. Our primary contributions include:… ▽ More

    Submitted 27 May, 2025; originally announced June 2025.

    Comments: 6 pages, 3 figures

  9. arXiv:2506.14648  [pdf, ps, other

    cs.RO cs.AI

    SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning

    Authors: Hexian Ni, Tao Lu, Haoyuan Hu, Yinghao Cai, Shuo Wang

    Abstract: Preference-based Reinforcement Learning (PbRL) methods provide a solution to avoid reward engineering by learning reward models based on human preferences. However, poor feedback- and sample- efficiency still remain the problems that hinder the application of PbRL. In this paper, we present a novel efficient query selection and preference-guided exploration method, called SENIOR, which could selec… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 8 pages, 8 figures

  10. arXiv:2506.13695  [pdf, ps, other

    cs.IR

    OneRec Technical Report

    Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

    Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Authors are listed alphabetically by their first name

  11. arXiv:2506.11496  [pdf, ps, other

    eess.IV cs.CV

    Taming Stable Diffusion for Computed Tomography Blind Super-Resolution

    Authors: Chunlei Li, Yilei Shi, Haoxi Hu, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

    Abstract: High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion mod… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  12. arXiv:2506.09202  [pdf, ps, other

    cs.LG cs.AI

    Policy-Based Trajectory Clustering in Offline Reinforcement Learning

    Authors: Hao Hu, Xinqi Wang, Simon Shaolei Du

    Abstract: We introduce a novel task of clustering trajectories from offline reinforcement learning (RL) datasets, where each cluster center represents the policy that generated its trajectories. By leveraging the connection between the KL-divergence of offline trajectory distributions and a mixture of policy-induced distributions, we formulate a natural clustering objective. To solve this, we propose Policy… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  13. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  14. arXiv:2506.08640  [pdf, ps, other

    cs.CV

    Orientation Matters: Making 3D Generative Models Orientation-Aligned

    Authors: Yichong Lu, Yuzhuo Tian, Zijin Jiang, Yikun Zhao, Yuanbo Yang, Hao Ouyang, Haoji Hu, Huimin Yu, Yujun Shen, Yiyi Liao

    Abstract: Humans intuitively perceive object shape and orientation from a single image, guided by strong priors about canonical poses. However, existing 3D generative models often produce misaligned results due to inconsistent training data, limiting their usability in downstream tasks. To address this gap, we introduce the task of orientation-aligned 3D object generation: producing 3D objects from single i… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Project Page: https://xdimlab.github.io/Orientation_Matters

  15. arXiv:2506.07972  [pdf, ps, other

    cs.LG cs.AI cs.CL

    HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

    Authors: Hongzheng Chen, Yingheng Wang, Yaohui Cai, Hins Hu, Jiajie Li, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang

    Abstract: While Large Language Models (LLMs) have demonstrated significant advancements in reasoning and agent-based problem-solving, current evaluation methodologies fail to adequately assess their capabilities: existing benchmarks either rely on closed-ended questions prone to saturation and memorization, or subjective comparisons that lack consistency and rigor. In this work, we introduce HeuriGym, an ag… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  16. arXiv:2506.06843  [pdf, ps, other

    cs.AI

    United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory

    Authors: HaoYang Shang, Xuan Liu, Zi Liang, Jie Zhang, Haibo Hu, Song Guo

    Abstract: Large Language Models (LLMs) exhibit a notable performance ceiling on complex, multi-faceted tasks, as they often fail to integrate diverse information or adhere to multiple constraints. We posit that such limitation arises when the demands of a task exceed the LLM's effective cognitive load capacity. This interpretation draws a strong analogy to Cognitive Load Theory (CLT) in cognitive science, w… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  17. When Better Features Mean Greater Risks: The Performance-Privacy Trade-Off in Contrastive Learning

    Authors: Ruining Sun, Hongsheng Hu, Wei Luo, Zhaoxi Zhang, Yanjun Zhang, Haizhuan Yuan, Leo Yu Zhang

    Abstract: With the rapid advancement of deep learning technology, pre-trained encoder models have demonstrated exceptional feature extraction capabilities, playing a pivotal role in the research and application of deep learning. However, their widespread use has raised significant concerns about the risk of training data privacy leakage. This paper systematically investigates the privacy threats posed by me… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Accepted In ACM ASIA Conference on Computer and Communications Security (ASIA CCS '25), August 25-29, 2025, Ha Noi, Vietnam. For Code, see https://github.com/SeroneySun/LpLA_code

  18. arXiv:2506.05729  [pdf, ps, other

    cs.HC

    Regenerating Daily Routines for Young Adults with Depression through User-Led Indoor Environment Modifications Using Local Natural Materials

    Authors: Ziqun Hua, Ao Jiang, Haoling Yang, Hao Fan, Huizhong Hu, Bernard Foing

    Abstract: Young adults with depression often experience prolonged indoor stays, limiting their access to natural environments and exacerbating mental health challenges. While nature therapy is recognized for its psychological benefits, existing interventions frequently require outdoor engagement, which may not be accessible for all individuals. This study explores the potential of user-led indoor modificati… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: This work (7 pages, 2 figures) was accepted as a poster at HCII 2025. Due to limited funding, it will not appear in the official Springer proceedings. The version uploaded here is the original preprint submitted for review. The preprint is shared to support further discussion on user-led, nature-integrated mental health interventions in HCI contexts

  19. arXiv:2506.05404  [pdf, ps, other

    cs.CV cs.AI

    AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving

    Authors: Lianming Huang, Haibo Hu, Yufei Cui, Jiacheng Zuo, Shangyu Wu, Nan Guan, Chun Jason Xue

    Abstract: With the rapid advancement of autonomous driving, deploying Vision-Language Models (VLMs) to enhance perception and decision-making has become increasingly common. However, the real-time application of VLMs is hindered by high latency and computational overhead, limiting their effectiveness in time-critical driving scenarios. This challenge is particularly evident when VLMs exhibit over-inference,… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 8 pages

  20. arXiv:2506.01185  [pdf, ps, other

    cs.RO

    HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control

    Authors: Priya Sundaresan, Rhea Malhotra, Phillip Miao, Jingyun Yang, Jimmy Wu, Hengyuan Hu, Rika Antonova, Francis Engelmann, Dorsa Sadigh, Jeannette Bohg

    Abstract: We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and ar… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  21. arXiv:2506.00854  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.MM q-bio.NC

    EEG2TEXT-CN: An Exploratory Study of Open-Vocabulary Chinese Text-EEG Alignment via Large Language Model and Contrastive Learning on ChineseEEG

    Authors: Jacky Tai-Yu Lu, Jung Chiang, Chi-Sheng Chen, Anna Nai-Yun Tung, Hsiang Wei Hu, Yuan Chiao Cheng

    Abstract: We propose EEG2TEXT-CN, which, to the best of our knowledge, represents one of the earliest open-vocabulary EEG-to-text generation frameworks tailored for Chinese. Built on a biologically grounded EEG encoder (NICE-EEG) and a compact pretrained language model (MiniLM), our architecture aligns multichannel brain signals with natural language representations via masked pretraining and contrastive le… ▽ More

    Submitted 17 June, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

  22. arXiv:2506.00388  [pdf, ps, other

    cs.LG

    CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

    Authors: Ni Mu, Hao Hu, Xiao Hu, Yiqin Yang, Bo Xu, Qing-Shan Jia

    Abstract: Preference-based reinforcement learning (PbRL) bypasses explicit reward engineering by inferring reward functions from human preference comparisons, enabling better alignment with human intentions. However, humans often struggle to label a clear preference between similar segments, reducing label efficiency and limiting PbRL's real-world applicability. To address this, we propose an offline PbRL m… ▽ More

    Submitted 10 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

    Comments: ICML 2025

  23. arXiv:2505.23201  [pdf, ps, other

    cs.CV

    WTEFNet: Real-Time Low-Light Object Detection for Advanced Driver Assistance Systems

    Authors: Hao Wu, Junzhou Chen, Ronghui Zhang, Nengchao Lyu, Hongyu Hu, Yanyong Guo, Tony Z. Qiu

    Abstract: Object detection is a cornerstone of environmental perception in advanced driver assistance systems(ADAS). However, most existing methods rely on RGB cameras, which suffer from significant performance degradation under low-light conditions due to poor image quality. To address this challenge, we proposes WTEFNet, a real-time object detection framework specifically designed for low-light scenarios,… ▽ More

    Submitted 29 May, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: This paper is expected to be submitted to IEEE Transactions on Instrumentation and Measurement

  24. arXiv:2505.22375  [pdf, ps, other

    cs.CL

    Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition

    Authors: Hanting Chen, Yasheng Wang, Kai Han, Dong Li, Lin Li, Zhenni Bi, Jinpeng Li, Haoyu Wang, Fei Mi, Mingjian Zhu, Bin Wang, Kaikai Song, Yifei Fu, Xu He, Yu Luo, Chong Zhu, Quan He, Xueyu Wu, Wei He, Hailin Hu, Yehui Tang, Dacheng Tao, Xinghao Chen, Yunhe Wang

    Abstract: This work presents Pangu Embedded, an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs), featuring flexible fast and slow thinking capabilities. Pangu Embedded addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs. We propose a two-stage training framework for its construction. In… ▽ More

    Submitted 28 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  25. arXiv:2505.20279  [pdf, ps, other

    cs.CV cs.CL

    VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

    Authors: Zhiwen Fan, Jian Zhang, Renjie Li, Junge Zhang, Runjin Chen, Hezhen Hu, Kevin Wang, Huaizhi Qu, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Tianlong Chen, Jiachen Li, Zhengzhong Tu, Zhangyang Wang, Rakesh Ranjan

    Abstract: The rapid advancement of Large Multimodal Models (LMMs) for 2D images and videos has motivated extending these models to understand 3D scenes, aiming for human-like visual-spatial intelligence. Nevertheless, achieving deep spatial understanding comparable to human capabilities poses significant challenges in model encoding and data acquisition. Existing methods frequently depend on external depth… ▽ More

    Submitted 1 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Project Page: https://vlm-3r.github.io/

  26. arXiv:2505.20004  [pdf, ps, other

    cs.SE

    Requirements Coverage-Guided Minimization for Natural Language Test Cases

    Authors: Rongqi Pan, Feifei Niu, Lionel C. Briand, Hanyang Hu

    Abstract: As software systems evolve, test suites tend to grow in size and often contain redundant test cases. Such redundancy increases testing effort, time, and cost. Test suite minimization (TSM) aims to eliminate such redundancy while preserving key properties such as requirement coverage and fault detection capability. In this paper, we propose RTM (Requirement coverage-guided Test suite Minimization),… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  27. arXiv:2505.18302  [pdf, ps, other

    cs.CV cs.IT

    Sampling Strategies for Efficient Training of Deep Learning Object Detection Algorithms

    Authors: Gefei Shen, Yung-Hong Sun, Yu Hen Hu, Hongrui Jiang

    Abstract: Two sampling strategies are investigated to enhance efficiency in training a deep learning object detection model. These sampling strategies are employed under the assumption of Lipschitz continuity of deep learning models. The first strategy is uniform sampling which seeks to obtain samples evenly yet randomly through the state space of the object dynamics. The second strategy of frame difference… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  28. arXiv:2505.17470  [pdf, other

    cs.CL cs.AI

    SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models

    Authors: Xiang Liu, Zhaoxiang Liu, Peng Wang, Kohou Wang, Huan Hu, Kai Wang, Shiguo Lian

    Abstract: When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge,… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 12 pages, 5 figures

  29. arXiv:2505.17118  [pdf, other

    cs.CL

    After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG

    Authors: Xinbang Dai, Huikang Hu, Yuncheng Hua, Jiaqi Li, Yongrui Chen, Rihui Jin, Nan Hu, Guilin Qi

    Abstract: Retrieval-augmented generation (RAG) systems face critical challenges in balancing internal (parametric) and external (retrieved) knowledge, especially when these sources conflict or are unreliable. To analyze these scenarios comprehensively, we construct the Trustworthiness Response Dataset (TRD) with 36,266 questions spanning four RAG settings. We reveal that existing approaches address isolated… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 24 pages, 8 figures

    ACM Class: I.2.7

  30. arXiv:2505.16831  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

    Authors: Xiaoyu Xu, Xiang Yue, Yang Liu, Qingqing Ye, Haibo Hu, Minxin Du

    Abstract: Unlearning in large language models (LLMs) is intended to remove the influence of specific data, yet current evaluations rely heavily on token-level metrics such as accuracy and perplexity. We show that these metrics can be misleading: models often appear to forget, but their original behavior can be rapidly restored with minimal fine-tuning, revealing that unlearning may obscure information rathe… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 44 pages

  31. arXiv:2505.16778  [pdf, ps, other

    cs.CV

    Single Domain Generalization for Few-Shot Counting via Universal Representation Matching

    Authors: Xianing Chen, Si Huo, Borui Jiang, Hailin Hu, Xinghao Chen

    Abstract: Few-shot counting estimates the number of target objects in an image using only a few annotated exemplars. However, domain shift severely hinders existing methods to generalize to unseen scenarios. This falls into the realm of single domain generalization that remains unexplored in few-shot counting. To solve this problem, we begin by analyzing the main limitations of current methods, which typica… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: CVPR 2025

  32. arXiv:2505.16770  [pdf, ps, other

    cs.CV

    RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs

    Authors: Meng-Hao Guo, Xuanyu Chu, Qianrui Yang, Zhe-Han Mo, Yiqing Shen, Pei-lin Li, Xinjie Lin, Jinnian Zhang, Xin-Sheng Chen, Yi Zhang, Kiyohiro Nakayama, Zhengyang Geng, Houwen Peng, Han Hu, Shi-Min Hu

    Abstract: The rapid advancement of native multi-modal models and omni-models, exemplified by GPT-4o, Gemini, and o3, with their capability to process and generate content across modalities such as text and images, marks a significant milestone in the evolution of intelligence. Systematic evaluation of their multi-modal output capabilities in visual thinking processes (also known as multi-modal chain of thou… ▽ More

    Submitted 23 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 12 pages

  33. arXiv:2505.16211  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

    Authors: Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang, Yang Liu, Haibo Hu, Zhizheng Wu , et al. (6 additional authors not shown)

    Abstract: The rapid advancement and expanding applications of Audio Large Language Models (ALLMs) demand a rigorous understanding of their trustworthiness. However, systematic research on evaluating these models, particularly concerning risks unique to the audio modality, remains largely unexplored. Existing evaluation frameworks primarily focus on the text modality or address only a restricted set of safet… ▽ More

    Submitted 1 July, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Technical Report

  34. arXiv:2505.15715  [pdf, ps, other

    cs.CL

    Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling

    Authors: He Hu, Yucheng Zhou, Juzheng Si, Qianning Wang, Hengheng Zhang, Fuji Ren, Fei Ma, Laizhong Cui

    Abstract: Large language models (LLMs) hold significant potential for mental health support, capable of generating empathetic responses and simulating therapeutic conversations. However, existing LLM-based approaches often lack the clinical grounding necessary for real-world psychological counseling, particularly in explicit diagnostic reasoning aligned with standards like the DSM/ICD and incorporating dive… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  35. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  36. arXiv:2505.15284  [pdf, ps, other

    cs.LG cs.CV

    Kernel PCA for Out-of-Distribution Detection: Non-Linear Kernel Selections and Approximations

    Authors: Kun Fang, Qinghua Tao, Mingzhen He, Kexin Lv, Runze Yang, Haibo Hu, Xiaolin Huang, Jie Yang, Longbin Cao

    Abstract: Out-of-Distribution (OoD) detection is vital for the reliability of deep neural networks, the key of which lies in effectively characterizing the disparities between OoD and In-Distribution (InD) data. In this work, such disparities are exploited through a fresh perspective of non-linear feature subspace. That is, a discriminative non-linear subspace is learned from InD features to capture represe… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: This study is an extension of its conference version published in NeurIPS'24, see https://proceedings.neurips.cc/paper_files/paper/2024/hash/f2543511e5f4d4764857f9ad833a977d-Abstract-Conference.html

  37. arXiv:2505.14418  [pdf, other

    cs.CL

    Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

    Authors: Pengzhou Cheng, Haowen Hu, Zheng Wu, Zongru Wu, Tianjie Ju, Zhuosheng Zhang, Gongshen Liu

    Abstract: Graphical user interface (GUI) agents powered by multimodal large language models (MLLMs) have shown greater promise for human-interaction. However, due to the high fine-tuning cost, users often rely on open-source GUI agents or APIs offered by AI providers, which introduces a critical but underexplored supply chain threat: backdoor attacks. In this work, we first unveil that MLLM-powered GUI agen… ▽ More

    Submitted 22 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 25 pages, 10 figures, 12 Tables

  38. arXiv:2505.14298  [pdf

    cs.CV

    A Review of Vision-Based Assistive Systems for Visually Impaired People: Technologies, Applications, and Future Directions

    Authors: Fulong Yao, Wenju Zhou, Huosheng Hu

    Abstract: Visually impaired individuals rely heavily on accurate and timely information about obstacles and their surrounding environments to achieve independent living. In recent years, significant progress has been made in the development of assistive technologies, particularly vision-based systems, that enhance mobility and facilitate interaction with the external world in both indoor and outdoor setting… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  39. arXiv:2505.12934  [pdf, ps, other

    cs.RO

    Granular Loco-Manipulation: Repositioning Rocks Through Strategic Sand Avalanche

    Authors: Haodi Hu, Yue Wu, Feifei Qian, Daniel Seita

    Abstract: Legged robots have the potential to leverage obstacles to climb steep sand slopes. However, efficiently repositioning these obstacles to desired locations is challenging. Here we present DiffusiveGRAIN, a learning-based method that enables a multi-legged robot to strategically induce localized sand avalanches during locomotion and indirectly manipulate obstacles. We conducted 375 trials, systemati… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  40. arXiv:2505.12871  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?

    Authors: Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Ronghua Li

    Abstract: Low rank adaptation (LoRA) has emerged as a prominent technique for fine-tuning large language models (LLMs) thanks to its superb efficiency gains over previous methods. While extensive studies have examined the performance and structural properties of LoRA, its behavior upon training-time attacks remain underexplored, posing significant security risks. In this paper, we theoretically investigate… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: To appear at ICML 25

  41. arXiv:2505.12371  [pdf, ps, other

    cs.AI cs.CL cs.LG cs.MA

    MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks

    Authors: Yinghao Zhu, Ziyi He, Haoran Hu, Xiaochen Zheng, Xichen Zhang, Zixiang Wang, Junyi Gao, Liantao Ma, Lequan Yu

    Abstract: The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-agent collaboration approaches remain insufficiently understood. Existing evaluations often lack generalizability, failing to cover diverse tasks reflective of real-world clinical practice, and frequently omit ri… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  42. arXiv:2505.10786  [pdf, ps, other

    eess.SP cs.HC

    Bridging BCI and Communications: A MIMO Framework for EEG-to-ECoG Wireless Channel Modeling

    Authors: Jiaheng Wang, Zhenyu Wang, Tianheng Xu, Yuan Si, Ang Li, Ting Zhou, Xi Zhao, Honglin Hu

    Abstract: As a method to connect human brain and external devices, Brain-computer interfaces (BCIs) are receiving extensive research attention. Recently, the integration of communication theory with BCI has emerged as a popular trend, offering potential to enhance system performance and shape next-generation communications. A key challenge in this field is modeling the brain wireless communication channel… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  43. arXiv:2505.07101  [pdf, other

    stat.ML cs.LG

    Constrained Online Decision-Making: A Unified Framework

    Authors: Haichen Hu, David Simchi-Levi, Navid Azizan

    Abstract: Contextual online decision-making problems with constraints appear in a wide range of real-world applications, such as adaptive experimental design under safety constraints, personalized recommendation with resource limits, and dynamic pricing under fairness requirements. In this paper, we investigate a general formulation of sequential decision-making with stage-wise feasibility constraints, wher… ▽ More

    Submitted 22 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  44. arXiv:2505.06832  [pdf, other

    cs.RO

    UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms

    Authors: Xueyang Guo, Hongwei Hu, Chengye Song, Jiale Chen, Zilin Zhao, Yu Fu, Bowen Guan, Zhenze Liu

    Abstract: Open-vocabulary, task-oriented grasping of specific functional parts, particularly with dual arms, remains a key challenge, as current Vision-Language Models (VLMs), while enhancing task understanding, often struggle with precise grasp generation within defined constraints and effective dual-arm coordination. We innovatively propose UniDiffGrasp, a unified framework integrating VLM reasoning with… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 8 pages, 5 figures

  45. arXiv:2505.05209   

    cs.CV

    EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution

    Authors: Haizhen Xie, Kunpeng Du, Qiangyu Yan, Sen Lu, Jianhong Han, Hanting Chen, Hailin Hu, Jie Hu

    Abstract: Utilizing pre-trained Text-to-Image (T2I) diffusion models to guide Blind Super-Resolution (BSR) has become a predominant approach in the field. While T2I models have traditionally relied on U-Net architectures, recent advancements have demonstrated that Diffusion Transformers (DiT) achieve significantly higher performance in this domain. In this work, we introduce Enhancing Anything Model (EAM),… ▽ More

    Submitted 10 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: The company audit did not pass, there are some mistake in paper

  46. arXiv:2505.04979  [pdf, other

    cs.CV

    Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization

    Authors: Zhuang Qi, Sijin Zhou, Lei Meng, Han Hu, Han Yu, Xiangxu Meng

    Abstract: Attribute bias in federated learning (FL) typically leads local models to optimize inconsistently due to the learning of non-causal associations, resulting degraded performance. Existing methods either use data augmentation for increasing sample diversity or knowledge distillation for learning invariant representations to address this problem. However, they lack a comprehensive analysis of the inf… ▽ More

    Submitted 10 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: IJCAI-25 Accepted

  47. arXiv:2505.04416  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

    Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu

    Abstract: Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 18 pages, 2 figures

  48. arXiv:2505.03983  [pdf, other

    cs.LG cs.AI

    Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation

    Authors: Hengyuan Hu, Aniket Das, Dorsa Sadigh, Nima Anari

    Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property.… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  49. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  50. arXiv:2505.03373  [pdf, ps, other

    cs.LG cs.AI math.OC

    SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

    Authors: Hanyu Hu, Xiaoming Yuan

    Abstract: The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer from performance degradation, reliance on heuristic metrics, or expensive finetuning. To address these challenges, we propose SPAP (Structured Pruning via Alte… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.