Skip to main content

Showing 1–50 of 838 results for author: Han, Y

Searching in archive cs. Search in all archives.
.
  1. WebANNS: Fast and Efficient Approximate Nearest Neighbor Search in Web Browsers

    Authors: Mugeng Liu, Siqi Zhong, Qi Yang, Yudong Han, Xuanzhe Liu, Yun Ma

    Abstract: Approximate nearest neighbor search (ANNS) has become vital to modern AI infrastructure, particularly in retrieval-augmented generation (RAG) applications. Numerous in-browser ANNS engines have emerged to seamlessly integrate with popular LLM-based web applications, while addressing privacy protection and challenges of heterogeneous device deployments. However, web browsers present unique challeng… ▽ More

    Submitted 1 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: SIGIR 2025

  2. arXiv:2507.00390  [pdf, ps, other

    cs.LG

    MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE

    Authors: Geng Zhang, Yuxuan Han, Yuxuan Lou, Wangbo Zhao, Yiqi Zhang, Yang You

    Abstract: Mixture-of-Experts (MoE) enables efficient scaling of large language models by activating only a subset of experts per input token. However, deploying MoE-based models incurs significant memory overhead due to the need to retain all experts in memory. While structured pruning is promising to reduce memory costs, existing methods often show suboptimal performance and unstable degradation in three d… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  3. arXiv:2506.24063  [pdf, ps, other

    cs.CV

    Continual Adaptation: Environment-Conditional Parameter Generation for Object Detection in Dynamic Scenarios

    Authors: Deng Li, Aming Wu, Yang Li, Yaowei Wang, Yahong Han

    Abstract: In practice, environments constantly change over time and space, posing significant challenges for object detectors trained based on a closed-set assumption, i.e., training and test data share the same distribution. To this end, continual test-time adaptation has attracted much attention, aiming to improve detectors' generalization by fine-tuning a few specific parameters, e.g., BatchNorm layers.… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  4. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  5. arXiv:2506.22784  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Single-Frame Point-Pixel Registration via Supervised Cross-Modal Feature Matching

    Authors: Yu Han, Zhiwei Huang, Yanting Zhang, Fangjun Ding, Shen Cai, Rui Fan

    Abstract: Point-pixel registration between LiDAR point clouds and camera images is a fundamental yet challenging task in autonomous driving and robotic perception. A key difficulty lies in the modality gap between unstructured point clouds and structured images, especially under sparse single-frame LiDAR settings. Existing methods typically extract features separately from point clouds and images, then rely… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  6. arXiv:2506.21560  [pdf, ps, other

    cs.CL cs.AI

    Reinforcement Learning Fine-Tuning of Language Model for Instruction Following and Math Reasoning

    Authors: Yifu Han, Geo Zhang

    Abstract: This study investigates the effectiveness of reinforcement learning (RL) fine-tuning techniques on a compact language model (Qwen2.5-0.5B Base) for two challenging tasks: instruction following and mathematical reasoning. We compare supervised fine-tuning (SFT), Direct Preference Optimization (DPO) using preference-labeled data, and Reinforce Leave-One-Out (RLOO) with reward models. Our experiments… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  7. arXiv:2506.18511  [pdf, ps, other

    cs.AI

    Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance

    Authors: Yu Han, Aaron Ceross, Jeroen H. M. Bergmann

    Abstract: Identifying the appropriate regulatory standard applicability remains a critical yet understudied challenge in medical device compliance, frequently necessitating expert interpretation of fragmented and heterogeneous documentation across different jurisdictions. To address this challenge, we introduce a modular AI system that leverages a retrieval-augmented generation (RAG) pipeline to automate st… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  8. arXiv:2506.14028  [pdf, ps, other

    cs.CL

    MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

    Authors: Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu, Peng Lu, Jerry Huang, Suyuchen Wang , et al. (19 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global finan… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  9. arXiv:2506.12622  [pdf, ps, other

    cs.LG cs.AI math.OC

    DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

    Authors: Mingxuan Cui, Duo Zhou, Yuxuan Han, Grani A. Hanasusanto, Qiong Wang, Huan Zhang, Zhengyuan Zhou

    Abstract: Deep reinforcement learning (RL) has achieved significant success, yet its application in real-world scenarios is often hindered by a lack of robustness to environmental uncertainties. To solve this challenge, some robust RL algorithms have been proposed, but most are limited to tabular settings. In this work, we propose Distributionally Robust Soft Actor-Critic (DR-SAC), a novel algorithm designe… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 24 Pages

  10. arXiv:2506.11081  [pdf, ps, other

    cs.CL

    SAGE:Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs

    Authors: Aditi, Hyunwoo Park, Sicheol Sung, Yo-Sub Han, Sang-Ki Ko

    Abstract: Grammar-based test case generation has proven effective for competitive programming problems, but generating valid and general grammars from natural language specifications remains a key challenge, especially under limited supervision. Context-Free Grammars with Counters (CCFGs) have recently been introduced as a formalism to represent such specifications with logical constraints by storing and re… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  11. arXiv:2506.10264  [pdf, ps, other

    cs.AI

    WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models

    Authors: Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have led to a qualitative leap in artificial intelligence' s performance on reasoning tasks, particularly demonstrating remarkable capabilities in mathematical, symbolic, and commonsense reasoning. However, as a critical component of advanced human cognition, strategic reasoning, i.e., the ability to assess multi-agent behaviors in dynamic envir… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 15 pages, 17 figures

  12. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  13. arXiv:2506.06803  [pdf, ps, other

    cs.CY

    Spatial Disparities in Fire Shelter Accessibility: Capacity Challenges in the Palisades and Eaton Fires

    Authors: Su Yeon Han, Yubin Lee, Jooyoung Yoo, Jeon-Young Kang, Jinwoo Park, Soe W. Myint, Eunsang Cho, Xin Gu, Joon-Seok Kim

    Abstract: The increasing frequency and severity of wildfire in California, exacerbated by prolonged drought and environmental changes, pose significant challenges to urban community resilience and equitable emergency response. The study investigates issues of accessibility to shelters during the Palisades and Eaton Fires which started in January 2025 in Southern California that led to over 180,000 displacem… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 35 pages, 11 figures

  14. arXiv:2506.06106  [pdf, ps, other

    cs.SI physics.soc-ph

    Measuring the co-evolution of online engagement with (mis)information and its visibility at scale

    Authors: Yueting Han, Paolo Turrini, Marya Bazzi, Giulia Andrighetto, Eugenia Polizzi, Manlio De Domenico

    Abstract: Online attention is an increasingly valuable resource in the digital age, with extraordinary events such as the COVID-19 pandemic fuelling fierce competition around it. As misinformation pervades online platforms, users seek credible sources, while news outlets compete to attract and retain their attention. Here we measure the co-evolution of online "engagement" with (mis)information and its "visi… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  15. arXiv:2506.04810  [pdf, ps, other

    cs.CL cs.AI cs.LO

    Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

    Authors: Yujun Zhou, Jiayi Ye, Zipeng Ling, Yufei Han, Yue Huang, Haomin Zhuang, Zhenwen Liang, Kehan Guo, Taicheng Guo, Xiangqi Wang, Xiangliang Zhang

    Abstract: Logical reasoning is a core capability for many applications of large language models (LLMs), yet existing benchmarks often rely solely on final-answer accuracy, failing to capture the quality and structure of the reasoning process. We propose FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall benchmark accuracy, stepwise soundness, and… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  16. arXiv:2506.04648  [pdf, ps, other

    cs.CV

    FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion

    Authors: Akide Liu, Zeyu Zhang, Zhexin Li, Xuehai Bai, Yizeng Han, Jiasheng Tang, Yuanjie Xing, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang, Gholamreza Haffari, Bohan Zhuang

    Abstract: Diffusion generative models have become the standard for producing high-quality, coherent video content, yet their slow inference speeds and high computational demands hinder practical deployment. Although both quantization and sparsity can independently accelerate inference while maintaining generation quality, naively combining these techniques in existing training-free approaches leads to signi… ▽ More

    Submitted 5 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: Project Page: https://fps.ziplab.co

  17. arXiv:2506.04308  [pdf, ps, other

    cs.RO cs.AI cs.CV

    RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

    Authors: Enshen Zhou, Jingkun An, Cheng Chi, Yi Han, Shanyu Rong, Chi Zhang, Pengwei Wang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, Shanghang Zhang

    Abstract: Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Project page: https://zhoues.github.io/RoboRefer/

  18. arXiv:2506.03478  [pdf, ps, other

    cs.GR cs.CV

    Facial Appearance Capture at Home with Patch-Level Reflectance Prior

    Authors: Yuxuan Han, Junfeng Lyu, Kuan Sheng, Minghao Que, Qixuan Zhang, Lan Xu, Feng Xu

    Abstract: Existing facial appearance capture methods can reconstruct plausible facial reflectance from smartphone-recorded videos. However, the reconstruction quality is still far behind the ones based on studio recordings. This paper fills the gap by developing a novel daily-used solution with a co-located smartphone and flashlight video capture setting in a dim room. To enhance the quality, our key observ… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: ACM Transactions on Graphics (Proc. of SIGGRAPH), 2025. Code: https://github.com/yxuhan/DoRA; Project Page: https://yxuhan.github.io/DoRA

  19. arXiv:2506.03176  [pdf, ps, other

    cs.LG

    Non-collective Calibrating Strategy for Time Series Forecasting

    Authors: Bin Wang, Yongqi Han, Minbo Ma, Tianrui Li, Junbo Zhang, Feng Hong, Yanwei Yu

    Abstract: Deep learning-based approaches have demonstrated significant advancements in time series forecasting. Despite these ongoing developments, the complex dynamics of time series make it challenging to establish the rule of thumb for designing the golden model architecture. In this study, we argue that refining existing advanced models through a universal calibrating strategy can deliver substantial be… ▽ More

    Submitted 2 July, 2025; v1 submitted 29 May, 2025; originally announced June 2025.

    Comments: Accepted by IJCAI 2025

  20. arXiv:2506.02929  [pdf, ps, other

    cs.AR

    Large Processor Chip Model

    Authors: Kaiyan Chang, Mingzhi Chen, Yunji Chen, Zhirong Chen, Dongrui Fan, Junfeng Gong, Nan Guo, Yinhe Han, Qinfen Hao, Shuo Hou, Xuan Huang, Pengwei Jin, Changxin Ke, Cangyuan Li, Guangli Li, Huawei Li, Kuan Li, Naipeng Li, Shengwen Liang, Cheng Liu, Hongwei Liu, Jiahua Liu, Junliang Lv, Jianan Mu, Jin Qin , et al. (18 additional authors not shown)

    Abstract: Computer System Architecture serves as a crucial bridge between software applications and the underlying hardware, encompassing components like compilers, CPUs, coprocessors, and RTL designs. Its development, from early mainframes to modern domain-specific architectures, has been driven by rising computational demands and advancements in semiconductor technology. However, traditional paradigms in… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  21. arXiv:2506.02875  [pdf, ps, other

    cs.CV

    NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

    Authors: Xiaohong Liu, Xiongkuo Min, Qiang Hu, Xiaoyun Zhang, Jie Guo, Guangtao Zhai, Shushi Wang, Yingjie Zhou, Lu Liu, Jingxin Li, Liu Yang, Farong Wen, Li Xu, Yanwei Jiang, Xilei Zhu, Chunyi Li, Zicheng Zhang, Huiyu Duan, Xiele Wu, Yixuan Gao, Yuqin Cao, Jun Jia, Wei Sun, Jiezhang Cao, Radu Timofte , et al. (70 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking he… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: NTIRE 2025 XGC Quality Assessment Challenge Report. arXiv admin note: text overlap with arXiv:2404.16687

  22. arXiv:2506.00671  [pdf, ps, other

    cs.CL

    DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA

    Authors: Yuelyu Ji, Hang Zhang, Shiven Verma, Hui Ji, Chun Li, Yushui Han, Yanshan Wang

    Abstract: We propose DeepRAG, a novel framework that integrates DeepSeek hierarchical question decomposition capabilities with RAG Gym unified retrieval-augmented generation optimization using process level supervision. Targeting the challenging MedHopQA biomedical question answering task, DeepRAG systematically decomposes complex queries into precise sub-queries and employs concept level reward signals inf… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  23. arXiv:2506.00486  [pdf, ps, other

    cs.LG cs.AI stat.ML

    It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

    Authors: Jun Wu, Yirong Xiong, Jiangtao Wen, Yuxing Han

    Abstract: Despite rapid advancements in the research and deployment of large language models (LLMs), the statistical distribution of model parameters, as well as their influence on initialization, training dynamics, and downstream efficiency, has received surprisingly little attention. A recent work introduced BackSlash, a training-time compression algorithm. It first demonstrated that pre-trained LLM param… ▽ More

    Submitted 4 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

  24. arXiv:2506.00089  [pdf, other

    cs.CY cs.AI

    TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

    Authors: Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han

    Abstract: The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sen… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  25. arXiv:2505.21545  [pdf, other

    cs.CV cs.LG

    Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation

    Authors: Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang

    Abstract: Latent Video Diffusion Models (LVDMs) achieve high-quality generation but are sensitive to imperfect conditioning, which causes semantic drift and temporal incoherence on noisy, web-scale video-text datasets. We introduce CAT-LVDM, the first corruption-aware training framework for LVDMs that improves robustness through structured, data-aligned noise injection. Our method includes Batch-Centered No… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/chikap421/catlvdm Models: https://huggingface.co/Chikap421/catlvdm-checkpoints/tree/main

  26. arXiv:2505.20773  [pdf, ps, other

    cs.IR

    Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation

    Authors: Wooseong Yang, Weizhi Zhang, Yuqing Liu, Yuwei Han, Yu Wang, Junhyun Lee, Philip S. Yu

    Abstract: Cold-start items remain a persistent challenge in recommender systems due to their lack of historical user interactions, which collaborative models rely on. While recent zero-shot methods leverage large language models (LLMs) to address this, they often struggle with sparse metadata and hallucinated or incomplete knowledge. We propose ColdRAG, a retrieval-augmented generation approach that builds… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 10 pages

    MSC Class: 68T05 68T05

  27. arXiv:2505.20650  [pdf, ps, other

    cs.CL cs.AI cs.CE

    FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

    Authors: Yan Wang, Yang Ren, Lingfei Qian, Xueqing Peng, Keyi Wang, Yi Han, Dongji Feng, Xiao-Yang Liu, Jimin Huang, Qianqian Xie

    Abstract: We introduce FinTagging, the first full-scope, table-aware XBRL benchmark designed to evaluate the structured information extraction and semantic alignment capabilities of large language models (LLMs) in the context of XBRL-based financial reporting. Unlike prior benchmarks that oversimplify XBRL tagging as flat multi-class classification and focus solely on narrative text, FinTagging decomposes t… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  28. arXiv:2505.20589  [pdf, ps, other

    cs.LG q-bio.QM

    Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction

    Authors: Mahdi Pourmirzaei, Farzaneh Esmaili, Salhuldin Alqarghuli, Mohammadreza Pourmirzaei, Ye Han, Kai Chen, Mohsen Rezaei, Duolin Wang, Dong Xu

    Abstract: The diverse nature of protein prediction tasks has traditionally necessitated specialized models, hindering the development of broadly applicable and computationally efficient Protein Language Models (PLMs). In this work, we introduce Prot2Token, a unified framework that overcomes these challenges by converting a wide spectrum of protein-related predictions, from sequence-level properties and resi… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  29. arXiv:2505.20536  [pdf, ps, other

    stat.ML cs.LG econ.EM stat.ME

    Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models

    Authors: Guanhao Zhou, Yuefeng Han, Xiufan Yu

    Abstract: This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that employs flexible model structures and powerful neural network architectures to cohesively deal with the underlying heterogeneity and nonlinearity of both panel uni… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  30. arXiv:2505.19607  [pdf, other

    cs.LG cs.AI

    Energy-based Preference Optimization for Test-time Adaptation

    Authors: Yewon Han, Seoyun Yang, Taesup Kim

    Abstract: Test-Time Adaptation (TTA) enhances model robustness by enabling adaptation to target distributions that differ from training distributions, improving real-world generalizability. Existing TTA approaches focus on adjusting the conditional distribution; however these methods often depend on uncertain predictions in the absence of label information, leading to unreliable performance. Energy-based fr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  31. arXiv:2505.19528  [pdf, other

    cs.CL cs.AI cs.CY

    AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection

    Authors: Yejin Lee, Joonghyuk Hahn, Hyeseon Ahn, Yo-Sub Han

    Abstract: Implicit hate speech detection is challenging due to its subtlety and reliance on contextual interpretation rather than explicit offensive words. Current approaches rely on contrastive learning, which are shown to be effective on distinguishing hate and non-hate sentences. Humans, however, detect implicit hate speech by first identifying specific targets within the text and subsequently interpreti… ▽ More

    Submitted 27 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 13 pages, 4 figures, Under Review

    MSC Class: 68T50 ACM Class: I.2.7

  32. arXiv:2505.19435  [pdf, ps, other

    cs.CL

    Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection

    Authors: Zhihong Pan, Kai Zhang, Yuze Zhao, Yupeng Han

    Abstract: The inherent capabilities of a language model (LM) and the reasoning strategies it employs jointly determine its performance in reasoning tasks. While test-time scaling is regarded as an effective approach to tackling complex reasoning tasks, it incurs substantial computational costs and often leads to "overthinking", where models become trapped in "thought pitfalls". To address this challenge, we… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  33. arXiv:2505.18695  [pdf, other

    cs.AI

    AI for Regulatory Affairs: Balancing Accuracy, Interpretability, and Computational Cost in Medical Device Classification

    Authors: Yu Han, Aaron Ceross, Jeroen H. M. Bergmann

    Abstract: Regulatory affairs, which sits at the intersection of medicine and law, can benefit significantly from AI-enabled automation. Classification task is the initial step in which manufacturers position their products to regulatory authorities, and it plays a critical role in determining market access, regulatory scrutiny, and ultimately, patient safety. In this study, we investigate a broad range of A… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  34. arXiv:2505.18654  [pdf, ps, other

    cs.IR

    MTGR: Industrial-Scale Generative Recommendation Framework in Meituan

    Authors: Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, Menglei Zhou, Lei Yu, Chuan Liu, Wei Lin

    Abstract: Scaling law has been extensively validated in many domains such as natural language processing and computer vision. In the recommendation system, recent work has adopted generative recommendations to achieve scalability, but their generative approaches require abandoning the carefully constructed cross features of traditional recommendation models. We found that this approach significantly degrade… ▽ More

    Submitted 20 June, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

  35. arXiv:2505.17477  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance

    Authors: Victor OK Li, Yang Han, Jacqueline CK Lam, Lawrence YL Cheung

    Abstract: This study introduces Reverse-Speech-Finder (RSF), a groundbreaking neural network backtracking architecture designed to enhance Alzheimer's Disease (AD) diagnosis through speech analysis. Leveraging the power of pre-trained large language models, RSF identifies and utilizes the most probable AD-specific speech markers, addressing both the scarcity of real AD speech samples and the challenge of li… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  36. arXiv:2505.15039  [pdf, ps, other

    cs.SE cs.AI

    LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming

    Authors: Sicheol Sung, Aditi, Dogyu kim, Yo-Sub Han, Sang-Ki Ko

    Abstract: Automated Test Case Generation (ATCG) is crucial for evaluating software reliability, particularly in competitive programming where robust algorithm assessments depend on diverse and accurate test cases. However, existing ATCG methods often fail to meet complex specifications or generate effective corner cases, limiting their utility. In this work, we introduce Context-Free Grammars with Counters… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  37. arXiv:2505.14305  [pdf, ps, other

    cs.CL

    JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling

    Authors: Jinwang Song, Hongying Zan, Kunli Zhang, Lingling Mu, Yingjie Han, Haobo Hua, Min Peng

    Abstract: Text-to-SQL, which maps natural language to SQL queries, has benefited greatly from recent advances in Large Language Models (LLMs). While LLMs offer various paradigms for this task, including prompting and supervised fine-tuning (SFT), SFT approaches still face challenges such as complex multi-stage pipelines and poor robustness to noisy schema information. To address these limitations, we presen… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Work in progress. 13 pages, 6 figures

  38. arXiv:2505.14244  [pdf, ps, other

    cs.CL

    TransBench: Benchmarking Machine Translation for Industrial-Scale Applications

    Authors: Haijun Li, Tianqi Shi, Zifu Shang, Yuxuan Han, Xueyu Zhao, Hao Wang, Yu Qian, Zhiqiang Qian, Linlong Xu, Minghao Wu, Chenyang Lyu, Longyue Wang, Gongbo Tang, Weihua Luo, Zhao Xu, Kaifu Zhang

    Abstract: Machine translation (MT) has become indispensable for cross-border communication in globalized industries like e-commerce, finance, and legal services, with recent advancements in large language models (LLMs) significantly enhancing translation quality. However, applying general-purpose MT models to industrial scenarios reveals critical limitations due to domain-specific terminology, cultural nuan… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  39. arXiv:2505.14059  [pdf, ps, other

    cs.CV

    Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

    Authors: Hao Feng, Shu Wei, Xiang Fei, Wei Shi, Yingdong Han, Lei Liao, Jinghui Lu, Binghong Wu, Qi Liu, Chunhui Lin, Jingqun Tang, Hao Liu, Can Huang

    Abstract: Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Current approaches either assemble specialized expert models or directly generate page-level content autoregressively, facing integration overhead, efficiency bottlenecks, and layout structure degradation despite their decent performance. To address these limitati… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025

  40. arXiv:2505.14024  [pdf, ps, other

    cs.LG cs.AI cs.CR cs.DC

    FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix

    Authors: Di Wu, Qian Li, Heng Yang, Yong Han

    Abstract: Federated Learning (FL) enables geographically distributed clients to collaboratively train machine learning models by sharing only their local models, ensuring data privacy. However, FL is vulnerable to untargeted attacks that aim to degrade the global model's performance on the underlying data distribution. Existing defense mechanisms attempt to improve FL's resilience against such attacks, but… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  41. arXiv:2505.11999  [pdf, ps, other

    cs.AI

    MRGRP: Empowering Courier Route Prediction in Food Delivery Service with Multi-Relational Graph

    Authors: Chang Liu, Huan Yan, Hongjie Sui, Haomin Wen, Yuan Yuan, Yuyang Han, Hongsen Liao, Xuetao Ding, Jinghua Hao, Yong Li

    Abstract: Instant food delivery has become one of the most popular web services worldwide due to its convenience in daily life. A fundamental challenge is accurately predicting courier routes to optimize task dispatch and improve delivery efficiency. This enhances satisfaction for couriers and users and increases platform profitability. The current heuristic prediction method uses only limited human-selecte… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  42. arXiv:2505.11166  [pdf, ps, other

    cs.CL cs.AI

    SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

    Authors: Huashan Sun, Shengyi Liao, Yansen Han, Yu Bai, Yang Gao, Cheng Fu, Weizhou Shen, Fanqi Wan, Ming Yan, Ji Zhang, Fei Huang

    Abstract: Despite advances in pretraining with extended context lengths, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by data quality issues, training inefficiencies, and the lack of well-designed optimization objectives. To address these limitations, we propose a framework named… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  43. arXiv:2505.10259  [pdf, other

    cs.LG

    SpecOffload: Unlocking Latent GPU Capacity for LLM Inference on Resource-Constrained Devices

    Authors: Xiangwen Zhuge, Xu Shen, Zeyu Wang, Fan Dang, Xuan Ding, Danyang Li, Yahui Han, Tianxiang Hao, Zheng Yang

    Abstract: Efficient LLM inference on resource-constrained devices presents significant challenges in compute and memory utilization. Due to limited GPU memory, existing systems offload model weights to CPU memory, incurring substantial I/O overhead between the CPU and GPU. This leads to two major inefficiencies: (1) GPU cores are underutilized, often remaining idle while waiting for data to be loaded; and (… ▽ More

    Submitted 21 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  44. arXiv:2505.10183  [pdf, other

    cs.DC cs.AI

    KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems

    Authors: Jieke Lin, Wanyu Wang, Longxiang Yin, Yinhe Han

    Abstract: Embodied Artificial Intelligence (AI) systems, such as autonomous robots and intelligent vehicles, are increasingly reliant on diverse heterogeneous accelerators (e.g., GPGPUs, NPUs, FPGAs) to meet stringent real-time processing and energy-efficiency demands. However, the proliferation of vendor-specific proprietary communication libraries creates significant interoperability barriers, hindering s… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures. Jieke Lin and Wanyu Wang contributed equally to this work

  45. arXiv:2505.07374  [pdf, other

    cs.AI cs.LG

    AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review

    Authors: Zhiye Xie, Enmei Tu, Xianping Fu, Guoliang Yuan, Yi Han

    Abstract: With the increasing demands for safety, efficiency, and sustainability in global shipping, Automatic Identification System (AIS) data plays an increasingly important role in maritime monitoring. AIS data contains spatial-temporal variation patterns of vessels that hold significant research value in the marine domain. However, due to its massive scale, the full potential of AIS data has long remain… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  46. arXiv:2505.06738  [pdf, ps, other

    cs.CR

    I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference

    Authors: Zibo Gao, Junjie Hu, Feng Guo, Yixin Zhang, Yinglong Han, Siyuan Liu, Haiyang Li, Zhiqiang Lv

    Abstract: Large Language Models (LLMs) that can be deployed locally have recently gained popularity for privacy-sensitive tasks, with companies such as Meta, Google, and Intel playing significant roles in their development. However, the security of local LLMs through the lens of hardware cache side-channels remains unexplored. In this paper, we unveil novel side-channel vulnerabilities in local LLM inferenc… ▽ More

    Submitted 15 June, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

    Comments: Submitted for review in January 22, 2025, revised under shepherding

    ACM Class: K.6.5

  47. arXiv:2505.06285  [pdf, other

    eess.SP cs.CV

    FEMSN: Frequency-Enhanced Multiscale Network for fault diagnosis of rotating machinery under strong noise environments

    Authors: Yuhan Yuan, Xiaomo Jiang, Yanfeng Han, Ke Xiao

    Abstract: Rolling bearings are critical components of rotating machinery, and their proper functioning is essential for industrial production. Most existing condition monitoring methods focus on extracting discriminative features from time-domain signals to assess bearing health status. However, under complex operating conditions, periodic impulsive characteristics related to fault information are often obs… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  48. arXiv:2505.02655  [pdf, ps, other

    cs.LG cs.AI

    SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting

    Authors: Shiwei Guo, Ziang Chen, Yupeng Ma, Yunfei Han, Yi Wang

    Abstract: The Transformer model has shown strong performance in multivariate time series forecasting by leveraging channel-wise self-attention. However, this approach lacks temporal constraints when computing temporal features and does not utilize cumulative historical series effectively.To address these limitations, we propose the Structured Channel-wise Transformer with Cumulative Historical state (SCForm… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  49. arXiv:2505.02444  [pdf, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface Aided Integrated Communication and Localization with a Single Access Point

    Authors: Xiyu Wang, Yixuan Huang, Jie Yang, Yu Han, Shi Jin

    Abstract: Reconfigurable intelligent surfaces (RISs) not only assist communication but also help the localization of user equipment (UE). This study focuses on the indoor localization of UE with a single access point (AP) aided by multiple RISs. First, we propose a two-stage channel estimation scheme where the phase shifts of RIS elements are tuned to obtain multiple channel soundings. In the first stage, t… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: accepted by China Communications

  50. arXiv:2505.02073  [pdf, other

    cs.LG cs.AI

    Lightweight Defense Against Adversarial Attacks in Time Series Classification

    Authors: Yi Han

    Abstract: As time series classification (TSC) gains prominence, ensuring robust TSC models against adversarial attacks is crucial. While adversarial defense is well-studied in Computer Vision (CV), the TSC field has primarily relied on adversarial training (AT), which is computationally expensive. In this paper, five data augmentation-based defense methods tailored for time series are developed, with the mo… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 13 pages, 8 figures. Accepted at RAFDA Workshop, PAKDD 2025 (Springer, EI & Scopus indexed). Code: https://github.com/Yi126/Lightweight-Defence

    MSC Class: 68T05; 62H30 ACM Class: I.2.6; I.5.1; G.3