Skip to main content

Showing 1–50 of 1,013 results for author: Han, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09203  [pdf, other

    cond-mat.mtrl-sci cond-mat.supr-con cs.AI cs.LG

    InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials

    Authors: Xiao-Qi Han, Peng-Jie Guo, Ze-Feng Gao, Hao Sun, Zhong-Yi Lu

    Abstract: Developing inverse design methods for functional materials with specific properties is critical to advancing fields like renewable energy, catalysis, energy storage, and carbon capture. Generative models based on diffusion principles can directly produce new materials that meet performance constraints, thereby significantly accelerating the material design process. However, existing methods for ge… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 29 pages, 11 figures

  2. arXiv:2505.08744  [pdf, other

    cs.AI

    DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

    Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang , et al. (6 additional authors not shown)

    Abstract: To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 14 pages, 4 figures

  3. arXiv:2505.08235  [pdf, other

    cs.CV

    EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation

    Authors: Hanle Zheng, Xujie Han, Zegang Peng, Shangbin Zhang, Guangxun Du, Zhuo Zou, Xilin Wang, Jibin Wu, Hao Guo, Lei Deng

    Abstract: Video Frame Interpolation (VFI) is a fundamental yet challenging task in computer vision, particularly under conditions involving large motion, occlusion, and lighting variation. Recent advancements in event cameras have opened up new opportunities for addressing these challenges. While existing event-based VFI methods have succeeded in recovering large and complex motions by leveraging handcrafte… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.07674  [pdf

    cs.LG

    Joint Graph Convolution and Sequential Modeling for Scalable Network Traffic Estimation

    Authors: Nan Jiang, Wenxuan Zhu, Xu Han, Weiqiang Huang, Yumeng Sun

    Abstract: This study focuses on the challenge of predicting network traffic within complex topological environments. It introduces a spatiotemporal modeling approach that integrates Graph Convolutional Networks (GCN) with Gated Recurrent Units (GRU). The GCN component captures spatial dependencies among network nodes, while the GRU component models the temporal evolution of traffic data. This combination al… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.06690  [pdf

    cs.LG

    E2E-FANet: A Highly Generalizable Framework for Waves prediction Behind Floating Breakwaters via Exogenous-to-Endogenous Variable Attention

    Authors: Jianxin Zhang, Lianzi Jiang, Xinyu Han, Xiangrong Wang, Weinan Huang

    Abstract: Accurate prediction of waves behind floating breakwaters (FB) is crucial for optimizing coastal engineering structures, enhancing safety, and improving design efficiency. Existing methods demonstrate limitations in capturing nonlinear interactions between waves and structures, while exhibiting insufficient capability in modeling the complex frequency-domain relationships among elevations of differ… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  6. arXiv:2505.06688  [pdf

    cs.LG

    A Novel Framework for Significant Wave Height Prediction based on Adaptive Feature Extraction Time-Frequency Network

    Authors: Jianxin Zhang, Lianzi Jiang, Xinyu Han, Xiangrong Wang

    Abstract: Precise forecasting of significant wave height (Hs) is essential for the development and utilization of wave energy. The challenges in predicting Hs arise from its non-linear and non-stationary characteristics. The combination of decomposition preprocessing and machine learning models have demonstrated significant effectiveness in Hs prediction by extracting data features. However, decomposing the… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  7. arXiv:2505.06145  [pdf

    cs.CL cs.LG

    Towards Robust Few-Shot Text Classification Using Transformer Architectures and Dual Loss Strategies

    Authors: Xu Han, Yumeng Sun, Weiqiang Huang, Hongye Zheng, Junliang Du

    Abstract: Few-shot text classification has important application value in low-resource environments. This paper proposes a strategy that combines adaptive fine-tuning, contrastive learning, and regularization optimization to improve the classification performance of Transformer-based models. Experiments on the FewRel 2.0 dataset show that T5-small, DeBERTa-v3, and RoBERTa-base perform well in few-shot tasks… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  8. arXiv:2505.06118  [pdf, ps, other

    eess.IV cs.AI cs.CV

    The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review

    Authors: Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying

    Abstract: Automatic lymph node segmentation is the cornerstone for advances in computer vision tasks for early detection and staging of cancer. Traditional segmentation methods are constrained by manual delineation and variability in operator proficiency, limiting their ability to achieve high accuracy. The introduction of deep learning technologies offers new possibilities for improving the accuracy of lym… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  9. arXiv:2505.05989  [pdf

    cs.IR cs.LG

    Modeling Multi-Hop Semantic Paths for Recommendation in Heterogeneous Information Networks

    Authors: Hongye Zheng, Yue Xing, Lipeng Zhu, Xu Han, Junliang Du, Wanyu Cui

    Abstract: This study focuses on the problem of path modeling in heterogeneous information networks and proposes a multi-hop path-aware recommendation framework. The method centers on multi-hop paths composed of various types of entities and relations. It models user preferences through three stages: path selection, semantic representation, and attention-based fusion. In the path selection stage, a path filt… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  10. arXiv:2505.05427  [pdf, other

    cs.CL

    Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data

    Authors: Yudong Wang, Zixuan Fu, Jie Cai, Peijun Tang, Hongya Lyu, Yewei Fang, Zhi Zheng, Jie Zhou, Guoyang Zeng, Chaojun Xiao, Xu Han, Zhiyuan Liu

    Abstract: Data quality has become a key factor in enhancing model performance with the rapid development of large language models (LLMs). Model-driven data filtering has increasingly become a primary approach for acquiring high-quality data. However, it still faces two main challenges: (1) the lack of an efficient data verification strategy makes it difficult to provide timely feedback on data quality; and… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: The datasets are available on https://huggingface.co/datasets/openbmb/UltraFineWeb

  11. arXiv:2505.05112  [pdf, ps, other

    eess.IV cs.CV

    MDAA-Diff: CT-Guided Multi-Dose Adaptive Attention Diffusion Model for PET Denoising

    Authors: Xiaolong Niu, Zanting Ye, Xu Han, Yanchao Huang, Hao Sun, Hubing Wu, Lijun Lu

    Abstract: Acquiring high-quality Positron Emission Tomography (PET) images requires administering high-dose radiotracers, which increases radiation exposure risks. Generating standard-dose PET (SPET) from low-dose PET (LPET) has become a potential solution. However, previous studies have primarily focused on single low-dose PET denoising, neglecting two critical factors: discrepancies in dose response cause… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  12. arXiv:2505.04622  [pdf, other

    cs.GR cs.CV

    PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

    Authors: Jingwen Ye, Yuze He, Yanning Zhou, Yiqin Zhu, Kaiwen Xiao, Yong-Jin Liu, Wei Yang, Xiao Han

    Abstract: Shape primitive abstraction, which decomposes complex 3D shapes into simple geometric elements, plays a crucial role in human visual cognition and has broad applications in computer vision and graphics. While recent advances in 3D content generation have shown remarkable progress, existing primitive abstraction methods either rely on geometric optimization with limited semantic understanding or le… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025. 14 pages, 15 figures

  13. arXiv:2505.04608  [pdf, other

    cs.LG cs.AI stat.ML

    WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales

    Authors: Drew Prinster, Xing Han, Anqi Liu, Suchi Saria

    Abstract: Responsibly deploying artificial intelligence (AI) / machine learning (ML) systems in high-stakes settings arguably requires not only proof of system reliability, but moreover continual, post-deployment monitoring to quickly detect and address any unsafe behavior. Statistical methods for nonparametric change-point detection -- especially the tools of conformal test martingales (CTMs) and anytime-v… ▽ More

    Submitted 12 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: To be published in The International Conference on Machine Learning (ICML), 2025

  14. arXiv:2505.03135  [pdf, other

    cs.AI

    Holmes: Automated Fact Check with Large Language Models

    Authors: Haoran Ou, Gelei Deng, Xingshuo Han, Jie Zhang, Xinlei He, Han Qiu, Shangwei Guo, Tianwei Zhang

    Abstract: The rise of Internet connectivity has accelerated the spread of disinformation, threatening societal trust, decision-making, and national security. Disinformation has evolved from simple text to complex multimodal forms combining images and text, challenging existing detection methods. Traditional deep learning models struggle to capture the complexity of multimodal disinformation. Inspired by adv… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  15. arXiv:2505.01838  [pdf, other

    cs.CV

    MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization

    Authors: Chenghong Li, Hongjie Liao, Yihao Zhi, Xihe Yang, Zhengwentai Sun, Jiahao Chang, Shuguang Cui, Xiaoguang Han

    Abstract: In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while significant progress has been achieved in object-centric tasks through large-scale datasets like Objaverse and MVImgNet, human-centric tasks have seen limited advancement, largely due to the absence of a comparable larg… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: project page: https://kevinlee09.github.io/research/MVHumanNet++/. arXiv admin note: substantial text overlap with arXiv:2312.02963

  16. arXiv:2505.00161  [pdf, other

    cs.RO

    Optimized Lattice-Structured Flexible EIT Sensor for Tactile Reconstruction and Classification

    Authors: Huazhi Dong, Sihao Teng, Xu Han, Xiaopeng Wu, Francesco Giorgio-Serchi, Yunjie Yang

    Abstract: Flexible electrical impedance tomography (EIT) offers a promising alternative to traditional tactile sensing approaches, enabling low-cost, scalable, and deformable sensor designs. Here, we propose an optimized lattice-structured flexible EIT tactile sensor incorporating a hydrogel-based conductive layer, systematically designed through three-dimensional coupling field simulations to optimize stru… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  17. arXiv:2504.21530  [pdf, other

    cs.RO cs.CV

    RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

    Authors: Haifeng Huang, Xinyi Chen, Yilun Chen, Hao Li, Xiaoshen Han, Zehan Wang, Tai Wang, Jiangmiao Pang, Zhou Zhao

    Abstract: Recent advancements in robotic manipulation have highlighted the potential of intermediate representations for improving policy generalization. In this work, we explore grounding masks as an effective intermediate representation, balancing two key advantages: (1) effective spatial guidance that specifies target objects and placement areas while also conveying information about object shape and siz… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  18. arXiv:2504.21394  [pdf, other

    cs.OS

    Concurrency Testing in the Linux Kernel via eBPF

    Authors: Jiacheng Xu, Dylan Wolff, Xing Yi Han, Jialin Li, Abhik Roychoudhury

    Abstract: Concurrency is vital for our critical software to meet modern performance requirements, yet concurrency bugs are notoriously difficult to detect and reproduce. Controlled Concurrency Testing (CCT) can make bugs easier to expose by enabling control over thread interleavings and systematically exploring the interleaving space through scheduling algorithms. However, existing CCT solutions for kernel… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  19. arXiv:2504.18629  [pdf, other

    cs.CY stat.AP

    Fairness Is More Than Algorithms: Racial Disparities in Time-to-Recidivism

    Authors: Jessy Xinyi Han, Kristjan Greenewald, Devavrat Shah

    Abstract: Racial disparities in recidivism remain a persistent challenge within the criminal justice system, increasingly exacerbated by the adoption of algorithmic risk assessment tools. Past works have primarily focused on bias induced by these tools, treating recidivism as a binary outcome. Limited attention has been given to non-algorithmic factors (including socioeconomic ones) in driving racial dispar… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  20. arXiv:2504.15721  [pdf, other

    cs.AR

    BBAL: A Bidirectional Block Floating Point-Based Quantisation Accelerator for Large Language Models

    Authors: Xiaomeng Han, Yuan Cheng, Jing Wang, Junyang Lu, Hui Wang, X. x. Zhang, Ning Xu, Dawei Yang, Zhe Jiang

    Abstract: Large language models (LLMs), with their billions of parameters, pose substantial challenges for deployment on edge devices, straining both memory capacity and computational resources. Block Floating Point (BFP) quantisation reduces memory and computational overhead by converting high-overhead floating point operations into low-bit fixed point operations. However, BFP requires aligning all data to… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  21. Shape-Guided Clothing Warping for Virtual Try-On

    Authors: Xiaoyu Han, Shunyuan Zheng, Zonglin Li, Chenyang Wang, Xin Sun, Quanling Meng

    Abstract: Image-based virtual try-on aims to seamlessly fit in-shop clothing to a person image while maintaining pose consistency. Existing methods commonly employ the thin plate spline (TPS) transformation or appearance flow to deform in-shop clothing for aligning with the person's body. Despite their promising performance, these methods often lack precise control over fine details, leading to inconsistenc… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by ACM MM 2024. The code is available at https://github.com/xyhanHIT/SCW-VTON

  22. arXiv:2504.14621  [pdf, other

    cs.CV

    Talk is Not Always Cheap: Promoting Wireless Sensing Models with Text Prompts

    Authors: Zhenkui Yang, Zeyi Huang, Ge Wang, Han Ding, Tony Xiao Han, Fei Wang

    Abstract: Wireless signal-based human sensing technologies, such as WiFi, millimeter-wave (mmWave) radar, and Radio Frequency Identification (RFID), enable the detection and interpretation of human presence, posture, and activities, thereby providing critical support for applications in public security, healthcare, and smart environments. These technologies exhibit notable advantages due to their non-contac… ▽ More

    Submitted 22 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: 10 pages

  23. arXiv:2504.14239  [pdf, other

    cs.AI cs.CL

    InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

    Authors: Yuhang Liu, Pengxiang Li, Congkai Xie, Xavier Hu, Xiaotian Han, Shengyu Zhang, Hongxia Yang, Fei Wu

    Abstract: Multimodal Large Language Models (MLLMs) have powered Graphical User Interface (GUI) Agents, showing promise in automating tasks on computing devices. Recent works have begun exploring reasoning in GUI tasks with encouraging results. However, many current approaches rely on manually designed reasoning templates, which may result in reasoning that is not sufficiently robust and adaptive for complex… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: 10 pages, 3 figures, work in progress

  24. arXiv:2504.13825  [pdf, other

    cs.CL cs.LG

    Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

    Authors: Junjie Yang, Junhao Song, Xudong Han, Ziqian Bi, Tianyang Wang, Chia Xin Liang, Xinyuan Song, Yichao Zhang, Qian Niu, Benji Peng, Keyu Chen, Ming Liu

    Abstract: Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various applications including image classification, object detection, language modeling, text classification, and sentiment analysis. Recent innovations in KD methods, suc… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  25. arXiv:2504.13420  [pdf, other

    cs.RO cs.SE

    Testing the Fault-Tolerance of Multi-Sensor Fusion Perception in Autonomous Driving Systems

    Authors: Haoxiang Tian, Wenqiang Ding, Xingshuo Han, Guoquan Wu, An Guo, Junqi Zhang. Wei Chen, Jun Wei, Tianwei Zhang

    Abstract: High-level Autonomous Driving Systems (ADSs), such as Google Waymo and Baidu Apollo, typically rely on multi-sensor fusion (MSF) based approaches to perceive their surroundings. This strategy increases perception robustness by combining the respective strengths of the camera and LiDAR and directly affects the safety-critical driving decisions of autonomous vehicles (AVs). However, in real-world au… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  26. arXiv:2504.13392  [pdf, ps, other

    cs.CV cs.HC

    POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation

    Authors: Evans Xu Han, Alice Qian Zhang, Hong Shen, Haiyi Zhu, Paul Pu Liang, Jane Hsieh

    Abstract: State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks -- offering the ability to generate (rather than search for) novel and unprecedented (instead of existing) images of considerable quality that also adhere to boundless combinations of user specifications. However, many large-scale text-to-image systems are designed for… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  27. arXiv:2504.12800  [pdf, other

    cs.GR cs.CV

    CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation

    Authors: Yifei Tong, Runze Tian, Xiao Han, Dingyao Liu, Fenggen Yu, Yan Zhang

    Abstract: As 3D Gaussian Splatting (3DGS) gains popularity as a 3D representation of real scenes, enabling user-friendly deformation to create novel scenes while preserving fine details from the original 3DGS has attracted significant research attention. We introduce CAGE-GS, a cage-based 3DGS deformation method that seamlessly aligns a source 3DGS scene with a user-defined target shape. Our approach learns… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  28. arXiv:2504.12788  [pdf, other

    cs.GR cs.CV

    ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior

    Authors: Xiao Han, Runze Tian, Yifei Tong, Fenggen Yu, Dingyao Liu, Yan Zhang

    Abstract: Drag-driven editing has become popular among designers for its ability to modify complex geometric structures through simple and intuitive manipulation, allowing users to adjust and reshape content with minimal technical skill. This drag operation has been incorporated into numerous methods to facilitate the editing of 2D images and 3D meshes in design. However, few studies have explored drag-driv… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  29. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  30. arXiv:2504.12329  [pdf, other

    cs.CL cs.AI

    Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

    Authors: Wang Yang, Xiang Yue, Vipin Chaudhary, Xiaotian Han

    Abstract: Recent advances leverage post-training to enhance model reasoning performance, which typically requires costly training pipelines and still suffers from inefficient, overly lengthy outputs. We introduce Speculative Thinking, a training-free framework that enables large reasoning models to guide smaller ones during inference at the reasoning level, distinct from speculative decoding, which operates… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  31. arXiv:2504.11966  [pdf, other

    cs.CV cs.LG cs.RO eess.IV

    Exploring Video-Based Driver Activity Recognition under Noisy Labels

    Authors: Linjuan Fan, Di Wen, Kunyu Peng, Kailun Yang, Jiaming Zhang, Ruiping Liu, Yufan Chen, Junwei Zheng, Jiamin Wu, Xudong Han, Rainer Stiefelhagen

    Abstract: As an open research topic in the field of deep learning, learning with noisy labels has attracted much attention and grown rapidly over the past ten years. Learning with label noise is crucial for driver distraction behavior recognition, as real-world video data often contains mislabeled samples, impacting model reliability and performance. However, label noise learning is barely explored in the d… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: The source code is available at https://github.com/ilonafan/DAR-noisy-labels

  32. arXiv:2504.10937  [pdf, other

    cs.DB

    Finding Locally Densest Subgraphs: Convex Programming with Edge and Triangle Density

    Authors: Yi Yang, Chenhao Ma, Reynold Cheng, Laks V. S. Lakshmanan, Xiaolin Han

    Abstract: Finding the densest subgraph (DS) from a graph is a fundamental problem in graph databases. The DS obtained, which reveals closely related entities, has been found to be useful in various application domains such as e-commerce, social science, and biology. However, in a big graph that contains billions of edges, it is desirable to find more than one subgraph cluster that is not necessarily the den… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  33. arXiv:2504.10906  [pdf, other

    cs.CL

    Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From

    Authors: Changjiang Gao, Hankun Lin, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Jiajun Chen

    Abstract: The ability of cross-lingual context retrieval is a fundamental aspect of cross-lingual alignment of large language models (LLMs), where the model extracts context information in one language based on requests in another language. Despite its importance in real-life applications, this ability has not been adequately investigated for state-of-the-art models. In this paper, we evaluate the cross-lin… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  34. arXiv:2504.10893  [pdf, other

    cs.AI cs.CL

    ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search

    Authors: Yize Zhang, Tianshu Wang, Sirui Chen, Kun Wang, Xingyu Zeng, Hongyu Lin, Xianpei Han, Le Sun, Chaochao Lu

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities and are receiving increasing attention to enhance their reasoning through scaling test--time compute. However, their application in open--ended, knowledge--intensive, complex reasoning scenarios is still limited. Reasoning--oriented methods struggle to generalize to open--ended scenarios due to implicit assumptions of complete… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Project homepage: https://opencausalab.github.io/ARise

  35. arXiv:2504.09466  [pdf, other

    cs.CR cs.CL

    AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender

    Authors: Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu

    Abstract: Despite extensive efforts in safety alignment, large language models (LLMs) remain vulnerable to jailbreak attacks. Activation steering offers a training-free defense method but relies on fixed steering coefficients, resulting in suboptimal protection and increased false rejections of benign inputs. To address this, we propose AdaSteer, an adaptive activation steering method that dynamically adjus… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 17 pages, 6 figures, 9 tables

  36. arXiv:2504.09344  [pdf

    cs.LG

    Context-Aware Adaptive Sampling for Intelligent Data Acquisition Systems Using DQN

    Authors: Weiqiang Huang, Juecen Zhan, Yumeng Sun, Xu Han, Tai An, Nan Jiang

    Abstract: Multi-sensor systems are widely used in the Internet of Things, environmental monitoring, and intelligent manufacturing. However, traditional fixed-frequency sampling strategies often lead to severe data redundancy, high energy consumption, and limited adaptability, failing to meet the dynamic sensing needs of complex environments. To address these issues, this paper proposes a DQN-based multi-sen… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  37. arXiv:2504.09294  [pdf, other

    cs.RO

    Adaptive Planning Framework for UAV-Based Surface Inspection in Partially Unknown Indoor Environments

    Authors: Hanyu Jin, Zhefan Xu, Haoyu Shen, Xinming Han, Kanlong Ye, Kenji Shimada

    Abstract: Inspecting indoor environments such as tunnels, industrial facilities, and construction sites is essential for infrastructure monitoring and maintenance. While manual inspection in these environments is often time-consuming and potentially hazardous, Unmanned Aerial Vehicles (UAVs) can improve efficiency by autonomously handling inspection tasks. Such inspection tasks usually rely on reference map… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  38. arXiv:2504.08257  [pdf, other

    physics.app-ph cs.AI

    Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions

    Authors: Yingqian Xu, Xiaohan Li, Caihua Wan, Ran Zhang, Bin He, Shiqiang Liu, Jihao Xia, Dehao Kong, Shilong Xiong, Guoqiang Yu, Xiufeng Han

    Abstract: Bayesian networks play an increasingly important role in data mining, inference, and reasoning with the rapid development of artificial intelligence. In this paper, we present proof-of-concept experiments demonstrating the use of spin-orbit torque magnetic tunnel junctions (SOT-MTJs) in Bayesian network reasoning. Not only can the target probability distribution function (PDF) of a Bayesian networ… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  39. arXiv:2504.08129  [pdf, other

    cs.LG cs.SI

    Between Linear and Sinusoidal: Rethinking the Time Encoder in Dynamic Graph Learning

    Authors: Hsing-Huan Chung, Shravan Chaudhari, Xing Han, Yoav Wald, Suchi Saria, Joydeep Ghosh

    Abstract: Dynamic graph learning is essential for applications involving temporal networks and requires effective modeling of temporal relationships. Seminal attention-based models like TGAT and DyGFormer rely on sinusoidal time encoders to capture temporal relationships between edge events. In this paper, we study a simpler alternative: the linear time encoder, which avoids temporal information loss caused… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  40. arXiv:2504.07827  [pdf, other

    eess.IV cs.CV

    HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss

    Authors: Yi Huang, Ke Zhang, Wei Liu, Yuanyuan Wang, Vishal M. Patel, Le Lu, Xu Han, Dakai Jin, Ke Yan

    Abstract: Accurate segmentation of tubular structures in medical images, such as vessels and airway trees, is crucial for computer-aided diagnosis, radiotherapy, and surgical planning. However, significant challenges exist in algorithm design when faced with diverse sizes, complex topologies, and (often) incomplete data annotation of these structures. We address these difficulties by proposing a new tubular… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  41. arXiv:2504.07813  [pdf, other

    cs.CV

    P2Object: Single Point Supervised Object Detection and Instance Segmentation

    Authors: Pengfei Chen, Xuehui Yu, Xumeng Han, Kuiran Wang, Guorong Li, Lingxi Xie, Zhenjun Han, Jianbin Jiao

    Abstract: Object recognition using single-point supervision has attracted increasing attention recently. However, the performance gap compared with fully-supervised algorithms remains large. Previous works generated class-agnostic \textbf{\textit{proposals in an image}} offline and then treated mixed candidates as a single bag, putting a huge burden on multiple instance learning (MIL). In this paper, we int… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Accepted by IJCV

  42. arXiv:2504.06011  [pdf, other

    cs.CL

    Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi

    Authors: Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das, Dhruv Sahnan, Xudong Han, Haonan Li, Aaryamonvikram Singh, Alok Anil Jadhav, Utkarsh Agarwal, Mukund Choudhary, Debopriyo Banerjee, Fajri Koto, Junaid Bhat, Awantika Shukla, Samujjwal Ghosh, Samta Kamboj, Onkar Pandit, Lalit Pradhan, Rahul Pal, Sunil Sahu, Soundar Doraiswamy, Parvez Mullah, Ali El Filali, Neha Sengupta, Gokul Ramakrishnan , et al. (5 additional authors not shown)

    Abstract: Developing high-quality large language models (LLMs) for moderately resourced languages presents unique challenges in data availability, model adaptation, and evaluation. We introduce Llama-3-Nanda-10B-Chat, or Nanda for short, a state-of-the-art Hindi-centric instruction-tuned generative LLM, designed to push the boundaries of open-source Hindi language models. Built upon Llama-3-8B, Nanda incorp… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  43. arXiv:2504.05613  [pdf, other

    cs.CV

    Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation

    Authors: Xiao Zhang, Xiangyu Han, Xiwen Lai, Yao Sun, Pei Zhang, Konrad Kording

    Abstract: Today's unsupervised image segmentation algorithms often segment suboptimally. Modern graph-cut based approaches rely on high-dimensional attention maps from Transformer-based foundation models, typically employing a relaxed Normalized Cut solved recursively via the Fiedler vector (the eigenvector of the second smallest eigenvalue). Consequently, they still lag behind supervised methods in both ma… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  44. arXiv:2504.04001  [pdf, other

    cs.CV cs.AI

    Edge Approximation Text Detector

    Authors: Chuang Yang, Xu Han, Tao Han, Han Han, Bingxuan Zhao, Qi Wang

    Abstract: Pursuing efficient text shape representations helps scene text detection models focus on compact foreground regions and optimize the contour reconstruction steps to simplify the whole detection pipeline. Current approaches either represent irregular shapes via box-to-polygon strategy or decomposing a contour into pieces for fitting gradually, the deficiency of coarse contours or complex pipelines… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  45. arXiv:2504.02477  [pdf, other

    cs.RO cs.CV

    Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision

    Authors: Xiaofeng Han, Shunpeng Chen, Zenghuang Fu, Zhe Feng, Lue Fan, Dong An, Changwei Wang, Li Guo, Weiliang Meng, Xiaopeng Zhang, Rongtao Xu, Shibiao Xu

    Abstract: Robot vision has greatly benefited from advancements in multimodal fusion techniques and vision-language models (VLMs). We systematically review the applications of multimodal fusion in key robotic vision tasks, including semantic scene understanding, simultaneous localization and mapping (SLAM), 3D object detection, navigation and localization, and robot manipulation. We compare VLMs based on lar… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 27 pages, 11 figures, survey paper submitted to Information Fusion

  46. arXiv:2504.02310  [pdf

    cs.CL

    Improving Harmful Text Detection with Joint Retrieval and External Knowledge

    Authors: Zidong Yu, Shuo Wang, Nan Jiang, Weiqiang Huang, Xu Han, Junliang Du

    Abstract: Harmful text detection has become a crucial task in the development and deployment of large language models, especially as AI-generated content continues to expand across digital platforms. This study proposes a joint retrieval framework that integrates pre-trained language models with knowledge graphs to improve the accuracy and robustness of harmful text detection. Experimental results demonstra… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  47. arXiv:2504.01801  [pdf, other

    cs.CL

    Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training

    Authors: Zhijun Wang, Jiahuan Li, Hao Zhou, Rongxiang Weng, Jingang Wang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Large language models (LLMs) exhibit remarkable multilingual capabilities despite the extreme language imbalance in the pre-training data. In this paper, we closely examine the reasons behind this phenomenon, focusing on the pre-training corpus. We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. We conduct an… ▽ More

    Submitted 22 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  48. arXiv:2504.00502  [pdf, other

    cs.CV cs.CL

    ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

    Authors: Qianhao Yuan, Qingyu Zhang, Yanjiang Liu, Jiawei Chen, Yaojie Lu, Hongyu Lin, Jia Zheng, Xianpei Han, Le Sun

    Abstract: Multimodal Large Language Models (MLLMs) suffer from high computational costs due to their massive size and the large number of visual tokens. In this paper, we investigate layer-wise redundancy in MLLMs by introducing a novel metric, Layer Contribution (LC), which quantifies the impact of a layer's transformations on visual and text tokens, respectively. The calculation of LC involves measuring t… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Project page: https://github.com/icip-cas/ShortV

  49. arXiv:2504.00472  [pdf, other

    cs.CL cs.AI

    Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning

    Authors: Ruoxi Xu, Yunjie Ji, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Yingfei Sun, Xiangang Li, Le Sun

    Abstract: Although large language models (LLMs) excel in knowledge recall and reasoning, their static nature leads to outdated information as the real world evolves or when adapting to domain-specific knowledge, highlighting the need for effective knowledge injection. However, current research on knowledge injection remains superficial, mainly focusing on knowledge memorization and retrieval. This paper pro… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  50. arXiv:2503.23463  [pdf, other

    cs.CV

    OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model

    Authors: Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, Alois C. Knoll

    Abstract: We present OpenDriveVLA, a Vision-Language Action (VLA) model designed for end-to-end autonomous driving. OpenDriveVLA builds upon open-source pre-trained large Vision-Language Models (VLMs) to generate reliable driving actions, conditioned on 3D environmental perception, ego vehicle states, and driver commands. To bridge the modality gap between driving visual representations and language embeddi… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.