Skip to main content

Showing 1–50 of 806 results for author: Zheng, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08235  [pdf, other

    cs.CV

    EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation

    Authors: Hanle Zheng, Xujie Han, Zegang Peng, Shangbin Zhang, Guangxun Du, Zhuo Zou, Xilin Wang, Jibin Wu, Hao Guo, Lei Deng

    Abstract: Video Frame Interpolation (VFI) is a fundamental yet challenging task in computer vision, particularly under conditions involving large motion, occlusion, and lighting variation. Recent advancements in event cameras have opened up new opportunities for addressing these challenges. While existing event-based VFI methods have succeeded in recovering large and complex motions by leveraging handcrafte… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2505.07611  [pdf

    cs.CV

    Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions

    Authors: Yi Zhang, Wenye Zhou, Ruonan Lin, Xin Yang, Hao Zheng

    Abstract: Traffic accident prediction and detection are critical for enhancing road safety,and vision-based traffic accident anticipation (Vision-TAA) has emerged as a promising approach in the era of deep learning.This paper reviews 147 recent studies,focusing on the application of supervised,unsupervised,and hybrid deep learning models for accident prediction,alongside the use of real-world and synthetic… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  3. arXiv:2505.07373  [pdf, other

    cs.CV

    Geometric Prior-Guided Neural Implicit Surface Reconstruction in the Wild

    Authors: Lintao Xiang, Hongpei Zheng, Bailin Deng, Hujun Yin

    Abstract: Neural implicit surface reconstruction using volume rendering techniques has recently achieved significant advancements in creating high-fidelity surfaces from multiple 2D images. However, current methods primarily target scenes with consistent illumination and struggle to accurately reconstruct 3D geometry in uncontrolled environments with transient occlusions or varying appearances. While some n… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  4. arXiv:2505.06899  [pdf, ps, other

    cs.NI

    ContribChain: A Stress-Balanced Blockchain Sharding Protocol with Node Contribution Awareness

    Authors: Xinpeng Huang, Wanqing Jie, Shiwen Zhang, Haofu Yang, Wangjie Qiu, Qinnan Zhang, Huawei Huang, Zehui Xiong, Shaoting Tang, Hongwei Zheng, Zhiming Zheng

    Abstract: Existing blockchain sharding protocols have focused on eliminating imbalanced workload distributions. However, even with workload balance, disparities in processing capabilities can lead to differential stress among shards, resulting in transaction backlogs in certain shards. Therefore, achieving stress balance among shards in the dynamic and heterogeneous environment presents a significant challe… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: Accepted by INFOCOM 2025

  5. arXiv:2505.06145  [pdf

    cs.CL cs.LG

    Towards Robust Few-Shot Text Classification Using Transformer Architectures and Dual Loss Strategies

    Authors: Xu Han, Yumeng Sun, Weiqiang Huang, Hongye Zheng, Junliang Du

    Abstract: Few-shot text classification has important application value in low-resource environments. This paper proposes a strategy that combines adaptive fine-tuning, contrastive learning, and regularization optimization to improve the classification performance of Transformer-based models. Experiments on the FewRel 2.0 dataset show that T5-small, DeBERTa-v3, and RoBERTa-base perform well in few-shot tasks… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  6. arXiv:2505.06133  [pdf, ps, other

    cs.CV

    BrainSegDMlF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation

    Authors: Hongming Wang, Yifeng Wu, Huimin Huang, Hongtao Wu, Jia-Xuan Jiang, Xiaodong Zhang, Hao Zheng, Xian Wu, Yefeng Zheng, Jinping Xu, Jing Cheng

    Abstract: The segmentation of substantial brain lesions is a significant and challenging task in the field of medical image segmentation. Substantial brain lesions in brain imaging exhibit high heterogeneity, with indistinct boundaries between lesion regions and normal brain tissue. Small lesions in single slices are difficult to identify, making the accurate and reproducible segmentation of abnormal region… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  7. arXiv:2505.05989  [pdf

    cs.IR cs.LG

    Modeling Multi-Hop Semantic Paths for Recommendation in Heterogeneous Information Networks

    Authors: Hongye Zheng, Yue Xing, Lipeng Zhu, Xu Han, Junliang Du, Wanyu Cui

    Abstract: This study focuses on the problem of path modeling in heterogeneous information networks and proposes a multi-hop path-aware recommendation framework. The method centers on multi-hop paths composed of various types of entities and relations. It models user preferences through three stages: path selection, semantic representation, and attention-based fusion. In the path selection stage, a path filt… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  8. arXiv:2505.04965  [pdf, ps, other

    cs.CV

    DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding

    Authors: Henry Zheng, Hao Shi, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng Weng, Zhongchao Shi, Gao Huang

    Abstract: Enabling intelligent agents to comprehend and interact with 3D environments through natural language is crucial for advancing robotics and human-computer interaction. A fundamental task in this field is ego-centric 3D visual grounding, where agents locate target objects in real-world 3D spaces based on verbal descriptions. However, this task faces two significant challenges: (1) loss of fine-grain… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by ICLR 2025

  9. arXiv:2505.04846  [pdf, ps, other

    cs.IR cs.CE cs.CL cs.DC cs.LG

    HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

    Authors: Ozan Gokdemir, Carlo Siebenschuh, Alexander Brace, Azton Wells, Brian Hsu, Kyle Hippe, Priyanka V. Setty, Aswathy Ajith, J. Gregory Pauloski, Varuni Sastry, Sam Foreman, Huihuo Zheng, Heng Ma, Bharat Kale, Nicholas Chia, Thomas Gibbs, Michael E. Papka, Thomas Brettin, Francis J. Alexander, Anima Anandkumar, Ian Foster, Rick Stevens, Venkatram Vishwanath, Arvind Ramanathan

    Abstract: The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduce… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted at the Platform for Advanced Scientific Computing Conference (PASC 25), June 16-18, 2025, Brugg-Windisch, Switzerland

    ACM Class: H.3.3; I.2.7

  10. arXiv:2504.21043  [pdf, other

    cs.CR cs.AI

    CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain

    Authors: Lingxiang Wang, Hainan Zhang, Qinnan Zhang, Ziwei Wang, Hongwei Zheng, Jin Dong, Zhiming Zheng

    Abstract: Large language models (LLMs) excel at generating code from natural language instructions, yet they often lack an understanding of security vulnerabilities. This limitation makes it difficult for LLMs to avoid security risks in generated code, particularly in high-security programming tasks such as smart contract development for blockchain. Researchers have attempted to enhance the vulnerability aw… ▽ More

    Submitted 6 May, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  11. arXiv:2504.20525  [pdf, other

    cs.CV

    Geometry-aware Temporal Aggregation Network for Monocular 3D Lane Detection

    Authors: Huan Zheng, Wencheng Han, Tianyi Yan, Cheng-zhong Xu, Jianbing Shen

    Abstract: Monocular 3D lane detection aims to estimate 3D position of lanes from frontal-view (FV) images. However, current monocular 3D lane detection methods suffer from two limitations, including inaccurate geometric information of the predicted 3D lanes and difficulties in maintaining lane integrity. To address these issues, we seek to fully exploit the potential of multiple input frames. First, we aim… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  12. arXiv:2504.19436  [pdf

    cs.CL cs.LG

    Context-Guided Dynamic Retrieval for Improving Generation Quality in RAG Models

    Authors: Jacky He, Guiran Liu, Binrong Zhu, Hanlu Zhang, Hongye Zheng, Xiaokai Wang

    Abstract: This paper focuses on the dynamic optimization of the Retrieval-Augmented Generation (RAG) architecture. It proposes a state-aware dynamic knowledge retrieval mechanism to enhance semantic understanding and knowledge scheduling efficiency in large language models for open-domain question answering and complex generation tasks. The method introduces a multi-level perceptive retrieval vector constru… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  13. arXiv:2504.19136  [pdf, other

    cs.CV cs.AI eess.IV

    PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification

    Authors: Huiling Zheng, Xian Zhong, Bin Liu, Yi Xiao, Bihan Wen, Xiaofeng Li

    Abstract: The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and the underutilization of spectral complementarity. Existing methods often fail to decouple shared structural features from modality-specific radiometric attributes, leading to feature conflicts and information loss. To address this issue, we propose Phase-… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 13 pages, 8 figures

  14. arXiv:2504.17255  [pdf

    eess.IV cs.AI physics.optics

    3D Deep-learning-based Segmentation of Human Skin Sweat Glands and Their 3D Morphological Response to Temperature Variations

    Authors: Shaoyu Pei, Renxiong Wu, Hao Zheng, Lang Qin, Shuaichen Lin, Yuxing Gan, Wenjing Huang, Zhixuan Wang, Mohan Qin, Yong Liu, Guangming Ni

    Abstract: Skin, the primary regulator of heat exchange, relies on sweat glands for thermoregulation. Alterations in sweat gland morphology play a crucial role in various pathological conditions and clinical diagnoses. Current methods for observing sweat gland morphology are limited by their two-dimensional, in vitro, and destructive nature, underscoring the urgent need for real-time, non-invasive, quantifia… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  15. arXiv:2504.14904  [pdf, other

    cs.SI cs.AI cs.CL cs.MM

    VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform

    Authors: Xingyu Lu, Tianke Zhang, Chang Meng, Xiaobei Wang, Jinpeng Wang, YiFan Zhang, Shisong Tang, Changyi Liu, Haojie Ding, Kaiyu Jiang, Kaiyu Tang, Bin Wen, Hai-Tao Zheng, Fan Yang, Tingting Gao, Di Zhang, Kun Gai

    Abstract: Exponentially growing short video platforms (SVPs) face significant challenges in moderating content detrimental to users' mental health, particularly for minors. The dissemination of such content on SVPs can lead to catastrophic societal consequences. Although substantial efforts have been dedicated to moderating such content, existing methods suffer from critical limitations: (1) Manual review i… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 20 pages, 6 figures

  16. arXiv:2504.14636  [pdf

    cs.LG cs.AI

    AlphaZero-Edu: Making AlphaZero Accessible to Everyone

    Authors: Binjie Guo, Hanyu Zheng, Guowei Su, Ru Zhang, Haohan Jiang, Xurong Lin, Hongyan Wei, Aisheng Mo, Jie Li, Zhiyuan Qian, Zhuhao Zhang, Xiaoyuan Cheng

    Abstract: Recent years have witnessed significant progress in reinforcement learning, especially with Zero-like paradigms, which have greatly boosted the generalization and reasoning abilities of large-scale language models. Nevertheless, existing frameworks are often plagued by high implementation complexity and poor reproducibility. To tackle these challenges, we present AlphaZero-Edu, a lightweight, educ… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  17. arXiv:2504.14620  [pdf, other

    cs.CL

    A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models

    Authors: Hongming Tan, Shaoxiong Zhan, Fengwei Jia, Hai-Tao Zheng, Wai Kin Chan

    Abstract: Measuring scientific paper innovation is both important and challenging. Existing content-based methods often overlook the full-paper context, fail to capture the full scope of innovation, and lack generalization. We propose HSPIM, a hierarchical and training-free framework based on large language models (LLMs). It introduces a Paper-to-Sections-to-QAs decomposition to assess innovation. We segmen… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  18. arXiv:2504.14350  [pdf, other

    cs.AI

    Time's Up! An Empirical Study of LLM Reasoning Ability Under Output Length Constraint

    Authors: Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu

    Abstract: Recent work has demonstrated the remarkable potential of Large Language Models (LLMs) in test-time scaling. By making the models think before answering, they are able to achieve much higher accuracy with extra inference computation. However, in many real-world scenarios, models are used under time constraints, where an answer should be given to the user within a certain output length. It is unclea… ▽ More

    Submitted 22 April, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

  19. arXiv:2504.12959  [pdf, other

    cs.CV

    Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction

    Authors: Dubing Chen, Huan Zheng, Jin Fang, Xingping Dong, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen

    Abstract: We present GDFusion, a temporal fusion method for vision-based 3D semantic occupancy prediction (VisionOcc). GDFusion opens up the underexplored aspects of temporal fusion within the VisionOcc framework, focusing on both temporal cues and fusion strategies. It systematically examines the entire VisionOcc pipeline, identifying three fundamental yet previously overlooked temporal cues: scene-level c… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  20. arXiv:2504.11544  [pdf, other

    cs.AI

    NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

    Authors: Tianyang Xu, Haojie Zheng, Chengze Li, Haoxiang Chen, Yixin Liu, Ruoxi Chen, Lichao Sun

    Abstract: Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus, enabling factually consistent responses in specific domains. By exploiting the inherent structure of the corpus, graph-based RAG methods further enrich this process by building a knowledge graph index and leveraging the structural nature of graphs. However, current graph-based RAG approaches… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  21. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  22. arXiv:2504.09910  [pdf, other

    cs.CL

    Learning to Erase Private Knowledge from Multi-Documents for Retrieval-Augmented Large Language Models

    Authors: Yujing Wang, Hainan Zhang, Liang Pang, Yongxin Tong, Binghui Guo, Hongwei Zheng, Zhiming Zheng

    Abstract: Retrieval-Augmented Generation (RAG) is a promising technique for applying LLMs to proprietary domains. However, retrieved documents may contain sensitive knowledge, posing risks of privacy leakage in generative results. Thus, effectively erasing private information from retrieved documents is a key challenge for RAG. Unlike traditional text anonymization, RAG should consider: (1) the inherent mul… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  23. arXiv:2504.09827  [pdf, other

    cs.HC

    Redesign of Online Design Communities: Facilitating Personalized Visual Design Learning with Structured Comments

    Authors: Xia Chen, Xinyue Chen, Weixian Hu, Haojia Zheng, YuJun Qian, Zhenhui Peng

    Abstract: Online Design Communities (ODCs) offer various artworks with members' comments for beginners to learn visual design. However, as identified by our Formative Study (N = 10), current ODCs lack features customized for personal learning purposes, e.g., searching artworks and digesting useful comments to learn design principles about buttons. In this paper, we present DesignLearner, a redesigned interf… ▽ More

    Submitted 14 April, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

  24. arXiv:2504.09502  [pdf, other

    cs.CV cs.LG

    PCM-SAR: Physics-Driven Contrastive Mutual Learning for SAR Classification

    Authors: Pengfei Wang, Hao Zheng, Zhigang Hu, Aikun Xu, Meiguang Zheng, Liu Yang

    Abstract: Existing SAR image classification methods based on Contrastive Learning often rely on sample generation strategies designed for optical images, failing to capture the distinct semantic and physical characteristics of SAR data. To address this, we propose Physics-Driven Contrastive Mutual Learning for SAR Classification (PCM-SAR), which incorporates domain-specific physical insights to improve samp… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  25. arXiv:2504.08591  [pdf, other

    cs.CV

    ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

    Authors: Yongsheng Yu, Haitian Zheng, Zhifei Zhang, Jianming Zhang, Yuqian Zhou, Connelly Barnes, Yuchen Liu, Wei Xiong, Zhe Lin, Jiebo Luo

    Abstract: Recent progress in generative models has significantly improved image restoration capabilities, particularly through powerful diffusion models that offer remarkable recovery of semantic details and local fidelity. However, deploying these models at ultra-high resolutions faces a critical trade-off between quality and efficiency due to the computational demands of long-range attention mechanisms. T… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  26. arXiv:2504.08019  [pdf, other

    cs.CV cs.AI

    DGFamba: Learning Flow Factorized State Space for Visual Domain Generalization

    Authors: Qi Bi, Jingjun Yi, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li

    Abstract: Domain generalization aims to learn a representation from the source domain, which can be generalized to arbitrary unseen target domains. A fundamental challenge for visual domain generalization is the domain gap caused by the dramatic style variation whereas the image content is stable. The realm of selective state space, exemplified by VMamba, demonstrates its global receptive field in represent… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: accepted by AAAI2025

  27. arXiv:2504.07491  [pdf, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (68 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 15 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  28. arXiv:2504.07433  [pdf, other

    cs.CL

    From Token to Line: Enhancing Code Generation with a Long-Term Perspective

    Authors: Tingwei Lu, Yangning Li, Liyuan Wang, Binghuai Lin, Jiwei Tang, Wanshi Xu, Hai-Tao Zheng, Yinghui Li, Bingxu An, Zhao Wei, Yong Xu

    Abstract: The emergence of large language models (LLMs) has significantly promoted the development of code generation task, sparking a surge in pertinent literature. Current research is hindered by redundant generation results and a tendency to overfit local patterns in the short term. Although existing studies attempt to alleviate the issue by adopting a multi-token prediction strategy, there remains limit… ▽ More

    Submitted 18 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  29. arXiv:2504.07288  [pdf, other

    cs.CL

    MDIT: A Model-free Data Interpolation Method for Diverse Instruction Tuning

    Authors: Yangning Li, Zihua Lan, Lv Qingsong, Yinghui Li, Hai-Tao Zheng

    Abstract: As Large Language Models (LLMs) are increasingly applied across various tasks, instruction tuning has emerged as a critical method for enhancing model performance. However, current data management strategies face substantial challenges in generating diverse and comprehensive data, restricting further improvements in model performance. To address this gap, we propose MDIT, a novel model-free data i… ▽ More

    Submitted 14 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  30. arXiv:2504.07282  [pdf, other

    cs.CL

    RAISE: Reinforenced Adaptive Instruction Selection For Large Language Models

    Authors: Lv Qingsong, Yangning Li, Zihua Lan, Zishan Xu, Jiwei Tang, Yinghui Li, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu

    Abstract: In the instruction fine-tuning of large language models (LLMs), it has become a consensus that a few high-quality instructions are superior to a large number of low-quality instructions. At present, many instruction selection methods have been proposed, but most of these methods select instruction based on heuristic quality metrics, and only consider data selection before training. These designs l… ▽ More

    Submitted 14 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  31. arXiv:2504.04861  [pdf, other

    cs.CL cs.AI

    SAFT: Structure-aware Transformers for Textual Interaction Classification

    Authors: Hongtao Wang, Renchi Yang, Hewen Wang, Haoran Zheng, Jianliang Xu

    Abstract: Textual interaction networks (TINs) are an omnipresent data structure used to model the interplay between users and items on e-commerce websites, social networks, etc., where each interaction is associated with a text description. Classifying such textual interactions (TIC) finds extensive use in detecting spam reviews in e-commerce, fraudulent transactions in finance, and so on. Existing TIC solu… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  32. arXiv:2504.04385  [pdf

    cs.CL

    Pre-trained Language Models and Few-shot Learning for Medical Entity Extraction

    Authors: Xiaokai Wang, Guiran Liu, Binrong Zhu, Jacky He, Hongye Zheng, Hanlu Zhang

    Abstract: This study proposes a medical entity extraction method based on Transformer to enhance the information extraction capability of medical literature. Considering the professionalism and complexity of medical texts, we compare the performance of different pre-trained language models (BERT, BioBERT, PubMedBERT, ClinicalBERT) in medical entity extraction tasks. Experimental results show that PubMedBERT… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  33. arXiv:2504.03686  [pdf, other

    cs.NI cs.AI cs.LG

    Revisiting Outage for Edge Inference Systems

    Authors: Zhanwei Wang, Qunsong Zeng, Haotian Zheng, Kaibin Huang

    Abstract: One of the key missions of sixth-generation (6G) mobile networks is to deploy large-scale artificial intelligence (AI) models at the network edge to provide remote-inference services for edge devices. The resultant platform, known as edge inference, will support a wide range of Internet-of-Things applications, such as autonomous driving, industrial automation, and augmented reality. Given the miss… ▽ More

    Submitted 28 April, 2025; v1 submitted 22 March, 2025; originally announced April 2025.

  34. arXiv:2504.00996  [pdf, other

    cs.CV

    TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting

    Authors: Liangbin Xie, Daniil Pakhomov, Zhonghao Wang, Zongze Wu, Ziyan Chen, Yuqian Zhou, Haitian Zheng, Zhifei Zhang, Zhe Lin, Jiantao Zhou, Chao Dong

    Abstract: This paper introduces TurboFill, a fast image inpainting model that enhances a few-step text-to-image diffusion model with an inpainting adapter for high-quality and efficient inpainting. While standard diffusion models generate high-quality results, they incur high computational costs. We overcome this by training an inpainting adapter on a few-step distilled text-to-image model, DMD2, using a no… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Project webpage available at https://liangbinxie.github.io/projects/TurboFill/

  35. arXiv:2503.23793  [pdf, other

    cs.CV

    Pan-LUT: Efficient Pan-sharpening via Learnable Look-Up Tables

    Authors: Zhongnan Cai, Yingying Wang, Yunlong Lin, Hui Zheng, Ge Meng, Zixu Lin, Jiaxin Xie, Junbin Lu, Yue Huang, Xinghao Ding

    Abstract: Recently, deep learning-based pan-sharpening algorithms have achieved notable advancements over traditional methods. However, many deep learning-based approaches incur substantial computational overhead during inference, especially with high-resolution images. This excessive computational demand limits the applicability of these methods in real-world scenarios, particularly in the absence of dedic… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 12 pages, 6 figures

  36. arXiv:2503.23712  [pdf, other

    cs.CV

    ElimPCL: Eliminating Noise Accumulation with Progressive Curriculum Labeling for Source-Free Domain Adaptation

    Authors: Jie Cheng, Hao Zheng, Meiguang Zheng, Lei Wang, Hao Wu, Jian Zhang

    Abstract: Source-Free Domain Adaptation (SFDA) aims to train a target model without source data, and the key is to generate pseudo-labels using a pre-trained source model. However, we observe that the source model often produces highly uncertain pseudo-labels for hard samples, particularly those heavily affected by domain shifts, leading to these noisy pseudo-labels being introduced even before adaptation a… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: ICME 2025 camera-ready

  37. arXiv:2503.23331  [pdf, other

    cs.CV cs.LG

    HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation

    Authors: Hongwei Zheng, Han Li, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Existing 2D-to-3D human pose estimation (HPE) methods struggle with the occlusion issue by enriching information like temporal and visual cues in the lifting stage. In this paper, we argue that these methods ignore the limitation of the sparse skeleton 2D input representation, which fundamentally restricts the 2D-to-3D lifting and worsens the occlusion issue. To address these, we propose a novel t… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  38. arXiv:2503.20488  [pdf, other

    cs.SI cs.DS cs.LG

    Adaptive Local Clustering over Attributed Graphs

    Authors: Haoran Zheng, Renchi Yang, Jianliang Xu

    Abstract: Given a graph $G$ and a seed node $v_s$, the objective of local graph clustering (LGC) is to identify a subgraph $C_s \in G$ (a.k.a. local cluster) surrounding $v_s$ in time roughly linear with the size of $C_s$. This approach yields personalized clusters without needing to access the entire graph, which makes it highly suitable for numerous applications involving large graphs. However, most exist… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Accepted by ICDE2025. The code is available at https://github.com/HaoranZ99/alac

  39. arXiv:2503.17896  [pdf, other

    eess.IV cs.CV

    Multi-Disease-Aware Training Strategy for Cardiac MR Image Segmentation

    Authors: Hong Zheng, Yucheng Chen, Nan Mu, Xiaoning Li

    Abstract: Accurate segmentation of the ventricles from cardiac magnetic resonance images (CMRIs) is crucial for enhancing the diagnosis and analysis of heart conditions. Deep learning-based segmentation methods have recently garnered significant attention due to their impressive performance. However, these segmentation methods are typically good at partitioning regularly shaped organs, such as the left vent… ▽ More

    Submitted 24 March, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

  40. arXiv:2503.16514  [pdf, other

    cs.AR cs.AI cs.LG cs.PL

    VeriMind: Agentic LLM for Automated Verilog Generation with a Novel Evaluation Metric

    Authors: Bardia Nadimi, Ghali Omar Boutaib, Hao Zheng

    Abstract: Designing Verilog modules requires meticulous attention to correctness, efficiency, and adherence to design specifications. However, manually writing Verilog code remains a complex and time-consuming task that demands both expert knowledge and iterative refinement. Leveraging recent advancements in large language models (LLMs) and their structured text generation capabilities, we propose VeriMind,… ▽ More

    Submitted 16 April, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

  41. arXiv:2503.14558  [pdf, other

    cs.CV cs.RO

    SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

    Authors: Yi Du, Zhipeng Zhao, Shaoshu Su, Sharath Golluri, Haoze Zheng, Runmao Yao, Chen Wang

    Abstract: Point cloud (PC) processing tasks-such as completion, upsampling, denoising, and colorization-are crucial in applications like autonomous driving and 3D reconstruction. Despite substantial advancements, prior approaches often address each of these tasks independently, with separate models focused on individual issues. However, this isolated approach fails to account for the fact that defects like… ▽ More

    Submitted 21 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  42. arXiv:2503.12153  [pdf, ps, other

    cs.MA cs.SI

    Decentralized Hidden Markov Modeling with Equal Exit Probabilities

    Authors: Dongyan Sui, Haitian Zheng, Siyang Leng, Stefan Vlaski

    Abstract: Social learning strategies enable agents to infer the underlying true state of nature in a distributed manner by receiving private environmental signals and exchanging beliefs with their neighbors. Previous studies have extensively focused on static environments, where the underlying true state remains unchanged over time. In this paper, we consider a dynamic setting where the true state evolves a… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  43. arXiv:2503.11084  [pdf

    cs.CL

    Semantic and Contextual Modeling for Malicious Comment Detection with BERT-BiLSTM

    Authors: Zhou Fang, Hanlu Zhang, Jacky He, Zhen Qi, Hongye Zheng

    Abstract: This study aims to develop an efficient and accurate model for detecting malicious comments, addressing the increasingly severe issue of false and harmful content on social media platforms. We propose a deep learning model that combines BERT and BiLSTM. The BERT model, through pre-training, captures deep semantic features of text, while the BiLSTM network excels at processing sequential data and c… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  44. arXiv:2503.11043  [pdf, other

    cs.LG

    InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

    Authors: Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy T. Feng, Caifeng Zou, Yu Sun, Nikola Kovachki, Zachary E. Ross, Katherine L. Bouman, Yisong Yue

    Abstract: Plug-and-play diffusion priors (PnPDP) have emerged as a promising research direction for solving inverse problems. However, current studies primarily focus on natural image restoration, leaving the performance of these algorithms in scientific inverse problems largely unexplored. To address this gap, we introduce \textsc{InverseBench}, a framework that evaluates diffusion models across five dis… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  45. arXiv:2503.10403  [pdf, other

    cs.CV

    Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders

    Authors: Jingyu Guo, Sensen Gao, Jia-Wang Bian, Wanhu Sun, Heliang Zheng, Rongfei Jia, Mingming Gong

    Abstract: Recent 3D content generation pipelines often leverage Variational Autoencoders (VAEs) to encode shapes into compact latent representations, facilitating diffusion-based generation. Efficiently compressing 3D shapes while preserving intricate geometric details remains a key challenge. Existing 3D shape VAEs often employ uniform point sampling and 1D/2D latent representations, such as vector sets or… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  46. arXiv:2503.09148  [pdf, other

    cs.RO eess.SY

    Predictor-Based Time Delay Control of A Hex-Jet Unmanned Aerial Vehicle

    Authors: Junning Liang, Haowen Zheng, Yuying Zhang, Yongzhuo Gao, Wei Dong, Ximin Lyu

    Abstract: Turbojet-powered VTOL UAVs have garnered increased attention in heavy-load transport and emergency services, due to their superior power density and thrust-to-weight ratio compared to existing electronic propulsion systems. The main challenge with jet-powered UAVs lies in the complexity of thrust vectoring mechanical systems, which aim to mitigate the slow dynamics of the turbojet. In this letter,… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted by IEEE Robotics and Automation Letters. 8 pages, 11 figures

  47. arXiv:2503.08677  [pdf, other

    cs.CV

    OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

    Authors: Yongsheng Yu, Ziyun Zeng, Haitian Zheng, Jiebo Luo

    Abstract: Diffusion-based generative models have revolutionized object-oriented image editing, yet their deployment in realistic object removal and insertion remains hampered by challenges such as the intricate interplay of physical effects and insufficient paired training data. In this work, we introduce OmniPaint, a unified framework that re-conceptualizes object removal and insertion as interdependent pr… ▽ More

    Submitted 12 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  48. arXiv:2503.08619  [pdf, other

    cs.CV

    LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization

    Authors: Xianfeng Wu, Yajing Bai, Haoze Zheng, Harold Haodong Chen, Yexin Liu, Zihao Wang, Xuran Ma, Wen-Jie Shu, Xianzu Wu, Harry Yang, Ser-Nam Lim

    Abstract: Recent advances in text-to-image generation have primarily relied on extensive datasets and parameter-heavy architectures. These requirements severely limit accessibility for researchers and practitioners who lack substantial computational resources. In this paper, we introduce \model, an efficient training paradigm for image generation models that uses knowledge distillation (KD) and Direct Prefe… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Code: https://github.com/XianfengWu01/LightGen

  49. arXiv:2503.06998  [pdf, other

    cs.CV

    SOYO: A Tuning-Free Approach for Video Style Morphing via Style-Adaptive Interpolation in Diffusion Models

    Authors: Haoyu Zheng, Qifan Yu, Binghe Yu, Yang Dai, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

    Abstract: Diffusion models have achieved remarkable progress in image and video stylization. However, most existing methods focus on single-style transfer, while video stylization involving multiple styles necessitates seamless transitions between them. We refer to this smooth style transition between video frames as video style morphing. Current approaches often generate stylized video frames with disconti… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  50. arXiv:2503.06888  [pdf

    cs.CL

    A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization

    Authors: Dan Sun, Jacky He, Hanlu Zhang, Zhen Qi, Hongye Zheng, Xiaokai Wang

    Abstract: This paper proposes a medical text summarization method based on LongFormer, aimed at addressing the challenges faced by existing models when processing long medical texts. Traditional summarization methods are often limited by short-term memory, leading to information loss or reduced summary quality in long texts. LongFormer, by introducing long-range self-attention, effectively captures long-ran… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: Paper accepted by 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2025)