Skip to main content

Showing 1–50 of 124 results for author: Zhan, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00449  [pdf, ps, other

    cs.LG cs.CL

    Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention

    Authors: Zhihao Zhan, Jianan Zhao, Zhaocheng Zhu, Jian Tang

    Abstract: Efficient long-context modeling remains a critical challenge for natural language processing (NLP), as the time complexity of the predominant Transformer architecture scales quadratically with the sequence length. While state-space models (SSMs) offer alternative sub-quadratic solutions, they struggle to capture long-range dependencies effectively. In this work, we focus on analyzing and improving… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Proceedings of the 42nd International Conference on Machine Learning, ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models, 18 pages, 9 figures

    ACM Class: I.2.7

  2. arXiv:2506.20806  [pdf, ps, other

    cs.CR cs.AI

    Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis

    Authors: Zhonghao Zhan, Huichi Zhou, Hamed Haddadi

    Abstract: Graph Neural Networks (GNNs) show great promise for Network Intrusion Detection Systems (NIDS), particularly in IoT environments, but suffer performance degradation due to distribution drift and lack robustness against realistic adversarial attacks. Current robustness evaluations often rely on unrealistic synthetic perturbations and lack demonstrations on systematic analysis of different kinds of… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Poster accepted at the 10th IEEE European Symposium on Security and Privacy (Euro S&P 2025)

  3. arXiv:2506.18145  [pdf, ps, other

    cs.LG cs.AI

    Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection

    Authors: Zheng Zhan, Liliang Ren, Shuohang Wang, Liyuan Liu, Yang Liu, Yeyun Gong, Yanzhi Wang, Yelong Shen

    Abstract: Linear State Space Models (SSMs) offer remarkable performance gains in efficient sequence modeling, with constant inference-time computation and memory complexity. Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the ex… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  4. arXiv:2506.07646  [pdf, other

    cs.CL cs.SD eess.AS

    Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation

    Authors: Rui Hu, Xiaolong Lin, Jiawang Liu, Shixi Huang, Zhenpeng Zhan

    Abstract: In this paper, we propose a method for annotating phonemic and prosodic labels on a given audio-transcript pair, aimed at constructing Japanese text-to-speech (TTS) datasets. Our approach involves fine-tuning a large-scale pre-trained automatic speech recognition (ASR) model, conditioned on ground truth transcripts, to simultaneously output phrase-level graphemes and annotation labels. To further… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  5. arXiv:2506.01285  [pdf, ps, other

    cs.GT

    A Reliable Vertical Federated Learning Framework for Traffic State Estimation with Data Selection and Incentive Mechanisms

    Authors: Zijun Zhan, Yaxian Dong, Daniel Mawunyo Doe, Yuqing Hu, Shuai Li, Shaohua Cao, Zhu Han

    Abstract: Vertical Federated Learning (VFL)-based Traffic State Estimation (TSE) offers a promising approach for integrating vertically distributed traffic data from municipal authorities (MA) and mobility providers (MP) while safeguarding privacy. However, given the variations in MPs' data collection capabilities and the potential for MPs to underperform in data provision, we propose a reliable VFL-based T… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Submitted to the IEEE Transactions on Intelligent Transportation Systems

  6. arXiv:2506.00533  [pdf, ps, other

    cs.LG cs.NE

    RsGCN: Rescaling Enhances Generalization of GCNs for Solving Scalable Traveling Salesman Problems

    Authors: Junquan Huang, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan

    Abstract: Neural traveling salesman problem (TSP) solvers face two critical challenges: poor generalization for scalable TSPs and high training costs. To address these challenges, we propose a new Rescaling Graph Convolutional Network (RsGCN). Focusing on the scale-dependent features (i.e., features varied with problem scales) related to nodes and edges that influence the sensitivity of GCNs to the problem… ▽ More

    Submitted 12 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

  7. arXiv:2505.23864  [pdf, ps, other

    cs.LG cs.AI

    Personalized Subgraph Federated Learning with Differentiable Auxiliary Projections

    Authors: Wei Zhuo, Zhaohuan Zhan, Ziduo Yang, Han Yu

    Abstract: Federated learning (FL) on graph-structured data typically faces non-IID challenges, particularly in scenarios where each client holds a distinct subgraph sampled from a global graph. In this paper, we introduce Federated learning with Auxiliary projections (FedAux), a personalized subgraph FL framework that learns to align, compare, and aggregate heterogeneously distributed local models without s… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2505.23844  [pdf, ps, other

    cs.CL

    Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation

    Authors: Zhenglun Kong, Zheng Zhan, Shiyue Hou, Yifan Gong, Xin Meng, Pengwei Sui, Peiyan Dong, Xuan Shen, Zifeng Wang, Pu Zhao, Hao Tang, Stratis Ioannidis, Yanzhi Wang

    Abstract: Large language models (LLMs) have shown remarkable promise but remain challenging to continually improve through traditional finetuning, particularly when integrating capabilities from other specialized LLMs. Popular methods like ensemble and weight merging require substantial memory and struggle to adapt to changing data environments. Recent efforts have transferred knowledge from multiple LLMs i… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  9. arXiv:2505.19645  [pdf, ps, other

    cs.LG cs.AI

    MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

    Authors: Zongle Huang, Lei Zhu, Zongyuan Zhan, Ting Hu, Weikai Mao, Xianzhi Yu, Yongpan Liu, Tianyu Zhang

    Abstract: Large Language Models (LLMs) have achieved remarkable success across many applications, with Mixture of Experts (MoE) models demonstrating great potential. Compared to traditional dense models, MoEs achieve better performance with less computation. Speculative decoding (SD) is a widely used technique to accelerate LLM inference without accuracy loss, but it has been considered efficient only for d… ▽ More

    Submitted 13 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  10. arXiv:2505.13000  [pdf, ps, other

    cs.SD eess.AS

    DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

    Authors: Jiaqi Li, Xiaolong Lin, Zhekai Li, Shixi Huang, Yuancheng Wang, Chaoren Wang, Zhenpeng Zhan, Zhizheng Wu

    Abstract: Neural audio codecs form the foundational building blocks for language model (LM)-based speech generation. Typically, there is a trade-off between frame rate and audio quality. This study introduces a low-frame-rate, semantically enhanced codec model. Existing approaches distill semantically rich self-supervised (SSL) representations into the first-layer codec tokens. This work proposes DualCodec,… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025. Github: https://github.com/jiaqili3/dualcodec

  11. arXiv:2505.06678  [pdf, other

    cs.NI eess.SP

    Distributionally Robust Contract Theory for Edge AIGC Services in Teleoperation

    Authors: Zijun Zhan, Yaxian Dong, Daniel Mawunyo Doe, Yuqing Hu, Shuai Li, Shaohua Cao, Lei Fan, Zhu Han

    Abstract: Advanced AI-Generated Content (AIGC) technologies have injected new impetus into teleoperation, further enhancing its security and efficiency. Edge AIGC networks have been introduced to meet the stringent low-latency requirements of teleoperation. However, the inherent uncertainty of AIGC service quality and the need to incentivize AIGC service providers (ASPs) make the design of a robust incentiv… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  12. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  13. arXiv:2505.04089  [pdf

    cs.NE

    A New Scope and Domain Measure Comparison Method for Global Convergence Analysis in Evolutionary Computation

    Authors: Liu-Yue Luo, Zhi-Hui Zhan, Kay Chen Tan, Jun Zhang

    Abstract: Convergence analysis is a fundamental research topic in evolutionary computation (EC). The commonly used analysis method models the EC algorithm as a homogeneous Markov chain for analysis, which is not always suitable for different EC variants, and also sometimes causes misuse and confusion due to their complex process. In this article, we categorize the existing researches on convergence analysis… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 14 pages, 8 figures

  14. arXiv:2505.03467  [pdf

    cs.CL

    Uncertainty-Aware Large Language Models for Explainable Disease Diagnosis

    Authors: Shuang Zhou, Jiashuo Wang, Zidu Xu, Song Wang, David Brauer, Lindsay Welton, Jacob Cogan, Yuen-Hei Chung, Lei Tian, Zaifu Zhan, Yu Hou, Mingquan Lin, Genevieve B. Melton, Rui Zhang

    Abstract: Explainable disease diagnosis, which leverages patient information (e.g., signs and symptoms) and computational models to generate probable diagnoses and reasonings, offers clear clinical values. However, when clinical notes encompass insufficient evidence for a definite diagnosis, such as the absence of definitive symptoms, diagnostic uncertainty usually arises, increasing the risk of misdiagnosi… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 22 pages, 8 figures

  15. arXiv:2505.02087  [pdf, other

    cs.AI

    Retrieval-augmented in-context learning for multimodal large language models in disease classification

    Authors: Zaifu Zhan, Shuang Zhou, Xiaoshan Zhou, Yongkang Xiao, Jun Wang, Jiawen Deng, He Zhu, Yu Hou, Rui Zhang

    Abstract: Objectives: We aim to dynamically retrieve informative demonstrations, enhancing in-context learning in multimodal large language models (MLLMs) for disease classification. Methods: We propose a Retrieval-Augmented In-Context Learning (RAICL) framework, which integrates retrieval-augmented generation (RAG) and in-context learning (ICL) to adaptively select demonstrations with similar disease pat… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 17 Pages, 1 figure, 7 tables

  16. arXiv:2504.20314  [pdf, other

    cs.LG cs.AI

    Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

    Authors: Qitao Tan, Sung-En Chang, Rui Xia, Huidong Ji, Chence Yang, Ci Zhang, Jun Liu, Zheng Zhan, Zhou Zou, Yanzhi Wang, Jin Lu, Geng Yuan

    Abstract: Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms,… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  17. arXiv:2504.17528  [pdf, other

    cs.LG cs.AI

    TACO: Tackling Over-correction in Federated Learning with Tailored Adaptive Correction

    Authors: Weijie Liu, Ziwei Zhan, Carlee Joe-Wong, Edith Ngai, Jingpu Duan, Deke Guo, Xu Chen, Xiaoxi Zhang

    Abstract: Non-independent and identically distributed (Non-IID) data across edge clients have long posed significant challenges to federated learning (FL) training in edge computing environments. Prior works have proposed various methods to mitigate this statistical heterogeneity. While these works can achieve good theoretical performance, in this work we provide the first investigation into a hidden over-c… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 11 pages, 7 figures, accepted by ICDCS 2025

    ACM Class: I.2.6

  18. arXiv:2503.19703  [pdf, other

    cs.CV eess.IV

    High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting

    Authors: Qian Wang, Zhihao Zhan, Jialei He, Zhituo Tu, Xiang Zhu, Jie Yuan

    Abstract: Highly accurate geometric precision and dense image features characterize True Digital Orthophoto Maps (TDOMs), which are in great demand for applications such as urban planning, infrastructure management, and environmental monitoring.Traditional TDOM generation methods need sophisticated processes, such as Digital Surface Models (DSM) and occlusion detection, which are computationally expensive a… ▽ More

    Submitted 13 May, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

  19. arXiv:2503.13086  [pdf, other

    cs.CV

    Gaussian On-the-Fly Splatting: A Progressive Framework for Robust Near Real-Time 3DGS Optimization

    Authors: Yiwei Xu, Yifei Yu, Wentian Gan, Tengfei Wang, Zongqian Zhan, Hao Cheng, Xin Wang

    Abstract: 3D Gaussian Splatting (3DGS) achieves high-fidelity rendering with fast real-time performance, but existing methods rely on offline training after full Structure-from-Motion (SfM) processing. In contrast, this work introduces On-the-Fly GS, a progressive framework enabling near real-time 3DGS optimization during image capture. As each image arrives, its pose and sparse points are updated via on-th… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  20. arXiv:2503.12335  [pdf, other

    cs.CV

    GS-I$^{3}$: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images

    Authors: Tengfei Wang, Yongmao Hou, Zhaoning Zhang, Yiwei Xu, Zongqian Zhan, Xin Wang

    Abstract: Accurate geometric surface reconstruction, providing essential environmental information for navigation and manipulation tasks, is critical for enabling robotic self-exploration and interaction. Recently, 3D Gaussian Splatting (3DGS) has gained significant attention in the field of surface reconstruction due to its impressive geometric quality and computational efficiency. While recent relevant ad… ▽ More

    Submitted 18 March, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: Comments: This work has been submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) for possible publication

  21. arXiv:2503.11408  [pdf, other

    cs.LG cs.AI

    A Neural Network Architecture Based on Attention Gate Mechanism for 3D Magnetotelluric Forward Modeling

    Authors: Xin Zhong, Weiwei Ling, Kejia Pan, Pinxia Wu, Jiajing Zhang, Zhiliang Zhan, Wenbo Xiao

    Abstract: Traditional three-dimensional magnetotelluric (MT) numerical forward modeling methods, such as the finite element method (FEM) and finite volume method (FVM), suffer from high computational costs and low efficiency due to limitations in mesh refinement and computational resources. We propose a novel neural network architecture named MTAGU-Net, which integrates an attention gating mechanism for 3D… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 12 pages, 16 figures

  22. arXiv:2503.08161  [pdf, other

    cs.CL cs.IR

    OASIS: Order-Augmented Strategy for Improved Code Search

    Authors: Zuchen Gao, Zizheng Zhan, Xianming Li, Erxin Yu, Haotian Zhang, Bin Chen, Yuqun Zhang, Jing Li

    Abstract: Code embeddings capture the semantic representations of code and are crucial for various code-related large language model (LLM) applications, such as code search. Previous training primarily relies on optimizing the InfoNCE loss by comparing positive natural language (NL)-code pairs with in-batch negatives. However, due to the sparse nature of code contexts, training solely by comparing the major… ▽ More

    Submitted 14 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  23. arXiv:2503.08056  [pdf, other

    cs.CV

    DDO-IN: Dual Domains Optimization for Implicit Neural Network to Eliminate Motion Artifact in Magnetic Resonance Imaging

    Authors: Zhongyu Mai, Zewei Zhan, Hanyu Guo, Yulang Huang, Weifeng Su

    Abstract: Magnetic resonance imaging (MRI) motion artifacts can seriously affect clinical diagnostics, making it challenging to interpret images accurately. Existing methods for eliminating motion artifacts struggle to retain fine structural details and simultaneously lack the necessary vividness and sharpness. In this study, we present a novel dual-domain optimization (DDO) approach that integrates informa… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 10 pages, 2 figures

  24. arXiv:2503.03355  [pdf, other

    cs.CV cs.LG eess.IV

    Rethinking Video Super-Resolution: Towards Diffusion-Based Methods without Motion Alignment

    Authors: Zhihao Zhan, Wang Pang, Xiang Zhu, Yechao Bai

    Abstract: In this work, we rethink the approach to video super-resolution by introducing a method based on the Diffusion Posterior Sampling framework, combined with an unconditional video diffusion transformer operating in latent space. The video generation model, a diffusion transformer, functions as a space-time model. We argue that a powerful model, which learns the physics of the real world, can easily… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  25. arXiv:2503.02053  [pdf, other

    cs.AI cs.CL cs.CV

    EPEE: Towards Efficient and Effective Foundation Models in Biomedicine

    Authors: Zaifu Zhan, Shuang Zhou, Huixue Zhou, Zirui Liu, Rui Zhang

    Abstract: Foundation models, including language models, e.g., GPT, and vision models, e.g., CLIP, have significantly advanced numerous biomedical tasks. Despite these advancements, the high inference latency and the "overthinking" issues in model inference impair the efficiency and effectiveness of foundation models, thus limiting their application in real-time clinical settings. To address these challenges… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Submitted to npj Digital Medicine

  26. arXiv:2503.01202  [pdf, other

    cs.CV cs.RO eess.IV

    A Multi-Sensor Fusion Approach for Rapid Orthoimage Generation in Large-Scale UAV Mapping

    Authors: Jialei He, Zhihao Zhan, Zhituo Tu, Xiang Zhu, Jie Yuan

    Abstract: Rapid generation of large-scale orthoimages from Unmanned Aerial Vehicles (UAVs) has been a long-standing focus of research in the field of aerial mapping. A multi-sensor UAV system, integrating the Global Positioning System (GPS), Inertial Measurement Unit (IMU), 4D millimeter-wave radar and camera, can provide an effective solution to this problem. In this paper, we utilize multi-sensor data to… ▽ More

    Submitted 4 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  27. arXiv:2503.00624  [pdf

    cs.CL cs.AI

    An evaluation of DeepSeek Models in Biomedical Natural Language Processing

    Authors: Zaifu Zhan, Shuang Zhou, Huixue Zhou, Jiawen Deng, Yu Hou, Jeremy Yeung, Rui Zhang

    Abstract: The advancement of Large Language Models (LLMs) has significantly impacted biomedical Natural Language Processing (NLP), enhancing tasks such as named entity recognition, relation extraction, event extraction, and text classification. In this context, the DeepSeek series of models have shown promising potential in general NLP tasks, yet their capabilities in the biomedical domain remain underexplo… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: Plan to submit to AMIA 2025 Annual Symposium. 10 pages

  28. arXiv:2502.15954  [pdf

    cs.CL cs.AI

    MMRAG: Multi-Mode Retrieval-Augmented Generation with Large Language Models for Biomedical In-Context Learning

    Authors: Zaifu Zhan, Jun Wang, Shuang Zhou, Jiawen Deng, Rui Zhang

    Abstract: Objective: To optimize in-context learning in biomedical natural language processing by improving example selection. Methods: We introduce a novel multi-mode retrieval-augmented generation (MMRAG) framework, which integrates four retrieval strategies: (1) Random Mode, selecting examples arbitrarily; (2) Top Mode, retrieving the most relevant examples based on similarity; (3) Diversity Mode, ensuri… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: Submitted to JAMIA

  29. arXiv:2502.14271  [pdf, other

    cs.CL

    PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant

    Authors: Congrui Yin, Evan Wei, Zhongxing Zhang, Zaifu Zhan

    Abstract: In the paper, we introduce a paper reading assistant, PaperHelper, a potent tool designed to enhance the capabilities of researchers in efficiently browsing and understanding scientific literature. Utilizing the Retrieval-Augmented Generation (RAG) framework, PaperHelper effectively minimizes hallucinations commonly encountered in large language models (LLMs), optimizing the extraction of accurate… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  30. arXiv:2502.03304  [pdf, other

    cs.LG cs.AI cs.CL

    Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning

    Authors: Qitao Tan, Jun Liu, Zheng Zhan, Caiwei Ding, Yanzhi Wang, Jin Lu, Geng Yuan

    Abstract: Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory, significantly limiting real-world deployment. Recently, zeroth-order (ZO) optimization stood out as a promising memory-efficient training paradigm, avoiding backward passes and relying solely on forward passes for gradient estimation, making it attractive for resource-con… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  31. arXiv:2501.12654  [pdf, other

    cs.RO

    AnyNav: Visual Neuro-Symbolic Friction Learning for Off-road Navigation

    Authors: Taimeng Fu, Zitong Zhan, Zhipeng Zhao, Shaoshu Su, Xiao Lin, Ehsan Tarkesh Esfahani, Karthik Dantu, Souma Chowdhury, Chen Wang

    Abstract: Off-road navigation is essential for a wide range of applications in field robotics such as planetary exploration and disaster response. However, it remains an unresolved challenge due to the unstructured environments and inherent complexity of terrain-vehicle interactions. Traditional physics-based methods struggle to accurately model the nonlinear dynamics of these interactions, while data-drive… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  32. arXiv:2501.12604  [pdf, other

    eess.IV cs.CV cs.LG

    Image Motion Blur Removal in the Temporal Dimension with Video Diffusion Models

    Authors: Wang Pang, Zhihao Zhan, Xiang Zhu, Yechao Bai

    Abstract: Most motion deblurring algorithms rely on spatial-domain convolution models, which struggle with the complex, non-linear blur arising from camera shake and object motion. In contrast, we propose a novel single-image deblurring approach that treats motion blur as a temporal averaging phenomenon. Our core innovation lies in leveraging a pre-trained video diffusion transformer model to capture divers… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  33. arXiv:2501.01677  [pdf, other

    cs.CV

    PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping

    Authors: Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a transformative method in the field of real-time novel synthesis. Based on 3DGS, recent advancements cope with large-scale scenes via spatial-based partition strategy to reduce video memory and optimization time costs. In this work, we introduce a parallel Gaussian splatting method, termed PG-SAG, which fully exploits semantic cues for both partitioning… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

  34. arXiv:2501.00879  [pdf, other

    cs.CL

    TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

    Authors: Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan, Yue Chen, Zhenhao Li, Zhaoyang Wang, Hamed Haddadi, Emine Yilmaz

    Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. These systems, however, remain susceptible to corpus poisoning attacks, which can severely impair the performance of LLMs. To address this challenge, we propose TrustRAG, a robust framework that sy… ▽ More

    Submitted 22 May, 2025; v1 submitted 1 January, 2025; originally announced January 2025.

  35. Contrastive Conditional Alignment based on Label Shift Calibration for Imbalanced Domain Adaptation

    Authors: Xiaona Sun, Zhenyu Wu, Zhiqiang Zhan, Yang Ji

    Abstract: Many existing unsupervised domain adaptation (UDA) methods primarily focus on covariate shift, limiting their effectiveness in imbalanced domain adaptation (IDA) where both covariate shift and label shift coexist. Recent IDA methods have achieved promising results based on self-training using target pseudo labels. However, under the IDA scenarios, the classifier learned in the source domain will e… ▽ More

    Submitted 28 December, 2024; originally announced December 2024.

    Comments: accepted by ICPR 2024

  36. arXiv:2412.20062  [pdf, other

    cs.CV

    MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion

    Authors: Zechao Zhan, Dehong Gao, Jinxia Zhang, Jiale Huang, Yang Hu, Xin Wang

    Abstract: Text-guided image editing model has achieved great success in general domain. However, directly applying these models to the fashion domain may encounter two issues: (1) Inaccurate localization of editing region; (2) Weak editing magnitude. To address these issues, the MADiff model is proposed. Specifically, to more accurately identify editing region, the MaskNet is proposed, in which the foregrou… ▽ More

    Submitted 15 January, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

  37. arXiv:2412.19997  [pdf, other

    cs.CV

    FashionFAE: Fine-grained Attributes Enhanced Fashion Vision-Language Pre-training

    Authors: Jiale Huang, Dehong Gao, Jinxia Zhang, Zechao Zhan, Yang Hu, Xin Wang

    Abstract: Large-scale Vision-Language Pre-training (VLP) has demonstrated remarkable success in the general domain. However, in the fashion domain, items are distinguished by fine-grained attributes like texture and material, which are crucial for tasks such as retrieval. Existing models often fail to leverage these fine-grained attributes from both text and image modalities. To address the above issues, we… ▽ More

    Submitted 12 January, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

    Comments: 5 pages, Accepted by ICASSP2025, full paper

  38. arXiv:2412.11455  [pdf, other

    cs.CL cs.AI

    Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models

    Authors: Zaifu Zhan, Rui Zhang

    Abstract: To efficiently select optimal dataset combinations for enhancing multi-task learning (MTL) performance in large language models, we proposed a novel framework that leverages a neural network to predict the best dataset combinations. The framework iteratively refines the selection, greatly improving efficiency, while being model-, dataset-, and domain-independent. Through experiments on 12 biomedic… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 14 pages, 5 figures, 4 tables

    Journal ref: Findings of the Association for Computational Linguistics: NAACL 2025

  39. arXiv:2412.09496  [pdf, other

    cs.RO

    iKap: Kinematics-aware Planning with Imperative Learning

    Authors: Qihang Li, Zhuoqun Chen, Haoze Zheng, Haonan He, Zitong Zhan, Shaoshu Su, Junyi Geng, Chen Wang

    Abstract: Trajectory planning in robotics aims to generate collision-free pose sequences that can be reliably executed. Recently, vision-to-planning systems have gained increasing attention for their efficiency and ability to interpret and adapt to surrounding environments. However, traditional modular systems suffer from increased latency and error propagation, while purely data-driven approaches often ove… ▽ More

    Submitted 20 March, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: 6 pages, 6 figures

  40. arXiv:2412.08948  [pdf, other

    cs.CV cs.CL

    Mojito: Motion Trajectory and Intensity Control for Video Generation

    Authors: Xuehai He, Shuohang Wang, Jianwei Yang, Xiaoxia Wu, Yiping Wang, Kuan Wang, Zheng Zhan, Olatunji Ruwase, Yelong Shen, Xin Eric Wang

    Abstract: Recent advancements in diffusion models have shown great promise in producing high-quality video content. However, efficiently training video diffusion models capable of integrating directional guidance and controllable motion intensity remains a challenging and under-explored area. To tackle these challenges, this paper introduces Mojito, a diffusion model that incorporates both motion trajectory… ▽ More

    Submitted 5 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

  41. arXiv:2412.03013  [pdf

    cs.NE

    A Performance Investigation of Multimodal Multiobjective Optimization Algorithms in Solving Two Types of Real-World Problems

    Authors: Zhiqiu Chen, Zong-Gan Chen, Yuncheng Jiang, Zhi-Hui Zhan

    Abstract: In recent years, multimodal multiobjective optimization algorithms (MMOAs) based on evolutionary computation have been widely studied. However, existing MMOAs are mainly tested on benchmark function sets such as the 2019 IEEE Congress on Evolutionary Computation test suite (CEC 2019), and their performance on real-world problems is neglected. In this paper, two types of real-world multimodal multi… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: the 2024 International Annual Conference on Complex Systems and Intelligent Science,6 pages

  42. arXiv:2412.01485  [pdf, other

    cs.CV

    SerialGen: Personalized Image Generation by First Standardization Then Personalization

    Authors: Cong Xie, Han Zou, Ruiqi Yu, Yan Zhang, Zhenpeng Zhan

    Abstract: In this work, we are interested in achieving both high text controllability and whole-body appearance consistency in the generation of personalized human characters. We propose a novel framework, named SerialGen, which is a serial generation method consisting of two stages: first, a standardization stage that standardizes reference images, and then a personalized generation stage based on the stan… ▽ More

    Submitted 11 May, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  43. arXiv:2411.19594  [pdf, other

    cs.CV

    Tortho-Gaussian: Splatting True Digital Orthophoto Maps

    Authors: Xin Wang, Wendi Zhang, Hong Xie, Haibin Ai, Qiangqiang Yuan, Zongqian Zhan

    Abstract: True Digital Orthophoto Maps (TDOMs) are essential products for digital twins and Geographic Information Systems (GIS). Traditionally, TDOM generation involves a complex set of traditional photogrammetric process, which may deteriorate due to various challenges, including inaccurate Digital Surface Model (DSM), degenerated occlusion detections, and visual artifacts in weak texture regions and refl… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

    Comments: This work has been submitted to the IEEE Transactions on Geoscience and Remote Sensing for possible publication

  44. arXiv:2411.15700  [pdf, other

    cs.CL cs.AI cs.CE

    RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements

    Authors: Zaifu Zhan, Shuang Zhou, Mingchen Li, Rui Zhang

    Abstract: \textbf{Objective:} We aimed to develop an advanced multi-task large language model (LLM) framework to extract multiple types of information about dietary supplements (DS) from clinical records. \textbf{Methods:} We used four core DS information extraction tasks - namely, named entity recognition (NER: 2,949 clinical sentences), relation extraction (RE: 4,892 sentences), triple extraction (TE: 2… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

    Journal ref: Journal of the American Medical Informatics Association, Volume 32, Issue 3, March 2025, Pages 545:554

  45. arXiv:2411.12279  [pdf, other

    cs.CV

    HouseTune: Two-Stage Floorplan Generation with LLM Assistance

    Authors: Ziyang Zong, Guanying Chen, Zhaohuan Zhan, Fengcheng Yu, Guang Tan

    Abstract: This paper proposes a two-stage text-to-floorplan generation framework that combines the reasoning capability of Large Language Models (LLMs) with the generative power of diffusion models. In the first stage, we leverage a Chain-of-Thought (CoT) prompting strategy to guide an LLM in generating an initial layout (Layout-Init) from natural language descriptions, which ensures a user-friendly and int… ▽ More

    Submitted 10 March, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

  46. arXiv:2411.02115  [pdf, other

    cs.LG cs.DC

    FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation

    Authors: Ziwei Zhan, Wenkuan Zhao, Yuanqing Li, Weijie Liu, Xiaoxi Zhang, Chee Wei Tan, Chuan Wu, Deke Guo, Xu Chen

    Abstract: Federated learning (FL) is a collaborative machine learning approach that enables multiple clients to train models without sharing their private data. With the rise of deep learning, large-scale models have garnered significant attention due to their exceptional performance. However, a key challenge in FL is the limitation imposed by clients with constrained computational and communication resourc… ▽ More

    Submitted 27 December, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 8 pages, 5 figures, accepted by The 20th International Conference on Mobility, Sensing and Networking (MSN 2024)

  47. FedReMa: Improving Personalized Federated Learning via Leveraging the Most Relevant Clients

    Authors: Han Liang, Ziwei Zhan, Weijie Liu, Xiaoxi Zhang, Chee Wei Tan, Xu Chen

    Abstract: Federated Learning (FL) is a distributed machine learning paradigm that achieves a globally robust model through decentralized computation and periodic model synthesis, primarily focusing on the global model's accuracy over aggregated datasets of all participating clients. Personalized Federated Learning (PFL) instead tailors exclusive models for each client, aiming to enhance the accuracy of clie… ▽ More

    Submitted 26 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 8 pages, 4 figures, accepted by European Conference on Artificial Intelligence (2024 ECAI)

    Journal ref: In ECAI 2024 (pp. 2090-2097). IOS Press (2024)

  48. arXiv:2411.01171  [pdf, other

    cs.CV cs.AI

    Fast and Memory-Efficient Video Diffusion Using Streamlined Inference

    Authors: Zheng Zhan, Yushu Wu, Yifan Gong, Zichong Meng, Zhenglun Kong, Changdi Yang, Geng Yuan, Pu Zhao, Wei Niu, Yanzhi Wang

    Abstract: The rapid progress in artificial intelligence-generated content (AIGC), especially with diffusion models, has significantly advanced development of high-quality video generation. However, current video diffusion models exhibit demanding computational requirements and high peak memory usage, especially for generating longer and higher-resolution videos. These limitations greatly hinder the practica… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024

  49. arXiv:2410.16663  [pdf, other

    cs.LG

    FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

    Authors: Haoran Lin, Xianzhi Yu, Kang Zhao, Lu Hou, Zongyuan Zhan, Stanislav Kamenev, Han Bao, Ting Hu, Mingkai Wang, Qixin Chang, Siyue Sui, Weihao Sun, Jiaxin Hu, Jun Yao, Zekun Yin, Cheng Qian, Ying Zhang, Yinfei Pan, Yu Yang, Weiguo Liu

    Abstract: FlashAttention series has been widely applied in the inference of large language models (LLMs). However, FlashAttention series only supports the high-level GPU architectures, e.g., Ampere and Hopper. At present, FlashAttention series is not easily transferrable to NPUs and low-resource GPUs. Moreover, FlashAttention series is inefficient for multi- NPUs or GPUs inference scenarios. In this work, w… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  50. arXiv:2410.14725  [pdf, other

    cs.LG cs.CL

    Rethinking Token Reduction for State Space Models

    Authors: Zheng Zhan, Yushu Wu, Zhenglun Kong, Changdi Yang, Yifan Gong, Xuan Shen, Xue Lin, Pu Zhao, Yanzhi Wang

    Abstract: Recent advancements in State Space Models (SSMs) have attracted significant interest, particularly in models optimized for parallel training and handling long-range dependencies. Architectures like Mamba have scaled to billions of parameters with selective SSM. To facilitate broader applications using Mamba, exploring its efficiency is crucial. While token reduction techniques offer a straightforw… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024