Skip to main content

Showing 1–50 of 675 results for author: Zhang, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.08603  [pdf, ps, other

    cs.AI cs.SD eess.AS

    Unlocking Speech Instruction Data Potential with Query Rewriting

    Authors: Yonghua Hei, Yibo Yan, Shuliang Liu, Huiyu Zhou, Linfeng Zhang, Xuming Hu

    Abstract: End-to-end Large Speech Language Models~(\textbf{LSLMs}) demonstrate strong potential in response latency and speech comprehension capabilities, showcasing general intelligence across speech understanding tasks. However, the ability to follow speech instructions has not been fully realized due to the lack of datasets and heavily biased training tasks. Leveraging the rich ASR datasets, previous app… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: ACL 2025 Findings

  2. arXiv:2507.06326  [pdf, ps, other

    cs.LG cs.AI eess.SY q-bio.NC

    Sample-Efficient Reinforcement Learning Controller for Deep Brain Stimulation in Parkinson's Disease

    Authors: Harsh Ravivarapu, Gaurav Bagwe, Xiaoyong Yuan, Chunxiu Yu, Lan Zhang

    Abstract: Deep brain stimulation (DBS) is an established intervention for Parkinson's disease (PD), but conventional open-loop systems lack adaptability, are energy-inefficient due to continuous stimulation, and provide limited personalization to individual neural dynamics. Adaptive DBS (aDBS) offers a closed-loop alternative, using biomarkers such as beta-band oscillations to dynamically modulate stimulati… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: Accepted by IEEE IMC 2025

  3. arXiv:2507.03872  [pdf, ps, other

    eess.IV cs.CV

    PLUS: Plug-and-Play Enhanced Liver Lesion Diagnosis Model on Non-Contrast CT Scans

    Authors: Jiacheng Hao, Xiaoming Zhang, Wei Liu, Xiaoli Yin, Yuan Gao, Chunli Li, Ling Zhang, Le Lu, Yu Shi, Xu Han, Ke Yan

    Abstract: Focal liver lesions (FLL) are common clinical findings during physical examination. Early diagnosis and intervention of liver malignancies are crucial to improving patient survival. Although the current 3D segmentation paradigm can accurately detect lesions, it faces limitations in distinguishing between malignant and benign liver lesions, primarily due to its inability to differentiate subtle var… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: MICCAI 2025 (Early Accepted)

  4. arXiv:2507.03315  [pdf, ps, other

    eess.IV cs.CV

    Towards Interpretable PolSAR Image Classification: Polarimetric Scattering Mechanism Informed Concept Bottleneck and Kolmogorov-Arnold Network

    Authors: Jinqi Zhang, Fangzhou Han, Di Zhuang, Lamei Zhang, Bin Zou, Li Yuan

    Abstract: In recent years, Deep Learning (DL) based methods have received extensive and sufficient attention in the field of PolSAR image classification, which show excellent performance. However, due to the ``black-box" nature of DL methods, the interpretation of the high-dimensional features extracted and the backtracking of the decision-making process based on the features are still unresolved problems.… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  5. arXiv:2507.00316  [pdf, ps, other

    cs.LG cs.CL eess.IV

    $μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation

    Authors: Siyou Li, Pengyao Qin, Huanan Wu, Dong Nie, Arun J. Thirunavukarasu, Juntao Yu, Le Zhang

    Abstract: Automated radiology report generation (RRG) aims to produce detailed textual reports from clinical imaging, such as computed tomography (CT) scans, to improve the accuracy and efficiency of diagnosis and provision of management advice. RRG is complicated by two key challenges: (1) inherent complexity in extracting relevant information from imaging data under resource constraints, and (2) difficult… ▽ More

    Submitted 1 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  6. arXiv:2506.23701  [pdf, ps, other

    eess.IV cs.CV

    MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction

    Authors: Lingtong Zhang, Mengdie Song, Xiaohan Hao, Huayu Mai, Bensheng Qiu

    Abstract: Magnetic Resonance Imaging (MRI) reconstruction is essential in medical diagnostics. As the latest generative models, diffusion models (DMs) have struggled to produce high-fidelity images due to their stochastic nature in image domains. Latent diffusion models (LDMs) yield both compact and detailed prior knowledge in latent domains, which could effectively guide the model towards more effective le… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accept by MICCAI2025

  7. arXiv:2506.16210  [pdf, ps, other

    eess.IV cs.CV

    From Coarse to Continuous: Progressive Refinement Implicit Neural Representation for Motion-Robust Anisotropic MRI Reconstruction

    Authors: Zhenxuan Zhang, Lipei Zhang, Yanqi Cheng, Zi Wang, Fanwen Wang, Haosen Zhang, Yue Yang, Yinzhe Wu, Jiahao Huang, Angelica I Aviles-Rivero, Zhifan Gao, Guang Yang, Peter J. Lally

    Abstract: In motion-robust magnetic resonance imaging (MRI), slice-to-volume reconstruction is critical for recovering anatomically consistent 3D brain volumes from 2D slices, especially under accelerated acquisitions or patient motion. However, this task remains challenging due to hierarchical structural disruptions. It includes local detail loss from k-space undersampling, global structural aliasing cause… ▽ More

    Submitted 24 June, 2025; v1 submitted 19 June, 2025; originally announced June 2025.

  8. arXiv:2506.15748  [pdf, ps, other

    eess.IV cs.CV

    Diffusion-based Counterfactual Augmentation: Towards Robust and Interpretable Knee Osteoarthritis Grading

    Authors: Zhe Wang, Yuhua Ru, Aladine Chetouani, Tina Shiang, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane, William Ewing Palmer, Mohamed Jarraya, Yung Hsin Chen

    Abstract: Automated grading of Knee Osteoarthritis (KOA) from radiographs is challenged by significant inter-observer variability and the limited robustness of deep learning models, particularly near critical decision boundaries. To address these limitations, this paper proposes a novel framework, Diffusion-based Counterfactual Augmentation (DCA), which enhances model robustness and interpretability by gene… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  9. arXiv:2506.13624  [pdf, ps, other

    eess.SY cs.RO

    Parallel Branch Model Predictive Control on GPUs

    Authors: Luyao Zhang, Chenghuai Lin, Sergio Grammatico

    Abstract: We present a parallel GPU-accelerated solver for branch Model Predictive Control problems. Based on iterative LQR methods, our solver exploits the tree-sparse structure and implements temporal parallelism using the parallel scan algorithm. Consequently, the proposed solver enables parallelism across both the prediction horizon and the scenarios. In addition, we utilize an augmented Lagrangian meth… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages, 9 figures

  10. arXiv:2506.12006  [pdf, ps, other

    eess.IV cs.CV

    crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023

    Authors: Navodini Wijethilake, Reuben Dorent, Marina Ivory, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Mohamed Okasha, Anna Oviedova, Hexin Dong, Bogyeong Kang, Guillaume Sallé, Luyi Han, Ziyuan Zhao, Han Liu, Tao Yang, Shahad Hardan, Hussain Alasmawi, Santosh Sanjeev, Yuzhou Zhuang, Satoshi Kondo, Maria Baldeon Calisto, Shaikh Muhammad Uzair Noman, Cancan Chen, Ipek Oguz, Rongguo Zhang , et al. (14 additional authors not shown)

    Abstract: The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a mea… ▽ More

    Submitted 24 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  11. arXiv:2506.08530  [pdf, ps, other

    eess.SY

    The Invariant Zonotopic Set-Membership Filter for State Estimation on Groups

    Authors: Tao Li, Yi Li, Lulin Zhang, Jiuxiang Dong

    Abstract: The invariant filtering theory based on the group theory has been successful in statistical filtering methods. However, there exists a class of state estimation problems with unknown statistical properties of noise disturbances, and it is worth discussing whether the invariant observer still has performance advantages. In this paper, considering the problem of state estimation with unknown but bou… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  12. arXiv:2506.07715  [pdf, ps, other

    cs.NI eess.SY

    Delay Optimization in Remote ID-Based UAV Communication via BLE and Wi-Fi Switching

    Authors: Yian Zhu, Ziye Jia, Lei Zhang, Yao Wu, Qiuming Zhu, Qihui Wu

    Abstract: The remote identification (Remote ID) broadcast capability allows unmanned aerial vehicles (UAVs) to exchange messages, which is a pivotal technology for inter-UAV communications. Although this capability enhances the operational visibility, low delay in Remote ID-based communications is critical for ensuring the efficiency and timeliness of multi-UAV operations in dynamic environments. To address… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  13. arXiv:2506.07709  [pdf, ps, other

    eess.IV cs.CV

    Fine-Grained Motion Compression and Selective Temporal Fusion for Neural B-Frame Video Coding

    Authors: Xihua Sheng, Peilin Chen, Meng Wang, Li Zhang, Shiqi Wang, Dapeng Oliver Wu

    Abstract: With the remarkable progress in neural P-frame video coding, neural B-frame coding has recently emerged as a critical research direction. However, most existing neural B-frame codecs directly adopt P-frame coding tools without adequately addressing the unique challenges of B-frame compression, leading to suboptimal performance. To bridge this gap, we propose novel enhancements for motion compressi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  14. arXiv:2506.07294  [pdf, ps, other

    cs.SD cs.CR cs.LG eess.AS

    Towards Generalized Source Tracing for Codec-Based Deepfake Speech

    Authors: Xuanjun Chen, I-Ming Lin, Lin Zhang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang

    Abstract: Recent attempts at source tracing for codec-based deepfake speech (CodecFake), generated by neural audio codec-based speech generation (CoSG) models, have exhibited suboptimal performance. However, how to train source tracing models using simulated CoSG data while maintaining strong performance on real CoSG-generated audio remains an open challenge. In this paper, we show that models trained solel… ▽ More

    Submitted 9 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

    Comments: Working in progress

  15. arXiv:2506.03133  [pdf, ps, other

    cs.LG cs.AI eess.SP math.OC

    PoLAR: Polar-Decomposed Low-Rank Adapter Representation

    Authors: Kai Lion, Liang Zhang, Bingcong Li, Niao He

    Abstract: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stief… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  16. arXiv:2506.02958  [pdf, ps, other

    eess.AS cs.SD

    PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing

    Authors: You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan

    Abstract: Neural speech editing enables seamless partial edits to speech utterances, allowing modifications to selected content while preserving the rest of the audio unchanged. This useful technique, however, also poses new risks of deepfakes. To encourage research on detecting such partially edited deepfake speech, we introduce PartialEdit, a deepfake speech dataset curated using advanced neural editing t… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Interspeech 2025 camera ready. Project page: https://yzyouzhang.com/PartialEdit/

  17. arXiv:2506.02197  [pdf, ps, other

    eess.IV cs.CV

    NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution

    Authors: Marcos V. Conde, Radu Timofte, Zihao Lu, Xiangyu Kong, Xiaoxia Xing, Fan Wang, Suejin Han, MinKyu Park, Tianyu Zhang, Xin Luo, Yeda Chen, Dong Liu, Li Pang, Yuhang Yang, Hongzhong Wang, Xiangyong Cao, Ruixuan Jiang, Senyan Xu, Siyuan Jiang, Xueyang Fu, Zheng-Jun Zha, Tianyu Hao, Yuhong He, Ruoqi Li, Yueqi Yang , et al. (14 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and… ▽ More

    Submitted 4 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

  18. arXiv:2506.01394  [pdf, ps, other

    eess.IV cs.CV

    NTIRE 2025 the 2nd Restore Any Image Model (RAIM) in the Wild Challenge

    Authors: Jie Liang, Radu Timofte, Qiaosi Yi, Zhengqiang Zhang, Shuaizheng Liu, Lingchen Sun, Rongyuan Wu, Xindong Zhang, Hui Zeng, Lei Zhang

    Abstract: In this paper, we present a comprehensive overview of the NTIRE 2025 challenge on the 2nd Restore Any Image Model (RAIM) in the Wild. This challenge established a new benchmark for real-world image restoration, featuring diverse scenarios with and without reference ground truth. Participants were tasked with restoring real-captured images suffering from complex and unknown degradations, where both… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  19. arXiv:2506.01213  [pdf, ps, other

    cs.LG eess.SP stat.ML

    On the Stability of Graph Convolutional Neural Networks: A Probabilistic Perspective

    Authors: Ning Zhang, Henry Kenlay, Li Zhang, Mihai Cucuringu, Xiaowen Dong

    Abstract: Graph convolutional neural networks (GCNNs) have emerged as powerful tools for analyzing graph-structured data, achieving remarkable success across diverse applications. However, the theoretical understanding of the stability of these models, i.e., their sensitivity to small changes in the graph structure, remains in rather limited settings, hampering the development and deployment of robust and t… ▽ More

    Submitted 12 June, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

  20. arXiv:2506.00885  [pdf, ps, other

    cs.SD cs.AI eess.AS

    CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching

    Authors: Leying Zhang, Yao Qian, Xiaofei Wang, Manthan Thakker, Dongmei Wang, Jianwei Yu, Haibin Wu, Yuxuan Hu, Jinyu Li, Yanmin Qian, Sheng Zhao

    Abstract: Generating natural-sounding, multi-speaker dialogue is crucial for applications such as podcast creation, virtual agents, and multimedia content generation. However, existing systems struggle to maintain speaker consistency, model overlapping speech, and synthesize coherent conversations efficiently. In this paper, we introduce CoVoMix2, a fully non-autoregressive framework for zero-shot multi-tal… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  21. arXiv:2505.23207  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM

    Authors: Zhaokai Sun, Li Zhang, Qing Wang, Pan Zhou, Lei Xie

    Abstract: Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation, a critical challenge in multi-party speech processing. This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks such as voice activity detection (VAD) and overlap detection. To improve acoustic repr… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  22. arXiv:2505.20984  [pdf, ps, other

    eess.IV cs.CV

    Generative Image Compression by Estimating Gradients of the Rate-variable Feature Distribution

    Authors: Minghao Han, Weiyi You, Jinhua Zhang, Leheng Zhang, Ce Zhu, Shuhang Gu

    Abstract: While learned image compression (LIC) focuses on efficient data transmission, generative image compression (GIC) extends this framework by integrating generative modeling to produce photo-realistic reconstructed images. In this paper, we propose a novel diffusion-based generative modeling framework tailored for generative image compression. Unlike prior diffusion-based approaches that indirectly e… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  23. arXiv:2505.19146  [pdf, ps, other

    physics.med-ph eess.SP

    Design of a Wearable Parallel Electrical Impedance Imaging System for Healthcare

    Authors: Bowen Li, Zekun Chen, Xuefei Chen, Luhao Zhang, Shili Liang

    Abstract: A wireless wearable Electrical Impedance Tomography (EIT) system has been developed utilizing the AD5933 chip to achieve real-time imaging of lung respiration. The system employs a voltage excitation method tailored to human impedance characteristics, injecting current by applying a known voltage and measuring the resulting current through the body. Additionally, specific measures have been implem… ▽ More

    Submitted 19 June, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  24. arXiv:2505.16403  [pdf, ps, other

    cs.LG eess.SY

    Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach

    Authors: Huazi Pan, Yanjun Zhang, Leo Yu Zhang, Scott Adams, Abbas Kouzani, Suiyang Khoo

    Abstract: Manipulation of local training data and local updates, i.e., the poisoning attack, is the main threat arising from the collaborative nature of the federated learning (FL) paradigm. Most existing poisoning attacks aim to manipulate local data/models in a way that causes denial-of-service (DoS) issues. In this paper, we introduce a novel attack method, named Federated Learning Sliding Attack (FedSA)… ▽ More

    Submitted 28 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: This paper is to appear in IJCAI 2025, code available at: https://github.com/Halsey777/FedSA

  25. arXiv:2505.15320  [pdf, ps, other

    eess.AS cs.SD

    Analysis of ABC Frontend Audio Systems for the NIST-SRE24

    Authors: Sara Barahona, Anna Silnova, Ladislav Mošner, Junyi Peng, Oldřich Plchot, Johan Rohdin, Lin Zhang, Jiangyu Han, Petr Palka, Federico Landini, Lukáš Burget, Themos Stafylakis, Sandro Cumani, Dominik Boboš, Miroslav Hlavaček, Martin Kodovsky, Tomáš Pavlíček

    Abstract: We present a comprehensive analysis of the embedding extractors (frontends) developed by the ABC team for the audio track of NIST SRE 2024. We follow the two scenarios imposed by NIST: using only a provided set of telephone recordings for training (fixed) or adding publicly available data (open condition). Under these constraints, we develop the best possible speaker embedding extractors for the p… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  26. arXiv:2505.12994  [pdf, ps, other

    cs.SD eess.AS

    Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy

    Authors: Xuanjun Chen, I-Ming Lin, Lin Zhang, Jiawei Du, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang

    Abstract: Recent advances in neural audio codec-based speech generation (CoSG) models have produced remarkably realistic audio deepfakes. We refer to deepfake speech generated by CoSG systems as codec-based deepfake, or CodecFake. Although existing anti-spoofing research on CodecFake predominantly focuses on verifying the authenticity of audio samples, almost no attention was given to tracing the CoSG used… ▽ More

    Submitted 31 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  27. arXiv:2505.11843  [pdf, ps, other

    eess.SP cs.LG

    S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation

    Authors: Junlang Huang, Hao Chen, Li Luo, Yong Cai, Lexin Zhang, Tianhao Ma, Yitian Zhang, Zhong Guan

    Abstract: Simulation of high-order nonlinear system requires extensive computational resources, especially in modern VLSI backend design where bifurcation-induced instability and chaos-like transient behaviors pose challenges. We present S-Crescendo - a nested transformer weaving framework that synergizes S-domain with neural operators for scalable time-domain prediction in high-order nonlinear networks, al… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  28. arXiv:2505.11724  [pdf, ps, other

    cs.CV eess.IV

    Semantically-Aware Game Image Quality Assessment

    Authors: Kai Zhu, Vignesh Edithal, Le Zhang, Ilia Blank, Imran Junejo

    Abstract: Assessing the visual quality of video game graphics presents unique challenges due to the absence of reference images and the distinct types of distortions, such as aliasing, texture blur, and geometry level of detail (LOD) issues, which differ from those in natural images or user-generated content. Existing no-reference image and video quality assessment (NR-IQA/VQA) methods fail to generalize to… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 16 pages, 12 figures

  29. arXiv:2505.09193  [pdf, ps, other

    eess.IV cs.CV

    BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression

    Authors: Wei Jiang, Junru Li, Kai Zhang, Li Zhang

    Abstract: Recent forward prediction-based learned video compression (LVC) methods have achieved impressive results, even surpassing VVC reference software VTM under the Low Delay B (LDB) configuration. In contrast, learned bidirectional video compression (BVC) remains underexplored and still lags behind its forward-only counterparts. This performance gap is mainly due to the limited ability to extract diver… ▽ More

    Submitted 6 July, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: Accepted to ACMMM 2025

  30. arXiv:2505.08203  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Not that Groove: Zero-Shot Symbolic Music Editing

    Authors: Li Zhang

    Abstract: Most work in AI music generation focused on audio, which has seen limited use in the music production industry due to its rigidity. To maximize flexibility while assuming only textual instructions from producers, we are among the first to tackle symbolic music editing. We circumvent the known challenge of lack of labeled data by proving that LLMs with zero-shot prompting can effectively edit drum… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  31. arXiv:2505.06918  [pdf, other

    eess.IV cs.CV cs.LG

    Uni-AIMS: AI-Powered Microscopy Image Analysis

    Authors: Yanhui Hong, Nan Wang, Zhiyi Xia, Haoyi Tao, Xi Fang, Yiming Li, Jiankun Wang, Peng Jin, Xiaochen Cai, Shengyu Li, Ziqi Chen, Zezhong Zhang, Guolin Ke, Linfeng Zhang

    Abstract: This paper presents a systematic solution for the intelligent recognition and automatic analysis of microscopy images. We developed a data engine that generates high-quality annotated datasets through a combination of the collection of diverse microscopy images from experiments, synthetic data generation and a human-in-the-loop annotation process. To address the unique challenges of microscopy ima… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  32. arXiv:2505.06277  [pdf, other

    eess.SP cs.AI cs.CV cs.NI

    Terahertz Spatial Wireless Channel Modeling with Radio Radiance Field

    Authors: John Song, Lihao Zhang, Feng Ye, Haijian Sun

    Abstract: Terahertz (THz) communication is a key enabler for 6G systems, offering ultra-wide bandwidth and unprecedented data rates. However, THz signal propagation differs significantly from lower-frequency bands due to severe free space path loss, minimal diffraction and specular reflection, and prominent scattering, making conventional channel modeling and pilot-based estimation approaches inefficient. I… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: submitted to IEEE conferences

  33. arXiv:2505.03266  [pdf

    physics.optics cs.IT eess.SP

    Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation

    Authors: Yi Ning Zheng, Lei Zhang, Xiao Qing Chen, Marco Rossi, Giuseppe Castaldi, Shuo Liu, Tie Jun Cui, Vincenzo Galdi

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 30 pages, 6 figures, 1 table, supporting information

  34. arXiv:2505.01831  [pdf, other

    eess.IV cs.CV

    Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

    Authors: Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang

    Abstract: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on r… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: Under review at Neural Networks

  35. arXiv:2504.20454  [pdf

    eess.IV cs.CV

    LymphAtlas- A Unified Multimodal Lymphoma Imaging Repository Delivering AI-Enhanced Diagnostic Insight

    Authors: Jiajun Ding, Beiyao Zhu, Xiaosheng Liu, Lishen Zhang, Zhao Liu

    Abstract: This study integrates PET metabolic information with CT anatomical structures to establish a 3D multimodal segmentation dataset for lymphoma based on whole-body FDG PET/CT examinations, which bridges the gap of the lack of standardised multimodal segmentation datasets in the field of haematological malignancies. We retrospectively collected 483 examination datasets acquired between March 2011 and… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 17pages,4 figures

  36. arXiv:2504.20383  [pdf, other

    cs.CV eess.IV

    Neural Stereo Video Compression with Hybrid Disparity Compensation

    Authors: Shiyin Jiang, Zhenghao Chen, Minghao Han, Xingyu Zhou, Leheng Zhang, Shuhang Gu

    Abstract: Disparity compensation represents the primary strategy in stereo video compression (SVC) for exploiting cross-view redundancy. These mechanisms can be broadly categorized into two types: one that employs explicit horizontal shifting, and another that utilizes an implicit cross-attention mechanism to reduce cross-view disparity redundancy. In this work, we propose a hybrid disparity compensation (H… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  37. arXiv:2504.19438  [pdf, other

    eess.IV cs.CV

    Dual Attention Driven Lumbar Magnetic Resonance Image Feature Enhancement and Automatic Diagnosis of Herniation

    Authors: Lingrui Zhang, Liang Guo, Xiao An, Feng Lin, Binlong Zheng, Jiankun Wang, Zhirui Li

    Abstract: Lumbar disc herniation (LDH) is a common musculoskeletal disease that requires magnetic resonance imaging (MRI) for effective clinical management. However, the interpretation of MRI images heavily relies on the expertise of radiologists, leading to delayed diagnosis and high costs for training physicians. Therefore, this paper proposes an innovative automated LDH classification framework. To addre… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 9 pages, 7 figures

  38. arXiv:2504.19401  [pdf

    physics.med-ph cs.CV cs.GR eess.IV

    Innovative Integration of 4D Cardiovascular Reconstruction and Hologram: A New Visualization Tool for Coronary Artery Bypass Grafting Planning

    Authors: Shuo Wang, Tong Ren, Nan Cheng, Li Zhang, Rong Wang

    Abstract: Background: Coronary artery bypass grafting (CABG) planning requires advanced spatial visualization and consideration of coronary artery depth, calcification, and pericardial adhesions. Objective: To develop and evaluate a dynamic cardiovascular holographic visualization tool for preoperative CABG planning. Methods: Using 4D cardiac computed tomography angiography data from 14 CABG candidates, we… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 35 pages, 9 figures

    ACM Class: J.3; I.3.8

  39. arXiv:2504.16099  [pdf, other

    eess.SP cs.AI cs.IT

    Two-Timescale Joint Transmit and Pinching Beamforming for Pinching-Antenna Systems

    Authors: Luyuan Zhang, Xidong Mu, An Liu, Yuanwei Liu

    Abstract: Pinching antenna systems (PASS) have been proposed as a revolutionary flexible antenna technology which facilitates line-of-sight links via numerous low-cost pinching antennas with adjustable activation positions over waveguides. This letter proposes a two-timescale joint transmit and pinching beamforming design for the maximization of sum rate of a PASS-based downlink multi-user multiple input si… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 5 pages, 4 figures, letter

  40. arXiv:2504.15545  [pdf, other

    eess.IV cs.CV

    VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining

    Authors: Zizhi Chen, Xinyu Zhang, Minghao Han, Yizhou Liu, Ziyun Qian, Weifeng Zhang, Xukun Zhang, Jingwei Wei, Lihua Zhang

    Abstract: In histopathology, tissue sections are typically stained using common H&E staining or special stains (MAS, PAS, PASM, etc.) to clearly visualize specific tissue structures. The rapid advancement of deep learning offers an effective solution for generating virtually stained images, significantly reducing the time and labor costs associated with traditional histochemical staining. However, a new cha… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  41. arXiv:2504.14641  [pdf, ps, other

    cs.SE eess.SY

    HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis

    Authors: Kangwei Xu, Bing Li, Grace Li Zhang, Ulf Schlichtmann

    Abstract: In high-level synthesis (HLS), C/C++ programs with synthesis directives are used to generate circuits for FPGA implementations. However, hardware-specific and platform-dependent characteristics in these implementations can introduce behavioral discrepancies between the original C/C++ programs and the circuits after high-level synthesis. Existing methods for testing behavioral discrepancies in HLS… ▽ More

    Submitted 9 July, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2407.03889

  42. arXiv:2504.13010  [pdf, other

    eess.SP

    Simultaneous Polysomnography and Cardiotocography Reveal Temporal Correlation Between Maternal Obstructive Sleep Apnea and Fetal Hypoxia

    Authors: Jingyu Wang, Donglin Xie, Jingying Ma, Yunliang Sun, Linyan Zhang, Rui Bai, Zelin Tu, Liyue Xu, Jun Wei, Jingjing Yang, Yanan Liu, Huijie Yi, Bing Zhou, Long Zhao, Xueli Zhang, Mengling Feng, Xiaosong Dong, Guoli Liu, Fang Han, Shenda Hong

    Abstract: Background: Obstructive sleep apnea syndrome (OSAS) during pregnancy is common and can negatively affect fetal outcomes. However, studies on the immediate effects of maternal hypoxia on fetal heart rate (FHR) changes are lacking. Methods: We used time-synchronized polysomnography (PSG) and cardiotocography (CTG) data from two cohorts to analyze the correlation between maternal hypoxia and FHR chan… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  43. arXiv:2504.10952  [pdf

    eess.SP

    A Signal Matrix-Based Local Flaw Detection Framework for Steel Wire Ropes Using Convolutional Neural Networks

    Authors: Siyu You, Leilei Yang, Zixu Kuang, Huayi Gou, Longlong Zhang, Zhiliang Liu

    Abstract: Steel wire ropes (SWRs) are critical load-bearing components in industrial applications, yet their structural integrity is often compromised by local flaws (LFs). Magnetic Flux Leakage (MFL) is a widely used non-destructive testing method that detects defects by measuring perturbations in magnetic fields. Traditional MFL detection methods suffer from critical limitations: one-dimensional approache… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Submitted to 2025 International Conference on Mechatronics and Automation

  44. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  45. arXiv:2504.07996  [pdf, ps, other

    eess.SP cs.LG

    Fusing Global and Local: Transformer-CNN Synergy for Next-Gen Current Estimation

    Authors: Junlang Huang, Hao Chen, Li Luo, Yong Cai, Lexin Zhang, Tianhao Ma, Yitian Zhang, Zhong Guan

    Abstract: This paper presents a hybrid model combining Transformer and CNN for predicting the current waveform in signal lines. Unlike traditional approaches such as current source models, driver linear representations, waveform functional fitting, or equivalent load capacitance methods, our model does not rely on fixed simplified models of standard-cell drivers or RC loads. Instead, it replaces the complex… ▽ More

    Submitted 10 June, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  46. arXiv:2504.07429  [pdf, other

    eess.SP

    DS-Pnet: FM-Based Positioning via Downsampling

    Authors: Shilian Zheng, Xinjiang Qiu, Luxin Zhang, Quan Lin, Zhijin Zhao, Xiaoniu Yang

    Abstract: In this paper we present DS-Pnet, a novel framework for FM signal-based positioning that addresses the challenges of high computational complexity and limited deployment in resource-constrained environments. Two downsampling methods-IQ signal downsampling and time-frequency representation downsampling-are proposed to reduce data dimensionality while preserving critical positioning features. By int… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  47. arXiv:2504.07427  [pdf, other

    eess.SP

    Deep Learning-Based Wideband Spectrum Sensing with Dual-Representation Inputs and Subband Shuffling Augmentation

    Authors: Shilian Zheng, Zhihao Ye, Luxin Zhang, Keqiang Yue, Zhijin Zhao

    Abstract: The widespread adoption of mobile communication technology has led to a severe shortage of spectrum resources, driving the development of cognitive radio technologies aimed at improving spectrum utilization, with spectrum sensing being the key enabler. This paper presents a novel deep learning-based wideband spectrum sensing framework that leverages multi-taper power spectral inputs to achieve hig… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  48. arXiv:2504.07399  [pdf, other

    eess.SP

    WK-Pnet: FM-Based Positioning via Wavelet Packet Decomposition and Knowledge Distillation

    Authors: Shilian Zheng, Quan Lin, Peihan Qi, Luxin Zhang, Xinjiang Qiu, Zhijin Zhao, Xiaoniu Yang

    Abstract: Accurate and efficient positioning in complex environments is critical for applications where traditional satellite-based systems face limitations, such as indoors or urban canyons. This paper introduces WK-Pnet, an FM-based indoor positioning framework that combines wavelet packet decomposition (WPD) and knowledge distillation. WK-Pnet leverages WPD to extract rich time-frequency features from FM… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  49. arXiv:2504.07308  [pdf, other

    eess.IV cs.CV

    MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

    Authors: Zhe Wang, Yuhua Ru, Aladine Chetouani, Fang Chen, Fabian Bauer, Liping Zhang, Didier Hans, Rachid Jennane, Mohamed Jarraya, Yung Hsin Chen

    Abstract: Magnetic Resonance Imaging (MRI) at lower field strengths (e.g., 3T) suffers from limited spatial resolution, making it challenging to capture fine anatomical details essential for clinical diagnosis and neuroimaging research. To overcome this limitation, we propose MoEDiff-SR, a Mixture of Experts (MoE)-guided diffusion model for region-adaptive MRI Super-Resolution (SR). Unlike conventional diff… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  50. arXiv:2504.07119  [pdf, other

    eess.SP

    UAV-Assisted MEC for Disaster Response: Stackelberg Game-Based Resource Optimization

    Authors: Yafei Guo, Ziye Jia, Lei Zhang, Jia He, Yu Zhang, Qihui Wu

    Abstract: The unmanned aerial vehicle assisted multi-access edge computing (UAV-MEC) technology has been widely applied in the sixth-generation era. However, due to the limitations of energy and computing resources in disaster areas, how to efficiently offload the tasks of damaged user equipments (UEs) to UAVs is a key issue. In this work, we consider a multiple UAVMECs assisted task offloading scenario, wh… ▽ More

    Submitted 26 March, 2025; originally announced April 2025.